models
📄️ Check whether the model's latest catalog version is compatible with the specified device profiles. The server resolves t
Check whether the model's latest catalog version is compatible with the specified device profiles. The server resolves the model's latest size from its CatalogModelVersion artifact packages, then calls OptimizationService.check_compatibility_from_metadata with the model's framework and optional device targets.
📄️ Create a new Model in the internal registry (distinct from CatalogModelVariant — see workspace memory 'Model Registries'
Create a new Model in the internal registry (distinct from CatalogModelVariant — see workspace memory 'Model Registries' for the two-registry distinction). Operators create Models when they want to track a custom or fine-tuned model that isn't in the global catalog.
📄️ List Models in the caller's org with extensive filter support for the dashboard's model browser.
List Models in the caller's org with extensive filter support for the dashboard's model browser.
📄️ Register a new Version for a Model. Artifacts (concrete weights / package files) are uploaded separately via the upload
Register a new Version for a Model. Artifacts (concrete weights / package files) are uploaded separately via the upload routes; creating a Version is metadata-only.
📄️ List every Version for a Model. Powers the version timeline on the model detail page.
List every Version for a Model. Powers the version timeline on the model detail page.
📄️ Delete a Model. Cascades to all versions and their artifact records; does NOT delete the underlying R2 objects (left for
Delete a Model. Cascades to all versions and their artifact records; does NOT delete the underlying R2 objects (left for the cleanup job to GC) so an accidental delete can be recovered manually within the 7-day R2 lifecycle window.
📄️ Fetch a single model by id. Powers the model detail page.
Fetch a single model by id. Powers the model detail page.
📄️ Partial update of a Model. Flipping `enabled: false` immediately blocks new deployments from using the model but doesn't
Partial update of a Model. Flipping `enabled: false` immediately blocks new deployments from using the model but doesn't affect in-flight rollouts.
📄️ Delete a Version + cascade artifacts.
Delete a Version + cascade artifacts.
📄️ Return the set of filter values available for the models list page. Distinct frameworks come from Model.framework; capab
Return the set of filter values available for the models list page. Distinct frameworks come from Model.framework; capabilities come from CatalogModelVariant.task_taxonomy flattened across all variants whose name appears in the org's Model rows. Used to populate the filter panel dropdowns without a round-trip per option.
📄️ Generate a presigned download URL for the primary weights ArtifactResource associated with an artifact package. The mode
Generate a presigned download URL for the primary weights ArtifactResource associated with an artifact package. The model_id is verified first to enforce org-scoped access. URL expires in 3600 seconds.
📄️ Return aggregated inference telemetry for a model within a time window. Combines InferenceSession rows (org-scoped, with
Return aggregated inference telemetry for a model within a time window. Combines InferenceSession rows (org-scoped, within the window) with fleet Device health status. Latency metrics are computed as percentile distributions. The fleet block summarises device availability independent of this model's inference history.
📄️ Generate an optimization plan for a model. The server resolves the model's latest catalog version size and delegates to
Generate an optimization plan for a model. The server resolves the model's latest catalog version size and delegates to OptimizationService.get_optimization_plan_from_metadata(). The plan describes recommended quantization strategies, format targets, and expected size/accuracy tradeoffs per device profile.
📄️ Return the pre-computed optimized configuration for a specific device type. Configurations are produced by running `octo
Return the pre-computed optimized configuration for a specific device type. Configurations are produced by running `octomil optimize` locally and uploading the result. This endpoint fetches the stored config artifact for the given model_id + device_type combination.
📄️ Return aggregated training-update statistics for a model, broken down by catalog version. Includes per-version update co
Return aggregated training-update statistics for a model, broken down by catalog version. Includes per-version update counts, sample counts, device counts, and the latest recorded metrics dict from TrainingUpdate. Scoped to the caller's org (Model.org_id == auth.org_id).
📄️ Fetch a Version by id without needing the parent Model.id. Used by deployment + rollout views that hold version ids but
Fetch a Version by id without needing the parent Model.id. Used by deployment + rollout views that hold version ids but don't always track the parent model.
📄️ Return detailed training-update statistics for a specific model version. Computes per-metric percentiles (p50, p95, p99)
Return detailed training-update statistics for a specific model version. Computes per-metric percentiles (p50, p95, p99) and per-device breakdowns from all TrainingUpdate rows for the model_id + version combination. Returns a zero-count response (not 404) when no updates exist for the version.
📄️ Return all artifact packages across every catalog version for a model. Uses CatalogModelVariant.resolve_by_name_for_org
Return all artifact packages across every catalog version for a model. Uses CatalogModelVariant.resolve_by_name_for_org to select the correct variant row (org-override beats blessed row on name collision). Returns [] when no catalog variant exists for the model name.
📄️ Return a manifest of all models as a name-keyed map. SDK clients use this for local caching: check the ETag header befor
Return a manifest of all models as a name-keyed map. SDK clients use this for local caching: check the ETag header before re-parsing. The endpoint is public (no auth required) to allow CLI and CI tooling to fetch the manifest without an API key.
📄️ Publish a draft ModelVersion. Validates that at least one artifact has been uploaded + checksums; flips lifecycle_status
Publish a draft ModelVersion. Validates that at least one artifact has been uploaded + checksums; flips lifecycle_status to 'published'; makes the version visible to SDK clients for download.
📄️ Resolve a model by name and return its manifest entry. Used by SDK CLI tooling to look up a canonical model entry withou
Resolve a model by name and return its manifest entry. Used by SDK CLI tooling to look up a canonical model entry without knowing its UUID. Searches Model rows by name (case-sensitive exact match).
📄️ Transition a catalog version's lifecycle to 'deprecated'. Only versions with lifecycle == 'active' can be deprecated. Th
Transition a catalog version's lifecycle to 'deprecated'. Only versions with lifecycle == 'active' can be deprecated. The operation sets CatalogModelVersion.lifecycle = 'deprecated' and commits.
📄️ Generate presigned download URLs for all stored resources of a catalog version. Iterates all ArtifactPackage rows for th
Generate presigned download URLs for all stored resources of a catalog version. Iterates all ArtifactPackage rows for the version, then all ArtifactResource rows with a non-null storage_key. Resources that fail presigning are silently skipped. Returns {source: 'catalog', resources: []} when no resources have storage_keys.
📄️ List all artifact packages for a specific catalog version. Does not require model_id; the version_id directly addresses
List all artifact packages for a specific catalog version. Does not require model_id; the version_id directly addresses the CatalogModelVersion row. Returns [] when no packages exist for the version.
📄️ Create the catalog version record after a successful presigned PUT upload. Verifies the object exists in S3 before creat
Create the catalog version record after a successful presigned PUT upload. Verifies the object exists in S3 before creating the ArtifactPackage and ArtifactResource rows. Auto-creates the internal Model row on first confirm for slug-shaped model_ids (OCT-46 #3). Framework is derived from the storage_key file extension.
📄️ Return a presigned S3 PUT URL so the client can upload a model artifact directly to object storage, bypassing the server
Return a presigned S3 PUT URL so the client can upload a model artifact directly to object storage, bypassing the server. The storage key is derived from org_id/model_name/version/format+extension. After a successful PUT, call models.versions.confirm_upload to create the version record.
📄️ Upload a model weights file and convert it to one or more mobile artifact formats server-side. Supports PyTorch (.pt) an
Upload a model weights file and convert it to one or more mobile artifact formats server-side. Supports PyTorch (.pt) and ONNX (.onnx) inputs. Server runs Docker-based conversion for .pt files. Enforces plan gate (model_versions_monthly) before accepting the upload. Auto-creates the internal Model row on first upload for slug-shaped model_ids (OCT-46 #3).
📄️ Upload a primary weights file plus N sidecar files as a single coherent artifact bundle. All files are grouped into ONE
Upload a primary weights file plus N sidecar files as a single coherent artifact bundle. All files are grouped into ONE ArtifactPackage row with N+1 ArtifactResource rows. This is required for models that need sidecar files to be co-located at runtime (e.g. piper-vits TTS: model.onnx + tokens.txt + espeak-ng-data). Enforces plan gate (model_versions_monthly). len(extras) MUST equal len(extras_kind); 422 on mismatch.
📄️ Upload pre-converted model artifact files — no server-side conversion. Creates one ArtifactPackage row per file. Use thi
Upload pre-converted model artifact files — no server-side conversion. Creates one ArtifactPackage row per file. Use this path when artifacts are produced locally via `octomil convert`. Enforces plan gate (model_versions_monthly). Note: for coherent multi-file bundles (e.g. ONNX + tokenizer + sidecar) use models.versions.upload_bundle instead, which groups all files into a single ArtifactPackage.