Deployment Validation

The compatibility check endpoint validates whether a model can run on target devices before you deploy. When a model doesn't fit, it suggests alternatives ranked by relevance.

Quick Start

cURL
Python
JavaScript

curl -X POST https://api.octomil.com/api/v1/deploy/check-compatibility \
  -H "Authorization: Bearer <token>" \
  -H "Content-Type: application/json" \
  -d '{
    "model_name": "gemma3-4b",
    "devices": ["device-abc-123"]
  }'

import requests

response = requests.post(
    "https://api.octomil.com/api/v1/deploy/check-compatibility",
    headers={"Authorization": "Bearer <token>"},
    json={
        "model_name": "gemma3-4b",
        "devices": ["device-abc-123"],
    },
)
print(response.json())

const response = await fetch(
  "https://api.octomil.com/api/v1/deploy/check-compatibility",
  {
    method: "POST",
    headers: {
      "Authorization": "Bearer <token>",
      "Content-Type": "application/json",
    },
    body: JSON.stringify({
      model_name: "gemma3-4b",
      devices: ["device-abc-123"],
    }),
  }
);
const data = await response.json();
console.log(data);

How It Works

For each target device, the compatibility check:

Classifies the device class (DeviceClass: flagship, high, mid, low) based on RAM, NPU, and chip family
Resolves the optimal artifact format (ArtifactFormat: coreml, tflite, gguf, onnx, mlx, mnn) and executor delegate (ExecutorDelegate: coreml_ane, nnapi, xnnpack, gpu_delegate, mlx, llama_cpp, etc.)
Checks model size against the device's maximum capacity
Checks format availability and flags if conversion is needed
Estimates performance (tok/s, latency) from benchmark data when available
Finds alternatives if the model doesn't fit

Canonical Types

Device classification, artifact formats, and executor delegates are defined as canonical domain types in octomil-contracts/enums/. The server imports them from server/app/domain/types/. See the contracts repo for the YAML definitions.

Device Classes

Class	RAM	Examples	Max Model Size
`flagship`	8 GB+	iPhone 15 Pro, Galaxy S24, Pixel 9	2048 MB
`high`	6 GB+	iPhone 14, Galaxy S22, Pixel 7	1024 MB
`mid`	4-6 GB	Mid-range Android devices	500 MB
`low`	<4 GB	Budget phones	200 MB

Response Structure

{
  "model_id": "model-abc",
  "model_name": "gemma3-4b",
  "model_version": "1.0.0",
  "model_size_bytes": 4200000000,
  "all_compatible": false,
  "devices": [
    {
      "device_id": "device-abc-123",
      "device_class": "mid",
      "compatible": false,
      "format": "tflite",
      "executor": "xnnpack",
      "quantization": "float32",
      "issues": [
        {
          "severity": "error",
          "message": "Model size (4000 MB) exceeds device limit (500 MB)",
          "suggestion": "Use a quantized variant (int8/int4) or a smaller model. Max for mid tier: 500 MB."
        }
      ],
      "performance": null,
      "runtime_config": {
        "engine": "xnnpack",
        "format": "tflite",
        "compute_units": "cpu_and_gpu"
      }
    }
  ],
  "alternatives": [
    {
      "model_id": "model-def",
      "model_name": "gemma3-1b",
      "version": "1.0.0",
      "size_bytes": 200000000,
      "reason": "Same family, fits within 500 MB limit",
      "same_family": true,
      "estimated_tokens_per_second": null
    },
    {
      "model_id": "model-ghi",
      "model_name": "phi-mini",
      "version": "2.0.0",
      "size_bytes": 150000000,
      "reason": "Alternative model (150 MB), fits within 500 MB limit",
      "same_family": false,
      "estimated_tokens_per_second": null
    }
  ]
}

Issue Severity

Severity	Meaning	Deploy blocked?
`error`	Model cannot run on this device	Yes
`warning`	Model can run but needs conversion or has caveats	No

Common issues:

Model too large (error) -- model size exceeds device class maximum
Format not available (warning) -- the optimal format hasn't been converted yet, but conversion will happen during deployment

Alternative Ranking

When a model is incompatible, the system suggests alternatives using closest-first ranking:

Same family, smaller size -- gemma3-4b to gemma3-1b
Cross-family alternatives -- gemma3-4b to phi-mini or smollm-360m

Within each group, alternatives are sorted by size (ascending). Up to 10 alternatives are returned.

Check and Deploy

Check compatibility and deploy using the API:

Check compatibility

cURL
Python
JavaScript

curl -X POST https://api.octomil.com/api/v1/deploy/check-compatibility \
  -H "Authorization: Bearer edg_..." \
  -H "Content-Type: application/json" \
  -d '{
    "model_name": "gemma3-4b",
    "devices": ["device-abc-123", "device-def-456"]
  }'

import requests

response = requests.post(
    "https://api.octomil.com/api/v1/deploy/check-compatibility",
    headers={"Authorization": "Bearer edg_..."},
    json={
        "model_name": "gemma3-4b",
        "devices": ["device-abc-123", "device-def-456"],
    },
)
print(response.json())

const response = await fetch(
  "https://api.octomil.com/api/v1/deploy/check-compatibility",
  {
    method: "POST",
    headers: {
      "Authorization": "Bearer edg_...",
      "Content-Type": "application/json",
    },
    body: JSON.stringify({
      model_name: "gemma3-4b",
      devices: ["device-abc-123", "device-def-456"],
    }),
  }
);
const data = await response.json();
console.log(data);

Execute deployment

If compatible, deploy:

cURL
Python
JavaScript

curl -X POST https://api.octomil.com/api/v1/deploy/execute \
  -H "Authorization: Bearer edg_..." \
  -H "Content-Type: application/json" \
  -d '{
    "model_name": "gemma3-4b",
    "devices": ["device-abc-123", "device-def-456"]
  }'

import requests

response = requests.post(
    "https://api.octomil.com/api/v1/deploy/execute",
    headers={"Authorization": "Bearer edg_..."},
    json={
        "model_name": "gemma3-4b",
        "devices": ["device-abc-123", "device-def-456"],
    },
)
print(response.json())

const response = await fetch(
  "https://api.octomil.com/api/v1/deploy/execute",
  {
    method: "POST",
    headers: {
      "Authorization": "Bearer edg_...",
      "Content-Type": "application/json",
    },
    body: JSON.stringify({
      model_name: "gemma3-4b",
      devices: ["device-abc-123", "device-def-456"],
    }),
  }
);
const data = await response.json();
console.log(data);

Or use the Python SDK:

from octomil import ModelRegistry

registry = ModelRegistry(api_key="edg_...")

# Check compatibility via the registry
# Deploy via the dashboard or REST API after confirming compatibility

Gotchas

Compatibility check does not reserve capacity — a successful check doesn't guarantee the device will have enough memory at deploy time. Another app or model could consume memory between check and deploy.
Device tier is based on specs, not state — a flagship device with 2GB free RAM is still classified as "flagship" (8GB+ total). The tier reflects hardware capability, not current availability.
Format conversion happens during deployment — if the optimal format (e.g., CoreML) hasn't been pre-converted, deployment triggers conversion. This adds minutes to the first deploy. Pre-convert by uploading with formats="coreml,tflite".
Alternatives are from your catalog only — the system suggests models you've already uploaded. It doesn't recommend models from Hugging Face or other sources.
Performance estimates require benchmark data — estimated_tokens_per_second is null if no one has submitted benchmarks for that model/device combination. Contribute with octomil benchmark --share.

Rollouts — gradual deployment after compatibility is confirmed
Device Profiling — benchmark data that feeds performance estimates
Model Catalog — model versioning and available formats
Local Inference — local inference with engine auto-selection

Quick Start​

How It Works​

Device Classes​

Response Structure​

Issue Severity​

Alternative Ranking​

Check and Deploy​

Check compatibility​

Execute deployment​

Gotchas​

Related​