Skip to main content

Deployment Validation

The compatibility check endpoint validates whether a model can run on target devices before you deploy. When a model doesn't fit, it suggests alternatives ranked by relevance.

Quick Start

curl -X POST https://api.octomil.com/api/v1/deploy/check-compatibility \
-H "Authorization: Bearer <token>" \
-H "Content-Type: application/json" \
-d '{
"model_name": "gemma3-4b",
"devices": ["device-abc-123"]
}'

How It Works

For each target device, the compatibility check:

  1. Classifies the device class (DeviceClass: flagship, high, mid, low) based on RAM, NPU, and chip family
  2. Resolves the optimal artifact format (ArtifactFormat: coreml, tflite, gguf, onnx, mlx, mnn) and executor delegate (ExecutorDelegate: coreml_ane, nnapi, xnnpack, gpu_delegate, mlx, llama_cpp, etc.)
  3. Checks model size against the device's maximum capacity
  4. Checks format availability and flags if conversion is needed
  5. Estimates performance (tok/s, latency) from benchmark data when available
  6. Finds alternatives if the model doesn't fit
Canonical Types

Device classification, artifact formats, and executor delegates are defined as canonical domain types in octomil-contracts/enums/. The server imports them from server/app/domain/types/. See the contracts repo for the YAML definitions.

Device Classes

ClassRAMExamplesMax Model Size
flagship8 GB+iPhone 15 Pro, Galaxy S24, Pixel 92048 MB
high6 GB+iPhone 14, Galaxy S22, Pixel 71024 MB
mid4-6 GBMid-range Android devices500 MB
low<4 GBBudget phones200 MB

Response Structure

{
"model_id": "model-abc",
"model_name": "gemma3-4b",
"model_version": "1.0.0",
"model_size_bytes": 4200000000,
"all_compatible": false,
"devices": [
{
"device_id": "device-abc-123",
"device_class": "mid",
"compatible": false,
"format": "tflite",
"executor": "xnnpack",
"quantization": "float32",
"issues": [
{
"severity": "error",
"message": "Model size (4000 MB) exceeds device limit (500 MB)",
"suggestion": "Use a quantized variant (int8/int4) or a smaller model. Max for mid tier: 500 MB."
}
],
"performance": null,
"runtime_config": {
"engine": "xnnpack",
"format": "tflite",
"compute_units": "cpu_and_gpu"
}
}
],
"alternatives": [
{
"model_id": "model-def",
"model_name": "gemma3-1b",
"version": "1.0.0",
"size_bytes": 200000000,
"reason": "Same family, fits within 500 MB limit",
"same_family": true,
"estimated_tokens_per_second": null
},
{
"model_id": "model-ghi",
"model_name": "phi-mini",
"version": "2.0.0",
"size_bytes": 150000000,
"reason": "Alternative model (150 MB), fits within 500 MB limit",
"same_family": false,
"estimated_tokens_per_second": null
}
]
}

Issue Severity

SeverityMeaningDeploy blocked?
errorModel cannot run on this deviceYes
warningModel can run but needs conversion or has caveatsNo

Common issues:

  • Model too large (error) -- model size exceeds device class maximum
  • Format not available (warning) -- the optimal format hasn't been converted yet, but conversion will happen during deployment

Alternative Ranking

When a model is incompatible, the system suggests alternatives using closest-first ranking:

  1. Same family, smaller size -- gemma3-4b to gemma3-1b
  2. Cross-family alternatives -- gemma3-4b to phi-mini or smollm-360m

Within each group, alternatives are sorted by size (ascending). Up to 10 alternatives are returned.

Check and Deploy

Check compatibility and deploy using the API:

Check compatibility

curl -X POST https://api.octomil.com/api/v1/deploy/check-compatibility \
-H "Authorization: Bearer edg_..." \
-H "Content-Type: application/json" \
-d '{
"model_name": "gemma3-4b",
"devices": ["device-abc-123", "device-def-456"]
}'

Execute deployment

If compatible, deploy:

curl -X POST https://api.octomil.com/api/v1/deploy/execute \
-H "Authorization: Bearer edg_..." \
-H "Content-Type: application/json" \
-d '{
"model_name": "gemma3-4b",
"devices": ["device-abc-123", "device-def-456"]
}'

Or use the Python SDK:

from octomil import ModelRegistry

registry = ModelRegistry(api_key="edg_...")

# Check compatibility via the registry
# Deploy via the dashboard or REST API after confirming compatibility

Gotchas

  • Compatibility check does not reserve capacity — a successful check doesn't guarantee the device will have enough memory at deploy time. Another app or model could consume memory between check and deploy.
  • Device tier is based on specs, not state — a flagship device with 2GB free RAM is still classified as "flagship" (8GB+ total). The tier reflects hardware capability, not current availability.
  • Format conversion happens during deployment — if the optimal format (e.g., CoreML) hasn't been pre-converted, deployment triggers conversion. This adds minutes to the first deploy. Pre-convert by uploading with formats="coreml,tflite".
  • Alternatives are from your catalog only — the system suggests models you've already uploaded. It doesn't recommend models from Hugging Face or other sources.
  • Performance estimates require benchmark dataestimated_tokens_per_second is null if no one has submitted benchmarks for that model/device combination. Contribute with octomil benchmark --share.