Skip to main content

Deployment Validation

The compatibility check endpoint validates whether a model can run on target devices before you deploy. When a model doesn't fit, it suggests alternatives ranked by relevance.

Quick Start

curl -X POST https://api.octomil.com/api/v1/deploy/check-compatibility \
-H "Authorization: Bearer <token>" \
-H "Content-Type: application/json" \
-d '{
"model_name": "gemma-4b",
"devices": ["device-abc-123"]
}'

How It Works

For each target device, the compatibility check:

  1. Classifies the device tier (flagship, high, mid, low) based on RAM, NPU, and chip family
  2. Resolves the optimal format (CoreML, TFLite, GGUF) and executor (ANE, NNAPI, XNNPACK, Metal)
  3. Checks model size against the device's maximum capacity
  4. Checks format availability and flags if conversion is needed
  5. Estimates performance (tok/s, latency) from benchmark data when available
  6. Finds alternatives if the model doesn't fit

Device Tiers

TierRAMExamplesMax Model Size
Flagship8 GB+iPhone 15 Pro, Galaxy S24, Pixel 92048 MB
High6 GB+iPhone 14, Galaxy S22, Pixel 71024 MB
Mid4-6 GBMid-range Android devices500 MB
Low<4 GBBudget phones200 MB

Response Structure

{
"model_id": "model-abc",
"model_name": "gemma-4b",
"model_version": "1.0.0",
"model_size_bytes": 4200000000,
"all_compatible": false,
"devices": [
{
"device_id": "device-abc-123",
"device_tier": "mid",
"compatible": false,
"format": "tflite",
"executor": "xnnpack",
"quantization": "float32",
"issues": [
{
"severity": "error",
"message": "Model size (4000 MB) exceeds device limit (500 MB)",
"suggestion": "Use a quantized variant (int8/int4) or a smaller model. Max for mid tier: 500 MB."
}
],
"performance": null,
"runtime_config": {
"engine": "xnnpack",
"format": "tflite",
"compute_units": "cpu_and_gpu"
}
}
],
"alternatives": [
{
"model_id": "model-def",
"model_name": "gemma-1b",
"version": "1.0.0",
"size_bytes": 200000000,
"reason": "Same family, fits within 500 MB limit",
"same_family": true,
"estimated_tokens_per_second": null
},
{
"model_id": "model-ghi",
"model_name": "phi-mini",
"version": "2.0.0",
"size_bytes": 150000000,
"reason": "Alternative model (150 MB), fits within 500 MB limit",
"same_family": false,
"estimated_tokens_per_second": null
}
]
}

Issue Severity

SeverityMeaningDeploy blocked?
errorModel cannot run on this deviceYes
warningModel can run but needs conversion or has caveatsNo

Common issues:

  • Model too large (error) -- model size exceeds device tier's maximum
  • Format not available (warning) -- the optimal format hasn't been converted yet, but conversion will happen during deployment

Alternative Ranking

When a model is incompatible, the system suggests alternatives using closest-first ranking:

  1. Same family, smaller size -- gemma-4b to gemma-1b
  2. Cross-family alternatives -- gemma-4b to phi-mini or smollm-360m

Within each group, alternatives are sorted by size (ascending). Up to 10 alternatives are returned.

Check and Deploy

Check compatibility and deploy using the API:

Check compatibility

curl -X POST https://api.octomil.com/api/v1/deploy/check-compatibility \
-H "Authorization: Bearer edg_..." \
-H "Content-Type: application/json" \
-d '{
"model_name": "gemma-4b",
"devices": ["device-abc-123", "device-def-456"]
}'

Execute deployment

If compatible, deploy:

curl -X POST https://api.octomil.com/api/v1/deploy/execute \
-H "Authorization: Bearer edg_..." \
-H "Content-Type: application/json" \
-d '{
"model_name": "gemma-4b",
"devices": ["device-abc-123", "device-def-456"]
}'

Or use the Python SDK:

from octomil import ModelRegistry

registry = ModelRegistry(api_key="edg_...")

# Check compatibility via the registry
# Deploy via the dashboard or REST API after confirming compatibility

Gotchas

  • Compatibility check does not reserve capacity — a successful check doesn't guarantee the device will have enough memory at deploy time. Another app or model could consume memory between check and deploy.
  • Device tier is based on specs, not state — a flagship device with 2GB free RAM is still classified as "flagship" (8GB+ total). The tier reflects hardware capability, not current availability.
  • Format conversion happens during deployment — if the optimal format (e.g., CoreML) hasn't been pre-converted, deployment triggers conversion. This adds minutes to the first deploy. Pre-convert by uploading with formats="coreml,tflite".
  • Alternatives are from your catalog only — the system suggests models you've already uploaded. It doesn't recommend models from Hugging Face or other sources.
  • Performance estimates require benchmark dataestimated_tokens_per_second is null if no one has submitted benchmarks for that model/device combination. Contribute with octomil benchmark --share.