Deployment Validation
The compatibility check endpoint validates whether a model can run on target devices before you deploy. When a model doesn't fit, it suggests alternatives ranked by relevance.
Quick Start
- cURL
- Python
- JavaScript
curl -X POST https://api.octomil.com/api/v1/deploy/check-compatibility \
-H "Authorization: Bearer <token>" \
-H "Content-Type: application/json" \
-d '{
"model_name": "gemma-4b",
"devices": ["device-abc-123"]
}'
import requests
response = requests.post(
"https://api.octomil.com/api/v1/deploy/check-compatibility",
headers={"Authorization": "Bearer <token>"},
json={
"model_name": "gemma-4b",
"devices": ["device-abc-123"],
},
)
print(response.json())
const response = await fetch(
"https://api.octomil.com/api/v1/deploy/check-compatibility",
{
method: "POST",
headers: {
"Authorization": "Bearer <token>",
"Content-Type": "application/json",
},
body: JSON.stringify({
model_name: "gemma-4b",
devices: ["device-abc-123"],
}),
}
);
const data = await response.json();
console.log(data);
How It Works
For each target device, the compatibility check:
- Classifies the device tier (flagship, high, mid, low) based on RAM, NPU, and chip family
- Resolves the optimal format (CoreML, TFLite, GGUF) and executor (ANE, NNAPI, XNNPACK, Metal)
- Checks model size against the device's maximum capacity
- Checks format availability and flags if conversion is needed
- Estimates performance (tok/s, latency) from benchmark data when available
- Finds alternatives if the model doesn't fit
Device Tiers
| Tier | RAM | Examples | Max Model Size |
|---|---|---|---|
| Flagship | 8 GB+ | iPhone 15 Pro, Galaxy S24, Pixel 9 | 2048 MB |
| High | 6 GB+ | iPhone 14, Galaxy S22, Pixel 7 | 1024 MB |
| Mid | 4-6 GB | Mid-range Android devices | 500 MB |
| Low | <4 GB | Budget phones | 200 MB |
Response Structure
{
"model_id": "model-abc",
"model_name": "gemma-4b",
"model_version": "1.0.0",
"model_size_bytes": 4200000000,
"all_compatible": false,
"devices": [
{
"device_id": "device-abc-123",
"device_tier": "mid",
"compatible": false,
"format": "tflite",
"executor": "xnnpack",
"quantization": "float32",
"issues": [
{
"severity": "error",
"message": "Model size (4000 MB) exceeds device limit (500 MB)",
"suggestion": "Use a quantized variant (int8/int4) or a smaller model. Max for mid tier: 500 MB."
}
],
"performance": null,
"runtime_config": {
"engine": "xnnpack",
"format": "tflite",
"compute_units": "cpu_and_gpu"
}
}
],
"alternatives": [
{
"model_id": "model-def",
"model_name": "gemma-1b",
"version": "1.0.0",
"size_bytes": 200000000,
"reason": "Same family, fits within 500 MB limit",
"same_family": true,
"estimated_tokens_per_second": null
},
{
"model_id": "model-ghi",
"model_name": "phi-mini",
"version": "2.0.0",
"size_bytes": 150000000,
"reason": "Alternative model (150 MB), fits within 500 MB limit",
"same_family": false,
"estimated_tokens_per_second": null
}
]
}
Issue Severity
| Severity | Meaning | Deploy blocked? |
|---|---|---|
error | Model cannot run on this device | Yes |
warning | Model can run but needs conversion or has caveats | No |
Common issues:
- Model too large (error) -- model size exceeds device tier's maximum
- Format not available (warning) -- the optimal format hasn't been converted yet, but conversion will happen during deployment
Alternative Ranking
When a model is incompatible, the system suggests alternatives using closest-first ranking:
- Same family, smaller size --
gemma-4btogemma-1b - Cross-family alternatives --
gemma-4btophi-miniorsmollm-360m
Within each group, alternatives are sorted by size (ascending). Up to 10 alternatives are returned.
Check and Deploy
Check compatibility and deploy using the API:
Check compatibility
- cURL
- Python
- JavaScript
curl -X POST https://api.octomil.com/api/v1/deploy/check-compatibility \
-H "Authorization: Bearer edg_..." \
-H "Content-Type: application/json" \
-d '{
"model_name": "gemma-4b",
"devices": ["device-abc-123", "device-def-456"]
}'
import requests
response = requests.post(
"https://api.octomil.com/api/v1/deploy/check-compatibility",
headers={"Authorization": "Bearer edg_..."},
json={
"model_name": "gemma-4b",
"devices": ["device-abc-123", "device-def-456"],
},
)
print(response.json())
const response = await fetch(
"https://api.octomil.com/api/v1/deploy/check-compatibility",
{
method: "POST",
headers: {
"Authorization": "Bearer edg_...",
"Content-Type": "application/json",
},
body: JSON.stringify({
model_name: "gemma-4b",
devices: ["device-abc-123", "device-def-456"],
}),
}
);
const data = await response.json();
console.log(data);
Execute deployment
If compatible, deploy:
- cURL
- Python
- JavaScript
curl -X POST https://api.octomil.com/api/v1/deploy/execute \
-H "Authorization: Bearer edg_..." \
-H "Content-Type: application/json" \
-d '{
"model_name": "gemma-4b",
"devices": ["device-abc-123", "device-def-456"]
}'
import requests
response = requests.post(
"https://api.octomil.com/api/v1/deploy/execute",
headers={"Authorization": "Bearer edg_..."},
json={
"model_name": "gemma-4b",
"devices": ["device-abc-123", "device-def-456"],
},
)
print(response.json())
const response = await fetch(
"https://api.octomil.com/api/v1/deploy/execute",
{
method: "POST",
headers: {
"Authorization": "Bearer edg_...",
"Content-Type": "application/json",
},
body: JSON.stringify({
model_name: "gemma-4b",
devices: ["device-abc-123", "device-def-456"],
}),
}
);
const data = await response.json();
console.log(data);
Or use the Python SDK:
from octomil import ModelRegistry
registry = ModelRegistry(api_key="edg_...")
# Check compatibility via the registry
# Deploy via the dashboard or REST API after confirming compatibility
Gotchas
- Compatibility check does not reserve capacity — a successful check doesn't guarantee the device will have enough memory at deploy time. Another app or model could consume memory between check and deploy.
- Device tier is based on specs, not state — a flagship device with 2GB free RAM is still classified as "flagship" (8GB+ total). The tier reflects hardware capability, not current availability.
- Format conversion happens during deployment — if the optimal format (e.g., CoreML) hasn't been pre-converted, deployment triggers conversion. This adds minutes to the first deploy. Pre-convert by uploading with
formats="coreml,tflite". - Alternatives are from your catalog only — the system suggests models you've already uploaded. It doesn't recommend models from Hugging Face or other sources.
- Performance estimates require benchmark data —
estimated_tokens_per_secondisnullif no one has submitted benchmarks for that model/device combination. Contribute withoctomil benchmark --share.
Related
- Rollouts — gradual deployment after compatibility is confirmed
- Device Profiling — benchmark data that feeds performance estimates
- Model Catalog — model versioning and available formats
- Local Inference — local inference with engine auto-selection