Skip to main content

CLI Reference

Installation

curl -fsSL https://octomil.com/install.sh | sh

Downloads a standalone binary — no Python required.

Verify:

octomil --version

Commands

octomil serve

Start a local OpenAI-compatible inference server.

octomil serve <model> [options]
OptionDefaultDescription
--port, -p8080Port to listen on
--host0.0.0.0Host to bind to
--engine, -eautoForce engine (mlx-lm, llama.cpp, mnn, mlc-llm, onnxruntime)
--benchmarkoffRun latency benchmark on startup
--shareoffShare anonymous benchmark data with Octomil Cloud
--json-modeoffDefault all responses to JSON output
--cache-size2048KV cache size in MB
--no-cacheoffDisable KV cache
--max-queue32Max pending requests in queue (0 to disable)
--models-Comma-separated models for multi-model serving
--auto-routeoffEnable automatic query routing (requires --models)
--route-strategycomplexityRouting strategy for --auto-route
# Basic usage
octomil serve phi-mini

# Model with specific quantization (model:variant syntax)
octomil serve gemma-3b:4bit
octomil serve gemma-3b:8bit

# Specific engine + port
octomil serve gemma-1b --engine llama.cpp --port 8080

# Multi-model with routing
octomil serve smollm-360m --models smollm-360m,phi-mini,llama-3b --auto-route

# JSON mode
octomil serve gemma-1b --json-mode

# Speech-to-text (Whisper)
octomil serve whisper-base
octomil serve whisper-large-v3

API usage

Once the server is running, use any OpenAI-compatible client:

curl http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "phi-mini",
"messages": [{"role": "user", "content": "Hello"}]
}'

octomil benchmark

Run inference benchmarks on a model.

octomil benchmark <model> [options]
OptionDefaultDescription
--localoffKeep results local (don't upload)
--iterations, -n10Number of inference iterations
--max-tokens50Max tokens per iteration
--engine, -eautoForce a specific engine
--all-enginesoffBenchmark all available engines
octomil benchmark gemma-1b --all-engines
octomil benchmark phi-mini --iterations 20 --local

Every benchmark run contributes anonymous performance data to the Octomil community leaderboard — helping everyone find the fastest engine for their hardware. Your results improve routing decisions and benchmark rankings for all users.

Never shared: prompts, model outputs, model files/weights, IP address, device IDs, or user profile data.

Shared (anonymous hardware telemetry): model name, backend/runtime, platform + architecture, OS version, accelerator type, total RAM, iteration count, latency stats (avg/min/max/p50/p90/p95/p99), TTFT, TPOT, throughput, and peak memory.

Opt out anytime: pass --local to keep results on your machine, or disable telemetry globally in your dashboard settings. No data is uploaded without an active API key (octomil login or OCTOMIL_API_KEY).

octomil deploy

Deploy a model to devices.

octomil deploy <name> [options]
OptionDefaultDescription
--version, -vlatestVersion to deploy
--phoneoffDeploy to connected phone
--rollout, -r100Rollout percentage (1-100)
--strategy, -scanaryStrategy: canary, immediate, blue_green
--target, -t-Target formats: ios, android
--devices-Comma-separated device IDs
--group, -g-Device group name
--dry-runoffPreview without deploying
# Deploy to phone (app-install QR -> pairing QR -> multi-device select)
octomil deploy phi-4-mini --phone

# Canary rollout to 10%
octomil deploy sentiment-v1 --rollout 10 --strategy canary

# Target specific devices
octomil deploy gemma-1b --devices device_1,device_2

# Dry run
octomil deploy gemma-1b --group production --dry-run

octomil login

Authenticate with Octomil Cloud.

octomil login [options]
OptionDefaultDescription
--api-key-Paste API key directly (skip browser)
# Browser-based (default)
octomil login

# Direct API key (CI/headless)
octomil login --api-key edg_...

Or set the environment variable:

export OCTOMIL_API_KEY=edg_...

octomil push

Upload model artifacts to the registry. Auto-downloads and converts if the model isn't local.

octomil push [path] --model-id <id> --version <version>
OptionDefaultDescription
path (positional)-Path to artifacts or model name (phi-4-mini, hf:org/model, ollama:name)
--model-id, -minferredModel ID in the registry
--version, -vrequiredSemantic version (e.g. 1.0.0)
--quantize, -q-Quantize models before pushing: auto, int8, int4, dynamic, float16
--quality-threshold-Reject quantized models if quality drops below this value (0.0-1.0)
# Push local artifacts
octomil push ./converted --model-id sentiment-v1 --version 1.0.0

# Auto-download, convert, and push (no local files needed)
octomil push phi-4-mini --version 1.0.0

# Explicit source
octomil push hf:microsoft/Phi-4-mini --version 1.0.0
octomil push ollama:phi4-mini --version 1.0.0

# Push with quantization
octomil push ./converted --model-id sentiment-v1 --version 2.0.0 --quantize int8

# Quantize with quality gate
octomil push phi-4-mini --version 1.0.0 --quantize auto --quality-threshold 0.95

octomil pull

Download a model from the registry.

octomil pull <name> [options]
OptionDefaultDescription
--version, -vlatestVersion to download
--format-Model format (onnx, coreml, tflite)
--output, -o.Output directory
octomil pull sentiment-v1 --version 1.0.0 --format coreml

octomil convert

Convert a model to edge formats locally. This is the primary conversion path -- all conversion runs on your machine, no server round-trip required.

octomil convert <model_path> [options]
OptionDefaultDescription
--formats, -fonnxTarget formats: onnx, coreml, tflite (comma-separated)
--output, -o./convertedOutput directory
--input-shape1,3,224,224Input tensor shape
--pushoffUpload converted artifacts to the registry after conversion
--validate / --no-validate--validateRun validation checks on converted artifacts
# Convert locally
octomil convert model.pt --formats onnx,coreml,tflite --output converted_models

# Convert and push to registry in one step
octomil convert model.pt --formats onnx,coreml,tflite --push --model-id sentiment-v1 --version 1.0.0

# Skip validation (faster, for CI pipelines)
octomil convert model.pt --formats onnx,coreml --no-validate

octomil eval

Run quality evaluation comparing cloud vs on-device inference. Sends test inputs to the server's eval endpoint and reports whether the model meets a quality threshold.

octomil eval <model_id> --test-data <path> [options]
OptionDefaultDescription
model_id (positional)requiredModel ID to evaluate
--test-data, -drequiredPath to JSONL file with test inputs
--threshold, -t0.95Quality threshold (0.0-1.0)
--api-basehttp://localhost:8000Server API base URL (also reads OCTOMIL_API_BASE)
--metrics, -msimilarity,exact_match,latencyComma-separated metrics to compute

The test data file is JSONL format. Each line is a JSON object with at least an "input" key and optionally an "expected_output" key:

{"input": "Great product, fast shipping", "expected_output": "positive"}
{"input": "Terrible experience, would not buy again", "expected_output": "negative"}
{"input": "It was okay", "expected_output": "neutral"}
# Basic quality eval
octomil eval sentiment-v1 --test-data tests.jsonl

# Custom threshold and metrics
octomil eval sentiment-v1 --test-data tests.jsonl --threshold 0.90 --metrics similarity,latency

# Against a remote server
octomil eval sentiment-v1 --test-data tests.jsonl --api-base https://api.octomil.com

The command exits with code 1 if the quality threshold is not met, making it suitable for CI pipelines. Output includes overall score, per-metric breakdowns, and statistical significance (p-value, effect size) when available.

octomil quantize

Quantize a model for edge deployment without pushing to the registry. Supports ONNX, TFLite, CoreML, and GGUF formats.

octomil quantize <model_path> [options]
OptionDefaultDescription
model_path (positional)requiredPath to model file (.onnx, .tflite, .mlpackage, .mlmodel, .gguf) or directory
--method, -mautoQuantization method: auto, int8, int4, dynamic, float16
--output, -o<model_dir>/quantizedOutput directory for quantized models
--quality-threshold-Reject quantized models if quality score drops below this value (0.0-1.0)
# Auto-select best quantization method
octomil quantize model.onnx

# Specific method
octomil quantize model.onnx --method int8

# Custom output directory
octomil quantize model.onnx --method int4 --output ./optimized

# With quality gate — rejects if quality drops too far
octomil quantize model.onnx --method auto --quality-threshold 0.95

# Quantize all models in a directory
octomil quantize ./models/ --method float16 --output ./quantized

Output reports size reduction, compression ratio, and quality scores (when --quality-threshold is set) for each processed file.

octomil check

Check device compatibility for a local model file.

octomil check <model_path> [options]
OptionDefaultDescription
--devices, -d-Device profiles (e.g. iphone_15_pro,pixel_8)
octomil check model.onnx --devices iphone_15_pro,pixel_8

octomil list models

List available models with their variants and supported engines.

octomil list models

Output shows all available models, quantization variants, and which engines support each:

Model               Variants          Engines
gemma-1b 4bit, 8bit mlx, mnn, mlc-llm, llama.cpp, onnxruntime
gemma-4b 4bit, 8bit mlx, mnn, mlc-llm, llama.cpp, onnxruntime
phi-4-mini 4bit, 8bit mlx, mnn, mlc-llm, llama.cpp
llama-3.2-1b 4bit, 8bit mlx, mnn, mlc-llm, llama.cpp, onnxruntime
llama-3.2-3b 4bit, 8bit mlx, mnn, mlc-llm, llama.cpp, onnxruntime
whisper-tiny fp16 whisper.cpp
whisper-base fp16 whisper.cpp
whisper-small fp16 whisper.cpp
whisper-medium fp16 whisper.cpp
whisper-large-v3 fp16 whisper.cpp
...

octomil scan

Scan the local network for Octomil inference servers and devices.

octomil scan [options]
OptionDefaultDescription
--timeout5Scan timeout in seconds
octomil scan
# Found 2 Octomil instances:
# 192.168.1.42:8000 — phi-4-mini on mlx (58 tok/s)
# 192.168.1.100:8000 — gemma-1b on llama.cpp (34 tok/s)

octomil status

Show deployment status for a model.

octomil status <name>

octomil dashboard

Open the Octomil dashboard in your browser.

octomil dashboard

octomil init

Initialize an Octomil organization for enterprise use.

octomil init <org_name> [options]
OptionDefaultDescription
--compliance-Compliance preset: hipaa, gdpr, pci, soc2
--regionusData region: us, eu, ap
--api-base-Override API base URL
octomil init "Acme Corp" --compliance hipaa --region us

octomil org

Show current organization info and settings.

octomil org

octomil demo code-assistant

Interactive code assistant powered by a local LLM.

octomil demo code-assistant [options]
OptionDefaultDescription
--model, -mautoModel to serve
--url-Connect to existing server
--port, -p8099Port for auto-started server
--no-auto-startoffDon't auto-start server
octomil demo code-assistant
octomil demo code-assistant --model phi-mini

octomil launch

Launch a coding agent powered by a local model. Starts octomil serve in the background (if not already running) and configures the agent to use the local endpoint.

octomil launch <agent> [options]
ArgumentDescription
claudeLaunch Claude Code with local backend
codexLaunch OpenAI Codex CLI
openclawLaunch OpenClaw agent
aiderLaunch Aider coding assistant
OptionDefaultDescription
--model, -mqwen3Model to serve
--port, -p8080Port for local server
octomil launch claude
octomil launch aider --model deepseek-coder-v2
octomil launch codex --model codestral

octomil models

List available models from ollama and the Octomil registry.

octomil models [options]
OptionDefaultDescription
--sourceallFilter source: all, ollama, registry
octomil models
octomil models --source ollama

octomil rollback

Rollback a model to a previous version.

octomil rollback <name> [options]
OptionDefaultDescription
--to-versionpreviousVersion to rollback to
# Rollback to the previous version
octomil rollback sentiment-v1

# Rollback to a specific version
octomil rollback sentiment-v1 --to-version 1.0.0

octomil pair

Connect to a pairing session as a device. Enter the code displayed by octomil deploy --phone to receive the model deployment.

octomil pair <code> [options]
OptionDefaultDescription
--device-idautoDevice identifier
--platform, -pautoDevice platform: ios, android, python
--device-name-Friendly device name
octomil pair ABC123
octomil pair ABC123 --device-name "Test iPhone"

octomil team

Manage organization team members.

octomil team <subcommand>
SubcommandDescription
add <email>Invite a team member
listList team members
set-policySet organization security policies
Option (add)DefaultDescription
--rolememberRole: admin, member, viewer
Option (set-policy)DefaultDescription
--require-mfaoffRequire MFA for all members
--session-hours24Session duration in hours
octomil team add alice@acme.com --role admin
octomil team list
octomil team set-policy --require-mfa --session-hours 8

octomil keys

Manage API keys.

octomil keys <subcommand>
SubcommandDescription
create <name>Create a new API key
listList API keys
revoke <key_id>Revoke an API key
Option (create)DefaultDescription
--scope-Permission scope (repeatable): devices:read, devices:write, models:read, models:write, training:read, training:write
--expires-Expiration (e.g. 30d, 90d)
octomil keys create deploy-key --scope devices:write --scope models:read
octomil keys list
octomil keys revoke key_abc123

octomil train

Federated training across deployed devices.

octomil train <subcommand>
SubcommandDescription
start <model>Start federated training
status <model>Show training progress
stop <model>Stop active training
Option (start)DefaultDescription
--strategyfedavgAggregation strategy: fedavg, fedprox, scaffold, krum, fedmedian, fedtrimmedavg, fedopt, fedadam, ditto
--rounds10Number of training rounds
--min-devices2Minimum devices per round
--group-Device group to train with
octomil train start sentiment-v1 --strategy fedavg --rounds 50
octomil train start sentiment-v1 --strategy scaffold --group production
octomil train status sentiment-v1
octomil train stop sentiment-v1

octomil federation

Manage cross-organization federations.

octomil federation <subcommand>
SubcommandDescription
create <name>Create a new federation
invite <name> <org_ids>Invite organizations
join <name>Join a federation
listList federations
show <name>Show federation details
members <name>List federation members
share <model> <federation>Share a model with a federation
octomil federation create "healthcare-consortium"
octomil federation invite "healthcare-consortium" org_123 org_456
octomil federation share phi-mini "healthcare-consortium"

octomil integrations

Manage observability export integrations (metrics + logs).

octomil integrations <subcommand>
SubcommandDescription
listList all configured integrations
createCreate a metrics or log integration
delete <id>Delete an integration
test <id>Test an integration
connect-otlpConnect an OTLP collector for both metrics and logs
Option (list)DefaultDescription
--typeallFilter: metrics, logs, all
--jsonoffOutput as JSON
Option (connect-otlp)DefaultDescription
--endpointrequiredOTLP collector URL (e.g. http://collector:4318)
--nameOTLP CollectorDisplay name
--headers-json-Auth headers as JSON
# List all integrations
octomil integrations list

# Connect OTLP collector (recommended — configures metrics + logs)
octomil integrations connect-otlp --endpoint http://otel-collector:4318

# With auth headers
octomil integrations connect-otlp --endpoint https://otlp.grafana.net \
--headers-json '{"Authorization": "Basic abc123"}'

# Create individual integrations
octomil integrations create --kind metrics --type prometheus --name prod-prom \
--config-json '{"prefix": "octomil"}'

octomil integrations create --kind logs --type splunk --name prod-splunk \
--endpoint https://splunk.example.com/services/collector --format hec

# Test and delete
octomil integrations test int_abc123 --kind metrics
octomil integrations delete int_abc123 --kind metrics

Environment variables

VariableDescription
OCTOMIL_API_KEYAPI key for Octomil Cloud
OCTOMIL_API_BASEOverride API base URL
OCTOMIL_DASHBOARD_URLDashboard URL for browser login (default: https://app.octomil.com)
OCTOMIL_MODELDefault model for demo/serve

Config files

PathDescription
~/.octomil/credentialsAPI key + org from octomil login (JSON)
~/.octomil/config.jsonOrganization settings from octomil init
~/.octomil/models/Downloaded model cache