Skip to main content

Quickstart

Get up and running with on-device inference in 5 minutes.

Four steps. Two shell commands. A few lines of code.


1. Install the CLI ~30s

curl -fsSL https://octomil.com/install.sh | sh

2. Sign in ~30s

octomil login

Your browser opens. Sign in with Google or create a passkey — the CLI receives your credentials automatically.

Try it now

See your model on a phone before writing any code. Run octomil deploy phi-4-mini --phone — a QR code appears, scan it, and the model runs on your device.

CI / SSH

In headless environments (CI, SSH, containers), octomil login falls back to a manual API key prompt. You can also pass --api-key directly or set OCTOMIL_API_KEY.


3. Push a model ~1 min

octomil push phi-4-mini

The CLI imports phi-4-mini from HuggingFace into your Octomil registry. The server handles conversion to edge formats (CoreML, TFLite). After import, it prints a ready-to-paste SDK snippet with your real apiKey and orgId.

Already have a local model file? Upload it directly:

octomil push ./model.safetensors --model-id phi-4-mini
Auto-quantization

Add --quantize to automatically optimize your model for on-device performance (e.g., octomil push phi-4-mini --quantize). Octomil selects the best quantization strategy based on target hardware. See Model Optimizer for details.

Explicit sources & versioning

Pin a specific source with hf:org/model. Version defaults to 1.0.0 — override with --version 2.0.0.

Smart Routing — Once pushed, Octomil automatically routes inference between device and cloud based on model capabilities, input complexity, and device resources. Common queries run on-device; hard queries fall back to the cloud with no code changes. See Smart Routing for configuration options.


4. Add the SDK to your app ~2 min

One dependency. No config files.

Add the package URL in Xcode:

https://github.com/octomil-ai/octomil-ios.git

Or in Package.swift:

dependencies: [
.package(url: "https://github.com/octomil-ai/octomil-ios.git", from: "1.0.0")
]

Requires iOS 15+.

Initialize the client, download the model, and run inference:

import Octomil

let client = OctomilClient(apiKey: "edg_...", orgId: "your-org-id")
try await client.register()
let model = try await client.downloadModel(modelId: "phi-4-mini")
let result = try model.predict(input: ["features": inputData])
Get your credentials

octomil push prints a ready-to-paste SDK snippet with your real apiKey and orgId after every import.

The model caches on-device — subsequent launches are instant. Inference runs entirely on the device: zero latency, zero cost, fully offline.

Streaming Inference — Building an LLM-powered app? Use model.stream() instead of model.predict() to receive tokens as they are generated, enabling real-time chat UIs and progressive output. See Streaming Inference for usage and examples.

Embeddings — Need vector representations? Call model.embed(input) alongside predict() to generate embeddings for search, RAG, or similarity features. See Embeddings for supported models and dimensions.


Your model is live

Devices download it on first launch and run inference fully offline. Here's what to do next.

Monitor your fleet

octomil dashboard

Opens app.octomil.com — device status, inference latency, error rates, and resource usage in real time.

Evaluate model quality

Verify that your on-device model matches cloud quality before rolling out widely:

octomil eval phi-4-mini --dataset my-eval-set

Runs your evaluation suite against the deployed model and reports accuracy, latency, and regression metrics. See Quality Evaluation for configuring eval datasets and thresholds.

Deploy to more devices

Roll your model out to a device group or your entire fleet:

octomil deploy phi-4-mini --group production --rollout 25

Start with 25% canary, watch metrics, then ramp to 100%. Roll back instantly if anything looks off.

  • Rollouts — canary, blue-green, and immediate strategies
  • iOS SDK — background updates, pairing deep links, CoreML options
  • Android SDK — NNAPI acceleration, WorkManager scheduling

Train better models

Improve your model using on-device data — without it ever leaving the device:

octomil train phi-4-mini --group production --rounds 10

Federated learning aggregates gradient updates from your fleet. The model improves every round while user data stays private.

Test what works

Compare model versions with A/B experiments across your fleet:

octomil experiment create --model-a phi-4-mini:v1 --model-b phi-4-mini:v2 --traffic 50
  • Experiments — traffic splits, metrics, significance testing

Ship it

Once you have a winner, promote it to your entire fleet:

octomil deploy phi-4-mini --version 2.0.0 --strategy immediate