Quickstart

Get up and running with on-device inference in a few minutes.

Install the CLI, import a model, connect an SDK, and open the dashboard.

1. Install the CLI ~30s

macOS / Linux
Homebrew
Windows

curl -fsSL https://get.octomil.com | sh

brew install octomil/octomil/octomil

irm https://get.octomil.com/install.ps1 | iex

octomil login

Your browser opens. Sign in with Google or create a passkey — the CLI receives your credentials automatically.

Try it now

See your model on a phone before writing any code. Run octomil deploy phi-4-mini --phone — a QR code appears, scan it, and the model runs on your device.

CI / SSH

In headless environments (CI, SSH, containers), octomil login falls back to a manual API key prompt. You can also pass --api-key directly or set OCTOMIL_API_KEY.

3. Push a model ~1 min

octomil push phi-4-mini

The CLI imports phi-4-mini from Hugging Face into your Octomil registry. The server handles conversion to device-ready formats such as CoreML and TFLite. After import, it prints a ready-to-paste SDK snippet with your apiKey and orgId.

Already have a local model file? Upload it directly:

octomil push ./model.safetensors --model-id phi-4-mini

Auto-quantization

Add --quantize to automatically optimize your model for on-device performance (e.g., octomil push phi-4-mini --quantize). Octomil selects the best quantization strategy based on target hardware. See Model Optimizer for details.

Explicit sources & versioning

Pin a specific source with hf:org/model. Version defaults to 1.0.0 — override with --version 2.0.0.

Smart routing — Once pushed, Octomil can route inference between device and cloud based on policy, device state, and query complexity. See Model Routing for configuration options.

4. Add the SDK to your app ~2 min

One dependency and a few lines of initialization.

iOS (Swift)
Android (Kotlin)
Python
Node.js

Add the package URL in Xcode:

https://github.com/octomil/octomil-ios.git

Or in Package.swift:

dependencies: [
    .package(url: "https://github.com/octomil/octomil-ios.git", from: "1.0.0")
]

Requires iOS 15+.

In build.gradle.kts:

dependencies {
    implementation("ai.octomil:octomil-android:1.0.0")
    implementation("com.google.ai.edge.litert:litert:2.0.0")
}

Requires API 26+ (Android 8.0).

curl -fsSL https://get.octomil.com | sh

Requires Python 3.9+.

pnpm install @octomil/sdk

Requires Node.js 18+.

Initialize the client, download the model, and run inference:

iOS (Swift)
Android (Kotlin)
Python
Node.js

import Octomil

let client = OctomilClient(apiKey: "oct_sk_live_...", orgId: "your-org-id")
try await client.register()
let model = try await client.downloadModel(modelId: "phi-4-mini")
let result = try model.predict(input: ["features": inputData])

import ai.octomil.OctomilClient

val client = OctomilClient(apiKey = "oct_sk_live_...", orgId = "your-org-id", context = this)
client.register()
val model = client.downloadModel(modelId = "phi-4-mini")
val result = model.predict(mapOf("features" to inputData))

import octomil

client = octomil.Client(api_key="oct_sk_live_...", org_id="your-org-id")
text = client.predict("phi-4-mini", [{"role": "user", "content": "Hello"}])
print(text)

import { OctomilClient } from "@octomil/sdk";

const client = new OctomilClient({ apiKey: "oct_sk_live_...", orgId: "your-org-id" });
const result = await client.predict("phi-4-mini", { text: "Hello" });
console.log(result.label, result.scores);

Get your credentials

octomil push prints a ready-to-paste SDK snippet with your real apiKey and orgId after every import.

The model caches on-device after the first download. Subsequent launches can reuse the cached copy, and many flows can keep serving locally when your routing policy allows it. The model caches on-device after the first download. Subsequent launches can reuse the cached copy and continue offline when your routing policy allows it.

Streaming inference — Building an LLM-powered app? Use model.stream() instead of model.predict() to receive tokens as they are generated. See Streaming Inference for usage and examples.

Embeddings — Need vector representations? Call model.embed(input) alongside predict() to generate embeddings for search, RAG, or similarity features. See Embeddings for supported models and dimensions.

Your model is live

Devices download the model on first launch and can keep serving from local cache. Here is what to do next.

Monitor your fleet

octomil dashboard

Opens app.octomil.com for device status, inference latency, error rates, and resource usage.

Monitoring dashboard — metrics, alerts, fleet health

Evaluate model quality

Verify that your on-device model matches cloud quality before rolling out widely:

octomil eval phi-4-mini --dataset my-eval-set

Runs your evaluation suite against the deployed model and reports accuracy, latency, and regression metrics. See Quality Evaluation for configuring eval datasets and thresholds.

Deploy to more devices

Roll your model out to a device group or your entire fleet:

octomil deploy phi-4-mini --group production --rollout 25

Start with 25%, watch metrics, then ramp further if the rollout stays healthy.

Rollouts — canary, blue-green, and immediate strategies
iOS SDK — background updates, pairing deep links, CoreML options
Android SDK — NNAPI acceleration, WorkManager scheduling

Train better models

Improve your model using on-device data without moving raw data off the device:

octomil train phi-4-mini --group production --rounds 10

Federated learning aggregates model updates from your fleet while data stays local.

Federated training — aggregation strategies, privacy, convergence
Personalization — per-device fine-tuning with Ditto (Python SDK)

Test what works

Compare model versions with A/B experiments across your fleet:

octomil experiment create --model-a phi-4-mini:v1 --model-b phi-4-mini:v2 --traffic 50

Experiments — traffic splits, metrics, significance testing

Ship it

Once you have a winner, promote it to your entire fleet:

octomil deploy phi-4-mini --version 2.0.0 --strategy immediate

CLI Reference — every command and flag
Supported Models — what runs, on what hardware

1. Install the CLI ~30s​

2. Sign in ~30s​

3. Push a model ~1 min​

4. Add the SDK to your app ~2 min​

Your model is live​

Monitor your fleet​

Evaluate model quality​

Deploy to more devices​

Train better models​

Test what works​

Ship it​