Skip to main content

Octomil API

Welcome to the Octomil API. Use the switcher below to choose the right surface for your use case.

  • Base URL: http://localhost:8080/v1
  • Use for: prompt/response inference on local or edge runtimes
  • Primary endpoint: Inference

Start locally:

octomil serve phi-4-mini

Get Started

If you're just getting started, follow the quickstart.

Libraries

Octomil local inference is compatible with OpenAI client libraries.

pip install openai
import openai

client = openai.OpenAI(
base_url="http://localhost:8080/v1",
api_key="not-needed",
)

Control Plane Details

Authentication

All control-plane endpoints (except device registration) require Authorization: Bearer <token>.

Store your API key securely

Device API keys are returned once during registration and cannot be retrieved later.

Rate Limiting

Authenticated control-plane endpoints are rate-limited to 100 requests per minute per device.

Error Shape

{
"error": "bad_request",
"message": "Validation failed: 'device_id' is required.",
"status_code": 400
}
CodeErrorDescription
400bad_requestInvalid or missing request fields
401unauthorizedMissing or invalid API key
404not_foundResource does not exist
409conflictResource already exists or state conflict
429rate_limitedToo many requests; check Retry-After
500internal_errorUnexpected server error

OpenAPI

Download openapi.yaml