Device Targeting

Octomil analyzes your inference telemetry to identify models that would benefit from on-device deployment. When a cloud-served model accumulates enough request volume and your device fleet has compatible hardware, Octomil generates a recommendation with a cost savings estimate and a confidence score.

How It Works

The recommendation engine runs periodically against your telemetry data and evaluates each model along three dimensions:

Request volume -- models with high daily request counts have the most to gain from moving to device.
Device diversity -- models requested from many distinct devices are good candidates because deployment reaches more users.
Fleet size -- a larger fleet of compatible devices increases the potential cost savings.

Analysis Criteria

A model is evaluated for recommendation when it exceeds the minimum daily request threshold (default: 1,000 requests/day). The engine then:

Calculates current cloud inference cost using the configured cost-per-millisecond rate
Estimates on-device cost savings accounting for an adoption factor (not all devices will be online)
Checks device compatibility against the model's size and format requirements
Produces a confidence score and recommendation type

Recommendation Types

Type	Confidence Range	Meaning
`deploy_to_device`	> 0.8	High confidence. The model is a strong candidate for on-device deployment.
`canary_test`	0.5 -- 0.8	Moderate confidence. Recommend starting with a canary rollout to validate.
`monitor`	< 0.5	Low confidence. Continue collecting telemetry before deciding.

Confidence Scoring

The confidence score is a weighted composite of three signals:

Signal	Weight	What It Measures
Request volume	40%	Daily request count relative to the minimum threshold. Higher volume increases confidence.
Device diversity	30%	Number of unique devices requesting the model. More devices means broader deployment impact.
Fleet size	30%	Total compatible devices in the fleet. Larger fleets amplify cost savings.

The score is normalized to 0.0 -- 1.0. A score of 1.0 means the model has high volume, is requested from many devices, and the fleet is large enough to absorb the workload.

Cost Model

The cost savings estimate uses a straightforward formula:

estimated_monthly_savings = daily_requests * avg_latency_ms * cost_per_ms * 30 * adoption_factor

Parameter	Default	Description
`cost_per_ms`	$0.000003	Cloud inference cost per millisecond of compute time
`adoption_factor`	0.7	Fraction of devices expected to successfully adopt the on-device model

These values are configurable. See Configuration below.

API Endpoints

GET /api/v1/recommendations

List all current recommendations.

cURL
Python
JavaScript

curl -H "Authorization: Bearer <token>" \
  https://api.octomil.com/api/v1/recommendations

import requests

response = requests.get(
    "https://api.octomil.com/api/v1/recommendations",
    headers={"Authorization": "Bearer <token>"},
)
print(response.json())

const response = await fetch(
  "https://api.octomil.com/api/v1/recommendations",
  {
    headers: { "Authorization": "Bearer <token>" },
  }
);
const data = await response.json();
console.log(data);

Response:

{
  "recommendations": [
    {
      "model_id": "text-classifier-v3",
      "recommendation_type": "deploy_to_device",
      "confidence": 0.87,
      "daily_requests": 45200,
      "avg_latency_ms": 12.3,
      "estimated_monthly_savings_usd": 142.80,
      "compatible_devices": 1823,
      "unique_requesting_devices": 412,
      "analysis_window_days": 7,
      "created_at": "2026-02-19T08:00:00Z"
    },
    {
      "model_id": "sentiment-v2",
      "recommendation_type": "canary_test",
      "confidence": 0.64,
      "daily_requests": 8300,
      "avg_latency_ms": 18.7,
      "estimated_monthly_savings_usd": 28.05,
      "compatible_devices": 945,
      "unique_requesting_devices": 87,
      "analysis_window_days": 7,
      "created_at": "2026-02-19T08:00:00Z"
    }
  ]
}

GET /api/v1/recommendations/{model_id}

Get the recommendation for a specific model.

cURL
Python
JavaScript

curl -H "Authorization: Bearer <token>" \
  https://api.octomil.com/api/v1/recommendations/text-classifier-v3

import requests

response = requests.get(
    "https://api.octomil.com/api/v1/recommendations/text-classifier-v3",
    headers={"Authorization": "Bearer <token>"},
)
print(response.json())

const response = await fetch(
  "https://api.octomil.com/api/v1/recommendations/text-classifier-v3",
  {
    headers: { "Authorization": "Bearer <token>" },
  }
);
const data = await response.json();
console.log(data);

Response:

{
  "model_id": "text-classifier-v3",
  "recommendation_type": "deploy_to_device",
  "confidence": 0.87,
  "daily_requests": 45200,
  "avg_latency_ms": 12.3,
  "estimated_monthly_savings_usd": 142.80,
  "compatible_devices": 1823,
  "unique_requesting_devices": 412,
  "analysis_window_days": 7,
  "created_at": "2026-02-19T08:00:00Z",
  "details": {
    "volume_score": 0.95,
    "diversity_score": 0.82,
    "fleet_score": 0.78,
    "cost_breakdown": {
      "current_monthly_cloud_cost_usd": 203.99,
      "projected_monthly_device_cost_usd": 61.19,
      "savings_usd": 142.80,
      "adoption_factor": 0.7
    }
  }
}

POST /api/v1/recommendations/{model_id}/deploy

Trigger a canary deployment for a recommended model. This creates a rollout that starts sending a percentage of traffic to on-device inference.

cURL
Python
JavaScript

curl -X POST \
  -H "Authorization: Bearer <token>" \
  -H "Content-Type: application/json" \
  https://api.octomil.com/api/v1/recommendations/text-classifier-v3/deploy

import requests

response = requests.post(
    "https://api.octomil.com/api/v1/recommendations/text-classifier-v3/deploy",
    headers={"Authorization": "Bearer <token>"},
)
print(response.json())

const response = await fetch(
  "https://api.octomil.com/api/v1/recommendations/text-classifier-v3/deploy",
  {
    method: "POST",
    headers: {
      "Authorization": "Bearer <token>",
      "Content-Type": "application/json",
    },
  }
);
const data = await response.json();
console.log(data);

Response:

{
  "rollout_id": "rol_a1b2c3d4",
  "model_id": "text-classifier-v3",
  "status": "canary",
  "canary_percentage": 10,
  "created_at": "2026-02-19T14:30:00Z"
}

The canary starts at 10% of eligible devices. Monitor the rollout in the Rollouts dashboard and promote or roll back as needed.

The Monitoring page includes a Recommendations panel that shows:

Models with active recommendations, sorted by estimated savings
Recommendation type badge (deploy_to_device, canary_test, monitor)
Estimated monthly savings in USD
A Deploy button for models with deploy_to_device or canary_test recommendations

The Deploy button triggers the /deploy endpoint and creates a canary rollout directly from the dashboard.

Configuration

Configure the recommendation engine using environment variables.

Variable	Default	Description
`OCTOMIL_CLOUD_COST_PER_MS`	`0.000003`	Cost in USD per millisecond of cloud inference compute
`OCTOMIL_RECOMMEND_MIN_DAILY_REQUESTS`	`1000`	Minimum daily requests before a model is evaluated
`OCTOMIL_RECOMMEND_ADOPTION_FACTOR`	`0.7`	Expected fraction of compatible devices that will run the on-device model
`OCTOMIL_RECOMMEND_LOOKBACK_DAYS`	`7`	Number of days of telemetry to analyze

Example:

export OCTOMIL_CLOUD_COST_PER_MS=0.000005
export OCTOMIL_RECOMMEND_MIN_DAILY_REQUESTS=500
export OCTOMIL_RECOMMEND_ADOPTION_FACTOR=0.6
export OCTOMIL_RECOMMEND_LOOKBACK_DAYS=14

Workflow

A typical workflow using recommendations:

Enable telemetry on your inference server with octomil serve --api-key <your-api-key>.
Wait for data -- the engine needs at least one lookback window of telemetry data.
Review recommendations in the Monitoring dashboard or via the API.
Deploy a canary for high-confidence recommendations using the Deploy button or the API.
Monitor the canary in the Rollouts dashboard. Check error rates and latency on devices.
Promote or roll back based on canary results.

Octomil Serve -- local inference server setup
Telemetry and Observability -- how telemetry data is collected
Model Rollouts -- canary and gradual deployment
Monitoring Dashboard -- view recommendations in the dashboard

How It Works​

Analysis Criteria​

Recommendation Types​

Confidence Scoring​

Cost Model​

API Endpoints​

GET /api/v1/recommendations​

GET /api/v1/recommendations/{model_id}​

POST /api/v1/recommendations/{model_id}/deploy​

Dashboard Widget​

Configuration​

Workflow​

Related Docs​

How It Works

Analysis Criteria

Recommendation Types

Confidence Scoring

Cost Model

API Endpoints

GET /api/v1/recommendations

GET /api/v1/recommendations/{model_id}

POST /api/v1/recommendations/{model_id}/deploy

Dashboard Widget

Configuration

Workflow

Related Docs