Skip to main content

Export Metrics

Octomil supports managed metrics export integrations so teams can route operational telemetry to existing observability tools.

Supported Integrations

  • Prometheus
  • Datadog
  • OpenTelemetry
  • StatsD / DogStatsD

The OTLP Collector integration provides a single endpoint for both metrics and log export. Instead of configuring metrics and logs separately, point Octomil at your OpenTelemetry collector and both streams are routed automatically.

How it works

When you connect an OTLP collector, Octomil creates two integrations under the hood:

  • Metrics integration — pushes to <endpoint>/v1/metrics
  • Logs integration — pushes to <endpoint>/v1/logs

Both integrations share the same endpoint, name, and authentication headers.

In addition to aggregated fleet metrics, Octomil forwards every raw SDK telemetry event to your collector in real-time. This means your backend receives per-request inference data instead of pre-bucketed summaries — you compute your own histograms, percentiles, and alerting thresholds.

What your collector receives

LogRecords (pushed to /v1/logs) — every SDK telemetry event as a structured log with all original attributes, timestamps, and trace/span IDs.

Gauge metrics (pushed to /v1/metrics) — numeric attributes extracted as named gauges:

Metric nameSource attributeDescription
octomil.inference.duration_msinference.duration_msEnd-to-end inference latency
octomil.inference.ttft_msinference.ttft_msTime to first token
octomil.inference.throughput_tpsinference.throughput_tpsTokens per second
octomil.inference.total_tokensinference.total_tokensTotal tokens generated
octomil.device.battery_levelbattery_levelRaw battery percentage (not bucketed)
octomil.device.memory_used_mbmemory_used_mbDevice memory usage

Every data point carries org.id, device.id, model.id, and event.name as attributes, so you can filter and group in your backend.

Example — querying forwarded metrics in Grafana:

# p99 inference latency per model over the last hour
histogram_quantile(0.99, rate(octomil_inference_duration_ms[1h])) by (model_id)

# Devices with battery below 20%
octomil_device_battery_level < 20

Example — querying forwarded logs in Grafana Loki:

{service_name="octomil"} | json | event_name="inference.completed"

Forwarding is fire-and-forget — it never blocks or delays event ingestion. If your collector is unreachable, events are dropped silently and the circuit breaker prevents retry storms.

Setup

Dashboard: Navigate to Settings > OTLP Collector card and enter your collector URL.

CLI:

octomil integrations connect-otlp --endpoint http://otel-collector:4318

Python SDK:

api.connect_otlp_collector("Production Grafana", "http://collector:4318")

Node SDK:

await client.integrations.connectOtlpCollector({
name: "Production Grafana",
endpoint: "http://collector:4318",
});

Compatible collectors

The OTLP integration works with any collector that accepts OTLP/HTTP:

  • Grafana Alloy
  • Datadog Agent (with OTLP receiver)
  • New Relic OTLP endpoint
  • Honeycomb
  • Any OpenTelemetry Collector distribution

Configure in Octomil

  1. Open Settings.
  2. Go to Metrics export.
  3. Click Add integration.
  4. Select provider and enter connection details.
  5. Run Test.
  6. Save and enable.

Configure via SDK

Manage integrations programmatically with the Python or Node SDKs.

from octomil import Octomil

api = Octomil(api_key="edg_...", org_id="your-org-id")

# List integrations
metrics_integrations = api.integrations.list_metrics_integrations()
log_integrations = api.integrations.list_log_integrations()

# Create a metrics integration
api.integrations.create_metrics_integration(
name="Prod Prometheus",
integration_type="prometheus",
config={"prefix": "octomil", "scrape_interval": 30},
)

# Test an integration
api.integrations.test_metrics_integration(integration_id)

# Delete an integration
api.integrations.delete_metrics_integration(integration_id)

Configure via CLI

# List all configured integrations
octomil integrations list

# Connect OTLP collector (metrics + logs in one step)
octomil integrations connect-otlp --endpoint http://collector:4318

# Create individual integrations
octomil integrations create --kind metrics --type prometheus --name prod-prom \
--config-json '{"prefix":"octomil"}'

# Test an integration
octomil integrations test <id> --kind metrics

# Delete an integration
octomil integrations delete <id> --kind metrics

See the CLI Reference for the full list of options.

Metric Families

  • SLO metrics
  • Infrastructure health metrics
  • Device fleet metrics
  • SDK stability metrics
  • Training operation metrics

Best Practices

  1. Use naming/tag conventions by environment (dev, staging, prod).
  2. Avoid high-cardinality dimensions.
  3. Set alert thresholds for crash rate, completion rate, and SLO attainment.
  4. Rotate integration credentials on a fixed schedule.

Troubleshooting

No metrics arriving

Check:

  • integration is enabled
  • credentials/endpoint are valid
  • test action succeeds in Settings

Delayed or sparse metrics

Check:

  • export cadence in your observability stack
  • dashboard query windows and aggregation settings