runtime
📄️ Submit real inference benchmark results from a device.
Submit real inference benchmark results from a device.
📄️ Returns public engine defaults, supported capabilities, and routing policies. No authentication is required. SDKs call t
Returns public engine defaults, supported capabilities, and routing policies. No authentication is required. SDKs call this on first launch to populate defaults without needing a full plan request. Response is static for a given server version and is safe to cache for plan_ttl_seconds.
📄️ Retrieve a single route telemetry event by route_id. Scoped to the authenticated caller's org_id. Used by the dashboard
Retrieve a single route telemetry event by route_id. Scoped to the authenticated caller's org_id. Used by the dashboard and SDK debugging tools to inspect routing decisions for a specific request.
📄️ Ingest a route telemetry event from an SDK. The event records the outcome of a routing decision: which engine was select
Ingest a route telemetry event from an SDK. The event records the outcome of a routing decision: which engine was selected, whether fallback occurred, performance metrics, and gate evaluation results. The server validates enum-constrained fields, strips privacy-sensitive content, and stores the event for monitoring queries. Returns 202 Accepted synchronously.
📄️ List route telemetry events for the authenticated org with comprehensive filtering. Supports windowed queries by timefra
List route telemetry events for the authenticated org with comprehensive filtering. Supports windowed queries by timeframe and filtering by app, deployment, experiment, variant, route_id, request_id, fallback status, locality, mode, and gate taxonomy fields. Returns newest-first.
📄️ Returns aggregated runtime monitoring statistics for the authenticated org within the specified timeframe. Aggregates up
Returns aggregated runtime monitoring statistics for the authenticated org within the specified timeframe. Aggregates up to 10,000 events per call. Provides operational insight into routing decisions, fallback rates, local-vs-cloud split, top fallback triggers, per-context breakdowns, and gate class distribution. Used by the dashboard monitoring view.
📄️ Request an optimized runtime execution plan for a model on a specific device.
Request an optimized runtime execution plan for a model on a specific device.