route
📄️ Compute and return a routing decision. Given model parameters and device capabilities, returns the optimal execution tar
Compute and return a routing decision. Given model parameters and device capabilities, returns the optimal execution target (device vs cloud), format, engine, and quantization. When deployment_id is supplied the deployment's serving_policy overrides the prefer hint; an active serving_policy experiment takes precedence over the deployment policy.
📄️ Retrieve a routing decision from the in-memory decision store by decision_id. The in-memory store is ephemeral; decision
Retrieve a routing decision from the in-memory decision store by decision_id. The in-memory store is ephemeral; decisions may expire or be evicted. For persistent history use route.decisions.list.
📄️ Retrieve the active routing policy for the org. Supports conditional GET via If-None-Match / ETag to avoid re-fetching a
Retrieve the active routing policy for the org. Supports conditional GET via If-None-Match / ETag to avoid re-fetching an unchanged policy. Response includes Cache-Control: private, max-age=300.
📄️ Route a query (OpenAI-style message list) to the appropriate model tier based on complexity scoring. When decompose=true
Route a query (OpenAI-style message list) to the appropriate model tier based on complexity scoring. When decompose=true the server may return a DecomposedQueryRouteResponse with sub-decisions instead.
📄️ List the named serving policy presets with their canonical configurations. Presets are the authoritative source for serv
List the named serving policy presets with their canonical configurations. Presets are the authoritative source for serving policy defaults; use these names when configuring deployments.
📄️ Get aggregated routing statistics for a specific model — device vs cloud percentages, total decision count, and common r
Get aggregated routing statistics for a specific model — device vs cloud percentages, total decision count, and common routing reasons.