Skip to main content

Return aggregated inference telemetry for a model within a time window. Combines InferenceSession rows (org-scoped, with

GET 

/api/v1/models/:model_id/inference-metrics

Return aggregated inference telemetry for a model within a time window. Combines InferenceSession rows (org-scoped, within the window) with fleet Device health status. Latency metrics are computed as percentile distributions. The fleet block summarises device availability independent of this model's inference history.

Request

Responses

Success