Return aggregated inference telemetry for a model within a time window. Combines InferenceSession rows (org-scoped, with
GET/api/v1/models/:model_id/inference-metrics
Return aggregated inference telemetry for a model within a time window. Combines InferenceSession rows (org-scoped, within the window) with fleet Device health status. Latency metrics are computed as percentile distributions. The fleet block summarises device availability independent of this model's inference history.
Request
Responses
- 200
- default
Success
Error response