monitoring
📄️ Unified org-wide activity feed of deployment events and cloud usage, newest first. Supports cursor-based pagination via
Unified org-wide activity feed of deployment events and cloud usage, newest first. Supports cursor-based pagination via the before parameter.
📄️ Create an alert rule that watches a named metric and fires when it crosses the configured threshold over the evaluation
Create an alert rule that watches a named metric and fires when it crosses the configured threshold over the evaluation window. Returns the new rule's identity; call `monitoring.alerts.get` for the full object.
📄️ List alert rules in an org. Returns a bare array (not envelope) because alert rules are bounded and the dashboard render
List alert rules in an org. Returns a bare array (not envelope) because alert rules are bounded and the dashboard renders them in a single table.
📄️ Delete an alert rule. Verifies org_id ownership before deletion.
Delete an alert rule. Verifies org_id ownership before deletion.
📄️ Get full details for a single alert rule. Scoped by org_id.
Get full details for a single alert rule. Scoped by org_id.
📄️ Partially update an alert rule. Only whitelisted fields are applied.
Partially update an alert rule. Only whitelisted fields are applied.
📄️ Return aggregated monitoring metrics for a single app. Computed by MonitoringService.get_app_metrics(). Powers the app d
Return aggregated monitoring metrics for a single app. Computed by MonitoringService.get_app_metrics(). Powers the app detail page monitoring tab.
📄️ Return monitoring metrics for a specific capability within an app. Computed by MonitoringService.get_capability_metrics(
Return monitoring metrics for a specific capability within an app. Computed by MonitoringService.get_capability_metrics(app_id, capability). Enables per-capability drill-down on the app monitoring page.
📄️ Return time-series monitoring data scoped to a specific capability within an app. Calls MonitoringService.get_capability
Return time-series monitoring data scoped to a specific capability within an app. Calls MonitoringService.get_capability_metrics_timeseries( app_id, capability, start=now-hours). Enables per-capability trend charts.
📄️ Return recent deployment events for an app. Calls MonitoringService.get_app_deployment_events(app_id, event_type=..., li
Return recent deployment events for an app. Calls MonitoringService.get_app_deployment_events(app_id, event_type=..., limit=...). Covers events such as version deploys, rollbacks, and status changes.
📄️ List recent runtime route events for an app with pagination. Returns RouteEventListResponse with events[], total, limit,
List recent runtime route events for an app with pagination. Returns RouteEventListResponse with events[], total, limit, offset. Each event is built from RuntimeRouteEvent.to_event_dict() via _build_event_response(). Distinct from monitoring.apps.get_events which returns deployment events.
📄️ Return aggregated runtime request telemetry for an app, derived from RuntimeRouteEvent records. Loads up to 10,000 event
Return aggregated runtime request telemetry for an app, derived from RuntimeRouteEvent records. Loads up to 10,000 events within the timeframe and passes them to _build_monitoring_summary_response(). Distinct from monitoring.apps.get which reads from MonitoringService.
📄️ Return bucketed runtime request count time-series for an app. Loads up to 10,000 RuntimeRouteEvent records within the ti
Return bucketed runtime request count time-series for an app. Loads up to 10,000 RuntimeRouteEvent records within the timeframe and passes them to _build_runtime_timeseries_response(). Used to render throughput trend charts on the app monitoring page.
📄️ Return time-series monitoring data for an app. Calls MonitoringService.get_app_metrics_timeseries(app_id, start=now-hour
Return time-series monitoring data for an app. Calls MonitoringService.get_app_metrics_timeseries(app_id, start=now-hours). Used to render metric trend charts on the app monitoring page.
📄️ List monitoring events for a deployment — lifecycle transitions, rollout steps, health gate evaluations, and governance
List monitoring events for a deployment — lifecycle transitions, rollout steps, health gate evaluations, and governance actions. Designed for the deployment detail page's activity/events timeline. Events are fetched from MonitoringService.get_deployment_events(), which aggregates across the deployment_events table and any correlated rollout history entries.
📄️ Return aggregated runtime monitoring metrics for a single deployment. Computed by MonitoringService.get_deployment_metri
Return aggregated runtime monitoring metrics for a single deployment. Computed by MonitoringService.get_deployment_metrics() from the most recent telemetry ingested for this deployment_id. This endpoint drives the deployment detail page's 'Monitoring' tab.
📄️ Return aggregated monitoring metrics for a single device. Computed by MonitoringService.get_device_metrics(). Covers inf
Return aggregated monitoring metrics for a single device. Computed by MonitoringService.get_device_metrics(). Covers inference activity, heartbeat health, and device assignment state as available in the monitoring service.
📄️ Create an incident, optionally linked to an alert rule. On success notifies verified status-page subscribers and dispatc
Create an incident, optionally linked to an alert rule. On success notifies verified status-page subscribers and dispatches `incident.created` to configured webhooks (Slack, PagerDuty, etc.) in the background.
📄️ List incidents for an org. Used by the dashboard's incidents panel and by SRE consoles aggregating across orgs (per-org
List incidents for an org. Used by the dashboard's incidents panel and by SRE consoles aggregating across orgs (per-org call).
📄️ Rollup statistics for the incidents panel. Single aggregated call so the dashboard doesn't need to compute MTTR / resolu
Rollup statistics for the incidents panel. Single aggregated call so the dashboard doesn't need to compute MTTR / resolution rate client-side.
📄️ Update an incident's status, severity, assignee, or resolution notes. Transitions to `resolved` send a resolution email
Update an incident's status, severity, assignee, or resolution notes. Transitions to `resolved` send a resolution email to verified status-page subscribers; any status change dispatches `incident.{status}` to configured webhooks.
📄️ Query time-series metrics for an org within a lookback window. Returns an array of metric records; schema depends on the
Query time-series metrics for an org within a lookback window. Returns an array of metric records; schema depends on the metric type.
📄️ Return aggregated runtime monitoring metrics for a single model. Computed by MonitoringService.get_model_metrics(). Dist
Return aggregated runtime monitoring metrics for a single model. Computed by MonitoringService.get_model_metrics(). Distinct from models.get_inference_metrics which computes from InferenceSession rows; this endpoint delegates to the MonitoringService abstraction layer.
📄️ Single-call summary for the dashboard's monitoring landing page. Polled on page mount; not intended for high-frequency S
Single-call summary for the dashboard's monitoring landing page. Polled on page mount; not intended for high-frequency SDK polling.