Skip to main content

Return the recommended speculative decoding configuration for a model/device combination. No authentication required. en

GET 

/api/v1/inference/speculative-config

Return the recommended speculative decoding configuration for a model/device combination. No authentication required. enabled=true requires >= 6 GB RAM and a supported chip family.

Request

Responses

Success