Return the recommended speculative decoding configuration for a model/device combination. No authentication required. en
GET/api/v1/inference/speculative-config
Return the recommended speculative decoding configuration for a model/device combination. No authentication required. enabled=true requires >= 6 GB RAM and a supported chip family.
Request
Responses
- 200
- default
Success
Error response