Return the recommended KV cache configuration for a model/device-class combination. No authentication required. The runt
GET/api/v1/inference/kv-cache-config
Return the recommended KV cache configuration for a model/device-class combination. No authentication required. The runtime_config object is intended to be merged directly into engine init parameters.
Request
Responses
- 200
- default
Success
Error response