Skip to main content

Return the recommended KV cache configuration for a model/device-class combination. No authentication required. The runt

GET 

/api/v1/inference/kv-cache-config

Return the recommended KV cache configuration for a model/device-class combination. No authentication required. The runtime_config object is intended to be merged directly into engine init parameters.

Request

Responses

Success