Skip to main content

Streaming inference. Same routing semantics as inference.create but the response is text/event-stream and the SDK can st

POST 

/api/v1/inference/stream

Streaming inference. Same routing semantics as inference.create but the response is text/event-stream and the SDK can start consuming tokens before the model is finished.

Request

Responses

Server-Sent Events stream. Each event is a JSON-encoded ResponseStreamEvent. Terminal event has type: "done"; SDKs should close the connection after receiving it.