Streaming inference. Same routing semantics as inference.create but the response is text/event-stream and the SDK can st

POST /api/v1/inference/stream

Streaming inference. Same routing semantics as inference.create but the response is text/event-stream and the SDK can start consuming tokens before the model is finished.

Request

Responses

200
default

Server-Sent Events stream. Each event is a JSON-encoded ResponseStreamEvent. Terminal event has type: "done"; SDKs should close the connection after receiving it.

Streaming inference. Same routing semantics as inference.create but the response is text/event-stream and the SDK can st

/api/v1/inference/stream

Request​

Responses​

Request

Responses