Streaming inference. Same routing semantics as inference.create but the response is text/event-stream and the SDK can st
POST/api/v1/inference/stream
Streaming inference. Same routing semantics as inference.create but the response is text/event-stream and the SDK can start consuming tokens before the model is finished.
Request
Responses
- 200
- default
Server-Sent Events stream. Each event is a JSON-encoded ResponseStreamEvent. Terminal event has type: "done"; SDKs should close the connection after receiving it.
Error response