Control
Control what the model generates using parameters, structured schemas, and stop sequences.
Generation parameters
Pass these in your chat completions request:
| Parameter | Default | Description |
|---|---|---|
temperature | 1.0 | Randomness. 0 = deterministic, 2 = maximum variety |
top_p | 1.0 | Nucleus sampling. 0.9 = consider tokens within 90% probability mass |
max_tokens | model max | Maximum tokens to generate |
stop | null | Up to 4 sequences that stop generation |
frequency_penalty | 0 | Penalize repeated tokens (-2.0 to 2.0) |
presence_penalty | 0 | Penalize tokens that have appeared (-2.0 to 2.0) |
seed | null | Fixed seed for reproducible output |
response = client.chat.completions.create(
model="phi-4-mini",
messages=[{"role": "user", "content": "Write a haiku about code"}],
temperature=0.7,
max_tokens=50,
stop=["\n\n"],
)
Structured output
Force the model to produce valid JSON matching a schema. Output is enforced at the token level -- every response is valid on the first attempt.
JSON mode
Returns valid JSON (any shape):
response = client.chat.completions.create(
model="phi-4-mini",
messages=[{"role": "user", "content": "List 3 colors as JSON"}],
response_format={"type": "json_object"},
)
JSON Schema
Returns JSON matching a specific schema:
response = client.chat.completions.create(
model="phi-4-mini",
messages=[{"role": "user", "content": "Extract the person's name and age"}],
response_format={
"type": "json_schema",
"json_schema": {
"name": "person",
"schema": {
"type": "object",
"properties": {
"name": {"type": "string"},
"age": {"type": "integer"}
},
"required": ["name", "age"]
}
}
},
)
The model will only generate tokens that produce a valid {"name": "...", "age": ...} object. See Structured Decoding for implementation details and supported schema features.
Stop sequences
Stop generation when the model emits any of up to 4 sequences:
response = client.chat.completions.create(
model="phi-4-mini",
messages=[{"role": "user", "content": "Count to 10"}],
stop=["5", "\n\n"],
)
The stop sequence is not included in the response. finish_reason will be "stop".
System messages
Use the system role to set behavior, persona, or constraints:
response = client.chat.completions.create(
model="phi-4-mini",
messages=[
{"role": "system", "content": "You are a concise technical writer. Never use more than 2 sentences."},
{"role": "user", "content": "Explain DNS"},
],
)
Reproducibility
Set seed for deterministic output. Responses with the same seed, model, and input will be identical:
response = client.chat.completions.create(
model="phi-4-mini",
messages=[{"role": "user", "content": "Pick a number"}],
seed=42,
temperature=1.0,
)
Determinism is best-effort. Different hardware (CPU vs GPU, different quantizations) may produce slightly different results even with the same seed.
Related
- Structured Decoding -- deep dive on schema enforcement
- Responses -- understanding the response object
- Tool calling -- controlling function call behavior