Skip to main content

Control

Control what the model generates using parameters, structured schemas, and stop sequences.

Generation parameters

Pass these in your chat completions request:

ParameterDefaultDescription
temperature1.0Randomness. 0 = deterministic, 2 = maximum variety
top_p1.0Nucleus sampling. 0.9 = consider tokens within 90% probability mass
max_tokensmodel maxMaximum tokens to generate
stopnullUp to 4 sequences that stop generation
frequency_penalty0Penalize repeated tokens (-2.0 to 2.0)
presence_penalty0Penalize tokens that have appeared (-2.0 to 2.0)
seednullFixed seed for reproducible output
response = client.chat.completions.create(
model="phi-4-mini",
messages=[{"role": "user", "content": "Write a haiku about code"}],
temperature=0.7,
max_tokens=50,
stop=["\n\n"],
)

Structured output

Force the model to produce valid JSON matching a schema. Output is enforced at the token level -- every response is valid on the first attempt.

JSON mode

Returns valid JSON (any shape):

response = client.chat.completions.create(
model="phi-4-mini",
messages=[{"role": "user", "content": "List 3 colors as JSON"}],
response_format={"type": "json_object"},
)

JSON Schema

Returns JSON matching a specific schema:

response = client.chat.completions.create(
model="phi-4-mini",
messages=[{"role": "user", "content": "Extract the person's name and age"}],
response_format={
"type": "json_schema",
"json_schema": {
"name": "person",
"schema": {
"type": "object",
"properties": {
"name": {"type": "string"},
"age": {"type": "integer"}
},
"required": ["name", "age"]
}
}
},
)

The model will only generate tokens that produce a valid {"name": "...", "age": ...} object. See Structured Decoding for implementation details and supported schema features.

Stop sequences

Stop generation when the model emits any of up to 4 sequences:

response = client.chat.completions.create(
model="phi-4-mini",
messages=[{"role": "user", "content": "Count to 10"}],
stop=["5", "\n\n"],
)

The stop sequence is not included in the response. finish_reason will be "stop".

System messages

Use the system role to set behavior, persona, or constraints:

response = client.chat.completions.create(
model="phi-4-mini",
messages=[
{"role": "system", "content": "You are a concise technical writer. Never use more than 2 sentences."},
{"role": "user", "content": "Explain DNS"},
],
)

Reproducibility

Set seed for deterministic output. Responses with the same seed, model, and input will be identical:

response = client.chat.completions.create(
model="phi-4-mini",
messages=[{"role": "user", "content": "Pick a number"}],
seed=42,
temperature=1.0,
)
note

Determinism is best-effort. Different hardware (CPU vs GPU, different quantizations) may produce slightly different results even with the same seed.