Control

Control what the model generates using parameters, structured schemas, and stop sequences.

Generation parameters

Pass these in your chat completions request:

Parameter	Default	Description
`temperature`	`1.0`	Randomness. `0` = deterministic, `2` = maximum variety
`top_p`	`1.0`	Nucleus sampling. `0.9` = consider tokens within 90% probability mass
`max_tokens`	model max	Maximum tokens to generate
`stop`	`null`	Up to 4 sequences that stop generation
`frequency_penalty`	`0`	Penalize repeated tokens (`-2.0` to `2.0`)
`presence_penalty`	`0`	Penalize tokens that have appeared (`-2.0` to `2.0`)
`seed`	`null`	Fixed seed for reproducible output

response = client.chat.completions.create(
    model="phi-4-mini",
    messages=[{"role": "user", "content": "Write a haiku about code"}],
    temperature=0.7,
    max_tokens=50,
    stop=["\n\n"],
)

Structured output

Force the model to produce valid JSON matching a schema. Output is enforced at the token level -- every response is valid on the first attempt.

JSON mode

Returns valid JSON (any shape):

response = client.chat.completions.create(
    model="phi-4-mini",
    messages=[{"role": "user", "content": "List 3 colors as JSON"}],
    response_format={"type": "json_object"},
)

JSON Schema

Returns JSON matching a specific schema:

response = client.chat.completions.create(
    model="phi-4-mini",
    messages=[{"role": "user", "content": "Extract the person's name and age"}],
    response_format={
        "type": "json_schema",
        "json_schema": {
            "name": "person",
            "schema": {
                "type": "object",
                "properties": {
                    "name": {"type": "string"},
                    "age": {"type": "integer"}
                },
                "required": ["name", "age"]
            }
        }
    },
)

The model will only generate tokens that produce a valid {"name": "...", "age": ...} object. See Structured Decoding for implementation details and supported schema features.

Stop sequences

Stop generation when the model emits any of up to 4 sequences:

response = client.chat.completions.create(
    model="phi-4-mini",
    messages=[{"role": "user", "content": "Count to 10"}],
    stop=["5", "\n\n"],
)

The stop sequence is not included in the response. finish_reason will be "stop".

System messages

Use the system role to set behavior, persona, or constraints:

response = client.chat.completions.create(
    model="phi-4-mini",
    messages=[
        {"role": "system", "content": "You are a concise technical writer. Never use more than 2 sentences."},
        {"role": "user", "content": "Explain DNS"},
    ],
)

Reproducibility

Set seed for deterministic output. Responses with the same seed, model, and input will be identical:

response = client.chat.completions.create(
    model="phi-4-mini",
    messages=[{"role": "user", "content": "Pick a number"}],
    seed=42,
    temperature=1.0,
)

note

Determinism is best-effort. Different hardware (CPU vs GPU, different quantizations) may produce slightly different results even with the same seed.

Structured Decoding -- deep dive on schema enforcement
Responses -- understanding the response object
Tool calling -- controlling function call behavior

Generation parameters​

Structured output​

JSON mode​

JSON Schema​

Stop sequences​

System messages​

Reproducibility​

Related​