Workflows

Chain multiple inference calls, tool invocations, and conditional logic into multi-step workflows. Workflows run on-device with cloud fallback where needed.

Basic chain

Pass the output of one inference call as input to another:

# Step 1: Extract entities
extract = client.chat.completions.create(
    model="phi-4-mini",
    messages=[
        {"role": "system", "content": "Extract all company names from the text. Return JSON array."},
        {"role": "user", "content": article_text},
    ],
    response_format={"type": "json_object"},
)
companies = json.loads(extract.choices[0].message.content)

# Step 2: Summarize each
for company in companies["names"]:
    summary = client.chat.completions.create(
        model="phi-4-mini",
        messages=[
            {"role": "system", "content": f"Write a 1-sentence summary of {company}'s role in this article."},
            {"role": "user", "content": article_text},
        ],
        max_tokens=100,
    )
    print(f"{company}: {summary.choices[0].message.content}")

Tool-augmented workflow

Combine inference with tool calls for workflows that interact with external systems:

tools = [
    {"type": "function", "function": {
        "name": "search_db",
        "description": "Search the product database",
        "parameters": {
            "type": "object",
            "properties": {"query": {"type": "string"}},
            "required": ["query"],
        },
    }},
    {"type": "function", "function": {
        "name": "send_email",
        "description": "Send an email to a customer",
        "parameters": {
            "type": "object",
            "properties": {
                "to": {"type": "string"},
                "subject": {"type": "string"},
                "body": {"type": "string"},
            },
            "required": ["to", "subject", "body"],
        },
    }},
]

messages = [
    {"role": "system", "content": "You are a customer support agent. Use tools to look up info and respond to customers."},
    {"role": "user", "content": "Customer asks: Is the X200 in stock? Their email is user@example.com"},
]

# Run until the model stops calling tools
while True:
    response = client.chat.completions.create(
        model="phi-4-mini", messages=messages, tools=tools,
    )
    choice = response.choices[0]

    if choice.finish_reason == "tool_calls":
        messages.append(choice.message)
        for tc in choice.message.tool_calls:
            result = execute_tool(tc.function.name, tc.function.arguments)
            messages.append({"role": "tool", "tool_call_id": tc.id, "content": json.dumps(result)})
    else:
        print(choice.message.content)
        break

Conditional routing

Route different inputs to different models based on classification:

# Classify the query
classification = client.chat.completions.create(
    model="smollm-360m",  # fast, small model for classification
    messages=[
        {"role": "system", "content": "Classify as 'simple' or 'complex'. Reply with one word."},
        {"role": "user", "content": user_query},
    ],
    max_tokens=5,
)
complexity = classification.choices[0].message.content.strip().lower()

# Route to appropriate model
model = "gemma3-1b" if complexity == "simple" else "phi-4-mini"
response = client.chat.completions.create(
    model=model,
    messages=[{"role": "user", "content": user_query}],
)

For automatic routing without manual classification, see Routing.

Patterns

Pattern	When to use
Chain	Output of step N is input to step N+1
Fan-out	One input, multiple parallel inferences
Tool loop	Model calls tools until it has enough info to answer
Classify-then-route	Small model classifies, larger model handles
Verify	Second model checks first model's output

On-device considerations

Memory -- each model in a workflow loads into memory. Use the same model across steps when possible, or unload between steps on constrained devices.
Latency -- each step adds inference time. Keep workflows short (2-4 steps) for interactive use cases.
Structured output -- use JSON Schema between steps to avoid parsing errors in the chain.

Tool calling -- defining and handling tool calls
Routing -- automatic model selection
Control -- constraining output between steps

Basic chain​

Tool-augmented workflow​

Conditional routing​

Patterns​

On-device considerations​

Related​

Basic chain

Tool-augmented workflow

Conditional routing

Patterns

On-device considerations

Related