Federated LLMs: Prompting, Cascading, and Fine-Tuning at Scale

February 2, 2026 · 11 min read

Octomil

Large Language Models have changed everything—including federated learning.

The old FL paradigm: Train a small model (~100M parameters) from scratch across devices.

The new FL paradigm: Adapt a massive pre-trained model (7B-70B parameters) using federated techniques.

But LLMs bring unique challenges to federated learning:

Size: 7B parameters = 28 GB (won't fit on most devices)
Compute: Full fine-tuning requires massive GPU memory
Inference cost: Running LLM inference on-device drains battery
Privacy: LLM memorization can leak training data

This post explores cutting-edge techniques for federated LLMs, from Virginia Smith's research group and beyond, showing how to make federated learning work in the foundation model era.