Federated LLMs: Prompting, Cascading, and Fine-Tuning at Scale
· 11 min read
Large Language Models have changed everything—including federated learning.
The old FL paradigm: Train a small model (~100M parameters) from scratch across devices.
The new FL paradigm: Adapt a massive pre-trained model (7B-70B parameters) using federated techniques.
But LLMs bring unique challenges to federated learning:
- Size: 7B parameters = 28 GB (won't fit on most devices)
- Compute: Full fine-tuning requires massive GPU memory
- Inference cost: Running LLM inference on-device drains battery
- Privacy: LLM memorization can leak training data
This post explores cutting-edge techniques for federated LLMs, from Virginia Smith's research group and beyond, showing how to make federated learning work in the foundation model era.