Variance Reduction: The Secret to Fast FL Convergence

February 2, 2026 · 10 min read

Why does federated learning take so many communication rounds to converge?

A typical FL training job might require:

Standard SGD: 1,000+ rounds to converge
With variance reduction: 100-200 rounds to converge
Result: 5-10× speedup in wall-clock time

Variance reduction is the algorithmic technique that makes this possible. It's the difference between federated learning being a research curiosity and a production-viable technology.

This post dives into variance reduction methods—MARINA, PAGE, SAGA, and their variants—and explains why they're fundamental to efficient federated learning.