Variance Reduction: The Secret to Fast FL Convergence
· 10 min read
Why does federated learning take so many communication rounds to converge?
A typical FL training job might require:
- Standard SGD: 1,000+ rounds to converge
- With variance reduction: 100-200 rounds to converge
- Result: 5-10× speedup in wall-clock time
Variance reduction is the algorithmic technique that makes this possible. It's the difference between federated learning being a research curiosity and a production-viable technology.
This post dives into variance reduction methods—MARINA, PAGE, SAGA, and their variants—and explains why they're fundamental to efficient federated learning.