Skip to main content

One post tagged with "pruning"

View All Tags

Model Compression for Edge Devices: Making LLMs Run on Smartphones

· 10 min read

The irony of modern federated learning: We want to train sophisticated models on edge devices, but those same devices often can't even run the models.

A state-of-the-art language model has 7B+ parameters (~28 GB at 32-bit). An iPhone 15 Pro has 8 GB RAM. This math doesn't work.

Model compression techniques—quantization, pruning, low-rank adaptation—are not just optimizations; they're prerequisites for production FL on edge devices. This post explores cutting-edge compression methods and how Octomil makes them accessible.