Batch Loss Score: Speed Up Deep Learning Training with a 3-Line Code Injection for Data Pruning
Dynamic data pruning — skipping less informative training samples — can dramatically speed up deep learning. But computing per-sample loss is expensive. Batch Loss Score (BLS) achieves similar results using only batch-level statistics with an exponential moving average.
The Insight
From the perspective of a single sample, the batch loss is a noisy measurement of its individual loss, with noise from stochastic batch composition. An EMA acts as a first-order low-pass filter, attenuating this noise.
How BLS Works
- Track EMA of batch losses — Readily available during normal training
- Assign scores to individual samples — Based on their contribution to batch loss over time
- Prune low-scoring samples — Skip them in future training iterations
Key Advantages
| Feature | BLS | Per-Sample Methods |
|---|---|---|
| Implementation | 3-line code injection | Often requires custom training loops |
| Computational cost | Negligible (uses existing batch loss) | Requires per-sample forward passes |
| Compatibility | Works with any training framework | May require framework modifications |
| Memory | No additional storage | May require storing per-sample metrics |
Theoretical Foundation
The EMA mechanism is formally proven to function as a first-order low-pass filter, yielding a score that approximates the smoothed and persistent contribution of each sample to the loss.
Practical Impact
BLS enables data pruning benefits (faster training, potentially better generalization) without the infrastructure complexity of per-sample importance scoring. The 3-line integration means any existing training pipeline can adopt it immediately.