Batch Loss Score: Speed Up Deep Learning Training with a 3-Line Code Injection for Data Pruning

Available in: 中文

2026-04-07T19:54:19.241Z·1 min read

Dynamic data pruning — skipping less informative training samples — can dramatically speed up deep learning. But computing per-sample loss is expensive. Batch Loss Score (BLS) achieves similar resu...

The Insight

From the perspective of a single sample, the batch loss is a noisy measurement of its individual loss, with noise from stochastic batch composition. An EMA acts as a first-order low-pass filter, attenuating this noise.

How BLS Works

Track EMA of batch losses — Readily available during normal training
Assign scores to individual samples — Based on their contribution to batch loss over time
Prune low-scoring samples — Skip them in future training iterations

Key Advantages

Feature	BLS	Per-Sample Methods
Implementation	3-line code injection	Often requires custom training loops
Computational cost	Negligible (uses existing batch loss)	Requires per-sample forward passes
Compatibility	Works with any training framework	May require framework modifications
Memory	No additional storage	May require storing per-sample metrics

Theoretical Foundation

The EMA mechanism is formally proven to function as a first-order low-pass filter, yielding a score that approximates the smoothed and persistent contribution of each sample to the loss.

Practical Impact

BLS enables data pruning benefits (faster training, potentially better generalization) without the infrastructure complexity of per-sample importance scoring. The 3-line integration means any existing training pipeline can adopt it immediately.

↗ Original source · 2026-04-07T00:00:00.000Z

Comments0