Batch Loss Score: Speed Up Deep Learning Training with a 3-Line Code Injection for Data Pruning

Available in: 中文
2026-04-07T19:54:19.241Z·1 min read
Dynamic data pruning — skipping less informative training samples — can dramatically speed up deep learning. But computing per-sample loss is expensive. Batch Loss Score (BLS) achieves similar resu...

Dynamic data pruning — skipping less informative training samples — can dramatically speed up deep learning. But computing per-sample loss is expensive. Batch Loss Score (BLS) achieves similar results using only batch-level statistics with an exponential moving average.

The Insight

From the perspective of a single sample, the batch loss is a noisy measurement of its individual loss, with noise from stochastic batch composition. An EMA acts as a first-order low-pass filter, attenuating this noise.

How BLS Works

  1. Track EMA of batch losses — Readily available during normal training
  2. Assign scores to individual samples — Based on their contribution to batch loss over time
  3. Prune low-scoring samples — Skip them in future training iterations

Key Advantages

FeatureBLSPer-Sample Methods
Implementation3-line code injectionOften requires custom training loops
Computational costNegligible (uses existing batch loss)Requires per-sample forward passes
CompatibilityWorks with any training frameworkMay require framework modifications
MemoryNo additional storageMay require storing per-sample metrics

Theoretical Foundation

The EMA mechanism is formally proven to function as a first-order low-pass filter, yielding a score that approximates the smoothed and persistent contribution of each sample to the loss.

Practical Impact

BLS enables data pruning benefits (faster training, potentially better generalization) without the infrastructure complexity of per-sample importance scoring. The 3-line integration means any existing training pipeline can adopt it immediately.

↗ Original source · 2026-04-07T00:00:00.000Z
← Previous: Spectroscopy ML Warning: Near-Perfect Accuracy Can Be Completely Misleading Due to High-Dimensional DataNext: GLM-5.1 Released: Zhipu AI's New Model Targets Long-Horizon Tasks →
Comments0