Cog-DRIFT: Teaching LLMs to Learn from Problems They Can't Yet Solve Through Task Reformulation
A fundamental limitation of RLVR (Reinforcement Learning from Verifiable Rewards) is that models can't learn from problems they can't solve — unsolved problems yield no meaningful reward signal. Cog-DRIFT solves this by reformulating hard problems into easier variants that still teach the model what it needs to know.
The Core Insight
If a problem is too hard, don't skip it — transform it. The approach converts challenging open-ended problems into cognitively simpler formats:
| Original Format | Reformulated Format | Benefit |
|---|---|---|
| Open-ended generation | Multiple choice | Smaller search space |
| Free-form reasoning | Cloze/fill-in-the-blank | Denser learning signal |
| Complex generation | Discriminative tasks | Binary feedback |
How Cog-DRIFT Works
- Reformulate — Transform hard problems into easier variants that preserve the original answer
- Organize by difficulty — Create an adaptive curriculum from easy to hard formats
- Bootstrap learning — Train on structured, easier formats first
- Transfer back — Knowledge transfers to improve performance on original open-ended problems
Why This Matters
Current RLVR methods like those used in o1-style reasoning hit a ceiling: the model can only learn from problems within its current capability range. Cog-DRIFT breaks through this by essentially creating a "scaffolding" approach — like teaching calculus by starting with simpler algebra before tackling the full problem.
Practical Impact
- Enables learning from previously unsolvable problems
- Creates a natural curriculum that progresses from easy to hard
- The reformulated variants preserve the original answer, ensuring learning transfers back
- Addresses a key bottleneck in scaling reasoning capabilities through RL