CoALFake: Human-LLM Collaborative Annotation for Cross-Domain Fake News Detection
A new approach called CoALFake addresses one of the biggest challenges in fake news detection: cross-domain generalization. Rather than training on one domain and failing on another, CoALFake uses collaborative Human-LLM annotation with domain-aware active learning.
The Problem
Current fake news detection systems suffer from:
- Narrow domain specificity — Models trained on political news fail on health misinformation
- Poor generalization — Performance degrades dramatically across domains
- Label scarcity — High-quality labeled data is expensive and slow to acquire
- Rigid categorization — Hard domain boundaries lose nuanced features
The CoALFake Solution
Human-LLM Co-Annotation
- LLMs provide scalable, low-cost annotation at scale
- Human oversight ensures label reliability and catches LLM errors
- Combines the speed of AI with the judgment of humans
Domain-Aware Active Learning
- Domain embedding techniques capture both domain-specific nuances and cross-domain patterns
- Smart sampling strategy prioritizes diverse domain coverage
- Trains a domain-agnostic model that generalizes better
Results
Experimental results across multiple datasets demonstrate that CoALFake outperforms existing approaches in cross-domain settings, while requiring significantly less human annotation effort.
Why This Matters
Fake news operates across every domain — politics, health, finance, science, entertainment. A detection system that only works in one domain is of limited practical value. CoALFake's human-AI collaborative approach offers a scalable path to more robust, generalizable misinformation detection.