De Jure: Automated Extraction of Regulatory Rules Using LLM Self-Refinement Without Human Annotation
Available in: 中文
A new paper presents De Jure, a fully automated pipeline for extracting structured regulatory rules from raw legal documents, requiring no human annotation, domain-specific prompting, or annotated ...
Automating Legal Compliance: De Jure Extracts Machine-Readable Rules From Regulatory Documents
A new paper presents De Jure, a fully automated pipeline for extracting structured regulatory rules from raw legal documents, requiring no human annotation, domain-specific prompting, or annotated gold data.
The Pipeline
De Jure operates through four stages:
- Normalization — Source documents converted to structured Markdown
- Semantic decomposition — LLM breaks documents into structured rule units
- Multi-criteria evaluation — LLM-as-judge across 19 dimensions (metadata, definitions, rule semantics)
- Iterative repair — Low-scoring extractions regenerated within a bounded budget
Key Results
- Peak extraction quality reached within three judge-guided iterations
- Generalizes across finance, healthcare, and AI governance
- Works with both open and closed-source models
- Downstream RAG compliance QA: responses preferred 73.8% of the time at single-rule retrieval, 84.0% with broader retrieval
Why It Matters
Converting dense legal text into machine-readable rules has traditionally been costly and expert-intensive. De Jure demonstrates that explicit, interpretable evaluation criteria can substitute for human annotation, offering a scalable and auditable path toward regulation-grounded LLM alignment.
Applications
- Automated compliance checking
- Regulatory gap analysis
- AI governance enforcement
- Cross-jurisdictional regulatory comparison
arXiv: 2604.02276
← Previous: Adaptive Memory Forgetting: How AI Agents Can Balance Relevance and Efficiency in Long ConversationsNext: Do Emotions in Prompts Matter? Research Shows Emotional Framing Has Limited but Input-Dependent Effect on LLMs →
0