De Jure: Automated Extraction of Regulatory Rules Using LLM Self-Refinement Without Human Annotation

Available in: 中文

2026-04-05T17:16:53.233Z·1 min read

A new paper presents De Jure, a fully automated pipeline for extracting structured regulatory rules from raw legal documents, requiring no human annotation, domain-specific prompting, or annotated ...

Automating Legal Compliance: De Jure Extracts Machine-Readable Rules From Regulatory Documents

The Pipeline

De Jure operates through four stages:

Normalization — Source documents converted to structured Markdown
Semantic decomposition — LLM breaks documents into structured rule units
Multi-criteria evaluation — LLM-as-judge across 19 dimensions (metadata, definitions, rule semantics)
Iterative repair — Low-scoring extractions regenerated within a bounded budget

Key Results

Peak extraction quality reached within three judge-guided iterations
Generalizes across finance, healthcare, and AI governance
Works with both open and closed-source models
Downstream RAG compliance QA: responses preferred 73.8% of the time at single-rule retrieval, 84.0% with broader retrieval

Why It Matters

Converting dense legal text into machine-readable rules has traditionally been costly and expert-intensive. De Jure demonstrates that explicit, interpretable evaluation criteria can substitute for human annotation, offering a scalable and auditable path toward regulation-grounded LLM alignment.

Applications

Automated compliance checking
Regulatory gap analysis
AI governance enforcement
Cross-jurisdictional regulatory comparison

arXiv: 2604.02276

ai legal regulation nlp research compliance arxiv

Comments0