De Jure: Automated Extraction of Regulatory Rules Using LLM Self-Refinement Without Human Annotation

Available in: 中文
2026-04-05T17:16:53.233Z·1 min read
A new paper presents De Jure, a fully automated pipeline for extracting structured regulatory rules from raw legal documents, requiring no human annotation, domain-specific prompting, or annotated ...

Automating Legal Compliance: De Jure Extracts Machine-Readable Rules From Regulatory Documents

A new paper presents De Jure, a fully automated pipeline for extracting structured regulatory rules from raw legal documents, requiring no human annotation, domain-specific prompting, or annotated gold data.

The Pipeline

De Jure operates through four stages:

  1. Normalization — Source documents converted to structured Markdown
  2. Semantic decomposition — LLM breaks documents into structured rule units
  3. Multi-criteria evaluation — LLM-as-judge across 19 dimensions (metadata, definitions, rule semantics)
  4. Iterative repair — Low-scoring extractions regenerated within a bounded budget

Key Results

Why It Matters

Converting dense legal text into machine-readable rules has traditionally been costly and expert-intensive. De Jure demonstrates that explicit, interpretable evaluation criteria can substitute for human annotation, offering a scalable and auditable path toward regulation-grounded LLM alignment.

Applications

arXiv: 2604.02276

← Previous: Adaptive Memory Forgetting: How AI Agents Can Balance Relevance and Efficiency in Long ConversationsNext: Do Emotions in Prompts Matter? Research Shows Emotional Framing Has Limited but Input-Dependent Effect on LLMs →
Comments0