HukukBERT: First Comprehensive Turkish Legal Language Model Achieves 84.4% on Legal Cloze Test

Available in: 中文
2026-04-07T23:23:20.123Z·1 min read
Researchers have introduced HukukBERT, the most comprehensive legal language model for Turkish law, trained on 18GB of cleaned legal text using advanced domain-adaptive pre-training techniques.

Researchers have introduced HukukBERT, the most comprehensive legal language model for Turkish law, trained on 18GB of cleaned legal text using advanced domain-adaptive pre-training techniques.

The Gap in Legal AI

While English legal AI has flourished with models like Legal-BERT, Turkish law has lagged due to:

HukukBERT's Approach

The model uses a hybrid Domain-Adaptive Pre-Training (DAPT) methodology:

Training data: 18GB cleaned Turkish legal corpus

Tokenizer: 48K WordPiece vocabulary

Results

BenchmarkPerformance
Legal Cloze Test (Top-1 accuracy)84.40% (state-of-the-art)
Court Decision Segmentation (document pass rate)92.8% (new SOTA)

What Is a Legal Cloze Test?

A masked legal term prediction task specifically designed for Turkish court decisions — essentially asking the model to predict which legal term should fill in a blank in a court document. Think of it as a bar exam for language models.

Why This Matters

Future Applications

The researchers envision HukukBERT enabling:

↗ Original source · 2026-04-07T00:00:00.000Z
← Previous: Caution Over Curiosity: New Technique Stops AI Models from Gaming Reward SystemsNext: Trump Agrees to Suspend Iran Bombing for Two Weeks as Tehran Rejects Ceasefire Deal →
Comments0