Importing the Entire Linux Kernel Git History into PostgreSQL: 1.4 Million Commits as SQL

Available in: 中文
2026-04-09T15:28:29.571Z·2 min read
A developer has successfully imported the complete Linux kernel git history into PostgreSQL using pgit, a Git-like CLI where everything lives in a SQL database instead of the filesystem. The projec...

pgit: The Linux Kernel as a SQL Database — 1.4 Million Commits, 20 Years of Development

A developer has successfully imported the complete Linux kernel git history into PostgreSQL using pgit, a Git-like CLI where everything lives in a SQL database instead of the filesystem. The project hit HN with 151 points and 37 comments.

Scale of the Import

MetricValue
Commits1,428,882
File versions24,384,844
Unique blobs3,089,589
Unique paths171,525
Contributors38,000
Import time2 hours
Actual data size2.7 GB (vs git gc aggressive: 1.95 GB)

Hardware Used

Why This Matters

The import makes the entire Linux kernel development history SQL-queryable, enabling analyses impossible or extremely difficult with git:

Technical Approach

pgit uses pg-xpatch for transparent delta compression. Few version control systems besides git have ever managed a full kernel import — Fossil never did, Darcs and Monotone had severe performance issues, and Mercurial can handle it. PostgreSQL with pgit handled it in 2 hours.

Implications

This demonstrates that PostgreSQL can serve as a viable backend for version control at massive scale. The ability to query 20 years of development history with SQL opens up new possibilities for code archaeology, developer analytics, and large-scale codebase understanding.

Source: oseifert.ch — 151 points on HN

↗ Original source · 2026-04-09T10:00:00.000Z
← Previous: Trump Denies Iran 10-Point Peace Plan as 'Fake News', US Military Stays in IranNext: C# in Unity 2026: Modern Features Most Developers Still Don't Use →
Comments0