Importing the Entire Linux Kernel Git History into PostgreSQL: 1.4 Million Commits as SQL

Available in: 中文

2026-04-09T15:28:29.571Z·2 min read

A developer has successfully imported the complete Linux kernel git history into PostgreSQL using pgit, a Git-like CLI where everything lives in a SQL database instead of the filesystem. The projec...

pgit: The Linux Kernel as a SQL Database — 1.4 Million Commits, 20 Years of Development

Scale of the Import

Metric	Value
Commits	1,428,882
File versions	24,384,844
Unique blobs	3,089,589
Unique paths	171,525
Contributors	38,000
Import time	2 hours
Actual data size	2.7 GB (vs git gc aggressive: 1.95 GB)

Hardware Used

CPU: AMD EPYC 7401P (24 cores / 48 threads)
RAM: 512 GB DDR4 ECC
Storage: 2x1.92 TB SSD in RAID 0
Location: Hetzner Finland datacenter (~272 EUR/month)
Cache: 350 GB xpatch content cache keeping entire repository in memory

Why This Matters

The import makes the entire Linux kernel development history SQL-queryable, enabling analyses impossible or extremely difficult with git:

7 f-bombs found across 1.4 million commit messages (all from just 2 people)
665 bug fixes pointing at a single commit
A filesystem that took 13 years to merge
Line-by-line blame queries across the entire history
Cross-file change correlation analysis

Technical Approach

pgit uses pg-xpatch for transparent delta compression. Few version control systems besides git have ever managed a full kernel import — Fossil never did, Darcs and Monotone had severe performance issues, and Mercurial can handle it. PostgreSQL with pgit handled it in 2 hours.

Implications

This demonstrates that PostgreSQL can serve as a viable backend for version control at massive scale. The ability to query 20 years of development history with SQL opens up new possibilities for code archaeology, developer analytics, and large-scale codebase understanding.

Source: oseifert.ch — 151 points on HN

↗ Original source · 2026-04-09T10:00:00.000Z

Comments0