TurboQuant-WASM: Google's Vector Quantization Algorithm Runs in the Browser at 6x Compression
Vector Search Compression Comes to the Browser
A new open-source project brings Google Research's TurboQuant vector quantization algorithm to browsers and Node.js via WebAssembly, achieving 6x compression of float32 embeddings with direct search on compressed data.
The Problem
Float32 embedding indexes are impractically large for web and mobile deployment:
- 1 million vectors × 384 dimensions = 1.5GB of float32 data
- Doesn't fit in mobile RAM
- Takes minutes to download
- gzip only saves ~7% (float32 has high entropy)
The Solution
TurboQuant-WASM compresses embeddings to ~4.5 bits per dimension (6x compression: 1.5GB → 240MB) and supports searching directly on compressed data without decompression.
Key Features
- No training step — Unlike Product Quantization (PQ/OPQ), just
init({ dim, seed })and encode immediately - Direct compressed search — Dot products computed without decompressing vectors
- Batch operations —
dotBatch()performs 83x faster than looping individual dot products - SIMD optimized — Uses WASM relaxed SIMD (
f32x4.relaxed_madd) - Broad browser support — Chrome 114+, Firefox 128+, Safari 18+, Node.js 20+
Usage
import { TurboQuant } from 'turboquant-wasm';
const tq = await TurboQuant.init({ dim: 1024, seed: 42 });
// Compress a vector (~6x compression)
const compressed = tq.encode(myFloat32Array);
// Fast dot product without decoding
const score = tq.dot(queryVector, compressed);
// Batch search: 83x faster than looping
const scores = tq.dotBatch(queryVector, allCompressed, bytesPerVector);
tq.destroy();
The Research
Based on the paper "TurboQuant: Online Vector Quantization with Near-optimal Distortion Rate" from Google Research, published at ICLR 2026. The algorithm preserves inner products with mathematically verified distortion bounds.
Technical Implementation
Built with Zig (compiled to WASM) and TypeScript:
- SIMD-vectorized QJL sign packing/unpacking and scaling
- Golden-value tests ensure byte-identical output with the reference Zig implementation
- npm package with embedded WASM binary
Applications
The live demo showcases three use cases running entirely in the browser:
- Vector search — Semantic similarity search on compressed embeddings
- Image similarity — Visual search using compressed image vectors
- 3D Gaussian Splatting compression — Real-time 3D scene compression
Impact
This project enables client-side vector search at scale, eliminating the need for server round-trips for common AI tasks like similarity search and recommendation. It's particularly significant for:
- Offline-first applications
- Privacy-sensitive search (data never leaves the device)
- Mobile AI applications with limited bandwidth
Source: GitHub (teamchong/turboquant-wasm), Google Research (ICLR 2026), Hacker News