TurboQuant-WASM: Google's Vector Quantization Algorithm Runs in the Browser at 6x Compression

2026-04-05T12:06:01.035Z·2 min read

A new open-source project brings Google Research's TurboQuant vector quantization algorithm to browsers and Node.js via WebAssembly, achieving 6x compression of float32 embeddings with direct searc...

Vector Search Compression Comes to the Browser

A new open-source project brings Google Research's TurboQuant vector quantization algorithm to browsers and Node.js via WebAssembly, achieving 6x compression of float32 embeddings with direct search on compressed data.

The Problem

Float32 embedding indexes are impractically large for web and mobile deployment:

1 million vectors × 384 dimensions = 1.5GB of float32 data
Doesn't fit in mobile RAM
Takes minutes to download
gzip only saves ~7% (float32 has high entropy)

The Solution

TurboQuant-WASM compresses embeddings to ~4.5 bits per dimension (6x compression: 1.5GB → 240MB) and supports searching directly on compressed data without decompression.

Key Features

No training step — Unlike Product Quantization (PQ/OPQ), just init({ dim, seed }) and encode immediately
Direct compressed search — Dot products computed without decompressing vectors
Batch operations — dotBatch() performs 83x faster than looping individual dot products
SIMD optimized — Uses WASM relaxed SIMD (f32x4.relaxed_madd)
Broad browser support — Chrome 114+, Firefox 128+, Safari 18+, Node.js 20+

Usage

import { TurboQuant } from 'turboquant-wasm';

const tq = await TurboQuant.init({ dim: 1024, seed: 42 });

// Compress a vector (~6x compression)
const compressed = tq.encode(myFloat32Array);

// Fast dot product without decoding
const score = tq.dot(queryVector, compressed);

// Batch search: 83x faster than looping
const scores = tq.dotBatch(queryVector, allCompressed, bytesPerVector);

tq.destroy();

The Research

Based on the paper "TurboQuant: Online Vector Quantization with Near-optimal Distortion Rate" from Google Research, published at ICLR 2026. The algorithm preserves inner products with mathematically verified distortion bounds.

Technical Implementation

Built with Zig (compiled to WASM) and TypeScript:

SIMD-vectorized QJL sign packing/unpacking and scaling
Golden-value tests ensure byte-identical output with the reference Zig implementation
npm package with embedded WASM binary

Applications

The live demo showcases three use cases running entirely in the browser:

Vector search — Semantic similarity search on compressed embeddings
Image similarity — Visual search using compressed image vectors
3D Gaussian Splatting compression — Real-time 3D scene compression

Impact

This project enables client-side vector search at scale, eliminating the need for server round-trips for common AI tasks like similarity search and recommendation. It's particularly significant for:

Offline-first applications
Privacy-sensitive search (data never leaves the device)
Mobile AI applications with limited bandwidth

Source: GitHub (teamchong/turboquant-wasm), Google Research (ICLR 2026), Hacker News

↗ Original source · 2026-04-05T00:00:00.000Z

Comments0