TurboQuant-WASM: Google's Vector Quantization Algorithm Runs in the Browser at 6x Compression

2026-04-05T12:06:01.035Z·2 min read
A new open-source project brings Google Research's TurboQuant vector quantization algorithm to browsers and Node.js via WebAssembly, achieving 6x compression of float32 embeddings with direct searc...

Vector Search Compression Comes to the Browser

A new open-source project brings Google Research's TurboQuant vector quantization algorithm to browsers and Node.js via WebAssembly, achieving 6x compression of float32 embeddings with direct search on compressed data.

The Problem

Float32 embedding indexes are impractically large for web and mobile deployment:

The Solution

TurboQuant-WASM compresses embeddings to ~4.5 bits per dimension (6x compression: 1.5GB → 240MB) and supports searching directly on compressed data without decompression.

Key Features

Usage

import { TurboQuant } from 'turboquant-wasm';

const tq = await TurboQuant.init({ dim: 1024, seed: 42 });

// Compress a vector (~6x compression)
const compressed = tq.encode(myFloat32Array);

// Fast dot product without decoding
const score = tq.dot(queryVector, compressed);

// Batch search: 83x faster than looping
const scores = tq.dotBatch(queryVector, allCompressed, bytesPerVector);

tq.destroy();

The Research

Based on the paper "TurboQuant: Online Vector Quantization with Near-optimal Distortion Rate" from Google Research, published at ICLR 2026. The algorithm preserves inner products with mathematically verified distortion bounds.

Technical Implementation

Built with Zig (compiled to WASM) and TypeScript:

Applications

The live demo showcases three use cases running entirely in the browser:

  1. Vector search — Semantic similarity search on compressed embeddings
  2. Image similarity — Visual search using compressed image vectors
  3. 3D Gaussian Splatting compression — Real-time 3D scene compression

Impact

This project enables client-side vector search at scale, eliminating the need for server round-trips for common AI tasks like similarity search and recommendation. It's particularly significant for:


Source: GitHub (teamchong/turboquant-wasm), Google Research (ICLR 2026), Hacker News

↗ Original source · 2026-04-05T00:00:00.000Z
← Previous: Aegis: The First Fully Open-Source FPGA — From Silicon Design to TapeoutNext: Karpathy Shares LLM Wiki — An 'Idea File' Approach to Organizing AI Knowledge →
Comments0