Hypura: Run LLMs Larger Than Your Mac's Memory Using Storage-Aware Scheduling

2026-03-25T04:07:35.527Z·2 min read

Hypura is a new open-source tool that enables running LLMs that exceed a Mac's physical memory by intelligently placing model tensors across GPU, RAM, and NVMe storage tiers based on access pattern...

Breaking the Memory Barrier on Apple Silicon

The Problem

Consumer Apple Silicon Macs have fast unified memory and NVMe storage, but limited capacity. A 32 GB M1 Max cannot natively load a 40 GB model — the OS swap-thrashes until the OOM killer intervenes. Standard llama.cpp simply crashes on models that exceed available memory.

How Hypura Works

Hypura reads the GGUF model file, profiles your hardware (GPU working set, RAM, NVMe bandwidth), and solves a placement optimization that assigns every tensor to the optimal tier:

GPU (Metal) — Attention layers, norms, embeddings. Fastest access, limited by recommendedMaxWorkingSetSize.

RAM — Overflow for frequently accessed tensors not on GPU.

NVMe — Dense FFN weights (~60% of model size) stream from NVMe through a dynamically-sized pool buffer.

Key Innovations

MoE expert optimization: Only 2 of 8 experts fire per token in MoE models. Router interception identifies selected experts and loads only needed strides from NVMe (75% I/O reduction)
Neuron cache: Tracks loaded expert slices across tokens, achieving 99.5% hit rate from temporal locality
Co-activation tracking: Predicts which experts will fire next for speculative prefetch
Models that fit in memory run at full Metal GPU speed with zero overhead

Benchmarks

31 GB Mixtral 8x7B on 32 GB Mac Mini → 2.2 tok/s
40 GB Llama 70B on 32 GB Mac → 0.3 tok/s
Both would crash vanilla llama.cpp

Significance

Hypura democratizes large model inference by making it possible to run frontier-scale models on consumer hardware. For researchers, developers, and enthusiasts who can't afford $10,000+ GPU servers, this opens up new possibilities for local AI experimentation.

ai llm apple opensource inference applesilicon moe

Comments0