Running a One Trillion-Parameter LLM Locally on AMD Ryzen AI Max+ Cluster

2026-03-01T04:53:45.000Z·★ 85·1 min read
AMD runs the 1T-parameter Kimi K2.5 model locally across 4 Ryzen AI Max+ nodes (480GB total VRAM) using llama.cpp RPC — no cloud required.

AMD demonstrates distributed inference of the Kimi K2.5 one trillion-parameter model across a 4-node cluster of Framework Desktop systems with Ryzen AI Max+ 395 and 128GB RAM each.

The Setup

ComponentSpecification
Hardware4x Framework Desktop
CPU/GPUAMD Ryzen AI Max+ 395
RAM per node128GB
Total VRAM480GB (120GB per node via TTM)
AI FrameworkAMD ROCm
Inference Enginellama.cpp RPC
ModelKimi-K2.5 (UD_Q2_K_XL, 375GB)
Network5Gbps Ethernet
OSUbuntu 24.04.3 LTS

Key Technical Details

Extended VRAM allocation: The BIOS maxes VRAM at 96GB per node (384GB total). Using Linux TTM (Translation Table Manager) kernel parameters, this is increased to 120GB per node (480GB total), enough to fit the 375GB quantized model.

Llama.cpp RPC: Multi-node inference is orchestrated across four machines as a single logical AI accelerator using llama.cpp's RPC mode.

About Kimi K2.5

Kimi K2.5 is Moonshot AI's most advanced open reasoning model — a state-of-the-art open model for coding, long-horizon reasoning, and agent-style workflows. It's natively multimodal, reasoning over text, visual, and video inputs.

Significance

This demonstrates that trillion-parameter-class models can run locally on consumer hardware without cloud infrastructure, democratizing access to frontier AI capabilities.


Source: AMD Developer Blog

↗ Original source
← Previous: Interactive explanationsNext: Vibe coding with overeager AI: Lessons learned from treating Google AI Studio like a teammate →
Comments0