Running a One Trillion-Parameter LLM Locally on AMD Ryzen AI Max+ Cluster
AMD demonstrates distributed inference of the Kimi K2.5 one trillion-parameter model across a 4-node cluster of Framework Desktop systems with Ryzen AI Max+ 395 and 128GB RAM each.
The Setup
| Component | Specification |
|---|---|
| Hardware | 4x Framework Desktop |
| CPU/GPU | AMD Ryzen AI Max+ 395 |
| RAM per node | 128GB |
| Total VRAM | 480GB (120GB per node via TTM) |
| AI Framework | AMD ROCm |
| Inference Engine | llama.cpp RPC |
| Model | Kimi-K2.5 (UD_Q2_K_XL, 375GB) |
| Network | 5Gbps Ethernet |
| OS | Ubuntu 24.04.3 LTS |
Key Technical Details
Extended VRAM allocation: The BIOS maxes VRAM at 96GB per node (384GB total). Using Linux TTM (Translation Table Manager) kernel parameters, this is increased to 120GB per node (480GB total), enough to fit the 375GB quantized model.
Llama.cpp RPC: Multi-node inference is orchestrated across four machines as a single logical AI accelerator using llama.cpp's RPC mode.
About Kimi K2.5
Kimi K2.5 is Moonshot AI's most advanced open reasoning model — a state-of-the-art open model for coding, long-horizon reasoning, and agent-style workflows. It's natively multimodal, reasoning over text, visual, and video inputs.
Significance
This demonstrates that trillion-parameter-class models can run locally on consumer hardware without cloud infrastructure, democratizing access to frontier AI capabilities.
Source: AMD Developer Blog