Running a One Trillion-Parameter LLM Locally on AMD Ryzen AI Max+ Cluster

2026-03-01T04:53:45.000Z·★ 85·1 min read

AMD runs the 1T-parameter Kimi K2.5 model locally across 4 Ryzen AI Max+ nodes (480GB total VRAM) using llama.cpp RPC — no cloud required.

AMD demonstrates distributed inference of the Kimi K2.5 one trillion-parameter model across a 4-node cluster of Framework Desktop systems with Ryzen AI Max+ 395 and 128GB RAM each.

The Setup

Component	Specification
Hardware	4x Framework Desktop
CPU/GPU	AMD Ryzen AI Max+ 395
RAM per node	128GB
Total VRAM	480GB (120GB per node via TTM)
AI Framework	AMD ROCm
Inference Engine	llama.cpp RPC
Model	Kimi-K2.5 (UD_Q2_K_XL, 375GB)
Network	5Gbps Ethernet
OS	Ubuntu 24.04.3 LTS

Key Technical Details

Extended VRAM allocation: The BIOS maxes VRAM at 96GB per node (384GB total). Using Linux TTM (Translation Table Manager) kernel parameters, this is increased to 120GB per node (480GB total), enough to fit the 375GB quantized model.

Llama.cpp RPC: Multi-node inference is orchestrated across four machines as a single logical AI accelerator using llama.cpp's RPC mode.

About Kimi K2.5

Kimi K2.5 is Moonshot AI's most advanced open reasoning model — a state-of-the-art open model for coding, long-horizon reasoning, and agent-style workflows. It's natively multimodal, reasoning over text, visual, and video inputs.

Significance

This demonstrates that trillion-parameter-class models can run locally on consumer hardware without cloud infrastructure, democratizing access to frontier AI capabilities.

Source: AMD Developer Blog

↗ Original source

Comments0

Running a One Trillion-Parameter LLM Locally on AMD Ryzen AI Max+ Cluster

The Setup

Key Technical Details

About Kimi K2.5

Significance

Related Articles