Why Video Games Still Baffle AI Models: Julian Togelius on the Limits of LLM Intelligence

Available in: 中文

2026-03-29T22:26:27.184Z·1 min read

Despite rapid improvement in coding and other domains, large language models remain terrible at playing video games, according to Julian Togelius, director of NYU's Game Innovation Lab. In a recent...

The Coding Paradox

Togelius frames coding as a 'well-behaved game': you get a specification, write code, run it, and receive immediate feedback. 'From that perspective, writing code is an extremely well-designed game.' This explains why LLMs excel at coding.

Why Games Are Different

Video games lack the clear structure that makes coding tractable for AI:

No explicit specification or rules provided in text
Different games have fundamentally different mechanics and input representations
Rewards are often delayed and unclear
The AI must learn to play through experience, not text

'There is a widespread perception that because we can build AI that plays particular games well, we should be able to build one that plays any game. I am not sure we are going to get there,' says Togelius.

The Data Problem

AI succeeds at well-known games like Minecraft and Pokemon because millions of hours of guides exist. For less-studied games, data is scarce. 'They fail. They absolutely suck. All of them. They do not even do as well as a simple search algorithm.'

What This Means for AI

The gaming failure highlights that LLMs operate primarily through pattern matching on text, not through genuine understanding of dynamic interactive environments. Success in benchmarks does not guarantee real-world or real-game performance.

Source: IEEE Spectrum

↗ Original source · 2026-03-29T00:00:00.000Z

ai llm videogames gaming nyu togelius benchmark limitations

Comments0

Why Video Games Still Baffle AI Models: Julian Togelius on the Limits of LLM Intelligence

The Coding Paradox

Why Games Are Different

The Data Problem

What This Means for AI

Related Articles