Why Video Games Still Baffle AI Models: Julian Togelius on the Limits of LLM Intelligence
Despite rapid improvement in coding and other domains, large language models remain terrible at playing video games, according to Julian Togelius, director of NYU's Game Innovation Lab. In a recent paper, he argues this failure reveals fundamental limitations in current AI approaches.
The Coding Paradox
Togelius frames coding as a 'well-behaved game': you get a specification, write code, run it, and receive immediate feedback. 'From that perspective, writing code is an extremely well-designed game.' This explains why LLMs excel at coding.
Why Games Are Different
Video games lack the clear structure that makes coding tractable for AI:
- No explicit specification or rules provided in text
- Different games have fundamentally different mechanics and input representations
- Rewards are often delayed and unclear
- The AI must learn to play through experience, not text
'There is a widespread perception that because we can build AI that plays particular games well, we should be able to build one that plays any game. I am not sure we are going to get there,' says Togelius.
The Data Problem
AI succeeds at well-known games like Minecraft and Pokemon because millions of hours of guides exist. For less-studied games, data is scarce. 'They fail. They absolutely suck. All of them. They do not even do as well as a simple search algorithm.'
What This Means for AI
The gaming failure highlights that LLMs operate primarily through pattern matching on text, not through genuine understanding of dynamic interactive environments. Success in benchmarks does not guarantee real-world or real-game performance.
Source: IEEE Spectrum