AI Takes on Super Mario Bros: A Tougher Benchmark than Pokémon?
If you thought Pokémon posed a significant challenge for artificial intelligence, think again! Researchers from the Hao AI Lab at the University of California San Diego believe that Super Mario Bros. is an even tougher nut to crack.
The AI Showdown
In a recent experiment, Hao AI Lab put various AI models head-to-head in the iconic Super Mario Bros. game, albeit in a modified version running on an emulator. The standout performer in this AI showdown was Anthropic’s Claude 3.7, closely followed by Claude 3.5. In contrast, Google’s Gemini 1.5 Pro and OpenAI’s GPT-4o fell short of expectations, revealing just how demanding this classic game can be for AI.
The Mechanics Behind the Madness
To level the playing field, the lab developed a custom framework called GamingAgent, which provided the AI with foundational instructions—like “If an obstacle or enemy is near, move/jump left to dodge”—alongside in-game screenshots. The challenge for these models was not just to navigate the game but to formulate complex strategies and maneuvers. Surprisingly, AI models based on reasoning, such as OpenAI’s o1, performed worse than their non-reasoning counterparts, even though they typically excel in other benchmarks.
The reason for this discrepancy? Time: reasoning models require several seconds to ponder their next move, but in a fast-paced environment like Super Mario Bros., every second counts. A split-second decision can mean the difference between gracefully jumping over a pit or falling to a fiery demise.
AI in Gaming: A Mixed Bag
For decades, games have served as a testing ground for AI capabilities. However, some experts are starting to doubt the validity of linking an AI’s gaming prowess to its overall technological advancements. After all, games are generally structured and can provide limitless data for AI training, which isn’t the case in the real world.
Renowned researcher Andrej Karpathy, a founding member of OpenAI, recently commented on this trend, describing an "evaluation crisis" in AI. He mentioned, "I don’t really know what [AI] metrics to look at right now. TLDR, my reaction is I don’t really know how good these models are right now." It’s a cautionary note, reminding us that while milestones are impressive, they also prompt deeper questions about the capabilities of AI today.
Let the Games Continue
Despite the ongoing debates regarding AI training metrics, one thing is clear: watching AI navigate the colorful, pixelated world of Mario is a captivating spectacle. The complexities and demands of this classic game present a unique challenge that continues to intrigue researchers and casual fans alike.
The AI Buzz Hub team is excited to see where these breakthroughs take us. Want to stay in the loop on all things AI? Subscribe to our newsletter or share this article with your fellow enthusiasts.