The Illusion of AI Intelligence: Misunderstanding Language and Tokens
How many instances of the letter "r" can you find in the word "strawberry"? AI systems like GPT-4o and Claude confidently state the answer is two. However, as advanced as large language models (LLMs) may appear, they can err in ways that highlight their limitations, causing these blunders to become viral sensations that remind us of the gap between human intelligence and artificial intelligence.
Large language models excel at generating text and solving complex problems, processing vast amounts of data far quicker than humans. Still, the inability of these systems to grasp basic concepts like letters and syllables serves as a stark reminder: these models lack true cognitive abilities. They do not think or interpret reality like humans do.
Most LLMs utilize transformer technology, a deep learning architecture that breaks down text into tokens—units that can range from entire words to syllables or individual letters, depending on the design of the model. Matthew Guzdial, an AI researcher at the University of Alberta, explains that LLMs don’t "read" text in the conventional sense. When a word like "the" is inputted, it is transformed into an encoding, but the model does not comprehend its individual letters.
This limitation stems from the fundamental operations of transformers. The text is converted into numerical representations, which the AI uses to formulate its responses. While it might understand that "straw" and "berry" combine to form "strawberry," it does not recognize the linguistic structure underlying the word, such as the sequence of letters that constitutes it. Consequently, it cannot accurately identify the quantity of occurrences of any specific letter.
Addressing these issues is complex, as they are inherent to the architecture of LLMs. The challenge becomes even more pronounced when these models are exposed to multiple languages, each with different grammatical structures. For instance, languages like Chinese or Thai do not use spaces between words, perplexing models that rely on tokenization based on English standards. Research indicates that certain languages require up to ten times more tokens to convey the same message as English.
Sheridan Feucht, a PhD student at Northeastern University, commented on the difficulty of establishing a universal definition for a "word" in the context of LLMs, suggesting there may never be an ideal tokenization approach due to the ambiguity inherent in languages.
In contrast, image-generating AI tools like Midjourney and DALL-E employ diffusion models rather than transformer architecture. These models produce images by reconstructing them from random noise, drawing from extensive image databases for learning. Thus, while these image generators achieve significant advancements in depicting recognizable objects and faces, they still struggle with finer details like fingers and handwriting, mirroring the text problems seen in LLMs.
Asmelash Teka Hadgu, co-founder of Lesan, notes that image generators excel at producing larger subjects featured prominently in training data while faltering on intricate features due to limited exposure. Unlike LLMs, these models can sometimes improve their performance simply by acquiring more targeted training data.
This disparity extends to the humorous reality that, while memes about AI misspelling "strawberry" circulate, OpenAI is reportedly developing a new LLM code-named Strawberry. This model aims for enhanced reasoning abilities and the capability to generate synthetic training data, potentially allowing for breakthroughs in contextual understanding that current models lack. Meanwhile, Google DeepMind has introduced AlphaProof and AlphaGeometry 2, AI systems proficient in formal mathematical reasoning, achieving scores that would yield silver medals at the International Math Olympiad.
The juxtaposition of entertaining AI mishaps with advancements like OpenAI’s Strawberry highlights the ongoing exploration within the field of AI. Amidst the memes and mockery lie profound questions concerning what intelligence really means—both artificial and human. OpenAI’s CEO Sam Altman even humorously capitalized on this coincidence, showcasing a flourishing berry harvest from his garden as a nod to the quirks of AI.