To the average person, it must seem like the field of artificial intelligence is making huge strides. According to the press releases and some of the more streaming media accounts, OpenAI’s DALL-E 2 could seemingly create spectacular images from any text† another OpenAI system called GPT-3 can talk about everything† and a system called Gato released in May by DeepMind, a division of Alphabet, apparently worked well on every task the company could throw on it. One of DeepMind’s high-level executives went so far as to say: to show off that in the quest for Artificial General Intelligence (AGI), AI with the flexibility and ingenuity of human intelligence, “The Game is Over!” And Elon Musk recently said that he would be surprised if we didn’t have artificial general intelligence by 2029†
Do not be fooled. Machines may one day be as smart as humans, and maybe even smarter, but the game is far from over. There is still a huge amount of work that needs to be done to create machines that can truly understand and reason about the world around them. What we really need now is less tampering and more fundamental research.
To be sure, there are indeed some ways AI is really making progress — synthetic images look increasingly realistic, and speech recognition can often work in noisy environments — but we’re still light years away from general-purpose, human-level AI that is the real deal. understand the meaning of articles and videos, or deal with unexpected obstacles and interruptions. We are still stuck with the exact same challenges that academic scientists (myself included) have pointed out year: Making AI reliable and dealing with unusual circumstances.
Take the recently celebrated gato, an alleged jack of all trades, and how it captioned an image of a pitcher throwing a baseball. The system gave three different answers: “A baseball player throwing a ball on a baseball field”, “A man throwing a baseball to a pitcher on a baseball field”, and “A baseball player at bat and a catcher in the sand during a baseball game.” The first answer is correct, but the other two answers contain hallucinations of other players not seen in the image, the system has no idea what is actually in the photo, unlike what is typical of roughly similar images. baseball fan would recognize that this was the pitcher who just threw the ball, not the other way around – and while we’d expect a catcher and a batter to be nearby, they clearly don’t appear in the picture.
A baseball player throwing a ball
on top of a baseball field.
A man throws a baseball at a
pitcher on a baseball field.
A baseball player hitting and a
catcher in the dirt during a
Likewise, the DALL-E 2 could not tell the difference between a red cube on top of a blue cube and a blue cube on top of a red cube. A newer version of the system, released in May, couldn’t tell the difference between an astronaut riding a horse and a horse riding an astronaut†
When systems like DALL-E make mistakes, the result is funny, but other AI mistakes cause serious problems. To take another example, a Tesla on autopilot recently drove straight to a human worker with a stop sign in the middle of the road, but only slowed down when the human driver intervened† The system could independently recognize people (as they appeared in the training data) and stop signs in their usual locations (again as they appeared on the trained images), but did not slow down when confronted with the unusual combination of the two, causing the stop sign was placed in a new and unusual position.
Unfortunately, the fact that these systems are still unreliable and struggling with new circumstances is usually buried in the fine print. Gato worked well on all the tasks DeepMind reported, but rarely as well as other systems today. GPT-3 often creates fluent prose, but still struggles with basic arithmetic and has so little grip on reality it is prone to creating sentences like “Some experts believe that eating a sock helps the brain get out of the altered state due to meditation.” while no expert has ever said such a thing. A cursory look at recent headlines wouldn’t tell you anything about these issues.
The subplot here is that the largest teams of researchers in AI can no longer be found in the academy, where peer review used to be the currency of the empire, but in companies. And unlike universities, companies have no incentive to play fair. Rather than subject their smashing new papers to academic scrutiny, they’ve gone for publication through a press release, enticing journalists and bypassing the peer review process. We only know what the companies want us to know.
In the software industry there is a word for this kind of strategy: demoware, software designed to look good for a demo, but not necessarily good enough for the real world. Often demoware vaporware, heralded to shock and awe to discourage competitors, is never released.
However, chickens tend to eventually come home to stoke. Cold fusion may have sounded great, but you still can’t get it at the mall. Cost in AI will likely be a winter of deflated expectations. Too many products, like driverless cars† automated radiologists and all purpose digital agents, have been demonstrated, published – and never delivered. For now, investment dollars keep coming in promisingly (who wouldn’t want a self-driving car?), but if the core issues of reliability and handling outliers aren’t solved, investment will dry up. We are left with powerful deepfakeshuge networks that emit huge amounts of carbonand solid advances in machine translation, speech recognition, and object recognition, but too little else to show for all the premature hype.
Deep learning has improved machines’ ability to spot patterns in data, but it has three major flaws. The patterns it teaches are, ironically, superficial, not conceptual; the results it produces are difficult to interpret; and the results are difficult to use in the context of other processes, such as memory and reasoning. As Harvard computer scientist Les Valiant noted, “The central challenge [going forward] is to unite the formulation of … learning and reasoning.” You can’t deal with a person wearing a stop sign if you don’t really understand what a stop sign is.
For now, we’re trapped in a “local minimum” where companies pursue benchmarks rather than fundamental ideas, and make small improvements to the technologies they already have, rather than pausing to ask more fundamental questions. Instead of pursuing flashy direct-to-the-media demos, we need more people asking basic questions about building systems that can learn and reason at the same time. Instead, current technical practice is way ahead of scientific skills and works harder to use tools that are not fully understood than to develop new tools and a clearer theoretical foundation. That is why fundamental research remains crucial.
That much of the AI research community (like those shouting ‘Game Over’) doesn’t even see that is, well, heartbreaking.
Imagine if an alien were to study all human interaction just by looking at the shadows on the ground, noting to his credit that some shadows are bigger than others, and all shadows disappear at night, and maybe even notice that the shadows grew regularly and shrank at certain periodic intervals—without ever looking up to see the sun or recognizing the three-dimensional world above.
It’s time for artificial intelligence researchers to look up. We cannot ‘solve’ AI with PR alone.
This is an opinion and analysis article and the views of the author or authors are not necessarily those of Scientific American†