Opinion The Turing test is about us, not the bots, and it failed.
Fans of the slow-burn mainstream media U-turn had a treat last week.
On Saturday, news came that Blake Lemoine, a Google engineer tasked with monitoring a chatbot called LaMDA for nausea, was take paid leave for disclosing confidential information.
Lamoine had indeed gone public, but instead of saying something as useful as Google’s messaging strategy (a trade secret if there ever was one), he claimed that LaMDA was still alive.
Armed with a transcript in which LamDA indeed felt it had passed the Turing test, Lemoine was the tech whistleblower from heaven to the media. By the time the news had filtered onto BBC radio news Sunday night, it had been reported as an event of some significance.
On Twitter, it was torn apart in a few hours, but who trusts Twitter with its large and active AI R&D community?
A few days later, the story was still flying, but by now journalists had provided expert commentary, via a handful of academics who had the usual reservations about expressing opinions.
In general, no, probably not, but you know it’s a fascinating area to talk about.
When the story went off the radar at the end of the week, the few remaining outlets left had found better experts who, it is believed, were just as annoyed as the rest of us. No. Absolutely not. And you won’t find anyone in AI who thinks otherwise. The conversation was still about feeling rather than how interesting it was.
Google should be using people to check its chatbot outputs for hate speech, but we were back on the planet.
For future reference and to save time for everyone, here’s the killer telling a story Android paranoia – “The Turing Test” as a touchstone for the feeling. It’s not.
It was never intended that way. Turing promised in a newspaper from 1950 as a way of answering the question “can machines think?” to avoid.
He sensibly characterized that as unanswerable until you’ve figured out what thought is. We didn’t then. We don’t have that now.
Instead, the test: Can a machine hold a persuasive human conversation? – was designed as a thought experiment to check arguments that machine intelligence was impossible. It tests human perceptions and misconceptions, but just like Google’s “quantum supremacy” claims, the test itself is tautologous: passing the test simply means the test passed. By itself it proves nothing more.
Take a hungry Labrador dog, that is, any Labrador that is not sleeping or dead, who becomes aware of the possibility of food.
An animal with a prodigious and insatiable appetite, at the slightest hint of available calories, the Labrador puts on an amazing show of deep desire and immense unmet need. Does this reflect an altered cognitive state, analogous to the amorous human teenager he so closely resembles? Or is it learned behavior that turns emotional blackmail into snacks? We may think we know, but we can’t without a much broader context. We may be gullible. If you pass the lab test, you will be fed. Nothing more in itself.
The first system that arguably passed the Turing test, in the spirit, if not the letter, of the various versions Turing proposed was a study of the psychology of human-machine interaction. ELIZAthe progenitor chatbot, was a 1966 program by computer researcher MIT Joseph Weizenbaum†
It was designed to roughly mimic therapeutic practice of repeating a patient’s questions to them.
“I want to kill my editor.”
“Why do you want to kill your editor?”
“He always makes sure that I meet deadlines.”
“Why don’t you like meeting deadlines?” and so on.
Famously, Weizenbaum was surprised when his secretary, one of the first subjects, imbued it with intelligence and asked to be left alone with the terminal.
The Google chatbot is a distant descendant of ELIZA, fed large amounts of written data from the internet and converted into language models through machine learning. It is an automated method actor.
A human actor who can’t add up can play Turing most convincingly – but question them on the Entscheidungsproblem and you will soon discover that they are not. Large language models are very good at simulating conversation, but if you have the means to generate the context that will test whether it is what it appears to be, you can say no more than that.
We are still a long way from defining sensation, although our increasingly nuanced appreciation of animal cognition shows that it can take many forms.
At least three species – birds, mammals and cephalopods – with considerable evolutionary distance indeed resemble three very different systems. If machine sense arises, it won’t be by a chatbot suddenly printing a cyborg rights statement. It will come after decades of focused research, building on models and tests and successes and failures. It will not be an imitation of ourselves.
And that’s why the Turing test, while fascinating and thought-provoking, has survived its sell-by date. It doesn’t do what people think it does, rather it’s tempted to serve as a Hollywood adjunct targeting a fantasy. It swallows up the general attention that should be given to the real dangers of machine-created information. It’s the astrology of AI, not astronomy.
The term “artificial intelligence” is just as bad, as everyone from Turing knows. We’re stuck with that. But it’s time to continue the conversation and say goodbye to the brilliant Alan Turing’s least useful inheritance.