Is the Turing Test Dead?

When Alan Turing first proposed an approach to distinguish the “minds” of machines from those of human beings in 1950, the idea that a machine could ever achieve human-level intelligence was almost laughable.

In the Turing test—which Turing himself originally called the “imitation game“—human participants conduct a conversation with unknown users to determine if they’re talking to a human or a computer. In 2014, a chatbot masquerading as a Ukrainian teenager named Eugene Goostman seemed to put one of the first nails in the Turing test’s coffin by fooling more than one-third of human interrogators into thinking they were talking to another human, although some researchers dispute the claim that the chatbot passed the test.

Today, we run into seemingly intelligent machines all day long. Our smart speakers tell us to bring umbrellas on our way out the door and large language models (LLMs) like ChatGPT can write promotion-worthy emails. Stacked up against a human, it might be easy to confuse these machines for the real thing.

Does this mean the Turing test is a thing of the past?

In a new paper published 10 November in the journal Intelligent Computing, a pair of researchers have proposed a new kind of intelligence test that treats machines as participants of a psychological study to determine how closely their reasoning skills match those of human beings. The researchers are Philip Johnson-Laird, a Princeton psychology professor and pioneer of the mental model of human reasoning, and Marco Ragni, a professor of predictive analytics at Chemnitz University of Technology in Germany.

“As chatbots have approached and succeeded at the Turing test, it has quietly slipped away from importance.” —Anders Sandberg, University of Oxford

In their paper, Johnson-Laird and Ragni argue that the Turing test was never a good measure of machine intelligence in the first place, as it fails to address the process of human thinking.

“Given that such algorithms do not reason in the way that humans do, the Turing test and any others it has inspired are obsolete,” they write.

This assertion is one that Anders Sandberg, a senior research fellow at the University of Oxford’s Future of Humanity Institute, says he agrees with. That said, he’s not convinced that a human reasoning assessment will be the ultimate test of intelligence either.

“As chatbots have approached and succeeded at the Turing test, it has quietly slipped away from importance,” Sandberg says. “This paper tries to see if a program reasons the way humans reason. That is both interesting and useful, but will of course only tell us if there is human-style intelligence, not some other form of potentially valuable intelligence.”

Likewise, even though Turing tests may be going out of fashion, Huma Shah, an assistant professor of computing at the University of Coventry whose research has focused on the Turing test and machine intelligence, says that doesn’t necessarily mean they’re no longer useful.

“In terms of indistinguishability, no, [the Turing test is not obsolete],” Shah says. “You can apply indistinguishability to other areas where we would want a machine’s performance to be as good as, or better than a human carrying out that task efficiently and ethically. For example, in facial recognition, or the ability to drive safely while avoiding hurting passengers and pedestrians.”

As for Johnson-Laird and Ragni’s test, it would be carried out in three steps. First, machines would be asked a number of questions to test their own reasoning—for example, they could be asked, “If Ann is intelligent, does it follow that Ann is intelligent or she is rich, or both?” They would then be tested on whether or not they understood their own reasoning—such as responding, “Nothing in the premise supports the possibility that Ann is rich.” Finally, researchers would take a look under the machines’ hoods to determine whether the neural networks are built to simulate human cognition.

This last step is where Sandberg worries there could be complications.

“The last step can be very hard,” he says. “Most LLMs are vast neural networks that are not particularly inspectable, despite much research on how to do this.”

Translating a machine’s internal representation of reasoning into a form that humans can understand may even ultimately distort the original nature of the machine’s thought process, Sandberg says. In other words, would we recognize a machine’s interpretation of human reasoning if we saw it?

This question is especially complicated as the science of human cognition itself isn’t yet set in stone.

While replacing the Turing test may not be a simple process, Shah says that alternatives like this reasoning test have the opportunity to advance how we think about these big questions like what it means to be human. They may also help shed light on what it means to be a computer as well, such as what processes take place inside a neural network’s black box.

“If new tests for human-machine indistinguishability progress machine ‘explainability’—for example, the ‘reasoning’ in algorithms that render their decision-making comprehensible to the general public, such as in financial algorithms for insurance, mortgages, loans, etc., then this objective is an invaluable contribution to progressing intelligent machinery,” Shah says.

Source: IEEE Spectrum Computing