Can AI reason?
Plenty of AI experts will tell you the answer, at least for current frontier models is: No.
For example, in this discussion, Yann LeCun, chief AI scientist at Meta, says LLM (large language model) reasoning is “very, very primitive” because the amount of computation the LLM performs is the same regardless of the complexity of the question being asked. He contrasts this with human thought, where the amount of time we take to answer and the number of reasoning steps we go through depend on the complexity of the problem.
But LeCun also uses the example of the chess grandmaster, who through years of practice has learned to respond very quickly (you could say “instinctively”) to a chess position. Their thinking has moved from “System 2” to “System 1”, as theorised by Daniel Kahneman. So LeCun is acknowledging that human thinking works very differently depending on which particular human is doing the thinking, and how practised they are at the task at hand.
And we still don’t have a very clear understanding of how humans think. We like to believe our minds are not a total “black box” like an LLM. But our own box is probably at best a darkish shade of grey. When I was an experimental psychology undergrad at Oxford in the 90s, I learned that firm conclusions about human cognition were hard to come by. The majority of our thought processes were “poorly understood” and “required further research”. 25+ years later, a great deal more research is still needed.
Kahneman and many others have at least shown that humans are far from consistently logical thinkers. We are error-prone. So are LLMs. But perhaps LLMs are error-prone in different ways to humans.
Age of reasoning
In a recent internal presentation, according to Bloomberg, OpenAI set out 5 levels of AI advancement on their path towards AGI as follows:
Level 1. Chatbots: AI with conversational language
Level 2. Reasoners: human-level problem solving
Level 3. Agents: systems that can take actions
Level 4. Innovators: AI that can aid in invention
Level 5. Organisations: AI that can do the work of an organisation
OpenAI executives said the company is now on the cusp of reaching the second level, Reasoners. This “refers to systems that can do basic problem-solving tasks as well as a human with doctorate-level education who doesn’t have access to any tools.”
As reported by Reuters, an OpenAI project called Strawberry is a key part of getting to “Reasoners” level. Strawberry has similarities to a method developed at Stanford in 2022 called “Self-Taught Reasoner” or “STaR”, one of the Reuters sources said. STaR enables AI models to bootstrap their own intelligence levels by iteratively creating their own training data.
Dead parrots
LLMs learn the rules of language, and many more concepts about the world, through a statistical training process. We can say this is “merely statistical” or “merely pattern recognition” or even “merely stochastic parroting”, and the learning processes and mechanisms are indeed different to how humans learn, but it would probably be overstating it to say they are completely different. Like LLMs, humans learn iteratively with feedback loops. And LLMs’ fundamental machine learning mechanism of neural networks is inspired by (although not the same as) our brain’s neuronal architecture.
When we learn language as children, we are learning a set of complex and often messy “rules”, yet most of us couldn’t state (even as adults) what all those rules are. That doesn’t mean we can’t reliably apply those rules when we speak. Similarly, we learn to apply many rules of logic, usually without being able to formally state those rules.
We can only infer how a human thinks, or what they know, by either observing their actions/outputs or trusting their self-reporting of their internal thought process. If we’re talking about the specific human who is me, I can seemingly “introspect” my own thoughts and the steps I take as I try to solve a tricky problem.
When it comes to LLMs, we can infer how they “think” by observing their outputs, and by analysing “features” within the model itself (a process that is central to developing “explainable AI”, a very nascent field). We might be tempted to ask the an LLM chatbot to explain its own thought process, but with current models we almost certainly can’t trust them to “introspect” accurately. They don’t “have access” to their own thinking processes in the same way that humans do (or at least we feel that we do!)
But as we start to collaborate with AI, it seems pretty essential for us to be able to understand a bit more about “how they think” and the type of reasoning that AI is capable of.
Beyond reasonable doubt
I recently challenged Anthropic’s Claude 3.5 Sonnet to help me solve a logical problem relating to a fictional football team. While I was overall impressed by how well Claude appeared to understand and attempt to solve the problem, a key learning for me was that the chatbot was misleading about the steps it would take to reach a solution. It was able to write Python code to solve the problem, and asked if I wanted to run the code, but when it ran the code and displayed the results they turned out to be an illusion. Claude had not in fact run the code at all, because Claude (for now) cannot run Python code.
Subsequently I re-ran the experiment, taking the code that Claude generated and running it myself. While it didn’t work at first, I found that by providing feedback to Claude about the errors I got, the chatbot was able to iteratively improve the code and quite quickly got to a pretty good solution.
The next generation of AI tools will have these processes built in. They will have a better understanding of what they can and can’t do. They will be able to execute cognitive tasks step by step, rather than just say that they do that. They will be able to explore multiple potential answers simultaneously and select the best one. They will be able to plan, test and self-improve. They will get much better at thinking before they speak. And maybe then we will really start to trust in their ability to reason.
How do you find AI models’ ability to reason? Please do share your experiences and thoughts.