Tuesday, September 12, 2023

Can Large Language Models Reason?

I am bothered by the common use of the term Artificial Intelligence (AI) to refer to Large Language Model (LLM) products like ChatGPT. It's clear that they are good at predicting text, but people are equating that with thinking, and that's clearly wrong.

Here's a good article by Melanie Mitchell, a professor at the Santa Fe institute that looks at the reasoning capabilities of LLMs in detail. It's one of the better articles on the subject that I've seen recently.  

What should we believe about the reasoning abilities of today’s large language models?  As the headlines above illustrate, there’s a debate raging over whether these enormous pre-trained neural networks have achieved humanlike reasoning abilities, or whether their skills are in fact “a mirage.”

Reasoning is a central aspect of human intelligence, and robust domain-independent reasoning abilities have long been a key goal for AI systems. While large language models (LLMs) are not explicitly trained to reason, they have exhibited “emergent” behaviors that sometimes look like reasoning. But are these behaviors actually driven by true abstract reasoning abilities, or by some other less robust and generalizable mechanism—for example, by memorizing their training data and later matching patterns in a given problem to those found in training data? 

Why does this matter? If robust general-purpose reasoning abilities have emerged in LLMs, this bolsters the claim that such systems are an important step on the way to trustworthy general intelligence.  On the other hand, if LLMs rely primarily on memorization and pattern-matching rather than true reasoning, then they will not be generalizable—we can’t trust them to perform well on “out of distribution” tasks, those that are not sufficiently similar to tasks they’ve seen in the training data. 

No comments: