How intelligent are large language models (LLMs)?

In summary, large language models (LLMs) are highly intelligent natural language processing systems that are capable of generating human-like text and completing various language tasks with high accuracy. They are trained on massive amounts of data and use advanced algorithms to understand and generate language. While LLMs have shown impressive performance, there are concerns about their potential biases and ethical implications. Further research and development are needed to fully understand the capabilities and limitations of LLMs.
  • #1
A.T.
Science Advisor
12,571
3,717
TL;DR Summary
François Chollet argues that LLMs are not ever going to be truly "intelligent" in the usual sense -- although other approaches to AI might get there.
  • Like
  • Informative
Likes jbergman, DaveE and PeroK
Computer science news on Phys.org
  • #2
LLVM's are essentially a Magic 8-Ball, just with more outputs. How intelligent are they?
 
  • Like
  • Haha
Likes AlexB23, jbergman and jedishrfu
  • #3
Vanadium 50 said:
LLVM's are essentially a Magic 8-Ball, just with more outputs. How intelligent are they?
That's an excellent analogy @Vanadium 50 !

Magic 8 balls have a multi-sided die with writing from humans on each side and the probability of the shake determines which side will appear in the window.

PS: I once gave that analogy to Garrett Lisy to use when explaining his E8 theory with the multi-sided die representing his hyperdimensional particle that would sometimes appear as one particle or another depending on the circumstances.
 
Last edited:
  • Like
Likes Vanadium 50
  • #4
A.T. said:
TL;DR Summary: François Chollet argues that LLMs are not ever going to be truly "intelligent" in the usual sense -- although other approaches to AI might get there.

Astrophysicist Sean Caroll interviews AI researcher François Chollet:
https://www.preposterousuniverse.co...eep-learning-and-the-meaning-of-intelligence/
The fundamental problem, IMO, with Chollet's argument is that he exaggerates human intelligence. The majority of humans cannot perform objective reasoning and data analysis. Instead, most people do something akin to what ChatGPT does, only with limited, biased data, biased a priori reasoning and a good measure of dishonesty thrown in.

I saw an interesting video recently where someone asked one of the LLM's to rate all post-war UK Prime Ministers. It was a stunningly more intelligent, unbiased and objective analysis than almost any human could produce. As most humans are driven by their largely unsubstantiated political biases.

On many issues, humans are in a state of denial and our own lack of intelligence is one of them. One of the biggest dangers is the arrogance of Chollet and his assumed superiority of human thought. And his pointless quibbling over what intelligence really is. On a practical level, there is a real danger that these systems could out-think us and outwit us - using our obvious human failings against us. Especially as the human race remains divided into factions who distrust or even hate each other. On a practical level, we cannot unite to prevent catastrophic climate change. And, on a practical level, we may be susceptible to being usurped by AI.

This last point is essentially what Geoffrey Hinton has been saying. For example:



I would stress the parallel with climate change. If catastrophic climate change is a risk, then there is no point in trying to convince yourself that it can't happen. You have assume the risk is real and work on that basis. The same is true with the existential threat from AI. Who wants to bet that Chollet is right and pretend there is nothing to worry about? Like climate change, by the time we realise it is actually happening, then it's too late!
 
  • Like
  • Skeptical
Likes Ranvaldo, Bandersnatch, jbergman and 4 others
  • #5
PeroK said:
The fundamental problem, IMO, with Chollet's argument is that he exaggerates human intelligence. The majority of humans cannot perform objective reasoning and data analysis. Instead, most people do something akin to what ChatGPT does, only with limited, biased data, biased a priori reasoning and a good measure of dishonesty thrown in.
although LLMs dont add anything to this - the superiority of quantitative, statistically based decision making in many (but not all) areas was well established by Kahneman & Tversky and other cognitive psychologists in the 60s and 70s
PeroK said:
I saw an interesting video recently where someone asked one of the LLM's to rate all post-war UK Prime Ministers. It was a stunningly more intelligent, unbiased and objective analysis than almost any human could produce. As most humans are driven by their largely unsubstantiated political biases.
Yes but political views and biases represent subjective policy preferences so how do you objectively rate politicians other than by how well they implemented the preferences of their constituents?
PeroK said:
On many issues, humans are in a state of denial and our own lack of intelligence is one of them. One of the biggest dangers is the arrogance of Chollet and his assumed superiority of human thought. And his pointless quibbling over what intelligence really is. On a practical level, there is a real danger that these systems could out-think us and outwit us - using our obvious human failings against us. Especially as the human race remains divided into factions who distrust or even hate each other. On a practical level, we cannot unite to prevent catastrophic climate change. And, on a practical level, we may be susceptible to being usurped by AI.

This last point is essentially what Geoffrey Hinton has been saying. For example:



I would stress the parallel with climate change. If catastrophic climate change is a risk, then there is no point in trying to convince yourself that it can't happen. You have assume the risk is real and work on that basis. The same is true with the existential threat from AI. Who wants to bet that Chollet is right and pretend there is nothing to worry about? Like climate change, by the time we realise it is actually happening, then it's too late!

its a long way from AI being able to provide better decisions than humans to possessing the agency to implement those decisions

A good example of how far LLMs are from providing critical decision making - LLMs that can pass medical exams cannot provide reliable medical diagnoses:

https://www.nature.com/articles/s41591-024-03097-1

Abstract​

Clinical decision-making is one of the most impactful parts of a physician’s responsibilities and stands to benefit greatly from artificial intelligence solutions and large language models (LLMs) in particular. However, while LLMs have achieved excellent performance on medical licensing exams, these tests fail to assess many skills necessary for deployment in a realistic clinical decision-making environment, including gathering information, adhering to guidelines, and integrating into clinical workflows. Here we have created a curated dataset based on the Medical Information Mart for Intensive Care database spanning 2,400 real patient cases and four common abdominal pathologies as well as a framework to simulate a realistic clinical setting. We show that current state-of-the-art LLMs do not accurately diagnose patients across all pathologies (performing significantly worse than physicians), follow neither diagnostic nor treatment guidelines, and cannot interpret laboratory results, thus posing a serious risk to the health of patients. Furthermore, we move beyond diagnostic accuracy and demonstrate that they cannot be easily integrated into existing workflows because they often fail to follow instructions and are sensitive to both the quantity and order of information. Overall, our analysis reveals that LLMs are currently not ready for autonomous clinical decision-making while providing a dataset and framework to guide future studies.
 
  • Like
  • Skeptical
Likes russ_watters and PeroK
  • #6
Define what intelligence is and we can answer your question. The classical measure of intelligence has been an IQ test. Based on that, there is no doubt that LLMs are intelligent.
 
  • Like
Likes AlexB23, Tazerfish, russ_watters and 1 other person
  • #7
I get that they can be bad at math and certain types of reasoning. I have met many people who have worse capabilities in both. Should we declare those people as unintelligent and completely useless? If there's going to be a definition, it should go both ways. Otherwise it's a biased definition.

One thing is for sure, ChatGPT can stay on topic better than my spouse. :oldwink:
 
  • Like
Likes PeroK
  • #8
  • Like
Likes mattt and PeroK
  • #9
Vanadium 50 said:
LLVM's are essentially a Magic 8-Ball, just with more outputs.
No, LLVM (originally the acronynm for a low level virtual machine) is a suite of middleware that facilitates cross-platform implementation of compiled languages :-p

I often make this slip myself.
 
  • Haha
  • Like
Likes Tazerfish and Vanadium 50
  • #10
This thread seems to have become confused between large language models (LLMs), which is what the interview in the OP was about, and artificial intelligence (AI) in general.

LLMs aren't 'bad at math': they don't do any math. AlphaProof and AlphaGeometry 2, which seem to be really good at proving things, are not LLMs any more than is AlphaZero, which is really good at chess. They are all examples of AI, but none of them has any claim to being an artificial general intelligence (AGI)*.

Chollet is not "quibbling" over what intelligence is, he is simply pointing out that LLMs are designed to match patterns in the data it is fed and this limits their capability. He is not saying that human intelligence is inherently superior to AI - in fact quite the opposite: about 24 minutes in he says "so for instance, genetic algorithms if implemented the right way, have the potential of demonstrating true creativity and of inventing new things in a way that LLMs cannot, LLMs cannot invent anything 'cause they're limited to interpolations. A genetic algorithm with the right search space and the right fitness function can actually invent entirely new systems that no human could anticipate", and provides an example.

We seem to have this rabbit-hole discussion about once a month, someone should write an Insight. Oh.



* some members of the Google DeepMind team, which created ChatGPT, do make this claim for it in https://arxiv.org/abs/2311.02462
 
Last edited:
  • Like
Likes mattt, jbergman, russ_watters and 3 others
  • #11
pbuk said:
We seem to have this rabbit-hole discussion about once a month, someone should write an Insight. Oh.
Just because we have that Insight doesn't mean that everything it says is valid. Nor should it silence debate on the topic.
 
  • #12
pbuk said:
Chollet is not "quibbling" over what intelligence is, he is simply pointing out that LLMs are designed to match patterns in the data it is fed and this limits their capability.
Exactly. Knowing the limits of a technology is important when you use it.

He is directly quantifying the abilities of AIs in general intelligence using the ARC tests, where average humans do much better than LLMs. These tests consist of novel questions, that haven not been widely distributed on the Internet yet, so LLMs cannot have been trained on them.
 
  • Like
Likes jbergman and russ_watters
  • #13
PeroK said:
The fundamental problem, IMO, with Chollet's argument is that he exaggerates human intelligence. The majority of humans cannot perform objective reasoning and data analysis. Instead, most people do something akin to what ChatGPT does, only with limited, biased data, biased a priori reasoning and a good measure of dishonesty thrown in.

I saw an interesting video recently where someone asked one of the LLM's to rate all post-war UK Prime Ministers. It was a stunningly more intelligent, unbiased and objective analysis than almost any human could produce. As most humans are driven by their largely unsubstantiated political biases.
But a LLM is just a statistical cross section of human generated data. It's data analysis of those inputs - those human-biased opinions. It isn't doing any political analysis at all, objective or otherwise - which I agree with @BWV ; there is inherently no such thing.

I wonder if they asked it what political party it supports.
 
  • #14
russ_watters said:
But a LLM is just a statistical cross section of human generated data. It's data analysis of those inputs - those human-biased opinions. It isn't doing any political analysis at all, objective or otherwise
It doesn't matter how it does it. That's the point. You can quibble that it's not intelligent all you like. It does stuff that stands up to objective analysis.

That it has no independent political opinion is not in itself a lack of intelligence.

It's not human, but it is intelligent.
 
  • Skeptical
  • Like
Likes jbergman, russ_watters and Filip Larsen
  • #15
PeroK said:
It does stuff that stands up to objective analysis.
Objective analysis is exactly what Chollet is applying to LLMs and other AI approaches. And it shows that LLMs are good at interpolating what has been fed into them, but not good (worse than humans) at extrapolating from it.

But even in the interpolation part LLMs are not very efficient, given that you have to feed them much more text a human could ever read, just to make them sound like a human.
 
  • Like
Likes jbergman, russ_watters and PeterDonis
  • #16
Some of the latest LLM's has been shown to have emergent capabilities for analogical reasoning that I assume is only going to get better which is my primary reason to consider current LLM's "intelligent", but since it is such a difficult term to agree on (even in pure human context) perhaps it in a sense would be more productive for discussions to characterize LLM's (when they are producing correct output) as "wise" rather than "intelligent"?
 
  • Like
Likes PeroK
  • #17
PeroK said:
It does stuff that stands up to objective analysis.
Yes, but that doesn't mean the stuff it does qualifies as "intelligent". You say:

PeroK said:
The majority of humans cannot perform objective reasoning and data analysis. Instead, most people do something akin to what ChatGPT does, only with limited, biased data, biased a priori reasoning and a good measure of dishonesty thrown in.
But you are not saying here that ChatGPT is intelligent; you are saying that humans (or at least most of them) are not intelligent, at least not in the domain of activity under discussion. And in any case "objective reasoning and data analysis" is only one of many possible definitions of "intelligent", and IMO it's a very narrow one. Humans do lots of things that can reasonably be called intelligent but which don't fall into that category.
 
  • #18
Filip Larsen said:
Some of the latest LLM's has been shown to have emergent capabilities for analogical reasoning that I assume is only going to get better which is my primary reason to consider current LLM's "intelligent" ...
And this is what many people refuse to acknowledge. At PF we have declared that LLM's are no more intelligent than a washing machine and that you may as well ask a toaster to help with your homework. Meanwhile, from lay people to high school students to researchers, many are getting "intelligent" and insightful answers from them. Even if those answers are not perfect.

It doesn't matter than we have an Insight that "proves" that ChatGPT is unreliable - in practical terms it is more reliable than almost any individual human being.

And, the speed of developmemt is such that in all the areas where we still claim human superiority (like medical diagnoses), it's only a matter of time before the LLM's can outdo the human experts. Or, at least, that is far more likely than that the development grinds to a halt because the vital spark of human intelligence is missing and cannot be simulated.
 
  • Skeptical
  • Like
Likes russ_watters and Filip Larsen
  • #19
PeroK said:
I saw an interesting video recently where someone asked one of the LLM's to rate all post-war UK Prime Ministers. It was a stunningly more intelligent, unbiased and objective analysis than almost any human could produce.
How do you know? On what basis is this claim made?
 
  • #20
There are cases where LLMs have decoded ancient text that no human had ever decoded before. If they are just parroting back what they were trained on, that couldn't happen. What they are doing when training is building a model of the world inside their neural networks, just like the model of the world that you build when you train your neural network by interacting with the environment. So I think the continued cries of "they can't be intelligent because they are not human!!" are missing the mark.
 
  • Like
Likes mattt, Bandersnatch, Borg and 1 other person
  • #21
PeterDonis said:
How do you know? On what basis is this claim made?
I used my intelligence to judge the analysis.
 
  • #22
PeroK said:
in practical terms it is more reliable than almost any individual human being.
But when it comes to getting answers to hard questions, we don't just ask a random individual human being. We draw on cumulative knowledge and understanding that has been built up over a long time. For example, the knowledge I draw on to answer questions in the relativity forum comes from textbooks and peer-reviewed papers that describe detailed theoretical models and key experiments, plus my own personal experience of living in a curved spacetime. LLMs don't have anything like that. Sure, the text of Misner, Thorne & Wheeler might be in its training data, but the word frequency mining it's doing on that data is very different from what I'm doing when I understand the equations and use them to work problems. It's not even the same as what, for example, Wolfram Alpha is doing when you feed it a question.
 
  • Like
Likes pbuk and russ_watters
  • #23
PeroK said:
I used my intelligence to judge the analysis.
How? You have been saying humans aren't intelligent. But now you're saying you are?
 
  • #24
PeroK said:
It doesn't matter how it does it. That's the point. You can quibble that it's not intelligent all you like. It does stuff that stands up to objective analysis.
Who is doing the judging? This entire line of reasoning is circular. It's programmed by humans, gathers and analyzes the opinions of humans and then is judged on its "objectivity" in how it aggregates those opinions by humans.

What's worse is this: There's literally no such thing as completely objective when it comes to politics. Much of what separates positions is pure opinion.

But at least we should all be able to agree, objectively, that the Dallas Cowboys suck and any LLM that disagrees was improperly coded.
PeroK said:
That it has no independent political opinion is not in itself a lack of intelligence.
I agree with that. What makes it not intelligent is that it doesn't think. My calculator is better at math than I am, but that's not the bar most people set for defining "AI".

prior post:
The fundamental problem, IMO, with Chollet's argument is that he exaggerates human intelligence.
There's more than one kind of intelligence. A spreadsheet is orders of magnitude more intelligent than I am if we judge each of us on our ability to do math. But computers still suck at coordinating movement, which humans barely have to think to do. The trick with AI, with the specific aim of replacing humans, is not in how smart they are.
 
  • Skeptical
Likes PeroK
  • #25
PeterDonis said:
But when it comes to getting answers to hard questions, we don't just ask a random individual human being. We draw on cumulative knowledge and understanding that has been built up over a long time. For example, the knowledge I draw on to answer questions in the relativity forum comes from textbooks and peer-reviewed papers that describe detailed theoretical models and key experiments, plus my own personal experience of living in a curved spacetime. LLMs don't have anything like that. Sure, the text of Misner, Thorne & Wheeler might be in its training data, but the word frequency mining it's doing on that data is very different from what I'm doing when I understand the equations and use them to work problems. It's not even the same as what, for example, Wolfram Alpha is doing when you feed it a question.
That was your training data and you use your algorithms (whatever they are) to process what you know and answer questions in a given context. That process has been simulated by LLM's. If I want to ask a question about GR, I don't care how the answer is generated. I'm only interested in the quality of the answer. It may be that your answers are still superior to an LLM. And, perhaps that will always be the case. Personally, however, I doubt that. Moreover, an LLM has all the advantages that IT has over humans - 24x7, instant replies and unlimited patience (!)- that may balance the equation in its favour, even if your eventual answer is objectively somewhat superior in content.
 
  • #26
russ_watters said:
Who is doing the judging? This entire line of reasoning is circular.
I am. That line of reasoning comes from an intelligent, educated human (me). So, your argument is against human intelligence (mine). If I am incapable of intelligent reasoning, then where does that leave us?
 
  • Haha
Likes russ_watters
  • #27
PeroK said:
That was your training data and you use your algorithms (whatever they are) to process what you know and answer questions in a given context. That process has been simulated by LLM's.
You have no basis for this claim since you don't know what human brains do with their input data. I strongly doubt that whatever human brains are doing with input data is anywhere near as simple as what LLMs do with their input data. For one thing, human input data is much more varied than text, which is all that LLMs can take as input.
 
  • #28
PeroK said:
If I want to ask a question about GR, I don't care how the answer is generated. I'm only interested in the quality of the answer.
But if you don't know how the answer is generated, you have only your own prior knowledge of the subject to use in judging the quality of the answer. Which makes the answer useless; you can only judge its correctness if you already know what's correct.

With a GR textbook, OTOH, you know a lot about how its content is generated, which means you don't have to just rely on your own prior knowledge. You can learn from a textbook by accepting the fact that it contains a lot of information that is accurate--because of the process that produced it--but which you don't currently have. You can't learn from an LLM that way.
 
  • #29
PeroK said:
If I am incapable of intelligent reasoning, then where does that leave us?
You tell us. You are the one who has been arguing that humans are incapable of intelligent reasoning.
 
  • Like
Likes russ_watters
  • #30
PeroK said:
And this is what many people refuse to acknowledge. At PF we have declared that LLM's are no more intelligent than a washing machine and that you may as well ask a toaster to help with your homework.

...in practical terms it is more reliable than almost any individual human being.
This hyperbole is not helpful. Nobody is claiming LLMs can't give useful answers. Or even that they can do it less often than an unfiltered cross section of the internet. The difference between ChatGPT and PF is that on ChatGPT you are asking a filtered cross section of the internet what it thinks of Relativity whereas on PF you are asking physics professors. Or for homework help -- well, ChatGPT doesn't do homework help, does it? It just gives answers.

It may be that your answers are still superior to an LLM. And, perhaps that will always be the case. Personally, however, I doubt that.

We agree on that. And when that happens, we should definitely revisit our policy.
 
Last edited:
  • #31
phyzguy said:
There are cases where LLMs have decoded ancient text that no human had ever decoded before. If they are just parroting back what they were trained on, that couldn't happen. What they are doing when training is building a model of the world inside their neural networks, just like the model of the world that you build when you train your neural network by interacting with the environment. So I think the continued cries of "they can't be intelligent because they are not human!!" are missing the mark.
I mean....a LLM figuring out a language seems like a task pretty well in its wheelhouse.
 
  • Like
Likes pbuk
  • #32
PeroK said:
I am. That line of reasoning comes from an intelligent, educated human (me). So, your argument is against human intelligence (mine). If I am incapable of intelligent reasoning, then where does that leave us?
Lol, I didn't say you aren't intelligent, I said you aren't objective.
 
  • #33
Is the question easier to agree on if flipped around: What kind abilities or qualities should a (future) model exhibit to be considered of, say, average human intelligence?

I think perhaps the OP question (or the "flipped" question) is not really that interesting in regards to LLM's. Research (driven by whatever motive) will by all accounts drive LLM to be more and more likely to produce what most would consider output generated by an intelligence (i.e. a sort of "intelligence is in the eye of the beholder" measure) and the interesting question in that context seems more to be if this path of evolution will be "blocked" by some fundamental but so-far undiscovered mechanism or model structure.

Compare, if you will, with study of animal intelligence. Clearly the mere presence of a brain in an animal does not imply it is able to what we would classify as intelligence, but some animals (e.g. chimpanzees) are clearly able to exhibit intelligent or at least very adaptive behavior in their domain even if they cannot be trained to explain general relativity. In that context I guess my question becomes what set of mechanisms or structures in the human brain, compared to such an animal brain, makes it qualitatively more capable of intelligent behavior? Considering homo sapiens have a common evolutionary ancestor with every species on this planet then I can only see the significant difference be structure and scale of the brain. And if so, why shouldn't LLM's with the right size and structure not also be able to achieve human level intelligence via such an evolutionary path? I am not saying such a path is guaranteed to be found, more that such as path has already been shown to exist and be evolutionary reachable in the example of homo sapiens so why not also with LLM's as a starting point?

(I realize discussion of the level of intelligence exhibited by possible future models are not what the OP posed as question, but, just to repeat myself, since we have such trouble answering that question maybe it is more easier or relevant to discuss if there is anything fundamentally blocking current models to evolve to a point where everyone would agree yes, now the behavior is intelligent).
 
  • #34
Filip Larsen said:
why shouldn't LLM's with the right size and structure not also be able to achieve human level intelligence via such an evolutionary path?
It took millions of years for humans to evolve whatever intelligence we have, and that was under selection pressures imposed by the necessity of surviving and reproducing in the real world. Whatever "evolution" is being used with LLMs has been happening for a much, much shorter time and under very different selection pressures. So I don't see any reason to expect LLMs to achieve human level intelligence any time soon on these grounds.
 
  • #35
PeterDonis said:
Whatever "evolution" is being used with LLMs has been happening for a much, much shorter time and under very different selection pressures.
Yes, but with an artificial selection pressure much more focused on optimizing towards behavior (output) we will consider intelligent. In our research for general AI we are able to established a much more accelerated evolution encompassing scales and structures that may vary wildly over few "generations" only limited by hardware efficiency and energy consumption at each cycle, i.e. without also having the selective pressures for a body to survive and compete in a physical environment.

But, as mentioned, my question was not really about how long an evolutionary path towards general AI would take but more if there should be any reason why such a path should not exists using more or less the known LLM mechanisms as a starting point. Or put differently: if LLM's already now can exhibit the emergence reasoning by analogy (which I understand is both surprising and undisputed), it is hard for me to see why other similar traits that we consider to be part of "intelligent lines of thought" could not also be emergent at some scale or structure of the models.
 
Back
Top