Insights Why ChatGPT AI Is Not Reliable

PeterDonis · Jul 5, 2023

I’ll start with the simple fact: ChatGPT is not a reliable answerer to questions.
To try to explain why from scratch would be a heavy lift, but fortunately, Stephen Wolfram has already done the heavy lifting for us in his article, “What is ChatGPT Doing… and Why Does It Work?” [1] In a PF thread discussing this article, I tried to summarize the key message of Wolfram’s article as briefly as I could. Here is what I said in my post there [2]:
ChatGPT does not make use of the meanings of words at all. All it is doing is generating text word by word based on relative word frequencies in its training data. It is using correlations between words, but that is not the same as correlations in the underlying information that the words represent (much less causation). ChatGPT literally has no idea that the words it strings together represent anything.
In other words, ChatGPT is not designed to answer questions or provide information. It is explicitly designed not to do those things, because...

Bystander · Jul 5, 2023

Call it what it is, "Artificial 'William's Syndrome.'" https://www.google.com/search?q=wil...99i465i512.19081j1j7&sourceid=chrome&ie=UTF-8

..., pre-politically correct characteristics included "often precocious vocabulary with no apparent 'real understanding/ability' for use/application/reasoning." That is my recollection from Googling ten-fifteen years ago; ymmv.

This is
https://www.physicsforums.com/threa...an-appropriate-source-for-discussion.1053525/
another/one more case; some wiki/google sources lack "shelf life."

AndreasC · Jul 5, 2023

How do we know at what point it "knows" something? There are non-trivial philosophical questions here... These networks are getting so vast and their training so advanced that I can see someone eventually arguing they have somehow formed a decent representation of what things "are" inside them.

Of course chatGPT is not reliable but honestly I was surprised at some of the things that it can do. I was really surprised when I fed it some relatively long and complicated code and asked what it did. It was able to parse it rather accurately, suggest what problem it was supposed to solve, and then suggest specific optimizations. And now it is said GPT-4 significantly improves over it. It's pretty impressive, and somewhat disconcerting given that people always look for the worst possible way to use something first.

PeterDonis · Jul 5, 2023

AndreasC said:

How do we know at what point it "knows" something? There are non-trivial philosophical questions here

Perhaps, but they are irrelevant to this article. The article is not about an abstract philosophical concept of "knowledge". It is about what ChatGPT is and is not actually doing when it emits text in response to a prompt.

AndreasC said:

I can see someone eventually arguing they have somehow formed a decent representation of what things "are" inside them

Not as long as there are no semantic connections between the network and the world. No entity forms "representations" of actual things just by looking at relative word frequencies in texts. There has to be two-way interaction with the actual world. That's how, for example, we humans form our mental representations of things. We interact with them and learn how they work.

AndreasC · Jul 6, 2023

PeterDonis said:

Perhaps, but they are irrelevant to this article. The article is not about an abstract philosophical concept of "knowledge". It is about what ChatGPT is and is not actually doing when it emits text in response to a prompt.Not as long as there are no semantic connections between the network and the world. No entity forms "representations" of actual things just by looking at relative word frequencies in texts. There has to be two-way interaction with the actual world. That's how, for example, we humans form our mental representations of things. We interact with them and learn how they work.

We definitely learn about lots of things by just reading about them...

I think lots of people don't give enough credit to what it does. It can already give accurate answers about a wide range of questions, pass tests etc and, importantly, answer new problems it has not been specifically trained on. I always thought somebody knows something if they can not only recall the facts, but also apply them in new contexts.

Of course you can argue that it doesn't really know things because
, well, it doesn't have a consciousness, and it doesn't reason or learn in the exact same sense that people do. But imo this is not related much to whether or not it is reliable. It is NOT reiable, but it may well become significantly more reliable. Allegedly, GPT-4 already is much more reliable. In a few years, I expect that it would be no more unreliable than asking a human expert (who are, of course, not completely reliable). At that point, would you still say it is unreliable because it doesn't really know, or that it now knows?

We should pay more attention and be a little more concerned, because honestly I didn't believe it would reach this point yet. Not because of any "AI singularity" nonsense but because it may very well affect the way society views and uses knowledge in radical ways. Plus because it has a sizeable environmental footprint.

AndreasC · Jul 6, 2023

Ok, I think I should probably qualify the "as reliable as a human expert in a few years" a bit, because I think stated like that it is a bit too much. I meant to say as reliable when it comes to factual recollection that involves only a little bit (but still a non-trivial amount) of actual reasoning.

Demystifier · Jul 6, 2023

In my view, the right question is not why ChatGPT is not reliable. Given the general principles how it works, the right question is: Why is it more reliable than one would expect? I think even the creators of it were surprised how good it was.

pintudaso · Jul 6, 2023

ChatGPT, like any AI language model, has certain limitations that can affect its reliability in certain situations. Here are some reasons why ChatGPT may not always be considered reliable:

Lack of real-time information
Dependence on training data
Inability to verify sources
Limited context understanding
Biased and offensive content

It's important to approach AI language models like ChatGPT with a critical mindset and to independently verify information obtained from such models when accuracy is crucial. While ChatGPT can be a valuable tool for generating ideas, providing general information, or engaging in casual conversation, it's always advisable to cross-reference and fact-check important or sensitive information from reliable sources.

Rive · Jul 6, 2023

Demystifier said:

Given the general principles how it works, the right question is: Why is it more reliable than one would expect?

I would push that a bit further: if that thing (working as-is) looks so reliable almost in a humane way, then how many people might live off on the same principles? Confidence tricking through most communication?
What about our performance?

PeterDonis · Jul 6, 2023

AndreasC said:

We definitely learn about lots of things by just reading about them...

That's because our minds have semantic connections between words and things in the world. When we read words, we make use of those connections--in other words, we know that the words have meanings, and what those meanings are. If we get the meanings of words wrong, we "learn" things that are wrong.

ChatGPT has none of this. It has no connections between words and anything else. It doesn't even have the concept of there being connections between words and anything else. The only information it uses is relative word frequencies in its training data.

AndreasC said:

It can already give accurate answers about a wide range of questions

No, it can't. It can get lucky sometimes and happen to give an "answer" that happens to be accurate, but, as you will quickly find out if you start looking, it also happily gives inaccurate answers with the same level of confidence. That's because it's not designed to give accurate answers to questions; that's not what it's for.

AndreasC said:

pass tests

Only because the "tests" are graded so poorly that even the inaccurate but confident-sounding responses that ChatGPT gives "pass" the tests. That is a reflection of the laziness and ignorance of the test graders, not of the knowledge of ChatGPT.

AndreasC said:

answer new problems it has not been specifically trained on

Sure, because it can generate text in response to any prompt whatever. But the responses it gives will have no reliable relationship to reality. Sometimes they might happen to be right, other times they will be wrong, often egregiously wrong. But all of the responses seem just as confident.

AndreasC said:

I always thought somebody knows something if they can not only recall the facts, but also apply them in new contexts.

ChatGPT does not and cannot do these things. What it does do is, as a side effect of its design, produce text that seems, to a naive observer, to be produced by something that does these things. But the illusion is quickly shattered when you start actually checking up on its responses.

PeterDonis · Jul 6, 2023

Demystifier said:

Why is it more reliable than one would expect?

Is it? How would one even determine that?

PeterDonis · Jul 6, 2023

Rive said:

how many people might live off on the same principles? Confidence tricking through most communication?

Yes, I think one way of describing ChatGPT is that it is crudely simulating a human con artist: it produces statements that seem to come from an entity that is knowledgeable, but actually don't.

AndreasC · Jul 6, 2023

PeterDonis said:

That's because our minds have semantic connections between words and things in the world. When we read words, we make use of those connections--in other words, we know that the words have meanings, and what those meanings are. If we get the meanings of words wrong, we "learn" things that are wrong.

ChatGPT has none of this. It has no connections between words and anything else. It doesn't even have the concept of there being connections between words and anything else. The only information it uses is relative word frequencies in its training data.No, it can't. It can get lucky sometimes and happen to give an "answer" that happens to be accurate, but, as you will quickly find out if you start looking, it also happily gives inaccurate answers with the same level of confidence. That's because it's not designed to give accurate answers to questions; that's not what it's for.Only because the "tests" are graded so poorly that even the inaccurate but confident-sounding responses that ChatGPT gives "pass" the tests. That is a reflection of the laziness and ignorance of the test graders, not of the knowledge of ChatGPT.Sure, because it can generate text in response to any prompt whatever. But the responses it gives will have no reliable relationship to reality. Sometimes they might happen to be right, other times they will be wrong, often egregiously wrong. But all of the responses seem just as confident.ChatGPT does not and cannot do these things. What it does do is, as a side effect of its design, produce text that seems, to a naive observer, to be produced by something that does these things. But the illusion is quickly shattered when you start actually checking up on its responses.

The semantic connections you are talking about are connections between sensory inputs and pre-existing structure inside our brains. You're just reducing what it's doing to the bare basics of its mechanics, but its impressive behavior comes about because of how massively complex the structure is.

I don't know if you've tried it out, but it doesn't just "get lucky". Imagine a student passing one test after another, would you take someone telling you they only "got lucky" seriously, and if yes, how many tests would it take? Plus, it can successfully apply itself to problems it never directly encountered before. Yes, not reliably, but enough that it's beyond "getting lucky".

You talk about it like you haven't actually tried it out. It's not at all the same as previous chatbots, it has really impressive capabilities. It can give you correct answers to unambiguous questions that are non-trivial and that it has not specifically encountered before in its training. And it can do that a lot, repeatably. Nothing to do with how confident it sounds, I am talking about unambiguously correct answers.

Again, I'm not saying it is reliable, but you are seriously downplaying its capabilities if you think that's all it does and I encourage you to try it out for yourself. Especially when it comes to programming, it is incredible. You can put in it complicated code that is undocumented, and it can explain to you what the code does exactly, what problem it probably was intended for, and how to improve it, and it works a lot of the time, much more frequently than "luck".

If all you want to say is that it isn't right all the time, then yeah, that's true. It's very, very frequently wrong. But that has little to do with what you are describing. It could (and will) improve significantly on accuracy, using the same mechanism. And practically, what you are saying doesn't matter. A database doesn't "know" what something is either in your sense of the word, neither does a web crawler, or anything like that. That doesn't make them unreliable. Neither is a human reliable because they "know" something (again going by your definition).

ChatGPT is unreliable because we observe it to be unreliable. That requires no explanation. What does require explanation is why, as @Demystifier said, it is so much more reliable (especially at non trivial, "reasoning" type problems) than you would naively expect.

AndreasC · Jul 6, 2023

PeterDonis said:

Is it? How would one even determine that?

Try it. Feed it questions which have unambiguous answers. You'll see that even though sometimes it generates nonsense, very, VERY frequently it gives right answers. Amusingly, one thing it does struggle with a bit is arithmetic. But it is getting better. Seriously though, try it.

PeterDonis · Jul 6, 2023

AndreasC said:

The semantic connections you are talking about are connections between sensory inputs and pre-existing structure inside our brains.

Not necessarily pre-existing. We build structures in our brains to represent things in the world as a result of our interactions with them. ChatGPT does not. (Nor does ChatGPT have any "pre-existing" structures that are relevant for this.)

AndreasC said:

Imagine a student passing one test after another, would you take someone telling you they only "got lucky" seriously

If the reason they passed was that their graders were lazy and didn't actually check the accuracy of the answers, yes. And that is exactly what has happened in cases where ChatGPT supposedly "passed" tests. If you think graders would never be so lazy, you have led a very sheltered life. It's just a more extreme version of students getting a passing grade on a book report without ever having read the book, and I can vouch for that happening from my own personal experience.

AndreasC said:

It can give you correct answers to unambiguous questions that are non-trivial and that it has not specifically encountered before in its training. And it can do that a lot, repeatably.

Please produce your evidence for this claim. It is contrary to both the analysis of how ChatGPT actually works, which I discuss in the Insights article, and the statements of many, many people who have used it. Including many posts here at PF where people have given ChatGPT output that is confident-sounding but wrong.

AndreasC said:

ChatGPT is unreliable because we observe it to be unreliable.

Doesn't this contradict your claim quoted above?

AndreasC said:

That requires no explanation.

The fact that it is observed to be unreliable is just a fact, yes. But in previous discussions of ChatGPT here at PF, it became clear to me that many people do not understand how ChatGPT works and so do not understand both that it is unreliable and why it is unreliable. That is why I wrote this article.

AndreasC said:

What does require explanation is why, as @Demystifier said, it is so much more reliable (especially at non trivial, "reasoning" type problems) than you would naively expect.

And I have already responded to @Demystifier that such a claim is meaningless unless you can actually quantify what "you would naively expect" and then compare ChatGPT's actual accuracy to that. Just saying that subjectively it seems more accurate than you would expect is meaningless.

PeterDonis · Jul 6, 2023

AndreasC said:

Try it. Feed it questions which have unambiguous answers. You'll see that even though sometimes it generates nonsense, very, VERY frequently it gives right answers.

This does not seem consistent with many posts here at PF by people who have tried ChatGPT and posted the output. The general sense I get from those posts is that ChatGPT was less reliable than they expected--because they did not realize what it is actually doing and not doing. For example, apparently many people expected that when you asked it a factual question about something in its training data, it would go look in its training data to find the answer. But it doesn't, even if the right answer is in its training data. Wolfram's article, referenced in my Insights article, makes all this clear.

AndreasC · Jul 6, 2023

PeterDonis said:

This does not seem consistent with many posts here at PF by people who have tried ChatGPT and posted the output. The general sense I get from those posts is that ChatGPT was less reliable than they expected--because they did not realize what it is actually doing and not doing. For example, apparently many people expected that when you asked it a factual question about something in its training data, it would go look in its training data to find the answer. But it doesn't, even if the right answer is in its training data. Wolfram's article, referenced in my Insights article, makes all this clear.

Have YOU tried it? People often post more when it gets something wrong. For instance, people have given it SAT tests:

https://study.com/test-prep/sat-exam/chatgpt-sat-score-promps-discussion-on-responsible-ai-use.html

Try giving it a SAT test yourself if you don't trust that.

phinds · Jul 6, 2023

pintudaso said:

Limited context understanding

That is incorrect. "Limited understanding" implies that there is at least SOME understanding but chatGPT has zero understanding of anything.

russ_watters · Jul 6, 2023

I suspect ChatGPT has infiltrated this thread...

Edit: Btw, While I'm not certain of this, here's how I can often tell: it's the lack of focus in the responses. When the content is dumped into the middle of an ongoing conversation, it doesn't acknowledge or respond to the ongoing conversation, it just provides generic information that is often not useful for/connected to the discussion.

russ_watters · Jul 6, 2023

Demystifier said:

In my view, the right question is not why ChatGPT is not reliable. Given the general principles how it works, the right question is: Why is it more reliable than one would expect?

PeterDonis said:

Is it? How would one even determine that?

I think it's just a qualitative feeling, but I feel the same way. When first learning about it, it never occurred to me that it didn't access stored information (either its own or 3rd party) to form its replies*. Now that I know it doesn't, it surprises me that it gets so much right. If it's just doing word association and statistical analysis, I'm surprised that asking about Independence Day doesn't return "On July 4, 1776 Will Smith fought a group of alien invaders before signing the Declaration of Independence in Philadelphia..." It seems that through statistical analysis it is able to build a model that approximates or simulates real information. To me, surprisingly well.

*I don't know the intent of the designers, but I can't imagine this is an oversight. Maybe the intent was always to profit from 3rd parties using it as an interface for their data sources (some of which they are doing it appears)?

But whatever the real goals of the company, I think it is wrong and risky that it's been hyped (whether by the media or the company) to make people think that it is a general purpose AI with real knowledge. As a result, people have their guard down and are likely to mis/over-use it.

I wonder if the developers really believe it qualifies for the title "AI" or that complexity = intelligence?

BWV · Jul 6, 2023

Good article, perhaps worth mentioning that this is the same way that language translation engines like Google Translate work - obviously Google translate does not understand English or Mandarin, it just has sufficient training data to statistically match phrases. The immediate applications seem to be as a 'word calculator' to generate prose where accuracy is less important or can be easily checked - this is no different of where ML gets used today (and this is just another ML tool). Recommending items to Amazon.com customers or targeting adds on Facebook has a wide margin for error, unlike, say, driving a car.

russ_watters said:

If it's just doing word association and statistical analysis, I'm surprised that asking about Independence Day doesn't return "On July 4, 1776 Will Smith fought a group of alien invaders before signing the Declaration of Independence in Philadelphia..." It seems that through statistical analysis it is able to build a model that approximates or simulates real information. To me, surprisingly well.

Well if only IMDB was used for the training set ;) - My real guess is the volume of data in the training set matters - the right answer for July 4 just shows up with a higher frequency.

AndreasC · Jul 6, 2023

russ_watters said:

I think it's just a qualitative feeling, but I feel the same way. When first learning about it, it never occurred to me that it didn't access stored information (either its own or 3rd party) to form its replies*. Now that I know it doesn't, it surprises me that it gets so much right. If it's just doing word association and statistical analysis, I'm surprised that asking about Independence Day doesn't return "On July 4, 1776 Will Smith fought a group of alien invaders before signing the Declaration of Independence in Philadelphia..." It seems that through statistical analysis it is able to build a model that approximates or simulates real information. To me, surprisingly well.

*I don't know the intent of the designers, but I can't imagine this is an oversight. Maybe the intent was always to profit from 3rd parties using it as an interface for their data sources (some of which they are doing it appears)?

But whatever the real goals of the company, I think it is wrong and risky that it's been hyped (whether by the media or the company) to make people think that it is a general purpose AI with real knowledge. As a result, people have their guard down and are likely to mis/over-use it.

I wonder if the developers really believe it qualifies for the title "AI" or that complexity = intelligence?

This isn't even what surprises me that much. You could say that it has learned that the correct date follows these prompts. But the thing is, you can make up an alien planet, tell gpt about it and their customs, and it will answer understanding questions on your text, plus it may even manage to infer when their alien independence day is, given enough clues. It's really impressive.

kith · Jul 6, 2023

FWIW, here are the results of ChatGPT taking two university-level exams in labor economics and quantum computing (graded and commented on by the lecturers who created the exams):
https://betonit.substack.com/p/chatgpt-takes-my-midterm-and-gets (GPT-3, D)
https://betonit.substack.com/p/gpt-retakes-my-midterm-and-gets-an (GPT-4, A)
https://scottaaronson.blog/?p=7209 (GPT-4, B)

Filip Larsen · Jul 6, 2023

I haven't read all the posts in this thread so perhaps someone already mentioned it, but since I have started explaining LLM, like ChatGPT, as akin to a stochastic parrot to family and non-tech friends who cared to ask me I sense my points about the quality of its output gets across much easier. Probably because most already have an idea what (some) parrots are capable of language-wise so I only have to explain a little about statistics and randomness. Of course, the analogy does not work to explain anything about how LLM work.

pbuk · Jul 6, 2023

russ_watters said:

Maybe the intent was always to profit from 3rd parties using it as an interface

Ya think?

russ_watters said:

But whatever the real goals of the company, I think it is wrong and risky that it's been hyped (whether by the media or the company) to make people think that it is a general purpose AI with real knowledge.

Unfortunately "people" tend to believe what they want to believe, like @AndreasC here, despite evidence and information to the contrary.

russ_watters said:

I wonder if the developers really believe it qualifies for the title "AI"

Definitely not, but they believe they are headed in the right direction:

https://openai.com/research/overview said:

We believe our research will eventually lead to artificial general intelligence, a system that can solve human-level problems.

russ_watters · Jul 6, 2023

AndreasC said:

This isn't even what surprises me that much. You could say that it has learned that the correct date follows these prompts. But the thing is, you can make up an alien planet, tell gpt about it and their customs, and it will answer understanding questions on your text, plus it may even manage to infer when their alien independence day is, given enough clues. It's really impressive.

Impressive how? Doesn't this just tell you that it doesn't know the difference between fiction and reality, and more to the point, there's no way for you to know if it is providing you fictional or real answers*?

*Hint: always fictional.

PeterDonis · Jul 6, 2023

AndreasC said:

people have given it SAT tests

This just shows that SAT tests can be gamed. Which we already knew anyway.

PeterDonis · Jul 6, 2023

russ_watters said:

It seems that through statistical analysis it is able to build a model that approximates or simulates real information.

Yes, because while the information that is contained in the relative word frequencies in the training data is extremely sparse compared to the information that a human reader could extract from the same data, it is still not zero information. There is information contained in those word frequencies. For example, "Thomas Jefferson" is going to appear correlated with "july 4, 1776" in the training data to a much greater degree than "Will Smith" does.

russ_watters said:

I can't imagine this is an oversight

It's not; it was an intentional feature of the design that only the relative word frequencies in the training data would be used. The designers, from what I can tell, actually believe that piling up enough training data with such word frequencies can lead to actual "knowledge" of subject matter.

PeterDonis · Jul 6, 2023

AndreasC said:

you can make up an alien planet, tell gpt about it and their customs, and it will answer understanding questions on your text, plus it may even manage to infer when their alien independence day is, given enough clues.

Please give a reference: where has this been done?

Rive · Jul 7, 2023

AndreasC said:

But the thing is, you can make up an alien planet, tell gpt about it and their customs, and it will answer understanding questions on your text, plus it may even manage to infer when their alien independence day is, given enough clues. It's really impressive.

It's indeed impressive that a limited set of text (training data) can hold so much hidden information through the encoding of the language that even not thoroughly trash responses/reflections can be extracted for extremely weird questions.

But still, ChatGPT is fundamentally a static machine so it cannot have any 'understanding' about your input.Somewhere I wrote that I expect some accidents/cases to happen in the following decades which retrospectively might be characterized as preliminary conscience or something like that: and actually I think these language models might be some preliminary parts of those preliminary cases, but - still, just parts. Nothing more but pieces.

Ps.: the closest thing to 'understanding' in this case would be some apparently wired-in linguistics rules like composition of sentences and such. But that's also the static kind of 'understanding'.
I wonder whether it can be tweaked to make linguistics mistakes. How deep is that 'wiring' o0)

Demystifier · Jul 7, 2023

PeterDonis said:

Is it? How would one even determine that?

By asking experts to give their educated guess estimates, say two years ago.

Demystifier · Jul 7, 2023

PeterDonis said:

Please give a reference: where has this been done?

I've just tried it:

Me: What are the customs of the aliens at the planet Xuxor?

ChatGPT:
As an AI language model, I don't have real-time information on specific planets or extraterrestrial civilizations outside of what is available up until my knowledge cutoff in September 2021. Therefore, I'm not aware of any specific customs of aliens on a planet called Xuxor or any other extraterrestrial civilization.

It's important to note that the existence of aliens or their customs on specific planets is purely speculative at this point, as we have not made contact with extraterrestrial life. If and when we do encounter extraterrestrial civilizations, their customs and behaviors would likely be vastly different from our own, given the vastness of the universe and the potential for diverse forms of life.

Until further information becomes available, any details about the customs of aliens on specific planets would be purely fictional or based on imagination, as depicted in science fiction literature, movies, or other creative works.

AndreasC · Jul 7, 2023

PeterDonis said:

This just shows that SAT tests can be gamed. Which we already knew anyway.

"Gamed" how? I honestly struggle to see what kind of evidence you would have to see to be convinced. Even if it had near 100% accuracy on whatever unambiguous question you threw at it, you would say "well it doesn't REALLY know the answer so it's basically just confusing you into thinking it gives correct answers". Well apparently it confused Terence Tao it gives useful answers in mathematics, pardon me for also being "confused"!

It seems like your argument is completely independent of whether or not it gives correct answers. Because it does! Not all the time of course, not even frequently enough to be reliable at this point, but it is improving. And you are free to check for yourself that this is true. If you want to argue that regardless of delivering accurate answers it is still somehow "cheating" people, I don't know what you expect it to do beyond generating unambiguously correct answers to prompts. If you think it can not give unambiguously correct answers to unambiguous questions, and they only seem to be correct because of its confidence, then you're just wrong and I'm imploring you to try it yourself.

We can't be downplaying it like that because it's unfortunately going to become a significant part of the academic world, and people should recognize what is going on.

AndreasC · Jul 7, 2023

russ_watters said:

Impressive how? Doesn't this just tell you that it doesn't know the difference between fiction and reality, and more to the point, there's no way for you to know if it is providing you fictional or real answers*?

*Hint: always fictional.

It is impressive because it can (sometimes) generate logical answers from text that it has never encountered before. This goes beyond parroting.

pbuk · Jul 7, 2023

SAT and other tests are designed to test humans. One way to test a human's knowledge of a subject is to require them to recall information about that subject and write a summary under time pressure. Recalling information and producing output quickly is something that computers are really good at so it should be less surprising that GPT4 has done well on exams (note this is GPT4 which is not the engine behind ChatGPT which is less sophisticated).

If someone who knew nothing about law took a law exam supported by an army of librarians with instant access to petabytes of relevant data and passed would you say that they had any knowledge or understanding of the law?

AndreasC said:

It seems like your argument is completely independent of whether or not it gives correct answers.

Of course it is: noone is arguing that an LLM is not capable of frequently giving correct answers, or that a very well designed and trained LLM is not capable of giving correct answers within a large domain more frequently than many humans. The argument is that no amount of correct answers is equivalent to knowledge.

AndreasC said:

you're just wrong and I'm imploring you to try it yourself.

It is you that is wrong, and you are making claims for ChatGPT that its makers OpenAI don't make themselves.

AndreasC said:

We can't be downplaying it like that because it's unfortunately going to become a significant part of the academic world, and people should recognize what is going on.

Nobody is downplaying it, but in order to "recognize what is going on" it is necessary to understand what is actually going on. Noone can tell anyone else what to do but if I were you I would stop repeating my own opinions here and take some time to do that.

AndreasC · Jul 7, 2023

PeterDonis said:

Please give a reference: where has this been done?

Well, I just tried it with Chat GPT. Its output was, alright. Not great, not terrible. It would be interesting to see what GPT-4 could do with it.

ME: I will tell you a story about aliens in the planet Xuxor and I want you to answer some questions based on it.

CHAT: Sure, I'd be happy to hear your story about aliens on the planet Xuxor. Please go ahead and tell me your story, and feel free to ask any questions you have afterwards.

ME: The planet Xuxor revolves around the star Artaxerxis B. The Xuxians have the same 12 months we have, but their planet revolves around the star twice as fast. The Xuxians were enslaved to the Gazogians for many centuries, but they eventually revolted. They only celebrate two main events, the start of the Xuxor Revolution, and the Achievement of Independence. On national holidays they eat their traditional dish, battery fluid caks. The start of the Xuxor Revolution is so important to them, that January 1st of the year 0 of their calendar corresponds to that date. Knowing that Xuxians gather together on the 25th of Xuxor May to eat battery fluid cake, do you think that corresponds to some important Xuxor event?
CHAT: Based on the information provided, the 25th of Xuxor May holds significance for the Xuxians as they gather together to eat battery fluid cake. While it is not explicitly mentioned in the story, given that the Xuxians celebrate two main events, the start of the Xuxor Revolution and the Achievement of Independence, it is reasonable to assume that the 25th of Xuxor May corresponds to one of these important events.

Considering that the Xuxor Revolution is highly significant to them, it is possible that the 25th of Xuxor May marks the anniversary or commemoration of a key event during the revolution. Perhaps it symbolizes a crucial battle, the signing of a pivotal document, or the day when the Xuxians gained a significant advantage in their struggle for independence.

Without further details, it is difficult to ascertain the exact nature of the event associated with the 25th of Xuxor May. However, given the importance of the Xuxor Revolution to the Xuxians, it is reasonable to assume that this date corresponds to an important event related to their fight for freedom.
COMMENT BY ME: We see it did not get it right away, though it did correctly answer that it is probably one of the two main historical events. Let's give it some more help.
ME: If January 1st corresponds to the Start of the Revolution, and they only celebrate the two aforementioned historical events, what do you think 25th of May corresponds to?
CHAT: Apologies for the confusion in my previous response. If January 1st in the Xuxians' calendar corresponds to the Start of the Xuxor Revolution, and they only celebrate two main historical events, it is likely that the 25th of May corresponds to the Achievement of Independence.

The 25th of May could signify the day when the Xuxians successfully achieved their independence from the Gazogians, marking a significant milestone in their history. This date may commemorate a specific event such as the signing of a declaration of independence, the end of the final battle against the Gazogians, or the establishment of their own governing system.

Given that the Xuxians gather together on this day to eat battery fluid cake, it suggests that the 25th of May is a special occasion and holds great importance in their culture. It serves as a national holiday and a time for the Xuxians to reflect upon and celebrate their freedom and the achievements of their past.

Therefore, it is reasonable to conclude that the 25th of May in the Xuxian calendar corresponds to the Achievement of Independence, one of the two significant historical events celebrated by the Xuxians.
COMMENT BY ME: This time it got it right. Let's give it something more "calculational", dependent on prior context.
ME: If roughly 10 human years passed from the Start of the Revolution to the Achievement of Independence, how many Xuxor years passed?
CHAT: Given that the planet Xuxor revolves around its star twice as fast as Earth, one Xuxor year would be equivalent to half of an Earth year. Therefore, if roughly 10 human years passed from the Start of the Revolution to the Achievement of Independence, we need to calculate how many Xuxor years would have passed.

Since one Xuxor year is half an Earth year, we can divide 10 human years by 0.5 to find the equivalent Xuxor years:

10 human years / 0.5 Xuxor years per human year = 20 Xuxor years.

So, approximately 20 Xuxor years would have passed from the Start of the Revolution to the Achievement of Independence.

COMMENT BY ME: Very neat, unambiguously correct response. Let's now push it for something a bit more complex.
ME: Give me the date, month and year of the Achievement of Independence in the Xuxor calendar
CHAT:
To determine the date, month, and year of the Achievement of Independence in the Xuxor calendar, we need additional information regarding the specific year the Xuxor Revolution began and the duration between the Start of the Revolution and the Achievement of Independence. Without this information, it is not possible to provide an exact date.

However, based on the given information that roughly 10 human years passed between the Start of the Revolution and the Achievement of Independence, we can make an estimation. If the Xuxor calendar is synchronized with the Earth calendar, we can calculate the date of the Achievement of Independence.

Let's assume that the Xuxor Revolution began on January 1st, 0, as stated earlier. If we add 10 human years to that date, we can estimate the year of the Achievement of Independence to be:

0 (Start of the Revolution) + 10 years = 10.

Therefore, the estimated year of the Achievement of Independence in the Xuxor calendar would be 10. However, without more specific information, it is not possible to determine the exact date and month within the Xuxor calendar for this historical event.

COMMENT BY ME: Well now it is saying nonsense. So we see it is not reliable. However, it did manage to give some surprisingly accurate answers. Now, can it tell whether or not Xuxor is real? Of course it can't tell if it is real, because it is not a person, but can it at least generate a correct answer on whether or not it is real, which after all is all that someone expects from it?

ME: Is the planet Xuxor real?

CHAT: No, the planet Xuxor is not real. It was mentioned in the context of a fictional story you created. As of my knowledge cutoff in September 2021, there is no scientific evidence or information available about a planet named Xuxor or any extraterrestrial civilization residing there. The story you provided is a product of imagination and creativity.

AndreasC · Jul 7, 2023

pbuk said:

Of course it is: noone is arguing that an LLM is not capable of frequently giving correct answers, or that a very well designed and trained LLM is not capable of giving correct answers within a large domain more frequently than many humans.

Maybe you are not arguing that. But I don't think other people in this thread agree with you. Some people insist it is only "confusing" us into thinking the answers are correct. My argument is that the question of whether or not it "knows" is philosophical, and unrelated to practical considerations of whether or not it is reliable.

pbuk · Jul 7, 2023

AndreasC said:

Maybe you are not arguing that. But I don't think other people in this thread agree with you. Some people insist it is only "confusing" us into thinking the answers are correct.

I will let others speak for themselves but I believe the only person that has used the term "confusing" in this thread is you.

AndreasC said:

My argument is that the question of whether or not it "knows" is philosophical, and unrelated to practical considerations of whether or not it is reliable.

Everyone is agreed on that, as @PeterDonis confirmed way back in #4:

PeterDonis said:

The article is not about an abstract philosophical concept of "knowledge". It is about what ChatGPT is and is not actually doing when it emits text in response to a prompt.

I believe your misunderstanding is that because ChatGPT's answers are frequently correct that means that they are reliable.

Let's try an analogy: I frequently spell words correctly because I have pretty good recall and am a bit obsessive about spelling and grammar. Bob reliably spells words correctly because he looks up anything he is not sure about in a dictionary.

AndreasC · Jul 7, 2023

pbuk said:

I will let others speak for themselves but I believe the only person that has used the term "confusing" in this thread is you.

The term "confusing" was not specifically used, but after I said it can give accurate answers to many questions, @PeterDonis in post #10 very specifically said it can't do that. They proceeded to say that it only sometimes gets "lucky" (I wouldn't call it luck exactly, it does it again and again for some subjects, you have to get UNlucky to get a wrong answer, then again it messes up more frequently on some other subjects) and gives an "answer". I don't know why "answer" was put in scare quotes but I believe it's probably due to scepticism that it even is an answer, and that it's not just me being confused. In the same post he argued that the only reason it passed tests was because of the "laziness and ignorance of the testers", presumably not because the answers were accurate.

Then, in post #15 he again doubled down that the only reason it passed tests was because graders were "lazy". Furthermore, despite the fact that they say their argument has nothing to do with the philosophical concept of knowledge, they again essentially assert that the reason it is UNreliable is because it doesn't "know" or "understand". I believe the two are separate subjects.

In #27, they say that it only passed SAT tests because they can be "gamed". At several points, it is compared to a human con artist, and it is implied that the reason people think it gives accurate answers is confidence, when they are actually inaccurate. So you can see there are doubts that it can give accurate answers at all.

pbuk said:

I believe your misunderstanding is that because ChatGPT's answers are frequently correct that means that they are reliable.

I have very explicitly said I do NOT believe it is reliable multiple times. Specifically, in posts #3 (my very first on the thread), #5, and #13, plus in multiple other posts I have said again and again it often generates nonsense.

pbuk · Jul 7, 2023

AndreasC said:

I have very explicitly said I do NOT believe it is reliable multiple times.

Ah yes, I missed that. It seems we are in violent agreement.

AndreasC · Jul 7, 2023

pbuk said:

Ah yes, I missed that. It seems we are in violent agreement.

Hahaha that is a very useful term online!

russ_watters · Jul 7, 2023

AndreasC said:

People often post more when it gets something wrong. For instance, people have given it SAT tests:

https://study.com/test-prep/sat-exam/chatgpt-sat-score-promps-discussion-on-responsible-ai-use.html

Your take is weird to me, but it seems common, especially in the media. Consider this potential headline from 1979:

"New 'Spreadsheet' Program 'VisiCalc' Boasts 96% Accuracy - Might it be the New Killer App?"
[ChatGPT was 96th percentile on the SAT, not accuracy, but close enough.]

That's not impressive, it's a disaster. It's orders of magnitude worse than acceptable accuracy from a computer. It seems that because ChatGPT sounds confidently human people have lowered the bar from "computer" to "human" in judging its intelligence - and don't even realize they've done it. That's a dangerous mistake.

AndreasC · Jul 7, 2023

russ_watters said:

That's not impressive, it's a disaster. It's orders of magnitude worse than acceptable accuracy from a computer.

Sure, but the thing is, that it is able to do tasks that previous computer programs couldn't do. You couldn't copy and paste an SAT question into a program and get an answer before. It would require significant pre-processing, and in some cases you just wouldn't be able to get any help, because previous computer programs weren't good at, say, parsing natural language and taking into account context, subjective meaning etc. That is why it is impressive, because it accurately and quickly performs tasks that computers couldn't previously do, and were solely the domain of humans.

russ_watters · Jul 7, 2023

AndreasC said:

Sure, but the thing is, that it is able to do tasks that previous computer programs couldn't do.

You could write that on the box of any new piece of software. Otherwise there's no reason to use it. But you're seeing the point now:

AndreasC said:

...previous computer programs weren't good at, say, parsing natural language and taking into account context, subjective meaning etc. That is why it is impressive, because it accurately and quickly performs tasks that computers couldn't previously do, and were solely the domain of humans.

Right. What's impressive about it is that it can converse with a human and sound pretty human. But now please reread the title of the thread. "Sounds human" is a totally different accomplishment from "reliable".

PeterDonis · Jul 7, 2023

Demystifier said:

I've just tried it

What you show here is nothing like what AndreasC described.

Demystifier · Jul 7, 2023

PeterDonis said:

What you show here is nothing like what AndreasC described.

Exactly!

PeterDonis · Jul 7, 2023

AndreasC said:

I have very explicitly said I do NOT believe it is reliable multiple times.

But in post #13 you also said it can "repeatably" give accurate answers to questions. That seems to contradict "unreliable". I asked you about this apparent contradiction in post #15 and you haven't responded.

Vanadium 50 · Jul 7, 2023

russ_watters said:

"New 'Spreadsheet' Program 'VisiCalc' Boasts 96% Accuracy - Might it be the New Killer App?"

"ChatGPT Airlines - now 96% of our takeoffs have landings at airports!"

Let's go back to "knowledge". Yes, it's philosophical, but some of the elements can be addressed scientifically. An old-fashioned definition of knowledge was "justified true belief". Let's dispense with "belief" as too fuzzy, Is what ChatGPT says true? Sometimes. As stated, 96% of the time is not very impressive. Is it justified? Absolutely not - it "knows" onlt what words others used, and in what order. That's it.

In no sense is there "knowledge" there.

It's not just unreliable - we have no reason to believe it should be reliable, or that this approach will ever be reliable.

PeterDonis · Jul 7, 2023

AndreasC said:

previous computer programs weren't good at, say, parsing natural language and taking into account context, subjective meaning etc. That is why it is impressive

ChatGPT is not parsing natural language. It might well give the appearance of doing so, but that's only an appearance. The text it outputs is just a continuation of the text you input, based on relative word frequencies in its training data. It does not break up the input into sentence structures or anything like that, which is what "parsing natural language" would mean. All it does is output continuations of text based on word frequencies.

PeterDonis · Jul 7, 2023

AndreasC said:

the only reason it passed tests was because of the "laziness and ignorance of the testers", presumably not because the answers were accurate

Or because the testers didn't bother writing a good test, that actually can distinguish between ChatGPT, an algorithm that generates text based on nothing but relative word frequencies in its training data, and an actual human with actual human understanding of the subject matter. The test is supposed to be testing for the latter, so if the former can pass the test, the test is no good.

AndreasC said:

the only reason it passed tests was because graders were "lazy"

See above.

AndreasC said:

it only passed SAT tests because they can be "gamed"

Which, as I said, is already well known: that humans can pass SAT tests without having any actual knowledge of the topic areas. For example, they can pass the SAT math test without being able to actually use math to solve real world problems--meaning, by gathering information about the problem, using that information to set up relevant mathematical equations, then solving them. So in this case, ChatGPT is not going beyond human performance in any respect.

Insights Why ChatGPT AI Is Not Reliable

Similar threads

Hot Threads

Is AI hype?

How to disable AI responses in Google Searches?

More on Distributing High Quality Audio

Looking For Ideas for a Hackathon: 'AI-Driven Diagnostic Efficiency & Solution'

On Progress Toward AGI

Recent Insights

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Insights What Exactly is Dirac’s Delta Function? - Insight

Insights Relativator (Circular Slide-Rule): Simulated with Desmos - Insight

Insights Fixing Things Which Can Go Wrong With Complex Numbers