Why ChatGPT AI Is Not Reliable
I’ll start with the simple fact: ChatGPT is not a reliable answerer to questions.
To try to explain why from scratch would be a heavy lift, but fortunately, Stephen Wolfram has already done the heavy lifting for us in his article, “What is ChatGPT Doing… and Why Does It Work?” [1] In a PF thread discussing this article, I tried to summarize the key message of Wolfram’s article as briefly as I could. Here is what I said in my post there [2]:
ChatGPT does not make use of the meanings of words at all. All it is doing is generating text word by word based on relative word frequencies in its training data. It is using correlations between words, but that is not the same as correlations in the underlying information that the words represent (much less causation). ChatGPT literally has no idea that the words it strings together represent anything.
In other words, ChatGPT is not designed to answer questions or provide information. It is explicitly designed not to do those things, because, as I said in the quote above, it only works with words in themselves; it does not work with, and does not even have any concept of, the information that the words represent. And that makes it unreliable, by design.
So, to give some examples of misconceptions that I have encountered: when you ask ChatGPT a question that you might think would be answerable by a Google Search, ChatGPT is not doing that. When you ask ChatGPT a question that you might think would be answerable by looking in a database (as Wolfram Alpha, for example, does when you ask it something like “What is the distance from New York to Los Angeles?”), ChatGPT is not doing that. And so on, for any value of “which you might think would be answerable by…”. The same is true if you substitute “looking for information in its training data” for any of the above: the fact that, for example, there is a huge body of posts on Instagram in ChatGPT’s training data does not mean that if you ask it a question about Instagram posts, it will look at those posts in its training data and analyze them to answer the question. It won’t. While there is, of course, voluminous information in ChatGPT’s training data for a human reader, ChatGPT does not use, or even comprehend, any of that information. All it gets from its training data is relative word frequencies.
So why do ChatGPT responses seem reliable? Why do they seem like they must be coming from a process that “knows” the information involved? Because our cognitive systems are designed to interpret things that way. When we see text that looks syntactically, and grammatically correct and seems like it is confidently asserting something, we assume that it must have been produced, if not by an actual human, at least by an “AI” that is generating the text based on some kind of actual knowledge. In other words, ChatGPT fools our cognitive systems into attributing qualities to it that it does not have.
This security hole, if you will, in our cognitive systems is not a recent discovery. Human con artists have used many of the same tricks throughout human history. The only difference with human con artists is that they were doing it intentionally, whereas ChatGPT has no intentions at all and is doing it as a side effect of its design. But the result is much the same: let the reader beware.
[1] https://writings.stephenwolfram.com/2023/02/what-is-chatgpt-doing-and-why-does-it-work/
- Completed Educational Background: MIT Master’s
- Favorite Area of Science: Relativity
just no way of getting away from speaking in terms of intention
"
Probably. Maybe it's better to talk about "design". ChatGPT is not designed to be independent of the echo chamber. It is designed to produce echos indistinguishable from the rest of the echo chamber.
Given the way ChatGPT works, it doesn't matter.
"
I was probably unclear. The training data very much matters. And who decides what training data to use or not to use?
The only way it is "unbiased" is that it is difficult to generate output that is differently biased than the training dataset.
ChatGPT attempts to create an "echo" indistinguishable from its "training echos".
"
There is just no way of getting away from speaking in terms of intention and volition, the English language won't let us. We cannot resist the temptation to say that it is "attempting" or "trying" to do something, but it is no more attempting to create an echo or anything else than my dishwasher is motivated to clean the dinner dishes.
The entire ChatGPT phenomenon makes me think of Searle's Chinese Room though experiment: https://en.wikipedia.org/wiki/Chinese_room
ChatGPT is at least honest.
"
Is it being honest when it commits academic fraud by fabricating sources?
[Spoiler: it's being nothing.]
Who decided what sources are reliable for training and what ones are not?
"Given the way ChatGPT works, it doesn't matter.
Text generated to look like other text on the internet is going to unreliable whether it is patterned on reliable text or not. For example, essentially everything you'll find about duplicate bridge on the internet will be reliable. Some stuff will be written for beginning players, some will be written by and for the relatively small community of world-class players, most will fall somewhere in between, but it's all reasonable advice. But we still get
http://bridgewinners.com/article/view/sorry-i-know-it-is-stupid-to-post-conversations-with-chatgpt/
http://bridgewinners.com/article/view/testing-chatgpt-the-media-hyped-ai-robot/
http://bridgewinners.com/article/view/using-chatgpt/
Are you speaking as a participant, the author of the Insight, or as a moderator?
"
The last.
But this misses a larger fallacy. The question of whether ChatGPT is reliable or not does not depend on whether people are reliable or not, nor on which people are more reliable than others.
please note my advice in post #105.
"
Are you speaking as a participant, the author of the Insight, or as a moderator?
Following is free advice, my feelings won't be hurt if you choose to ignore it… Form your opinions of the Supreme Court by reading the actual opinions (posted at https://www.supremecourt.gov/) and by joining the live blog on opinion days at scotusblog.com. We spend a lot of time complaining about pop-sci videos…. Popular press coverage of major court cases is far worse.
"
I hate to have to repeat myself, but all thread participants, please note my advice in post #105.
ChatGPT is at least honest.[/quote]This is a category error, but one that is almost impossible to avoid. The English language has no natural way of talking about chatbot output so we inevitably find ourselves saying things like "it thinks" or "it knows" when of course it does no such thing – it's just arranging words to form sequences that resemble patterns already out there on the internet (and as evidence of just how hard it is to avoid this category error I just finished backspacing away the words "it has seen on the internet"). Saying that ChatGPT is honest makes no more sense than saying that the motor hauling my truck up a steep hill is "gutsy and determined to succeed" – the difference is that we know how to talk about mechanical devices in terms of their performance characteristics without attributing sentience to them.
you are not in an echo chamber
"
Isn't it just the opposite? ChatGPT attempts to create an "echo" indistinguishable from its "training echos".
With ChatGPT you are not in an echo chamber being fed a steady diet of misinformation.
"
True, because "misinformation" requires intent just as much as "honesty" does, and ChatGPT has no intent. Or, to put it another way, ChatGPT is not reliably unreliable any more than it is reliably reliable.
"
I stumbled on a twitter feed about the COVID vaccine. Everyone on the thread believed that it was harmful. One woman was puzzled by those who willingly took the vaccine and they all agreed it must be down to "low intelligence".
That is "gross"
"
While I understand the point being made in the context of this thread, let's please not take it any further, in order to ensure that we don't derail the thread into a political discussion.
This isn't exactly true (though depends on what you mean by "gross").
"
In contrast with social media software, for example, whose model is to focus information based on your perceived prejudices.
With ChatGPT you are not in an echo chamber being fed a steady diet of misinformation.
For example, I stumbled on a twitter feed about the COVID vaccine. Everyone on the thread believed that it was harmful. One woman was puzzled by those who willingly took the vaccine and they all agreed it must be down to "low intelligence".
That is "gross"; and your examples of ChatGPT bias pale by comparison.
Chat GPT doesn't have gross political, religious or philosophical prejudices.
"
This isn't exactly true (though depends on what you mean by "gross"). It has guardrails designed to constrain content, which reflect the biases of the programmers. For example, a few months ago someone asked it for religious jokes and while it was OK with Christian jokes it declined to provide Islamic jokes. I think this bias has since been corrected.
It is also biased by its programmers' choice of source information. For example, the user base of Reddit has a lot more say in the generated output than the membership of AARP.
ferry ride from Istanbul to Trieste take
"
Perhaps you should have asked about the ferry from Constantinople.
Not even that, it just predicts words. It doesn't care if the sentence it makes actually describes anything real. It cannot.
An example:
Q: How long does a ferry ride from Istanbul to Trieste take?
ChatGPT:
A direct ferry ride from Istanbul to Trieste is not available, as these two cities are located in different countries and are quite far apart. Istanbul is in Turkey, while Trieste is in northeastern Italy.
To travel between Istanbul and Trieste, you would need to consider alternative transportation options such as flights, trains, or buses…
Of course, there is a route from Istanbul to Trieste (at least that's what google tells me).
Sure more data, more parameters will make it better, but it's still not reliable.
"
You may be right and it'll die a death. I'm not so sure. The reasons for the adoption of technology are often social and cultural, rather than technical.
In fact, there is evidence it's already taken off.
There's no comparison.
"
Didn't I just make one?
"
Chat GPT, however imperfectly, is working on a global pool of human knowledge.
"
Actually, it is working on pool of human writing,
The idea is that writing is a good enough proxy for knowledge and that word frequency distributions* are a good enough proxy for understanding. The thread as well as some past ones highlight many cases where this does not work.
FWIW, I think ChatGPT could write horoscopes as well as the "professionals". But probably not write prescriptions.
* But not letter frequency distributions, which we had 40 years ago doing much the same thing. That would just be crazy talk.
ChatGPT is at least honest.
"
No, it's not. "Honest" requires intent. ChatGPT has no intent.
"
I would trust ChatGPT more than I would the US Supreme Court, for example.
"
I don't see how you would even compare the two. The US Supreme Court issues rulings that say what the law is. You don't "trust" or "not trust" the US Supreme Court. You either abide by its rulings or you get thrown in jail.
So what? That doesn't make ChatGPT good enough to rely on.
"
People already rely on a steady diet of lies and misinformation from human sources. ChatGPT is at least honest. I would trust ChatGPT more than I would the US Supreme Court, for example.
Should we climb the Moine Ridge on Thursday this week?
To make an informed decision about climbing the Moine Ridge on Thursday, I recommend checking weather forecasts, consulting with experienced climbers or local mountaineering authorities, and assessing your own skills and experience. Additionally, consider factors such as trail conditions, safety equipment, and the overall fitness and preparedness of your climbing team.
Mountain environments can be unpredictable and potentially dangerous, so it's essential to prioritize safety and make well-informed decisions.
People can do what they want.
"
Sure, but now you're backing away from your previous claim. People are free to choose to do stupid things, of course; but previously you were saying that relying on ChatGPT for practical information was not stupid. Now you're back-pedaling and saying, well, yes, it is stupid, just like relying on astrology, but there will always be people who choose to do stupid things.
Chat GPT, however imperfectly, is working on a global pool of human knowledge.
"
No, it's working on a global pool of text. That's not the same as "knowledge". ChatGPT has no information about the connection of any of the text in its training data with the actual world. It doesn't even make use of the text in itself; it only makes use of the relative word frequencies in the text.
"
ten years from now, who knows how many people will be using it or its competitors as their mentor?
"
Not in its current form. The next obvious step in the evolution of such models–connecting them to actual real world data–is already being taken, at the very least with the paid version of ChatGPT (mentioned in earlier posts), which includes actual lookups in various data sources (web search, for one, and for another, ironically, Wolfram Alpha) for generating responses. In other words, to do the key things that the current free version, which is what this Insights article discussed, does not. Ten years from now, I expect that further steps along those lines will have been taken and will have made these tools reliable in a way that the current ChatGPT is not.
Doesn't that contradict your previous claim here?
If you're not willing to do this yourself, on what basis do you justify saying that someone else could do it?
"
People can do what they want. It's an option, for sure. In fact, we've seen some evidence on here that significant numbers of people are using it to learn about physics.
If some people choose to live by astrological charts, them others can choose to live by ChatGPT. I choose to do neither. For the time being.
Could you make tyhe same argument for astrology? Yesterday it told me to talk to a loved one and it worked!
"
There's no comparison. Chat GPT, however imperfectly, is working on a global pool of human knowledge. There's a rationale that it's trying to produce an unprejudiced, balanced answer.
Perhaps it will fail to develop. But, ten years from now, who knows how many people will be using it or its competitors as their mentor?
I don't personally intend to, no.
"
Doesn't that contradict your previous claim here?
"
In a practical sense, you could live according to what answers ChatGPT gives you.
"
If you're not willing to do this yourself, on what basis do you justify saying that someone else could do it?
there are worse ways to get answers.
"
So what? That doesn't make ChatGPT good enough to rely on.
For your sake I sincerely hope you don't try this. Unless, of course, you only ask it questions whose answers you don't really care about anyway and aren't going to use to determine any actions.
"
I don't personally intend to, no. But, there are worse ways to get answers.
You are too focused, IMO, on how it does things and not what it does.
"
Could you make the same argument for astrology? Yesterday it told me to talk to a loved one and it worked!
In a practical sense, you could live according to what answers ChatGPT gives you.
"
For your sake I sincerely hope you don't try this. Unless, of course, you only ask it questions whose answers you don't really care about anyway and aren't going to use to determine any actions. Particularly any actions that involve risk of harm to you or others.
"
Wolfram Alpha is a mathematical engine. It's not able to communicate on practical everyday matters.
"
Sure it is. You can ask it questions in natural language about everyday matters and it gives you answers, if the answers are in its databases. Unlike ChatGPT, it "knows" when it doesn't know an answer and tells you so. ChatGPT doesn't even have the concept of "doesn't know", because it doesn't even have the concept of "know". All it has is the relative word frequencies in its training data, and all it does is produce a "continuation" of the text you give it as input, according to those relative word frequencies.
Granted, Wolfram Alpha doesn't communicate its answers in natural language, but the answers are still understandable. Plus, it also includes in its answers the assumptions it made while parsing your natural language input (which ChatGPT doesn't even do at all–not just that it doesn't include any assumptions in its output, but it doesn't even parse its input). For example, if you ask Wolfram Alpha "what is the distance from New York to Los Angeles", it includes in its answer that it assumed that by "New York" you meant the city, not the state.
"
You are too focused, IMO, on how it does things and not what it does.
"
Huh? The Insights article under discussion, and the Wolfram article it references, are entirely about what ChatGPT does, and what it doesn't do. Wolfram also goes into some detail about the "how", but the "what" is the key part I focused on.
ChatGPT does not have any "objective assessment of the world". All it has is the relative word frequencies in its training data.
Wolfram Alpha, ironically, would be a much better thing to describe with the phrase you use here. It actually does contain a database (more precisely multiple databases with different entry and lookup criteria) with validated information about the world, which it uses to answer questions.
Only for the same reason a rock doesn't.
"
In a practical sense, you could live according to what answers ChatGPT gives you. Wolfram Alpha is a mathematical engine. It's not able to communicate on practical everyday matters. Nor can a rock.
How any software works is not really the issue if you are an end user. The important thing is what it outputs.
You are too focused, IMO, on how it does things and not what it does.
Chat GPT, if anything, is more reliable in terms of its objective assessment of the world
"
ChatGPT does not have any "objective assessment of the world". All it has is the relative word frequencies in its training data.
Wolfram Alpha, ironically, would be a much better thing to describe with the phrase you use here. It actually does contain a database (more precisely multiple databases with different entry and lookup criteria) with validated information about the world, which it uses to answer questions.
"
Chat GPT doesn't have gross political, religious or philosophical prejudices.
"
Only for the same reason a rock doesn't.
Chat GPT, if anything, is more reliable in terms of its objective assessment of the world than the vast majority of human beings.
Chat GPT doesn't have gross political, religious or philosophical prejudices.
If you talked to an Oil Company Executive, then there was no climate change and the biggest threat to humanity was the environmental movement.
Most humans beings deliberately lie if it is in their interests. With Chat GPT at least you know it isn't deliberately lying to you.
I don't know where AI is going, or where we are heading, but I could make a case that Chat GPT is more rational, intelligent and truthful than 99% of the people on this planet.
However, an epistomologist would say that an envelope that contaiend the phrase "It is after 2:30 and before 2:00" does not posess knowledgem eve though it is correct about as often as ChatGPT.
Here are a few.
https://stealthoptional.com/tech/chatgpt-paid-vs-free/
https://www.wepc.com/tips/is-chat-gpt-plus-worth-it/
https://www.wired.com/story/chatgpt-plus-web-browsing-openai/
https://www.businessinsider.com/chatgpt-plus-free-openai-paid-version-chatbot-2023-2
"
Thanks! It looks like, at the very least, the paid version includes searching the Internet for actual answers to prompts, so it is not the same thing as the free version that my Insights article (and the Wolfram article it references) discuss.
I know someone who has the paid version and says it's a lot more reliable.
"
Is there any reference online about this paid version and how it differs from the free version?
It also tries to err on the side of providing an answer
"
It doesn't even "try"–it will always output text in response to a prompt.
"
even when it has no idea what the right answer is
"
It never does, since it has no "idea" of any content at all. All it has any "idea" of is relative word frequencies.
As discussed, ChatGPT doesn't even try to output what is correct. It tries to output what is written often. There is some home that there is a correlation between that and correctness, but that's not always true and it was not hard to come up with examples.
ChatGPT is the love child of Clever Hans and the Mechanical Turk.
Second. the problem with discussing "AI", much less its purpose, is that it is such a huge area, lumping it all together is seldom helpful. Personally I feel that the most interesting work has been done in motion, balance and sensors.
Third, we had this technology almost 40 years ago. That was based on letters, not words, and it was much slower than real-time. And nobody got excited.
….and this Insight addresses an important but not well discussed problem that more to the point is why we frown upon chat-bot questions and answers on PF.
*Edit: Also, this isn't what AI is "for". AI's promise is in being able to solve problems that are currently out of reach of computers but don't even require conscious thought by people. These problems – such as self-driving cars – are often ones where reliability is important.
edit2: Ok, I say that, but I can't be so sure it's true, particularly because of wildcards like Elon Musk who are
eagerwilling to put the public at risk to test experimental software.Most people care about the result. Of course it has some limitations that are fundamental, and they don't necessarily want people knowing that.
"
You're contradicting yourself. The "limitations that are fundamental" are crucial effects on the result. They're not just irrelevant side issues.
what happens when some business or gover does the math and figures it would rather risk being wrong than pay experts?
"
If I know that's what your business is doing, you won't get my business.
I suspect that a lot of people feel this way; they just don't know that that's what the business is doing. Certainly OpenAI has not done anything to inform the public of what ChatGPT is actually doing, and not doing. I suspect that is because if they did do so, interest in what OpenAI is doing would evaporate.
But what if you want the answer as if given by Homer Simpson, or a Shakespearian Sonnet? Alpha cant do that ;)
"
I think they already do it the Max Power way:
If what you want is the answer to an exercise, and it can give you the correct answer, say, 99% of the time, then that's good enough for many people and in many contexts
"
Is it?
Perhaps if my only purpose is to get a passing grade on the exercise, by hook or by crook, this would be good enough.
But for lots of other purposes, it seems wrong. It's not even a matter of percentage accuracy; it's a matter of what the thing is doing and not doing, as compared with what my purpose is. If my purpose is to actually understand the subject matter, I need to learn from a source that actually understands the subject matter. If my purpose is to learn a particular fact, I need to learn from a source that will respond based on that particular fact. For example, if I ask for the distance from New York to Chicago, I don't want an answer from a source that will generate text based on word frequencies in its input data; I want an answer from a source that will look up that distance in a database of verified distances and output what it finds. (Wolfram Alpha, for example, does this in response to queries of that sort.)
go back again to wondering what the creators are thinking about this…
"
I think they are planning to monetize this by first making a name for themselves and then selling a product where "close enough is good enough". For example, customer service chatbots.
"ChatGPT Airlines – now 96% of our takeoffs have landings at airports!"
"
"New from OceanGate: now 99% Reliable – Twice as Reliable as our Previous Subs!"
(too soon?)
"
It's not just unreliable – we have no reason to believe it should be reliable, or that this approach will ever be reliable.
"
I go back again to wondering what the creators are thinking about this…
"
Definitely not [AI], but they believe they are headed in the right direction:
"
OpenAI's website is really weird. It is exceptionally thin on content and heavy on flash, with most of the front page just being pointless slogans and photos of people doing office things (was it created by ChatGPT?). It even features a video on top that apparently has no sound? All this to sell a predominantly text-based application (ironic)? The first section of the front page, though, contains one actual piece of information, in slogan form:
"Creating safe AGI that benefits all of humanity"
That's quite an ambitious goal/claim. It's not surprising that everyday people believe it's more than it really is, when that's what the company is saying.
The trajectory of the app and the way they've talked about the flaws such as hallucinations does imply they think their approach is viable and that refinements that improve its reliability should result in it becoming "reliable enough". Ironically this may increase the risk/danger of misuse, as people apply it to more and more situations where reliability should matter. I can't see how this approach would ever be acceptable for industrial automation. Maybe for a toy drone it won't matter if it unexpectedly/unpredictably crashes for no apparent reason "only" 0.1% of the time, but that won't ever be acceptable for a self driving car or airplane.
It can't do this. There is no there there.
the only reason it passed tests was because of the "laziness and ignorance of the testers", presumably not because the answers were accurate
"
Or because the testers didn't bother writing a good test, that actually can distinguish between ChatGPT, an algorithm that generates text based on nothing but relative word frequencies in its training data, and an actual human with actual human understanding of the subject matter. The test is supposed to be testing for the latter, so if the former can pass the test, the test is no good.
"
the only reason it passed tests was because graders were "lazy"
"
See above.
"
it only passed SAT tests because they can be "gamed"
"
Which, as I said, is already well known: that humans can pass SAT tests without having any actual knowledge of the topic areas. For example, they can pass the SAT math test without being able to actually use math to solve real world problems–meaning, by gathering information about the problem, using that information to set up relevant mathematical equations, then solving them. So in this case, ChatGPT is not going beyond human performance in any respect.
previous computer programs weren't good at, say, parsing natural language and taking into account context, subjective meaning etc. That is why it is impressive
"
ChatGPT is not parsing natural language. It might well give the appearance of doing so, but that's only an appearance. The text it outputs is just a continuation of the text you input, based on relative word frequencies in its training data. It does not break up the input into sentence structures or anything like that, which is what "parsing natural language" would mean. All it does is output continuations of text based on word frequencies.
"New 'Spreadsheet' Program 'VisiCalc' Boasts 96% Accuracy – Might it be the New Killer App?"
"
"ChatGPT Airlines – now 96% of our takeoffs have landings at airports!"
Let's go back to "knowledge". Yes, it's philosophical, but some of the elements can be addressed scientifically. An old-fashioned definition of knowledge was "justified true belief". Let's dispense with "belief" as too fuzzy, Is what ChatGPT says true? Sometimes. As stated, 96% of the time is not very impressive. Is it justified? Absolutely not – it "knows" onlt what words others used, and in what order. That's it.
In no sense is there "knowledge" there.
It's not just unreliable – we have no reason to believe it should be reliable, or that this approach will ever be reliable.
I have very explicitly said I do NOT believe it is reliable multiple times.
"
But in post #13 you also said it can "repeatably" give accurate answers to questions. That seems to contradict "unreliable". I asked you about this apparent contradiction in post #15 and you haven't responded.
What you show here is nothing like what AndreasC described.
"
Exactly!
I've just tried it
"
What you show here is nothing like what AndreasC described.
Sure, but the thing is, that it is able to do tasks that previous computer programs couldn't do.
"
You could write that on the box of any new piece of software. Otherwise there's no reason to use it. But you're seeing the point now:
"
…previous computer programs weren't good at, say, parsing natural language and taking into account context, subjective meaning etc. That is why it is impressive, because it accurately and quickly performs tasks that computers couldn't previously do, and were solely the domain of humans.
"
Right. What's impressive about it is that it can converse with a human and sound pretty human. But now please reread the title of the thread. "Sounds human" is a totally different accomplishment from "reliable".
People often post more when it gets something wrong. For instance, people have given it SAT tests:
https://study.com/test-prep/sat-exam/chatgpt-sat-score-promps-discussion-on-responsible-ai-use.html
"
Your take is weird to me, but it seems common, especially in the media. Consider this potential headline from 1979:
"New 'Spreadsheet' Program 'VisiCalc' Boasts 96% Accuracy – Might it be the New Killer App?"
[ChatGPT was 96th percentile on the SAT, not accuracy, but close enough.]
That's not impressive, it's a disaster. It's orders of magnitude worse than acceptable accuracy from a computer. It seems that because ChatGPT sounds confidently human people have lowered the bar from "computer" to "human" in judging its intelligence – and don't even realize they've done it. That's a dangerous mistake.
Please give a reference: where has this been done?
"
I've just tried it:
Me: What are the customs of the aliens at the planet Xuxor?
ChatGPT:
As an AI language model, I don't have real-time information on specific planets or extraterrestrial civilizations outside of what is available up until my knowledge cutoff in September 2021. Therefore, I'm not aware of any specific customs of aliens on a planet called Xuxor or any other extraterrestrial civilization.
It's important to note that the existence of aliens or their customs on specific planets is purely speculative at this point, as we have not made contact with extraterrestrial life. If and when we do encounter extraterrestrial civilizations, their customs and behaviors would likely be vastly different from our own, given the vastness of the universe and the potential for diverse forms of life.
Until further information becomes available, any details about the customs of aliens on specific planets would be purely fictional or based on imagination, as depicted in science fiction literature, movies, or other creative works.
Is it? How would one even determine that?
"
By asking experts to give their educated guess estimates, say two years ago.
you can make up an alien planet, tell gpt about it and their customs, and it will answer understanding questions on your text, plus it may even manage to infer when their alien independence day is, given enough clues.
"
Please give a reference: where has this been done?
It seems that through statistical analysis it is able to build a model that approximates or simulates real information.
"
Yes, because while the information that is contained in the relative word frequencies in the training data is extremely sparse compared to the information that a human reader could extract from the same data, it is still not zero information. There is information contained in those word frequencies. For example, "Thomas Jefferson" is going to appear correlated with "july 4, 1776" in the training data to a much greater degree than "Will Smith" does.
"
I can't imagine this is an oversight
"
It's not; it was an intentional feature of the design that only the relative word frequencies in the training data would be used. The designers, from what I can tell, actually believe that piling up enough training data with such word frequencies can lead to actual "knowledge" of subject matter.
people have given it SAT tests
"
This just shows that SAT tests can be gamed. Which we already knew anyway.
This isn't even what surprises me that much. You could say that it has learned that the correct date follows these prompts. But the thing is, you can make up an alien planet, tell gpt about it and their customs, and it will answer understanding questions on your text, plus it may even manage to infer when their alien independence day is, given enough clues. It's really impressive.
"
Impressive how? Doesn't this just tell you that it doesn't know the difference between fiction and reality, and more to the point, there's no way for you to know if it is providing you fictional or real answers*?
*Hint: always fictional.
https://betonit.substack.com/p/chatgpt-takes-my-midterm-and-gets (GPT-3, D)
https://betonit.substack.com/p/gpt-retakes-my-midterm-and-gets-an (GPT-4, A)
https://scottaaronson.blog/?p=7209 (GPT-4, B)
In my view, the right question is not why ChatGPT is not reliable. Given the general principles how it works, the right question is: Why is it more reliable than one would expect?
"
"
Is it? How would one even determine that?
"
I think it's just a qualitative feeling, but I feel the same way. When first learning about it, it never occurred to me that it didn't access stored information (either its own or 3rd party) to form its replies*. Now that I know it doesn't, it surprises me that it gets so much right. If it's just doing word association and statistical analysis, I'm surprised that asking about Independence Day doesn't return "On July 4, 1776 Will Smith fought a group of alien invaders before signing the Declaration of Independence in Philadelphia…" It seems that through statistical analysis it is able to build a model that approximates or simulates real information. To me, surprisingly well.
*I don't know the intent of the designers, but I can't imagine this is an oversight. Maybe the intent was always to profit from 3rd parties using it as an interface for their data sources (some of which they are doing it appears)?
But whatever the real goals of the company, I think it is wrong and risky that it's been hyped (whether by the media or the company) to make people think that it is a general purpose AI with real knowledge. As a result, people have their guard down and are likely to mis/over-use it.
I wonder if the developers really believe it qualifies for the title "AI" or that complexity = intelligence?
Edit: Btw, While I'm not certain of this, here's how I can often tell: it's the lack of focus in the responses. When the content is dumped into the middle of an ongoing conversation, it doesn't acknowledge or respond to the ongoing conversation, it just provides generic information that is often not useful for/connected to the discussion.
Limited context understanding
"
That is incorrect. "Limited understanding" implies that there is at least SOME understanding but chatGPT has zero understanding of anything.
Try it. Feed it questions which have unambiguous answers. You'll see that even though sometimes it generates nonsense, very, VERY frequently it gives right answers.
"
This does not seem consistent with many posts here at PF by people who have tried ChatGPT and posted the output. The general sense I get from those posts is that ChatGPT was less reliable than they expected–because they did not realize what it is actually doing and not doing. For example, apparently many people expected that when you asked it a factual question about something in its training data, it would go look in its training data to find the answer. But it doesn't, even if the right answer is in its training data. Wolfram's article, referenced in my Insights article, makes all this clear.
The semantic connections you are talking about are connections between sensory inputs and pre-existing structure inside our brains.
"
Not necessarily pre-existing. We build structures in our brains to represent things in the world as a result of our interactions with them. ChatGPT does not. (Nor does ChatGPT have any "pre-existing" structures that are relevant for this.)
"
Imagine a student passing one test after another, would you take someone telling you they only "got lucky" seriously
"
If the reason they passed was that their graders were lazy and didn't actually check the accuracy of the answers, yes. And that is exactly what has happened in cases where ChatGPT supposedly "passed" tests. If you think graders would never be so lazy, you have led a very sheltered life. It's just a more extreme version of students getting a passing grade on a book report without ever having read the book, and I can vouch for that happening from my own personal experience.
"
It can give you correct answers to unambiguous questions that are non-trivial and that it has not specifically encountered before in its training. And it can do that a lot, repeatably.
"
Please produce your evidence for this claim. It is contrary to both the analysis of how ChatGPT actually works, which I discuss in the Insights article, and the statements of many, many people who have used it. Including many posts here at PF where people have given ChatGPT output that is confident-sounding but wrong.
"
ChatGPT is unreliable because we observe it to be unreliable.
"
Doesn't this contradict your claim quoted above?
"
That requires no explanation.
"
The fact that it is observed to be unreliable is just a fact, yes. But in previous discussions of ChatGPT here at PF, it became clear to me that many people do not understand how ChatGPT works and so do not understand both that it is unreliable and why it is unreliable. That is why I wrote this article.
"
What does require explanation is why, as @Demystifier said, it is so much more reliable (especially at non trivial, "reasoning" type problems) than you would naively expect.
"
And I have already responded to @Demystifier that such a claim is meaningless unless you can actually quantify what "you would naively expect" and then compare ChatGPT's actual accuracy to that. Just saying that subjectively it seems more accurate than you would expect is meaningless.
how many people might live off on the same principles? Confidence tricking through most communication?
"
Yes, I think one way of describing ChatGPT is that it is crudely simulating a human con artist: it produces statements that seem to come from an entity that is knowledgeable, but actually don't.
Why is it more reliable than one would expect?
"
Is it? How would one even determine that?
We definitely learn about lots of things by just reading about them…
"
That's because our minds have semantic connections between words and things in the world. When we read words, we make use of those connections–in other words, we know that the words have meanings, and what those meanings are. If we get the meanings of words wrong, we "learn" things that are wrong.
ChatGPT has none of this. It has no connections between words and anything else. It doesn't even have the concept of there being connections between words and anything else. The only information it uses is relative word frequencies in its training data.
"
It can already give accurate answers about a wide range of questions
"
No, it can't. It can get lucky sometimes and happen to give an "answer" that happens to be accurate, but, as you will quickly find out if you start looking, it also happily gives inaccurate answers with the same level of confidence. That's because it's not designed to give accurate answers to questions; that's not what it's for.
"
pass tests
"
Only because the "tests" are graded so poorly that even the inaccurate but confident-sounding responses that ChatGPT gives "pass" the tests. That is a reflection of the laziness and ignorance of the test graders, not of the knowledge of ChatGPT.
"
answer new problems it has not been specifically trained on
"
Sure, because it can generate text in response to any prompt whatever. But the responses it gives will have no reliable relationship to reality. Sometimes they might happen to be right, other times they will be wrong, often egregiously wrong. But all of the responses seem just as confident.
"
I always thought somebody knows something if they can not only recall the facts, but also apply them in new contexts.
"
ChatGPT does not and cannot do these things. What it does do is, as a side effect of its design, produce text that seems, to a naive observer, to be produced by something that does these things. But the illusion is quickly shattered when you start actually checking up on its responses.
How do we know at what point it "knows" something? There are non-trivial philosophical questions here
"
Perhaps, but they are irrelevant to this article. The article is not about an abstract philosophical concept of "knowledge". It is about what ChatGPT is and is not actually doing when it emits text in response to a prompt.
"
I can see someone eventually arguing they have somehow formed a decent representation of what things "are" inside them
"
Not as long as there are no semantic connections between the network and the world. No entity forms "representations" of actual things just by looking at relative word frequencies in texts. There has to be two-way interaction with the actual world. That's how, for example, we humans form our mental representations of things. We interact with them and learn how they work.
…, pre-politically correct characteristics included "often precocious vocabulary with no apparent 'real understanding/ability' for use/application/reasoning." That is my recollection from Googling ten-fifteen years ago; ymmv.
This is
https://www.physicsforums.com/threa…an-appropriate-source-for-discussion.1053525/
another/one more case; some wiki/google sources lack "shelf life."