chatgpt-reliable

Why ChatGPT Is Not Reliable

Estimated Read Time: 3 minute(s)
Common Topics: chatgpt, words, training, information, data

I’ll start with the simple fact: ChatGPT is not a reliable answerer of questions.

To try to explain why from scratch would be a heavy lift, but fortunately, Stephen Wolfram has already done the heavy lifting for us in his article, “What is ChatGPT Doing… and Why Does It Work?” [1] In a PF thread discussing this article, I tried to summarize as briefly as I could the key message of Wolfram’s article. Here is what I said in my post there [2]:

ChatGPT does not make use of the meanings of words at all. All it is doing is generating text word by word based on relative word frequencies in its training data. It is using correlations between words, but that is not the same as correlations in the underlying information that the words represent (much less causation). ChatGPT literally has no idea that the words it strings together represent anything.

In other words, ChatGPT is not designed to actually answer questions or provide information. In fact, it is explicitly designed not to do those things, because, as I said in the quote above, it only works with words in themselves; it does not work with, and does not even have any concept of, the information that the words represent. And that makes it unreliable, by design.

So, to give some examples of misconceptions that I have encountered: when you ask ChatGPT a question that you might think would be answerable by a Google Search, ChatGPT is not doing that. When you ask ChatGPT a question that you might think would be answerable by looking in a database (as Wolfram Alpha, for example, does when you ask it something like “what is the distance from New York to Los Angeles?”), ChatGPT is not doing that. And so on, for any value of “which you might think would be answerable by…”. And the same is true if you substitute “looking for information in its training data” for any of the above: the fact that, for example, there are a huge body of posts on Instagram in ChatGPT’s training data does not mean that if you ask it a question about Instagram posts, it will look at those posts in its training data and analyze them in order to answer the question. It won’t. While there is, of course, voluminous information in ChatGPT’s training data for a human reader, ChatGPT does not use, or even comprehend, any of that information. Literally all it gets from its training data is relative word frequencies.

So why do ChatGPT responses seem like they are reliable? Why do they seem like they must be coming from a process that “knows” the information involved? Because our cognitive systems are designed to interpret things that way. When we see text that looks syntactically, grammatically correct and seems like it is confidently asserting something, we assume that it must have been produced, if not by an actual human, at least by an “AI” that is generating the text based on some kind of actual knowledge. In other words, ChatGPT fools our cognitive systems into attributing qualities to it that it does not actually have.

This security hole, if you will, in our cognitive systems is not a recent discovery. Human con artists have made use of much the same tricks throughout human history. The only difference with the human con artists is that they were doing it intentionally, whereas ChatGPT has no intentions at all and is doing it as a side effect of its design. But the end result is much the same: let the reader beware.

[1] https://writings.stephenwolfram.com/2023/02/what-is-chatgpt-doing-and-why-does-it-work/

[2] https://www.physicsforums.com/threads/stephen-wolfram-explains-how-chatgpt-works.1050431/post-6903906

138 replies
« Older CommentsNewer Comments »
  1. Ken G says:
    Perhaps ChatGPT can itself give us some assistance here. When I prompted it to describe why it writes a poem about poetry when given the single word "poem", this was part of its answer:
    "The process of training me involves a cost function, which is used to fine-tune my responses. During training, I'm presented with a wide variety of text and corresponding prompts, and I learn to predict the next word or phrase based on those inputs. The cost function helps adjust the model's internal parameters to minimize the difference between its predictions and the actual training data."

    I think the important element of this is that it is trained on prompts and text, not just text. So it does not just predict the next word in a body of text, then see how well it did, it tries to predict responses to prompts. But a prompt is not a normal aspect of either human communication or bodies of text (if I came up to you and said "poem", you would not think you were being prompted, you would have no idea what I wanted and would probably ask me what I was talking about, but ChatGPT does not ask us what we are talking about, it is trained that everything is a prompt). So its training must look at bodies of text as if they were in response to something, and they look for some kind of correlation with something earlier that is then considered a prompt.
    It says: "The prompt/response framework is a way to communicate how the model learns context and generates responses, but it doesn't necessarily represent the actual structure of the training data."

  2. Ken G says:

    PeterDonis said

    Where are these layers in the Wolfram article's description? (I am referring to the Wolfram article that is referenced in the Insights article at the start of this thread.)

    I presume the "loss function" used for the training, this must be the crucial place that the trainers impose their intentions onto the output. I keep coming back to the fact that you can prompt ChatGPT with the single word "poem" and it will write a poem about poetry. Surely this is not what you can get from simply taking a body of written work and trying to predict the words that come after "poem" in that body of data, because very few poems are about poetry. There must be someplace where the trainers have decided what they expect the user to want from ChatGPT, and have trained it to produce a poem when the prompt is the word "poem", and have trained it to make the poem be about something mentioned in the prompt. That would go beyond simply predict what comes after the word "poem" in some vast body of text. The LLM would have to be trained to predict what words follow other words that also qualify as satisfying the prompt in some way.

    So getting back to the issue as a prompt as a correction, the trainers expect that people will want to iterate with ChatGPT to improve its output in a session, so they will give prompts that should be interpreted as corrections. That's when ChatGPT is trained to be obsequious about doing corrections, it has to be built into the way it is trained and not just a simple algorithm for predicting text that follows the text in the prompt (again, humans are rarely so obsequious).

  3. PeterDonis says:

    AndreasC said

    ChatGPT has additional layers on top of that to "push" it to favor some kinds of response more than others.

    Where are these layers in the Wolfram article's description? (I am referring to the Wolfram article that is referenced in the Insights article at the start of this thread.)

  4. PeterDonis says:

    russ_watters said

    it therefore doesn't know if the input is wrong, and will simply riff on the wrong input. This gives the appearance to the user that they are winning a debate or providing useful clarification on which to get a better answer, when in reality they may just be steering it towards providing or expanding on a wrong answer.

    Yes, agreed.

  5. Ken G says:

    PeterDonis said

    It has a database of relative word frequencies. That's the only database it has. (At least, that's the case for the version that was reviewed in the articles being discussed here. Later versions might have changed some things.)

    Yes good point, it starts with a training dataset and then generates its own, such that it only needs to query its own to generate its responses. But the manner in which that dataset is trained is a key aspect, that's where the intelligence of the trainers leaves its mark. They must have had certain expectations that they built in, which end up looking like there is a difference between a prompt like "poem", that will generate a poem about poetry, versus a followup prompt like "shorten that poem". It can even count the words it uses in its answer, and prohibit certain types of answers, so it has some extra scaffolding in its training.

  6. russ_watters says:

    PeterDonis said

    No, it's designed to accept any input whatever and generate response text based on the relative word frequency algorithm that Wolfram describes in his article (referenced in the Insights article). It has no idea that some input is "criticism/clarification" based on its previous responses. It has no semantics.

    I understand, but I think you might be reading past my point. As you say, it "accepts any input" regardless of what the input is. What I'm pointing out is that it therefore doesn't know if the input is wrong, and will simply riff on the wrong input. This gives the appearance to the user that they are winning a debate or providing useful clarification on which to get a better answer, when in reality they may just be steering it towards providing or expanding on a wrong answer.

  7. PeterDonis says:

    Ken G said

    it has a database to inform those frequencies

    It has a database of relative word frequencies. That's the only database it has. (At least, that's the case for the version that was reviewed in the articles being discussed here. Later versions might have changed some things.)

  8. Ken G says:

    PeterDonis said

    It doesn't search any database; that was one of the main points of the Insights article. All it does is generate text based on relative word frequencies, using the prompt given as input as its starting point.

    Yes, but it has a database to inform those frequencies. That database must be "primed" in some way to establish what the ChatGPT is, and how it should relate to prompts. For example, if you prompt it with "describe yourself", it will say "I am ChatGPT, a creation of OpenAI. I'm a language model powered by the GPT-3.5 architecture, designed to understand and generate human-like text based on the input I receive. I have been trained on a diverse range of text sources up until September 2021, so I can provide information, answer questions, assist with writing, generate creative content, and more. However, it's important to note that I don't possess consciousness, emotions, or personal experiences. My responses are based on patterns in the data I've been trained on, and I aim to be a helpful and informative tool for various tasks and conversations." So that's a highly specialized set of data to look for word associations with "describe yourself," it has been trained to favor certain word frequencies in response to certain prompts.

    Also, if you correct it, it will invariably apologize obsequiously. So it is in some sense "programmed to accept corrections," in the sense that it uses a word association database that expects to be corrected and is trained to respond to that in certain ways.

    It would seem that its training also expects to provide certain types of responses. For example, if you just give it the one word prompt "poem", it will write a poem. Also, since you did not specify a subject, it will write a poem about poetry! I think that was a conscious decision by its programmers, there are built in expectations about what a prompt is trying to accomplish, including corrections. It could be said that ChatGPT inherits some elements of the intelligence of its trainers.

  9. PeterDonis says:

    Ken G said

    it will typically search its database

    It doesn't search any database; that was one of the main points of the Insights article. All it does is generate text based on relative word frequencies, using the prompt given as input as its starting point.

  10. PeterDonis says:

    Ken G said

    it gives the prompt some kind of special status, so when you correct something it said, it tends to become quite obsequious

    I don't think that's anything explicitly designed, at least not in the version of ChatGPT that was reviewed in the Insights article and the Wolfram article it references. It is just a side effect of its algorithm.

  11. Ken G says:

    PeterDonis said

    No, it's designed to accept any input whatever and generate response text based on the relative word frequency algorithm that Wolfram describes in his article (referenced in the Insights article). It has no idea that some input is "criticism/clarification" based on its previous responses. It has no semantics.

    Yet it gives the prompt some kind of special status, so when you correct something it said, it tends to become quite obsequious. It is not normal in human language to be that obsequious, I don't think it could get that from simply predicting the next word. When it has made a mistake and you tell it that it has, it will typically search its database a little differently, placing some kind of emphasis on your correction. If you tell it that one plus one equals three, however, it has enough contrary associations in its database that it sticks to its guns, but it will still not tell you that you have made a mistake (which surely would be the norm in its database of reactions), it will suggest you might be joking or using an alternative type of mathematics. The status it gives to the prompt must be an important element of how it works, and it is constrained to be polite in the extreme, which amounts to "riffing on" your prompt.

  12. PeterDonis says:

    russ_watters said

    it is designed to* accept criticism/clarification/correction

    No, it's designed to accept any input whatever and generate response text based on the relative word frequency algorithm that Wolfram describes in his article (referenced in the Insights article). It has no idea that some input is "criticism/clarification" based on its previous responses. It has no semantics.

  13. Ken G says:
    Also, it should perhaps be noted that this question goes way beyond whether we should shun or accept AI in the physics classroom, the question of "what does an LLM really know" goes to the heart of what this type of AI will be useful for in the future. Let us not forget IBM's painful lesson that was Watson! They thought that because it was the best Jeopardy contestant ever, it could analyze patient data and help suggest improved cancer treatments. So far that has been a dismal failure, because of the problem of connecting the machine and its capabilities with the necessary data. A human can better tell what matters and what doesn't, and can integrate disparate forms of information, whereas Watson had vastly more computing power but could not find a way to use it, unlike if it was trying to win a chess game or a Jeopardy match, games with simple rules and relatively straightforward connections between the data. To get AI to live up to its potential, we may have to first understand better what knowledge even is, and what separates it from some vast pile of disjoint information. What will that knowledge tell us about ourselves?
  14. russ_watters says:

    neobaud said

    …. If it misunderstands you, you can even clarify your meaning in a way that is really only possible with another human.

    PeterDonis said

    This seems to be vastly overstating ChatGPT's capabilities. You can "clarify" by giving it another input, but it won't have any semantic meaning to ChatGPT any more than your original input did.

    My understanding is that it is designed to* accept criticism/clarification/correction.That makes such follow up mostly useless, since it will simply be riffing on what you tell it, regardless of whether is accurate or not. In other words, you'll always win a debate with it, even if you're wrong.

    *Whether actual design or "emergent" behavior I don't know, but I don't think it matters.

  15. Ken G says:

    AngryBeavers said

    More than I meant to reply….heh. Basically, LLMs have strengths and weaknesses, but none are going to stun an intellectual community in any area that might be relavent.

    My question is, how much of this is due to the fact that these are just early generation attempts, versus how much is fundamental to the way LLMs must work? If we fix up their ability to recognize logical contradictions, and enhance their ability to do mathematical logic, will we get to a point where it is very hard to distinguish their capabilities from the capabilities of physics teachers who pose the questions in the first place? And if we did, what would that mean for our current ideas about what conceptual understanding is, since physics seems like a place where conceptual understanding plays a crucial role in achieving expertise. These kinds of AI related questions always remind me of B. F. Skinner's great point, "The real question is not whether machines think but whether men do. The mystery which surrounds a thinking machine already surrounds a thinking man.” (Or woman.)

  16. Ken G says:

    Motore said

    Well LLMs do not have computational algorhitms (yet), they deal with text pattern recognition, so I don't know why it's so surprising they cannot do calculations.

    It's because I have seen them report a Python code they used to do the calculation, and the Python code does not yield the quantitative result they report. So that's pretty odd, they seem to be able to associate their prompts with actual Python code that is correct, and still get the answer wrong.

    Motore said

    Here is a proposal for a math extension for LLMs: https://aclanthology.org/2023.acl-industry.4.pdf

    Yes, this is the kind of thing that is needed, and is what I'm expecting will be in place in a few years, so it seems likely that ten years from now, LLMs will be able to answer physics questions fairly well, as long as they only require associating the question with a formula without conceptual analysis first. It will then be interesting to see how much LLMs have to teach us about what we do and do not comprehend about our own physics, and what physics understanding actually is. This might be pedagogically significant for our students, or something much deeper.

    Motore said

    Anyway testing ChatGPT (or Bard) a little more I find it useful for initial code generation, but to have a properly functioning script I found myself goning to StackOverflow 70% of the time. The explanation, examples are all already there, wtih LLMs you have to ask a lot of questions (which means typing and waiting) and still don't get the right answers some of the time. And mind you this is not complex code (for that, I never use LLMs), just some small scripts for everyday use.

    Then the question is, why do you not use LLMs for complex code, and will that still be true in ten years? That might be the coding equivalent of using LLMs to solve physics questions, say on a graduate level final exam.

  17. phinds says:

    AngryBeavers said

    I see. I was wondering. By what I read, it seemed the only LLM being discussed was ChatGPT, which I now understand why. I do not understand exactly why ChatGPT gets so much flack though, even still. At the end of the day, it's merely a tool.

    The issue, on this forum at least, it not actually the tool so much as people's misunderstanding about how it works and the reasons for its behavior. People give it too much credit for "intelligence" and so forth.

  18. Ken G says:
    In my rather limited experience, I found Bard was even worse than ChatGPT when it comes to mathematical reasoning that results in a quantitative answer to a physics or astronomy question. ChatGPT often uses the correct physics equation but does the calculation wrong, whereas Bard just makes the physics up completely. If there are any free LLMs that are better at these types of questions than those two I'd like to know about them, because those two are, quite frankly, just awful if you need reliability. That said, I will say that ChatGPT can often be prompted into a correct answer if you check it carefully, whereas I found Bard to be hopeless even when corrected. Hence I do think ChatGPT can be a useful physics tool, but only when used interactively and from a position of some prior knowledge. On the other hand, it is probably correct often enough to be able to give a student a passing grade on most physics exams, though I should not think it would ever result in an A result, at least in my experience with it.

    Ten years from now? Well of course it will have improved enough to give excellent results to the kinds of exams that are currently given. Does that point to a problem in the exams, if they can be answered correctly by a language manipulation model that does not have any comprehension of its source material? Possibly, yes, it may mean that we are not asking the right kinds of questions to our students if we want them to be critical thinkers and not semantic parrots.

  19. PeterDonis says:

    AngryBeavers said

    all of you are aware that there are many, many more LLMs besides ChatGPT?

    Yes, and no doubt how these models work will continue to evolve. The reason for specifically considering ChatGPT in the Insights article under discussion is that that specific one (because of its wide public accessibility) has been the subject of quite a few PM threads, and in a number of those threads it became apparent that there are common misconceptions about how ChatGPT works, which the Insights article was intended to help correct.

  20. nsaspook says:
    https://arxiv.org/abs/2308.02312
    Who Answers It Better? An In-Depth Analysis of ChatGPT and Stack Overflow Answers to Software Engineering Questions

    Over the last decade, Q&A platforms have played a crucial role in how programmers seek help online. The emergence of ChatGPT, however, is causing a shift in this pattern. Despite ChatGPT's popularity, there hasn't been a thorough investigation into the quality and usability of its responses to software engineering queries. To address this gap, we undertook a comprehensive analysis of ChatGPT's replies to 517 questions from Stack Overflow (SO). We assessed the correctness, consistency, comprehensiveness, and conciseness of these responses. Additionally, we conducted an extensive linguistic analysis and a user study to gain insights into the linguistic and human aspects of ChatGPT's answers. Our examination revealed that 52% of ChatGPT's answers contain inaccuracies and 77% are verbose. Nevertheless, users still prefer ChatGPT's responses 39.34% of the time due to their comprehensiveness and articulate language style. These findings underscore the need for meticulous error correction in ChatGPT while also raising awareness among users about the potential risks associated with seemingly accurate answers.

    Users get tricked by appearance.

    Our user study results show
    that users prefer ChatGPT answers 34.82% of the time. However, 77.27% of these preferences are incorrect answers. We believe this observation is worth investigating. During our study, we observed that only when the error in the ChatGPT answer is obvious, users can identify the error. However, when the error is not readily verifiable or requires external IDE or documentation, users often fail to identify the incorrectness or underestimate the degree of error in the answer. Surprisingly, even when the answer has an obvious error, 2 out of 12 participants still marked them as correct and preferred that answer. From semi-structured interviews, it is
    apparent that polite language, articulated and text-book style answers, comprehensiveness, and affiliation in answers make completely wrong answers seem correct. We argue that these seemingly correct-looking answers are the most fatal. They can easily trick users into thinking that they are correct, especially when they
    lack the expertise or means to readily verify the correctness. It is even more dangerous when a human is not involved in the generation process and generated results are automatically used elsewhere by another AI. The chain of errors will propagate and have devastating effects in these situations. With the large percentage
    of incorrect answers ChatGPT generates, this situation is alarming. Hence it is crucial to communicate the level of correctness to users

  21. Ken G says:
    One thing I have noticed interacting with ChatGPT is that it is no good at all at explaining how it comes up with the answers it does. Recently I asked it a question that required mathematical reasoning (something it is really terrible at for some reason), and it did an excellent calculation that involved invoking physics equations that were not at all obvious from the prompt. So that was impressive, and it arrived at a correct expression that involved simply multiplying a chain of different terms that varied widely in magnitude. After the remarkably correct analysis it did, it got that final multiplication wrong by many orders of magnitude!

    So I told it that it was wrong, but the rest was right, so just do the multiplication again. This time it got a different answer, but still completely wrong. So I asked it to describe how it carried out the multiplication, step by step, and it couldn't do it. So I told it to carry out the multiplication of the first two numbers, report the answer, then multiply the third one, and so on, and it then did get the answer correct. Then I used Bard and it got the answer wrong also, though it could report the Python code it used, but the Python code did not give the answer Bard gave! So those AIs cannot seem to track how they arrive at their own answers, and I think that may be closely related to why their numerical results are horrendously unreliable. At some point along the way, they seem to do something that in human terms would be called "guessing", but they cannot distinguish that from any other type of analysis they do, including following Python codes.

  22. PeterDonis says:

    neobaud said

    Its value is that it can interpret your question

    But it doesn't "interpret" at all. It has no semantics.

    neobaud said

    you can even clarify your meaning in a way that is really only possible with another human

    This seems to be vastly overstating ChatGPT's capabilities. You can "clarify" by giving it another input, but it won't have any semantic meaning to ChatGPT any more than your original input did.

  23. PeterDonis says:

    mattt said

    There are many differences between how biological nervous systems work, and how the computers that execute the algorithms behind large language models work.

    Yes, and a major difference, at least for the version of ChatGPT that this article discusses, is that biological nervous systems have two-way real-time connections with the outside world, whereas ChatGPT does not–it has a one-way, "frozen" connection from its training data, and that connection is much more limited than the connections biological systems have since it only consists of relative word frequencies in text.

  24. PeterDonis says:

    russ_watters said

    Is there any mathematical logic in it at all?

    Not in the version of ChatGPT that my Insights article, and the Wolfram article it references, are based on. Future versions might add something along these lines.

  25. russ_watters says:

    Swamp Thing said

    Can one break down how a blind statistical process can do that?

    Is there any mathematical logic in it at all? Obviously search engines and chatbots use word association so "big" = "mass" and from that you get a list of masses. Can it then do =? Seems messy, but I'd think by statistical analysis/word frequency it would associate a 5 digit number with "bigger" than a 4 digit number even without doing the math.

    For what it's worth, translating your question into searchenginese doesn't give a one-word answer but does give a table of masses in the 5th hit.

    I think this is a good exercise, because people so vastly over-estimate the capabilities of chat-bots. It's amazing how convincing you can make it with a large enough data set without using actual intelligence.

  26. Vanadium 50 says:

    Nugatory said

    just no way of getting away from speaking in terms of intention

    Probably. Maybe it's better to talk about "design". ChatGPT is not designed to be independent of the echo chamber. It is designed to produce echos indistinguishable from the rest of the echo chamber.

  27. Vanadium 50 says:

    Nugatory said

    Given the way ChatGPT works, it doesn't matter.

    I was probably unclear. The training data very much matters. And who decides what training data to use or not to use?

    The only way it is "unbiased" is that it is difficult to generate output that is differently biased than the training dataset.

  28. Nugatory says:

    Vanadium 50 said

    ChatGPT attempts to create an "echo" indistinguishable from its "training echos".

    There is just no way of getting away from speaking in terms of intention and volition, the English language won't let us. We cannot resist the temptation to say that it is "attempting" or "trying" to do something, but it is no more attempting to create an echo or anything else than my dishwasher is motivated to clean the dinner dishes.

    The entire ChatGPT phenomenon makes me think of Searle's Chinese Room though experiment: https://en.wikipedia.org/wiki/Chinese_room

  29. Nugatory says:

    Vanadium 50 said

    Who decided what sources are reliable for training and what ones are not?

    Given the way ChatGPT works, it doesn't matter.

    Text generated to look like other text on the internet is going to unreliable whether it is patterned on reliable text or not. For example, essentially everything you'll find about duplicate bridge on the internet will be reliable. Some stuff will be written for beginning players, some will be written by and for the relatively small community of world-class players, most will fall somewhere in between, but it's all reasonable advice. But we still get
    http://bridgewinners.com/article/view/sorry-i-know-it-is-stupid-to-post-conversations-with-chatgpt/
    http://bridgewinners.com/article/view/testing-chatgpt-the-media-hyped-ai-robot/
    http://bridgewinners.com/article/view/using-chatgpt/

  30. Vanadium 50 says:
    The issue with "neutrality" is fraught with peril. For example, in the war between Mordor and Gondor, if we only use sources from one side or the other, we might get different opinions. Most people here would say that the side of Gondor is completely truthful and the side of Mordor is nothing but propaganda and lies. But of course the Mordorian side would say differently. Who decided what sources are reliable for training and what ones are not? Ot do we toss them all in and let ChatGPT sort it out? Because then the point of view will be set by whomever can write the most.

    But this misses a larger fallacy. The question of whether ChatGPT is reliable or not does not depend on whether people are reliable or not, nor on which people are more reliable than others.

  31. PeterDonis says:

    Nugatory said

    Following is free advice, my feelings won't be hurt if you choose to ignore it… Form your opinions of the Supreme Court by reading the actual opinions (posted at https://www.supremecourt.gov/) and by joining the live blog on opinion days at scotusblog.com. We spend a lot of time complaining about pop-sci videos…. Popular press coverage of major court cases is far worse.

    I hate to have to repeat myself, but all thread participants, please note my advice in post #105.

  32. Nugatory says:

    PeroK said

    ChatGPT is at least honest.

    This is a category error, but one that is almost impossible to avoid. The English language has no natural way of talking about chatbot output so we inevitably find ourselves saying things like "it thinks" or "it knows" when of course it does no such thing – it's just arranging words to form sequences that resemble patterns already out there on the internet (and as evidence of just how hard it is to avoid this category error I just finished backspacing away the words "it has seen on the internet"). Saying that ChatGPT is honest makes no more sense than saying that the motor hauling my truck up a steep hill is "gutsy and determined to succeed" – the difference is that we know how to talk about mechanical devices in terms of their performance characteristics without attributing sentience to them.

    I would trust ChatGPT more than I would the US Supreme Court, for example.

    Following is free advice, my feelings won't be hurt if you choose to ignore it… Form your opinions of the Supreme Court by reading the actual opinions (posted at https://www.supremecourt.gov/) and by joining the live blog on opinion days at scotusblog.com. We spend a lot of time complaining about pop-sci videos…. Popular press coverage of major court cases is far worse.

  33. Vanadium 50 says:

    PeroK said

    you are not in an echo chamber

    Isn't it just the opposite? ChatGPT attempts to create an "echo" indistinguishable from its "training echos".

  34. PeterDonis says:

    PeroK said

    With ChatGPT you are not in an echo chamber being fed a steady diet of misinformation.

    True, because "misinformation" requires intent just as much as "honesty" does, and ChatGPT has no intent. Or, to put it another way, ChatGPT is not reliably unreliable any more than it is reliably reliable. :wink:

    PeroK said

    I stumbled on a twitter feed about the COVID vaccine. Everyone on the thread believed that it was harmful. One woman was puzzled by those who willingly took the vaccine and they all agreed it must be down to "low intelligence".

    That is "gross"

    While I understand the point being made in the context of this thread, let's please not take it any further, in order to ensure that we don't derail the thread into a political discussion.

  35. PeroK says:

    russ_watters said

    This isn't exactly true (though depends on what you mean by "gross").

    In contrast with social media software, for example, whose model is to focus information based on your perceived prejudices.

    With ChatGPT you are not in an echo chamber being fed a steady diet of misinformation.

    For example, I stumbled on a twitter feed about the COVID vaccine. Everyone on the thread believed that it was harmful. One woman was puzzled by those who willingly took the vaccine and they all agreed it must be down to "low intelligence".

    That is "gross"; and your examples of ChatGPT bias pale by comparison.

  36. russ_watters says:

    PeroK said

    Chat GPT doesn't have gross political, religious or philosophical prejudices.

    This isn't exactly true (though depends on what you mean by "gross"). It has guardrails designed to constrain content, which reflect the biases of the programmers. For example, a few months ago someone asked it for religious jokes and while it was OK with Christian jokes it declined to provide Islamic jokes. I think this bias has since been corrected.

    It is also biased by its programmers' choice of source information. For example, the user base of Reddit has a lot more say in the generated output than the membership of AARP.

  37. PeroK says:

    Motore said

    Not even that, it just predicts words. It doesn't care if the sentence it makes actually describes anything real. It cannot.
    An example:
    Q: How long does a ferry ride from Istanbul to Trieste take?
    ChatGPT:
    A direct ferry ride from Istanbul to Trieste is not available, as these two cities are located in different countries and are quite far apart. Istanbul is in Turkey, while Trieste is in northeastern Italy.

    To travel between Istanbul and Trieste, you would need to consider alternative transportation options such as flights, trains, or buses…

    Of course, there is a route from Istanbul to Trieste (at least that's what google tells me).

    Sure more data, more parameters will make it better, but it's still not reliable.

    You may be right and it'll die a death. I'm not so sure. The reasons for the adoption of technology are often social and cultural, rather than technical.

    In fact, there is evidence it's already taken off.

  38. Vanadium 50 says:

    PeroK said

    There's no comparison.

    Didn't I just make one? :smile:

    PeroK said

    Chat GPT, however imperfectly, is working on a global pool of human knowledge.

    Actually, it is working on pool of human writing,

    The idea is that writing is a good enough proxy for knowledge and that word frequency distributions* are a good enough proxy for understanding. The thread as well as some past ones highlight many cases where this does not work.

    FWIW, I think ChatGPT could write horoscopes as well as the "professionals". But probably not write prescriptions.

    * But not letter frequency distributions, which we had 40 years ago doing much the same thing. That would just be crazy talk.

  39. PeterDonis says:

    PeroK said

    ChatGPT is at least honest.

    No, it's not. "Honest" requires intent. ChatGPT has no intent.

    PeroK said

    I would trust ChatGPT more than I would the US Supreme Court, for example.

    I don't see how you would even compare the two. The US Supreme Court issues rulings that say what the law is. You don't "trust" or "not trust" the US Supreme Court. You either abide by its rulings or you get thrown in jail.

  40. PeroK says:

    PeterDonis said

    So what? That doesn't make ChatGPT good enough to rely on.

    People already rely on a steady diet of lies and misinformation from human sources. ChatGPT is at least honest. I would trust ChatGPT more than I would the US Supreme Court, for example.

  41. PeroK says:
    So I tried it, but it was non committal!

    Should we climb the Moine Ridge on Thursday this week?

    To make an informed decision about climbing the Moine Ridge on Thursday, I recommend checking weather forecasts, consulting with experienced climbers or local mountaineering authorities, and assessing your own skills and experience. Additionally, consider factors such as trail conditions, safety equipment, and the overall fitness and preparedness of your climbing team.
    Mountain environments can be unpredictable and potentially dangerous, so it's essential to prioritize safety and make well-informed decisions.

  42. PeterDonis says:

    PeroK said

    People can do what they want.

    Sure, but now you're backing away from your previous claim. People are free to choose to do stupid things, of course; but previously you were saying that relying on ChatGPT for practical information was not stupid. Now you're back-pedaling and saying, well, yes, it is stupid, just like relying on astrology, but there will always be people who choose to do stupid things.

  43. PeterDonis says:

    PeroK said

    Chat GPT, however imperfectly, is working on a global pool of human knowledge.

    No, it's working on a global pool of text. That's not the same as "knowledge". ChatGPT has no information about the connection of any of the text in its training data with the actual world. It doesn't even make use of the text in itself; it only makes use of the relative word frequencies in the text.

    PeroK said

    ten years from now, who knows how many people will be using it or its competitors as their mentor?

    Not in its current form. The next obvious step in the evolution of such models–connecting them to actual real world data–is already being taken, at the very least with the paid version of ChatGPT (mentioned in earlier posts), which includes actual lookups in various data sources (web search, for one, and for another, ironically, Wolfram Alpha) for generating responses. In other words, to do the key things that the current free version, which is what this Insights article discussed, does not. Ten years from now, I expect that further steps along those lines will have been taken and will have made these tools reliable in a way that the current ChatGPT is not.

  44. PeroK says:

    PeterDonis said

    Doesn't that contradict your previous claim here?

    If you're not willing to do this yourself, on what basis do you justify saying that someone else could do it?

    People can do what they want. It's an option, for sure. In fact, we've seen some evidence on here that significant numbers of people are using it to learn about physics.

    If some people choose to live by astrological charts, them others can choose to live by ChatGPT. I choose to do neither. For the time being.

  45. PeroK says:

    Vanadium 50 said

    Could you make tyhe same argument for astrology? Yesterday it told me to talk to a loved one and it worked!

    There's no comparison. Chat GPT, however imperfectly, is working on a global pool of human knowledge. There's a rationale that it's trying to produce an unprejudiced, balanced answer.

    Perhaps it will fail to develop. But, ten years from now, who knows how many people will be using it or its competitors as their mentor?

  46. PeterDonis says:

    PeroK said

    I don't personally intend to, no.

    Doesn't that contradict your previous claim here?

    PeroK said

    In a practical sense, you could live according to what answers ChatGPT gives you.

    If you're not willing to do this yourself, on what basis do you justify saying that someone else could do it?

« Older CommentsNewer Comments »

Leave a Reply

Want to join the discussion?
Feel free to contribute!

Leave a Reply