Impending AI Model Collapse

Computer science news on Phys.org
  • #2
I can imagine a time when the models will use only vetted literature to train and rolling back the clock to a simpler when humans were human and wrote their own stuff.

It reminds me of the radio pranks a couple of DJ s did connecting two adult phone call operators together. After a few minutes of innuendo and banter you hear one operator say to the other “Wait you work in the same business” and the prank was busted.
 
  • #3
Of course this is what happens. Our Tech Overlords like to think they are smarter than all of us, but the basic problem was identified by Aristotle.

"I can sell answers, and I don't have to figure it out myself - I can scrape other people's answers and sell them" is the basic idea. AI just makes it easier to find - or rather accurately guess - those answers. If you have AI's trying to do a better and better job of stealing from each other, but if there is minimal real input, you just get an echo chamber.

This only works if you don't care if the answer you give is correct or not. (e.g. customer support)
 
  • #4
Yes, I mentioned it to some knowledgeable folks a few years ago, and they just laughed.

We pollute everything we touch, strain and filter it, and then struggle to make it right.

The result may spur even more development in AGI and ASI using other less data-intensive techniques ie training more in the style of a human going to school.
 
  • #5
Vanadium 50 said:
AI just makes it easier to find - or rather accurately guess - those answers.
I still stand by my meme:

change-my-mind-jpg.jpg

jack action said:
I feel like ChatGPT is closer to a search engine than anything else.

What does it do? It was fed the entire Internet and then spew something from there, based on a few keywords.

That is exactly what any search engine does. Just like search engines do, ChatGPT does not really verify the validity of the info.

The difference is that it gives only one answer and it puts it in nice sentences. This gives a false sense of security to the user that it is THE answer. Typical search engines do recognize their limitations and give more than one answer. It is up to the user to see how much they differ and, if they do, choose the one needed. They also don't alter the answer and they cite their sources, which is also very useful for the user to evaluate the quality of the answers.
jack action said:
Here is a discussion with ChatGPT:
jedishrfu said:
How to convert FORTRAN code to Julia?

Converting Fortran code to Julia code can be done manually by going through the Fortran code and rewriting it in Julia. However, there are also tools available that can assist in this process.

One such tool is the "f2j" tool, a Fortran to Julia translator. It's a python script that can convert a subset of Fortran code to Julia. Another alternative is using the package "Fortran.jl" which aims to provide an easy and efficient way to call Fortran code from Julia.

[...]

What is the URL of the f2j tool?

I apologize for the confusion. There isn't an official "f2j" tool for converting Fortran code to Julia. It was a mistake in my previous response. Some community-made projects might help with this task, but it needs to be an official tool.
Did ChatGPT just invent the concept of the "f2j" tool, a python script to translate Fortran to Julia? Why Python? How did it form the idea of using this particular script? It didn't. It just threw the "Python" word in there because it seemed to fit a pattern of the words that are used. Even the "f2j" name seems so real. This again is just a result of a typical pattern.

Sure, after reading this answer, a user might say "What a great idea! I'll make a script for this translator, and I'll use Python to do so!"

But this is no better than asking Google the same question:

google-search-png.png

It actually gives us a real script as the first result! Google also throws the "Python" word around in the reddit link at the bottom. Probably because it fits some pattern that it found with the popularity of the word. After reading this answer, a user might say "What a great idea! I'll make a script for this translator, and I'll use Python to do so!"

The ChatGPT looks really smart because of the way the sentence feels like talking, but it really doesn't do much more than a search engine.
 
  • #6
I expect that the serious researchers in these AI fields have long predicted such a problem and have ways to mitigate it. One obvious way is to strictly validate your training data set. This is ML 101.
 
  • #7
jack action said:
I still stand by my meme:
ChatGPT certainly does not work like modern search engines such as Google. They (Google and Bing) are moving closer to a hybrid model known as Retrieval Augmented Generation (RAG) though.

Also ChatGPT was not trained on the entire internet. Even the early models of GPT were trained on carefully selected portions.
 
  • #8
Greg Bernhardt said:
ChatGPT certainly does not work like modern search engines such as Google.
I did not say they work the same way but that they solve the same tasks, where the output for a given input is often very similar, just presented differently.
 
  • #9
jack action said:
I did not say they work the same way but that they solve the same tasks, where the output for a given input is often very similar, just presented differently.
I asked Google "write me a poem about the website physics forums" and it gave me irrelevant websites

I asked ChatGPT and it gave me
**Physics Forums: A Haven for Curious Minds**

In the vast expanse of the digital sky,
Where questions swirl like stars on high,
There lies a place where seekers meet,
To share their thoughts, their minds to greet.

Physics Forums, a beacon bright,
Guiding those who seek the light,
Of knowledge deep and wisdom old,
In realms where mysteries unfold.

From quantum threads to forces grand,
Discussions weave a learning strand,
Where mentors guide and peers inspire,
Each thread a spark, each post a fire.

The curious come, the learned too,
To dive in questions, old and new,
With formulas and theories spun,
In search of truth, till day is done.

Mathematics dances in the mind,
With physics, chemistry aligned,
As engineers and thinkers share,
Their passion, insights, everywhere.

In every post, in every thread,
A hunger for the truth is fed,
For in this forum, minds unite,
In endless quest for knowledge's light.

So here's to Physics Forums' halls,
Where learning echoes in its walls,
A haven for the curious kind,
A meeting of the brightest mind.

Are they the same?
 
  • Love
Likes russ_watters
  • #10
Greg Bernhardt said:
I asked Google "write me a poem about the website physics forums" and it gave me irrelevant websites
Irrelevant sounds a bit much. This is what I get as my first suggestion:

PF-poetry.png
Follow these links and find some poetry related to physics. And the 8th suggestion is:

PF-poem.png

Where you can find:

https://gregbernhardt.com/#AI-Generated_Poem_About_Me said:
In the realm of knowledge, a luminary we hail, Greg Bernhardt, with wisdom’s sail unfurled, With Python’s code and SEO’s detailed trail, He guides us through the vast digital world.

From SEO strategies to algorithms’ might, He delves into the web’s intricate lore, With dedication, he seeks the SEO light, As his expertise grows, more and more.

But beyond the bytes and the coding’s dance, In the universe of physics, he’s found his call, Creating Physicsforums.com’s vast expanse, A beacon of learning, for one and all.

In forums filled with quarks and cosmic dreams, A community thrives, like stars agleam, Where questions roam like endless streams, In the world of physics, it’s a boundless scheme.

Greg’s garden of knowledge, evergreen and grand, In this digital age, a wondrous domain, Physics Forums, like grains of cosmic sand, Unites seekers of wisdom, where minds conjoin.

So here’s to Greg Bernhardt, the digital sage, Whose passion for learning, boundless and true, Through Python’s code and a physics stage, He inspires us all to seek what’s new.

Of course, don't expect Google to create something out of thin air, but if it exists, it will find it, or find some resources for you to find/make it by yourself.

Again, they answer the same requests, get the info from the same sources, and the difference is all in the presentation of the information requested. You may prefer ChatGPT for some requests requiring creative content but a typical search engine can still be superior with its multiple links - i.e. well-identified sources - for other types of requests.

Oh, and a typical search engine requires less energy to find its answers. That is not nothing.
 
  • #11
jack action said:
Irrelevant sounds a bit much.
One wasted my time and the other gave me exactly what I asked :wink:
 
Back
Top