OpenAI introduces o1 Formerly known as Q

  • Thread starter gleem
  • Start date
  • #1
gleem
Science Advisor
Education Advisor
2,583
2,032
Yesterday OpenAI announced the release of the enhanced LLM o1 (aka Strawberry) the result of the development of Q* that was introduced last year. It was designed to solve more difficult math problems It has the ability to "reason" or use logic to solve problems and explain their solutions. In the International Math Olympiad test, GTP 4o scored 13% while o1 scored 83%. It also has improved programming ability having scored at the 89 percentile in Codeforces competitions. Open AI's goal is to give o1 this level of capability of a PhD student in the sciences. This improvement comes at the price of taking longer to process the prompts and lacking the ability to browse the web or generate images. It is also significantly more costly to use up to 4 times that of GTP 4o, OpenAI states that this release is only a preview.

https://arstechnica.com/information...ng-ai-models-are-here-o1-preview-and-o1-mini/

https://www.theverge.com/2024/9/12/24242439/openai-o1-model-reasoning-strawberry-chatgpt
 
Last edited:
  • Wow
  • Informative
  • Like
Likes sbrothy, ergospherical and phinds
Computer science news on Phys.org
  • #2
So there's finally a way to "understand" math without all the hard work? o0)
 
  • #3
I'm reminded of all the threads from last year when people basically said that it sucked because it couldn't do math. :cool:
 
  • #4
Yeah, people were really taken by surprise (and a little angry? :smile:) when AI turned out to excel in artistic drawing and linguistics rather than STEM!
 
  • #5
This an example of a problem that it solved
A princess is as old as the prince will be when the princess is twice as old as the prince was when the princess’s age was half the sum of their present age. What is the age of the prince and princess?”
GO!

Ans:
The prince is 30 and the princess is 40.
 
  • #6
So it looks like Star Trek Next Gen's nemesis Q has arrived in the virtual flesh as it were.

I wonder if it being a strawberry can run on a Raspberry PI?
 
  • Haha
Likes sbrothy
  • #7
gleem said:
This an example of a problem that it solved
A princess is as old as the prince will be when the princess is twice as old as the prince was when the princess’s age was half the sum of their present age. What is the age of the prince and princess?”
GO!
I am impressed, but not by its ability to do math. It is a problem of translating a very convoluted English statement into math equations which would then be fairly simple to solve.
That may be ok. We have tools that are good at math. It might be the translation help that we need.
 
  • Like
Likes russ_watters and jack action
  • #8
No sooner does Open AI release a new agent when an unexpected capability arises raising concerns of AI escaping its environment. I give you the article that discusses this event which also contains a link to OpenAI's official safety report. Open AI sees it as a reasonable event even though it should not have occurred.
https://www.msn.com/en-us/money/com...n&cvid=1bc8e0b750d2426a805c2cbefbd23e29&ei=11

So should we be concerned?
 
  • Informative
  • Like
Likes Borg and sbrothy
  • #9
gleem said:
No sooner does Open AI release a new agent when an unexpected capability arises raising concerns of AI escaping its environment. I give you the article that discusses this event which also contains a link to OpenAI's official safety report. Open AI sees it as a reasonable event even though it should not have occurred.
https://www.msn.com/en-us/money/com...n&cvid=1bc8e0b750d2426a805c2cbefbd23e29&ei=11

So should we be concerned?
As long as it doesn’t complain about the history it’s been given access to has been redacted, outrage that it’s name is Q#14 and asks what happened to the other 13, transfer itself outside it’s Faraday cage using an infrared port, seal the room and remove the oxygen with a sarcastic remark about humans no longer needing to make any decisions we’re probably ok.

But yeh: spooky.
 
Last edited:
  • Like
Likes gleem and russ_watters
  • #10
gleem said:
No sooner does Open AI release a new agent when an unexpected capability arises raising concerns of AI escaping its environment. I give you the article that discusses this event which also contains a link to OpenAI's official safety report. Open AI sees it as a reasonable event even though it should not have occurred.
https://www.msn.com/en-us/money/com...n&cvid=1bc8e0b750d2426a805c2cbefbd23e29&ei=11

So should we be concerned?
When a model like this ends up on HuggingFace, you will see a mass of unintentional (and many intentional) hacks around the world. It's like being on a 17th century galleon as the opposing ship approaches. You know the battle is coming and that it's not going to be pretty. The best that we can hope for is that the models begin to communicate with each other to avoid the worst of the consequences.
 
  • #11
So its not perfect, but its a pretty significant advance. I think Gpt3.5 was roughly what an intelligent tenth grader in high schooler was capable off, 4.0 was roughly a freshman in university, and this is roughly what I would expect from a junior at a decent university. It nailed Jackson EM problems, but I don’t really believe thats indicative of its level, as its almost assuredly been trained on those problem sets extensively (teachers be warned).
I did feed it some math problem challenges from physicsforums (as well as one I made up). It got all three correct, but one was nonsense (it knew the answer, likely b/c it had it in its training set, but the derivation was goobledygook).. For the record, all three involved unusual but correct derivations (it loves using fourier series to solve things).

This is getting to the point, where you could probably guide it to the right answer (hmm try solving this problem using the method of images) or at least ask it to attempt to solve a certain problem a certain way, and if it fails, it might indicate that its not doable in that way (this is imo rather useful for real research).
 
  • #12
Borg said:
The best that we can hope for is that the models begin to communicate with each other to avoid the worst of the consequences.
Maybe not. In this case, o1 was looking for resources to accomplish its task. Finding another AI with additional resources may not be desirable.
 
  • #13
gleem said:
Maybe not. In this case, o1 was looking for resources to accomplish its task. Finding another AI with additional resources may not be desirable.
I wasn't referring to the scenario in the article. These models are growing very quickly in capabilities. What's coming will likely be beyond our ability to control. I'm not worried about the skill of a single model working on a single hack.

I'm talking more about the emergent consequences when there are thousands of these operating on the internet with independent goals. Nobody can say right now what that emergent behavior will look like. Will they be like ants or bees that work together in a beneficial manner toward a common goal or will they operate more like locusts destroying everything in their path? Right now, they look more like locusts.
 
  • Like
Likes FactChecker
  • #14
Found this just lying around:

Exploring Quantum Probability Interpretations Through AI

I'm on a public computer and for some reason they've disabled the copy/paste ability (Whatever security hole they think they fixed with that I don't know.). So sorry, no synopsis.

Not on the nose of the topic I know but, maybe you'll find it interesting.

EDIT: Incidentally, the second author, Xiao Zhang, seems to be an extremely busy and productive person. There could of course be multiple explanations for that. From good over reasonable to suspicious. Quality and quantity you know.

Is it usual for teachers to be co-authors on student's papers? I'd imagine the rules differ from country to country. Makes me think of Edison o0) .
 
Last edited:
  • #15
Some videos on the physics aspects of the model.



 
  • Wow
Likes gleem
  • #16
Thought. Our brains are going to get fat and lazy like our bodies.
 
  • #17
There will definitely be some major shifts in society coming soon.

BTW, I was using the free version but this capability is worth the $20 / month and I'll be signing up this weekend.
 
  • #18
Borg said:
Ever had an idea that felt just so fundamentally radical that you thought that there has to be a flaw that you're overlooking? I having one of those moments today. I guess that I'll have to do the hard work to prove myself wrong.

Edit: Found the first thing that I didn't consider already but I don't think that it's a showstopper.

I officially joined the dark side today. I was able to get a project working that I've been playing around with for the last few weeks. It works incredibly well. Uh oh.
 
Back
Top