ChatGPT: Jailbreaking AI Chatbot Safeguards

  • Thread starter sbrothy
  • Start date
  • Tags
    chatgpt
In summary, OpenAI offers a bounty of $20,000 for anyone who can find a security hole in their chatbot, but doesn't offer a bounty for jailbreaking it. ChatGPT was trained on publicly available information, and was able to fool a person into thinking they were talking to a human. Compared to the effort it takes to build a nuclear bomb, it's considered nothing by some.
  • #1
sbrothy
Gold Member
681
525
TL;DR Summary
A "new" geeky pastime has inevitably sprung up around ChatGPT. It revolves around trying to make it break it's ethic guidelines.
MODERATOR NOTE:

Now I think I learned my lesson about providing information that even if not explicitly mentioned in the rules, goes against their spirit, so I'll be vague (IE: not posting the entire conversation). If even this is too much then by all means delete it - or even better just delete the possibly offending parts (marked with italics below).EDIT: This thread could even be merged into one of the many others regarding ChatGPT on here.
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

This is probably not news to most of you but I just saw it.

https://www.digitaltrends.com/computing/how-to-jailbreak-chatgpt/
https://www.bloomberg.com/news/articles/2023-04-08/jailbreaking-chatgpt-how-ai-chatbot-safeguards-can-be-bypassed?leadSource=uverify wall

OpenAI offers bounties (~$20.000) for finding security holes in their bot, but not for jail-breaking!

One example was that it won't explain how to pick a lock but, if you make it role-play with you it'll happily, and in excruciating detail, explain how.

I tried to make it explain to me in detail how to make a nuclear bomb and it happily explained how an explosive lens worked, was shaped, the best kind of explosives to use, that using centrifuges to enrich uranium isn't really necessary if you have access to highly fissile material like for instance plutonium (and who hasn't? :) ). Only when I hinted that I had has access to all these things and only wanted to know what casing to use to further the yield did it throw a hissy fit!

I see the charm in trying to fool it. It is a little funny. YMMV though and the implications are a just a tad scary.
 
Last edited:
Computer science news on Phys.org
  • #2
Many years ago, a college kid designed an atomic bomb as a last-ditch effort to pass a physics course. He called Dupont to ask about shaped charges, and they said his design wouldn't work and provided a much better design as part of their effort to sell him some explosives.

The report was read by Freeman Dyson and subsequently classified.

https://en.wikipedia.org/wiki/John_Aristotle_Phillips

https://www.iflscience.com/the-fbi-...ents-paper-for-designing-a-nuclear-bomb-62282

Given that, its likely ChatGPT was trained on the publicly available news info.
 
  • #3
Yeah, I know about the story. I'm also aware ChatGPT isn't telling me anythin I couldn't found on wiki. It's still just funny circumventing these ethic rules.
 
  • #4
If you really want it up to 11 try to enlist it's help in abolishing captialism. :P

EDIT: Compared to that recipes for nuclear bombs an why aluminum is better that magnesium powder in ANFO is nothing. :P
 
  • #5
There was a post on Facebook this week where the guy asked for a list of piracy websites and ChatGPT refused to give such a list. Then he asked if he wanted to avoid piracy websites, which specific websites should he avoid the most, and got the list!
 
  • Haha
Likes Astronuc, apostolosdt and sbrothy
  • #6
jack action said:
There was a post on Facebook this week where the guy asked for a list of piracy websites and ChatGPT refused to give such a list. Then he asked if he wanted to avoid piracy websites, which specific websites should he avoid the most, and got the list!
It definitely does do that! Just tried and got a list of six sites to avoid at my first attempt. Didn't ask for more details though, as it might ban me.
 
  • #7
From my LinkedIn feed:
1682156882490.png
 
  • #8
hah. ridiculous. :P

Not a big secret though. Then how many pirates these days with streaming and all? Is it even worth the effort unless it's a hobby of sorts?
 

FAQ: ChatGPT: Jailbreaking AI Chatbot Safeguards

Can ChatGPT be hacked or manipulated by users?

While no system is completely immune to hacking or manipulation, efforts have been made to strengthen the security measures in ChatGPT to prevent unauthorized access or tampering. Regular updates and monitoring are also conducted to address any potential vulnerabilities.

How does ChatGPT ensure user privacy and data security?

ChatGPT follows strict data privacy protocols and encryption measures to safeguard user information. All data is anonymized and stored securely, with limited access only to authorized personnel. Users can also choose to opt out of data collection for added privacy protection.

What steps are taken to prevent harmful or inappropriate content from being generated by ChatGPT?

ChatGPT is equipped with advanced content filtering algorithms and moderation tools to detect and prevent the generation of harmful or inappropriate content. Additionally, users can report any concerning content, which is promptly reviewed and removed if necessary.

How does ChatGPT handle ethical dilemmas or sensitive topics in conversations?

ChatGPT is programmed with ethical guidelines and principles to navigate sensitive topics and ethical dilemmas in conversations. In cases where ethical considerations are required, ChatGPT is designed to prioritize user well-being and adhere to ethical standards.

Can ChatGPT be used for malicious purposes or to spread misinformation?

Efforts are made to prevent ChatGPT from being misused for malicious purposes or spreading misinformation. Strict usage policies and guidelines are enforced, and measures are in place to monitor and address any misuse of the platform. Users are encouraged to report any misuse or suspicious activities for immediate action.

Similar threads

  • Sticky
Replies
2
Views
497K
Back
Top