AI Safety Hack Discovered - Neel Nanda on 80000 Hours

7 days ago

9

Finance & Crypto Stock Market ai risks model probes prompt harmful frontier refusing jailbreak

Imagine outsmarting AI's harmful tendencies with a simple, cost-effective trick, like convincing a teenager to avoid trouble by planting the thought earlier—it's like a cheap, clever magic spell for your AI.

#jailbreak #harmful #model

Discover what's trending.

Trends across the world in entertainment, finance, podcasts and more.

Stay on top of trends across the internet with @trndgtr

#shorts #explore #discover #fyp

Loading comments...

Comments