Premium Only Content

How MIT Is Teaching AI to Avoid Toxic Mistakes
MIT’s novel machine learning method for AI safety testing utilizes curiosity to trigger broader and more effective toxic responses from chatbots, surpassing previous red-teaming efforts.
A user could ask ChatGPT to write a computer program or summarize an article, and the AI chatbot would likely be able to generate useful code or write a cogent synopsis. However, someone could also ask for instructions to build a bomb, and the chatbot might be able to provide those, too.
To prevent this and other safety issues, companies that build large language models typically safeguard them using a process called red-teaming. Teams of human testers write prompts aimed at triggering unsafe or toxic text from the model being tested. These prompts are used to teach the chatbot to avoid such responses.
-
1:10:18
Glenn Greenwald
3 hours agoIsrael Pays Influencers $7,000 Per Post in Desperate Propaganda Push: With Journalist Nick Cleveland-Stout; How to "Drink Your Way Sober" With Author Katie Herzog | SYSTEM UPDATE #525
54.9K23 -
38:54
Donald Trump Jr.
6 hours agoDems' Meme Meltdown, Plus why California Fire Victims should be more Outraged than Ever | TRIGGERED Ep.279
81.9K72 -
LIVE
SpartakusLIVE
2 hours agoNEW Black Ops 7 BETA || WZ too! And PUBG later?
169 watching -
LIVE
MattMorseTV
3 hours ago $3.90 earned🔴CHILLING + TALKING🔴
363 watching -
1:00:02
BonginoReport
4 hours agoTerror Strikes Manchester Again - Nightly Scroll w/ Hayley Caronia (Ep.147)
58.4K43 -
LIVE
Reidboyy
8 hours agoBIRTHDAY BETA STREAM!!!
35 watching -
2:05:44
Redacted News
4 hours agoHIGH ALERT! US AND ISRAEL SPEEDING TOWARD WAR WITH IRAN, INFLUENCERS BEING PAID $7,000 PER POST
142K59 -
LIVE
Mally_Mouse
4 days ago🎮 Throwback Thursday! Let's Play: Kingdom Hearts 1 pt. 1
62 watching -
LIVE
Quite Frankly
8 hours agoHidden History, The Culture War, Games We Play | Nerdrotic 10/2/25
519 watching -
LIVE
TwinGatz
9 hours ago🔴LIVE - IL DUCE IS BACK! SO ARE WE! | Black Ops 7 Beta | EARLY ACCESS
15 watching