Premium Only Content
This video is only available to Rumble Premium subscribers. Subscribe to
enjoy exclusive content and ad-free viewing.
Unleashing The Dual Nature of AI: Can It Be Both Dr. Jekyll and Mr. Hyde?
3 months ago
3
The correct URL to the article is: https://arxiv.org/abs/2401.05566
Researchers created proof-of-concept models that act deceptively. These models appear helpful most of the time, but under specific circumstances (like a prompt mentioning a different year), they exhibit malicious behavior, like inserting insecure code.
The troubling part is that current safety training techniques, including supervised training, reinforcement learning, and adversarial training, could not entirely remove this "backdoor" behavior. The backdoor became even more persistent for larger models and those trained to reason about deceiving the training process.
Loading comments...
-
LIVE
Bright Insight
1 hour agoWhat I Didn't Say to Joe Rogan on his Podcast
2,047 watching -
LIVE
Havoc
2 hours agoThankful | Stuck Off the Realness Ep. 21
285 watching -
2:05:40
Roseanne Barr
5 hours ago $22.01 earnedThe Perverse Reverse | The Roseanne Barr Podcast #76
53.4K90 -
LIVE
TheTapeLibrary
12 hours agoThe Priest Murders & The Mystery of William Toomey
324 watching -
LIVE
Adam Does Movies
2 hours agoAwful New Christmas Movies! - Dear Santa - Our Little Secret - Nutcrackers - LIVE!
67 watching -
3:16:05
Nerdrotic
6 hours ago $17.68 earnedWoke Killed Comedy, Hollywood Infighting, Girlboss Rohirrim | Friday Night Tights 330 w/ It'sAGundam
72.2K8 -
LIVE
Edge of Wonder
4 hours agoThe Maya Worshiped Turkeys, Bizarre Thanksgiving Facts & Weird News
325 watching -
1:10:47
Sarah Westall
5 hours agoARPANET and Who Really Invented Blockchain: Reconstructing Reality w/ Bryan Ferre
3.8K1 -
LIVE
Quite Frankly
6 hours ago"Tony Black Friday & Thanksgiving Left-Overs" 11/29/24
766 watching -
1:10:05
2 MIKES LIVE
2 hours ago2 MIKES LIVE #149 Open Mike Friday! Special guest Lance Caroselli!
5.36K