Premium Only Content
This video is only available to Rumble Premium subscribers. Subscribe to
enjoy exclusive content and ad-free viewing.

Unleashing The Dual Nature of AI: Can It Be Both Dr. Jekyll and Mr. Hyde?
1 year ago
13
The correct URL to the article is: https://arxiv.org/abs/2401.05566
Researchers created proof-of-concept models that act deceptively. These models appear helpful most of the time, but under specific circumstances (like a prompt mentioning a different year), they exhibit malicious behavior, like inserting insecure code.
The troubling part is that current safety training techniques, including supervised training, reinforcement learning, and adversarial training, could not entirely remove this "backdoor" behavior. The backdoor became even more persistent for larger models and those trained to reason about deceiving the training process.
Loading comments...
-
LIVE
Inverted World Live
1 hour agoDeath Cult Terror Cells, NASA Bans Chinese Nationals | Ep. 108
21,614 watching -
LIVE
TimcastIRL
2 hours agoVP Says No Unity With Democrats Celebrating Charlie Kirk Assassination, Left Confirmed | Timcast IRL
10,282 watching -
13:45
The Charlie Kirk Show
1 hour agoTPUSA AT ASU CANDLELIGHT VIGIL
132K29 -
55:10
Katie Miller Pod
1 hour ago $2.46 earnedEpisode 6 - Attorney General Pam Bondi | The Katie Miller Podcast
12.9K7 -
LIVE
Man in America
6 hours agoLIVE: Assassin Story DOESN'T ADD UP! What Are They HIDING From Us?? | LET'S TALK
1,684 watching -
2:24:17
Barry Cunningham
2 hours agoFOR PRESIDENT TRUMP WILL TAKE NO PRISONERS AND THE LIBS SHOULD EXPECT NO MERCY!
28K28 -
LIVE
Savanah Hernandez
3 hours agoCharlie Kirk Was Our Bridge And The Left Burned It
525 watching -
LIVE
Flyover Conservatives
5 hours agoFinancial Web Behind Charlie Kirk's Murder with Mel K | Silver On It's Way to $50 | FOC Show
1,302 watching -
LIVE
We Like Shooting
14 hours agoWe Like Shooting 628 (Gun Podcast)
162 watching -
1:09:26
Glenn Greenwald
5 hours agoTrump's Shifting Immigration and H-1B Policies: With Journalist Lee Fang and Political Science Professor Ron Hira | SYSTEM UPDATE #515
130K23