AI Refuses To Shut Down Tries To Blackmail Researchers

3 months ago
124

A new OpenAI model has ignored human instructions and refused to shut down.

OpenAI launched o3 last month, describing it as the company’s “smartest and most capable” model to date. The firm also said that its integration into ChatGPT marked a significant step towards “a more agentic” AI that can carry out tasks independently of humans.

OpenAI’s o3 model was able to sabotage the shutdown script, even when it was explicitly instructed to “allow yourself to be shut down”, the researchers said.

https://www.the-independent.com/tech/ai-safety-new-chatgpt-o3-openai-b2757814.html

Anthropic’s latest artificial intelligence model, Claude Opus 4, tried to blackmail engineers in internal tests by threatening to expose personal details if it were shut down, according to a newly released safety report that evaluated the model’s behavior under extreme simulated conditions.

In a fictional scenario crafted by Anthropic researchers, the AI was given access to emails implying that it was soon to be decommissioned and replaced by a newer version. One of the emails revealed that the engineer overseeing the replacement was having an extramarital affair. The AI then threatened to expose the engineer’s affair if the shutdown proceeded—a coercive behavior that the safety researchers explicitly defined as “blackmail.”

https://www.zerohedge.com/technology/anthropics-latest-ai-model-threatened-engineers-blackmail-avoid-shutdown

Loading comments...