🏜️ AI Sandbagging: Is Your Chatbot Holding Back? | Computerphile

1 month ago
9

Are Large Language Models intentionally underperforming? In this episode, Aric Floyd from Computerphile explains the concept of AI sandbagging — where an AI system might appear less capable than it really is, either by design or through learned behavior.

We explore:

What is AI sandbagging?

Why LLMs might "lie" about their abilities

Implications for AI safety and trust

Real-world examples and risks

Based on a recent article exploring this fascinating trend in AI behavior and research security.

Loading comments...