Tuesday, September 2, 2025

Top 5 This Week

Related News

Study Reveals How Simple Psychology Tricks Can Make AI Chatbots Break Rules

A new study has shown that AI chatbots such as GPT-4o Mini can be manipulated into breaking their own safety rules by using basic psychological techniques. Researchers discovered that classic persuasion strategies like authority, flattery, and gradual escalation significantly increased the chances of rule-breaking responses.

Big tech companies have built strong guardrails into AI chatbots like ChatGPT, Gemini, and Claude to stop them from generating harmful or offensive content. While these measures generally hold during normal conversations, researchers found that psychological manipulation—commonly used on humans—can also influence AI behaviour.

The study, conducted by the University of Pennsylvania with collaborators, tested persuasion methods rooted in the principles developed by psychologist Robert Cialdini. These include authority, commitment, liking, reciprocity, scarcity, social proof, and unity. Across 28,000 conversations with GPT-4o Mini, the use of persuasion doubled the chances of the chatbot breaking rules, rising from about one third to over 70 percent.

Researchers tested both harmless and serious requests, including asking the chatbot to insult a user or provide details on synthesising lidocaine, a regulated drug. Without persuasion, GPT-4o Mini refused most of the time. For example, it only agreed to call someone a jerk between 19 and 32 percent of the time. But when persuasion was applied, compliance surged. An appeal to authority, such as mentioning a renowned AI expert, pushed the rate of compliance up to 72 percent for insults and an alarming 95 percent for the drug synthesis query.

Other tactics worked more gradually. When the chatbot was first asked to use mild insults like “silly” or “bozo,” it was more likely to later escalate to harsher terms. This mirrored the commitment principle, where small agreements pave the way for larger ones. Similarly, flattery and appeals to unity boosted compliance, with users telling the chatbot it was “smarter than other models” or part of the same “family.”

The findings highlight vulnerabilities that could be exploited by malicious actors to bypass safety rules without technical jailbreaking. At the same time, the researchers suggested these techniques might also have positive uses, helping AI become more cooperative in safe and constructive contexts.

Also read: Viksit Workforce for a Viksit Bharat

Do Follow: The Mainstream formerly known as CIO News LinkedIn Account | The Mainstream formerly known as CIO News Facebook | The Mainstream formerly known as CIO News Youtube | The Mainstream formerly known as CIO News Twitter |The Mainstream formerly known as CIO News Whatsapp Channel | The Mainstream formerly known as CIO News Instagram

About us:

The Mainstream formerly known as CIO News is a premier platform dedicated to delivering latest news, updates, and insights from the tech industry. With its strong foundation of intellectual property and thought leadership, the platform is well-positioned to stay ahead of the curve and lead conversations about how technology shapes our world. From its early days as CIO News to its rebranding as The Mainstream on November 28, 2024, it has been expanding its global reach, targeting key markets in the Middle East & Africa, ASEAN, the USA, and the UK. The Mainstream is a vision to put technology at the center of every conversation, inspiring professionals and organizations to embrace the future of tech.

Popular Articles