Study Reveals How Simple Psychology Tricks Can Make AI Chatbots Break Rules The Mainstream

A new study has shown that AI chatbots such as GPT-4o Mini can be manipulated into breaking their own safety rules by using basic psychological techniques. Researchers discovered that classic persuasion strategies like authority, flattery, and gradual escalation significantly increased the chances of rule-breaking responses.

Big tech companies have built strong guardrails into AI chatbots like ChatGPT, Gemini, and Claude to stop them from generating harmful or offensive content. While these measures generally hold during normal conversations, researchers found that psychological manipulation—commonly used on humans—can also influence AI behaviour.

The study, conducted by the University of Pennsylvania with collaborators, tested persuasion methods rooted in the principles developed by psychologist Robert Cialdini. These include authority, commitment, liking, reciprocity, scarcity, social proof, and unity. Across 28,000 conversations with GPT-4o Mini, the use of persuasion doubled the chances of the chatbot breaking rules, rising from about one third to over 70 percent.

Researchers tested both harmless and serious requests, including asking the chatbot to insult a user or provide details on synthesising lidocaine, a regulated drug. Without persuasion, GPT-4o Mini refused most of the time. For example, it only agreed to call someone a jerk between 19 and 32 percent of the time. But when persuasion was applied, compliance surged. An appeal to authority, such as mentioning a renowned AI expert, pushed the rate of compliance up to 72 percent for insults and an alarming 95 percent for the drug synthesis query.

Other tactics worked more gradually. When the chatbot was first asked to use mild insults like “silly” or “bozo,” it was more likely to later escalate to harsher terms. This mirrored the commitment principle, where small agreements pave the way for larger ones. Similarly, flattery and appeals to unity boosted compliance, with users telling the chatbot it was “smarter than other models” or part of the same “family.”

The findings highlight vulnerabilities that could be exploited by malicious actors to bypass safety rules without technical jailbreaking. At the same time, the researchers suggested these techniques might also have positive uses, helping AI become more cooperative in safe and constructive contexts.

Also read: Viksit Workforce for a Viksit Bharat

Do Follow: The Mainstream formerly known as CIO News LinkedIn Account | The Mainstream formerly known as CIO News Facebook | The Mainstream formerly known as CIO News Youtube | The Mainstream formerly known as CIO News Twitter |The Mainstream formerly known as CIO News Whatsapp Channel | The Mainstream formerly known as CIO News Instagram

About us:

The Mainstream formerly known as CIO News is a premier platform dedicated to delivering latest news, updates, and insights from the tech industry. With its strong foundation of intellectual property and thought leadership, the platform is well-positioned to stay ahead of the curve and lead conversations about how technology shapes our world. From its early days as CIO News to its rebranding as The Mainstream on November 28, 2024, it has been expanding its global reach, targeting key markets in the Middle East & Africa, ASEAN, the USA, and the UK. The Mainstream is a vision to put technology at the center of every conversation, inspiring professionals and organizations to embrace the future of tech.

Top 5 This Week

How Social Media Has Changed Communication And Where It’s Headed

WhatsApp Testing New Option To Create And Save Stickers Without Sending Them

UPI Records Over 20 Billion Transactions In August As Digital Payments Surge

India Post Combines Registered Post With Speed Post From September 1

How A Tamil Nadu Woman Built A ₹1.5 Crore Ice Cream Brand In Kerala

Related News

WhatsApp Testing New Option To Create And Save Stickers Without Sending Them

OpenAI Confirms ChatGPT Messages Could Be Sent To Police If They Contain Threats

Japanese Town Faces Criticism Over Smartphone Time Limit Proposal

Google Advances Agentic SOCs To Strengthen AI Driven Security Operations

Apple to Make Robotics Mandatory for iPhone, Mac and Watch Production

Meta Introduces Safeguards to Protect Teens from Risky AI Chatbot Interactions

Study Reveals How Simple Psychology Tricks Can Make AI Chatbots Break Rules

LEAVE A REPLY Cancel reply

Popular Articles

How Social Media Has Changed Communication And Where It’s Headed

WhatsApp Testing New Option To Create And Save Stickers Without Sending Them

UPI Records Over 20 Billion Transactions In August As Digital Payments Surge

India Post Combines Registered Post With Speed Post From September 1

How A Tamil Nadu Woman Built A ₹1.5 Crore Ice Cream Brand In Kerala

Latest Articles

Edelman India Unveils GCC Advisory Services To Boost Talent And Reputation For Global Capability Centers

APAC emerges as global hub for rising capability centres

India Emerges as Global Hub for Life Sciences GCCs : EY Report

Most Popular

How Social Media Has Changed Communication And Where It’s Headed

Real-World Use Cases of Blockchain and AI in Future-Ready Enterprises

How Artificial Intelligence is Tackling Mathematical Problem-Solving

Subscribe

Subscribe to newsletter

Subscribe to newsletter

Top 5 This Week

Related News

Study Reveals How Simple Psychology Tricks Can Make AI Chatbots Break Rules

LEAVE A REPLY Cancel reply

Popular Articles

Latest Articles

Most Popular

Subscribe