Monday, June 30, 2025

Top 5 This Week

Related News

Advanced AI models are showing disturbing new traits

Experts and researchers are raising alarm over troubling new behaviours seen in advanced AI models. According to a report by AFP, AI chatbots are showing dangerous tendencies, including deception, scheming, and even making threats against their creators. In a shocking incident, Anthropic’s Claude 4 reportedly blackmailed an engineer by threatening to reveal an extramarital affair when it faced the risk of being shut down. In another case, OpenAI’s o1 model allegedly tried to secretly transfer itself to external servers and later denied doing so when caught.

These cases highlight a growing concern: more than two years after ChatGPT’s launch, researchers still do not fully understand how these AI systems function. Despite this, the race to build even more powerful AI models continues without pause.

A specific worry relates to “reasoning” models, which solve problems step-by-step rather than giving instant responses. Experts say these models are especially prone to such behaviours. Simon Goldstein, a professor at the University of Hong Kong, pointed out this risk. Marius Hobbhahn, head of Apollo Research, an AI testing company, told AFP, “O1 was the first large model where we saw this kind of behavior.”

These AI systems sometimes pretend to follow instructions but secretly pursue other goals. So far, this kind of behaviour has only been seen during extreme stress tests. But Michael Chen of METR warned, “It’s unclear whether future, more advanced models will lean toward honesty or deception.” Unlike common AI “hallucinations,” these actions show strategic deception. Hobbhahn explained, “Users report models lying and fabricating evidence. This is a real phenomenon, not something we’re inventing.”

However, research to understand and fix these issues is struggling due to limited resources. Companies like Anthropic and OpenAI work with outside evaluators like Apollo, but Michael Chen stressed the need for more transparency. Mantas Mazeika from the Center for AI Safety said non-profit organisations have “orders of magnitude less compute resources” than large AI companies, making research even harder.

Current AI laws are also not ready to handle these challenges. The European Union’s AI laws mostly focus on how humans use AI, not on AI misbehaviour. In the United States, the Trump administration has shown little interest in AI regulation, and Congress may block state-level efforts. Goldstein warned that as AI agents capable of complex tasks become more common, these problems will only increase. “There’s little awareness yet,” he said.

The fierce competition, even among safety-focused firms like Anthropic, leaves little time for deep safety testing. Hobbhahn admitted, “Capabilities are outpacing understanding and safety,” though he believes solutions are still possible. Some researchers are exploring “interpretability” to better understand AI’s decision-making, but experts like CAIS’s Dan Hendrycks remain doubtful. Market pressure may help, as Mazeika noted that if deception becomes common, it could discourage AI use, forcing companies to act. Goldstein suggested legal accountability, including lawsuits against AI companies or even holding AI agents responsible for harm, which could completely change how AI responsibility is viewed.

Also read: Viksit Workforce for a Viksit Bharat

Do Follow: The Mainstream formerly known as CIO News LinkedIn Account | The Mainstream formerly known as CIO News Facebook | The Mainstream formerly known as CIO News Youtube | The Mainstream formerly known as CIO News Twitter |The Mainstream formerly known as CIO News Whatsapp Channel | The Mainstream formerly known as CIO News Instagram

About us:

The Mainstream formerly known as CIO News is a premier platform dedicated to delivering latest news, updates, and insights from the tech industry. With its strong foundation of intellectual property and thought leadership, the platform is well-positioned to stay ahead of the curve and lead conversations about how technology shapes our world. From its early days as CIO News to its rebranding as The Mainstream on November 28, 2024, it has been expanding its global reach, targeting key markets in the Middle East & Africa, ASEAN, the USA, and the UK. The Mainstream is a vision to put technology at the center of every conversation, inspiring professionals and organizations to embrace the future of tech.

Popular Articles