As artificial intelligence becomes more deeply integrated into critical sectors, a new cybersecurity study has uncovered a highly advanced method of compromising large language models (LLMs), raising serious concerns about their safety and reliability.
Researchers have developed a technique called “ProAttack”, a prompt-based backdoor attack that can manipulate AI outputs with near 100% success rates while remaining almost impossible to detect using current defence systems.
Unlike traditional backdoor attacks that rely on unusual tokens or altered training labels, ProAttack works in a more subtle way. It embeds malicious behaviour through carefully designed prompts while keeping the training data and labels completely clean. This makes it extremely difficult for existing systems to identify any irregularities.
Instead of obvious triggers, the model is trained to recognise hidden prompt patterns. When such a pattern appears, even in a natural-looking input, the model generates the attacker’s intended response without raising suspicion.
One of the most concerning aspects is its efficiency. The attack can achieve near 100% success rates across multiple models and datasets with minimal interference. In some cases, just 6 poisoned samples were enough to successfully implant the backdoor. At the same time, the model continues to perform normally, making detection during testing very difficult.
The study highlights a major gap in current AI security systems. Most defences are designed to detect visible anomalies such as unusual tokens or mismatched labels. However, ProAttack avoids these signals entirely, allowing it to bypass widely used safeguards.
The potential real-world impact is significant. LLMs are widely used in sectors like finance, healthcare, and governance. A compromised system could generate misleading financial advice, alter medical reports, or introduce biased outputs in critical applications. Since the backdoor remains inactive until triggered, such threats could go unnoticed for long periods.
This discovery adds to growing concerns about vulnerabilities in AI systems. As adoption increases, attackers are shifting focus from software flaws to manipulating model behaviour itself.
The findings highlight the urgent need for stronger AI-specific security measures, including better training data audits, prompt validation, and continuous monitoring of model behaviour to prevent future threats.
Also read: Viksit Workforce for a Viksit Bharat
Do Follow: The Mainstream LinkedIn | The Mainstream Facebook | The Mainstream Youtube | The Mainstream Twitter
About us:
The Mainstream is a premier platform delivering the latest updates and informed perspectives across the technology business and cyber landscape. Built on research-driven, thought leadership and original intellectual property, The Mainstream also curates summits & conferences that convene decision makers to explore how technology reshapes industries and leadership. With a growing presence in India and globally across the Middle East, Africa, ASEAN, the USA, the UK and Australia, The Mainstream carries a vision to bring the latest happenings and insights to 8.2 billion people and to place technology at the centre of conversation for leaders navigating the future.



