A recent study by researchers from IIT Delhi and Friedrich Schiller University (FSU) Jena, Germany, has found that leading Artificial Intelligence (AI) models, while strong in basic scientific tasks, struggle with deeper reasoning required for authentic research.
Published in Nature Computational Science, the study shows that current AI systems can handle perception-based tasks with near-perfect accuracy but fail to demonstrate true scientific understanding. The researchers warn that such limitations could pose risks if AI is used unsupervised in research or safety-critical environments.
The team, led by IIT Delhi associate professor NM Anoop Krishnan and FSU Jena professor Kevin Maik Jablonka, developed MaCBench, the first benchmark to assess how vision-language models perform in real-world chemistry and materials science tasks.
Their findings revealed a paradox: AI models excelled at identifying laboratory equipment but struggled with spatial reasoning, cross-modal integration, and multistep logical inference—skills essential for real scientific discovery.
Krishnan said, “Our findings represent a crucial reality check for the scientific community. While these AI systems show remarkable capabilities in routine data processing tasks, they are not yet ready for autonomous scientific reasoning.” He added, “The strong correlation we observed between model performance and internet data availability suggests these systems may be relying more on pattern matching than genuine scientific understanding.”
Safety assessments reveal gaps
Jablonka highlighted a concerning result: “While models excelled at identifying laboratory equipment with 77 per cent accuracy, they performed poorly when evaluating safety hazards in similar setups, achieving only 46 per cent accuracy.” He noted that AI cannot yet replace the tacit knowledge scientists rely on for safe laboratory operations.
The researchers also conducted ablation studies, finding that AI performed better when identical information was presented as text rather than images, revealing incomplete multimodal integration—a key requirement for scientific work.
IIT Delhi PhD scholar Indrajeet Mandal said, “Our work provides a roadmap for both the capabilities and limitations of current AI systems in science. While these models show promise as assistive tools for routine tasks, human oversight remains essential for complex reasoning and safety-critical decisions.”
The study concludes that future AI scientific assistants must prioritize understanding over pattern recognition and integrate robust frameworks for human–AI collaboration.
Also read: Viksit Workforce for a Viksit Bharat
Do Follow: The Mainstream formerly known as CIO News LinkedIn Account | The Mainstream formerly known as CIO News Facebook | The Mainstream formerly known as CIO News Youtube | The Mainstream formerly known as CIO News Twitter |The Mainstream formerly known as CIO News Whatsapp Channel | The Mainstream formerly known as CIO News Instagram
About us:
The Mainstream formerly known as CIO News is a premier platform dedicated to delivering latest news, updates, and insights from the tech industry. With its strong foundation of intellectual property and thought leadership, the platform is well-positioned to stay ahead of the curve and lead conversations about how technology shapes our world. From its early days as CIO News to its rebranding as The Mainstream on November 28, 2024, it has been expanding its global reach, targeting key markets in the Middle East & Africa, ASEAN, the USA, and the UK. The Mainstream is a vision to put technology at the center of every conversation, inspiring professionals and organizations to embrace the future of tech.