Pruna AI, a European startup focused on developing compression algorithms for AI models, will release its optimization framework as open source this Thursday. The framework incorporates various efficiency techniques, including caching, pruning, quantization, and distillation, to enhance AI models.
“We also standardize saving and loading the compressed models, applying combinations of these compression methods, and also evaluating your compressed model after you compress it,” Pruna AI co-fonder and CTO John Rachwan told Media.
“If I were to use a metaphor, we are similar to how Hugging Face standardized transformers and diffusers — how to call them, how to save them, load them, etc. We are doing the same, but for efficiency methods,” he added.
Notably, Pruna AI’s framework assesses the extent of quality loss after model compression and the performance improvements achieved. Major AI laboratories have already implemented different compression strategies. For example, OpenAI has utilized distillation to produce quicker iterations of its primary models, which likely contributed to the creation of GPT-4 Turbo, a faster variant of GPT-4. Similarly, Black Forest Labs’ Flux.1-schnell image generation model is a distilled adaptation of the original Flux.1 model.
Distillation involves a “teacher-student” approach, where knowledge is extracted from a larger AI model. Developers submit requests to the teacher model and capture the outputs, sometimes comparing these results with a dataset for validation.
“For big companies, what they usually do is that they build this stuff in-house. And what you can find in the open source world is usually based on single methods. For example, let’s say one quantization method for LLMs, or one caching method for diffusion models,” Rachwan said. “But you cannot find a tool that aggregates all of them, makes them all easy to use and combine together. And this is the big value that Pruna is bringing right now.”
While Pruna AI accommodates a wide range of models, including large language models, diffusion models, speech-to-text models, and computer vision models, it is currently placing a stronger emphasis on image and video generation models.
Among Pruna AI’s current users are Scenario and PhotoRoom. In addition to the open-source version, the company offers an enterprise solution featuring advanced optimization capabilities, including an optimization agent.
“The most exciting feature that we are releasing soon will be a compression agent,” Rachwan said. “Basically, you give it your model, you say: ‘I want more speed but don’t drop my accuracy by more than 2%.’ And then, the agent will just do its magic. It will find the best combination for you, return it for you. You don’t have to do anything as a developer.”
Pruna AI charges by the hour for its pro version. “It’s similar to how you would think of a GPU when you rent a GPU on AWS or any cloud service,” Rachwan said.
For organizations that rely heavily on AI models, using the optimized version can lead to significant cost savings on inference. For instance, Pruna AI successfully reduced the size of a Llama model by eight times with minimal quality loss through its compression framework. The company aims for its clients to view this framework as a self-sustaining investment. Recently, Pruna AI secured $6.5 million in seed funding, with investors such as EQT Ventures, Daphni, Motier Ventures, and Kima Ventures backing the startup.
Also read: Viksit Workforce for a Viksit Bharat
Do Follow: The Mainstream formerly known as CIO News LinkedIn Account | The Mainstream formerly known as CIO News Facebook | The Mainstream formerly known as CIO News Youtube | The Mainstream formerly known as CIO News Twitter
About us:
The Mainstream formerly known as CIO News is a premier platform dedicated to delivering latest news, updates, and insights from the tech industry. With its strong foundation of intellectual property and thought leadership, the platform is well-positioned to stay ahead of the curve and lead conversations about how technology shapes our world. From its early days as CIO News to its rebranding as The Mainstream on November 28, 2024, it has been expanding its global reach, targeting key markets in the Middle East & Africa, ASEAN, the USA, and the UK. The Mainstream is a vision to put technology at the center of every conversation, inspiring professionals and organizations to embrace the future of tech.