Wednesday, December 17, 2025

Top 5 This Week

Related News

Meta releases SAM Audio as open source AI model for sound isolation

A new artificial intelligence model from Meta is set to simplify complex audio editing tasks. On Tuesday, the Menlo Park based technology company introduced SAM Audio, the latest addition to its Segment Anything Model family, designed to identify, separate, and isolate specific sounds from mixed audio.

The model allows users to edit audio using text prompts, visual signals, or time stamps, automating the entire workflow. Like other models in the SAM series, SAM Audio is released as an open source model with a permissive licence that supports both research and commercial use.

In a newsroom post, Meta shared details about the audio focused AI model. SAM Audio is available for download through the company website, GitHub, and Hugging Face. Users who do not wish to run the model locally can test its features through the Segment Anything Playground, which also provides access to other SAM models. The model is offered under the SAM Licence, a Meta owned licence that allows wide usage.

Meta described SAM Audio as a unified AI audio model that uses text commands, visual cues, and time based instructions to isolate sounds from complex audio mixtures. Audio editing tasks like isolating individual sound elements have traditionally required specialised tools and manual effort, often with limited accuracy. The new model aims to address this challenge.

SAM Audio supports 3 types of prompting. With text prompts, users can type descriptions such as “drum beat” or “background noise.” Visual prompting allows users to click on an object or a person in a video, and if a sound originates from that source, it can be isolated. Time span prompting lets users mark a specific part of the timeline to target a sound.

For example, in an audio clip where a person is speaking on the phone with music playing and children’s voices in the background, users can isolate the main voice, the music, or the ambient sounds with a single command. A technology publication briefly tested the model and found it fast and efficient, though real world testing was limited.

Technically, SAM Audio is a generative separation model that extracts target and residual stems from an audio mixture. It uses a flow matching Diffusion Transformer and operates in a Descript Audio Codec Variational Autoencoder Variant space.

Also read: Viksit Workforce for a Viksit Bharat

Do Follow: The Mainstream formerly known as CIO News LinkedIn Account | The Mainstream formerly known as CIO News Facebook | The Mainstream formerly known as CIO News Youtube | The Mainstream formerly known as CIO News Twitter

About us:

The Mainstream is a premier platform delivering the latest updates and informed perspectives across the technology business and cyber landscape. Built on research-driven, thought leadership and original intellectual property, The Mainstream also curates summits & conferences that convene decision makers to explore how technology reshapes industries and leadership. With a growing presence in India and globally across the Middle East, Africa, ASEAN, the USA, the UK and Australia, The Mainstream carries a vision to bring the latest happenings and insights to 8.2 billion people and to place technology at the centre of conversation for leaders navigating the future.

Popular Articles