Vocal Isolation via Machine Learning

A Kennesaw State University Capstone Project

Adviser/Instructor: Dr. Ying Xie

Sponsor: Dr. Linh Le

Team Lead: Jacob Barnett

Team: Henry Colomb, Amari McGee, Christian Moore, and Jack Wengert

Overview of our Capstone - Why The Clarity Project?

Who is this for?
What does it do?
Why AI Neural Network over Traditional Filters?
What is the Scope?
How can we improve?
Looking for the words to say?

5-Minute Video of Why We Believe in The Clarity Project

18 - Minute Full Presentation of Project

First Approach Using Open-Unmix as a Proof of Concept

We decided to take a first approach by using the Open-Unmix python library built with PyTorch. This library is designed to extract different sources from music. You could use it to extract vocals, drums, and bass into separate audio files. We adapted this to work with speech and background noise instead.

Open-Unmix provides an LSTM machine learning model built on top of the PyTorch machine learning library. We molded our source files to fit what Open-Unmix expects by extending the recordings to 5 seconds to match the length of the noise files and make sure the sample rate also matches. We sorted out data into an 80% training, 20% validation split. After running the training many times while tweaking the learning rate, batch size, sample size, validation size, we got the following as our best results:

Our Best Results with Open-Unmix

Our training loss became under 2%, over the course around 20 epochs, and flat-lined around 100 epochs. Our accuracy still needs to improve, but the results are looking up with more data to train.

Audio Before Being Processed

before_prediction.wav

(Open-Unmix)

Audio After Being Processed

after_prediction.wav

Audio Before Being Processed

BeforeCPPrediction.wav

(The Clarity Project so far)

Audio After Being Processed

CPModelsPrediction.wav