We decided to take a first approach by using the Open-Unmix python library built with PyTorch. This library is designed to extract different sources from music. You could use it to extract vocals, drums, and bass into separate audio files. We adapted this to work with speech and background noise instead.
Open-Unmix provides an LSTM machine learning model built on top of the PyTorch machine learning library. We molded our source files to fit what Open-Unmix expects by extending the recordings to 5 seconds to match the length of the noise files and make sure the sample rate also matches. We sorted out data into an 80% training, 20% validation split. After running the training many times while tweaking the learning rate, batch size, sample size, validation size, we got the following as our best results: