Build with Audio: The easy & hard way!
- Room:
- Wicklow Hall 2B
- Start (Dublin time):
- Start (your time):
- Duration:
- 180 minutes
Abstract
The audio (& speech) domain is going through a massive shift in terms of end-user performances. It is at the same tipping point as NLP was in 2017 before the Transformers revolution took over. Weâve gone from needing a copious amount of data to create Spoken Language Understanding systems to just needing a 10-minute snippet.
This tutorial will help you create strong code-first & scientific foundations in dealing with Audio data and build real-world applications like Automatic Speech Recognition (ASR) Audio Classification, and Speaker Verification using backbone models like Wav2Vec2.0, HuBERT, etc.
TutorialPyData: Deep Learning, NLP, CV
Description
Repository for the conference: https://github.com/Vaibhavs10/how-to-asr
Unlike general Machine Learning problems where we either classify i.e. segregate a data point into a pre-defined class or regress around a continuous variable, audio related problems can be slightly more complex. Wherein, we either go from an audio representation to a text representation (ASR) or separate different layers of audio (Diarization) and so on. This tutorial will not only help you build applications like these but also unpack the science behind them using a code-first approach.
Every step of the way weâll first write and run some code and then take a step back and unpack it all till it makes sense. Weâll make science fun again :)
The tutorial will be divided into 3 key sections:
- Read, Manipulate & Visualize Audio data
- Build your very own ASR system (using pre-trained models like Wav2Vec2.0) & deploy it
- Create an Audio Classification pipeline & infer the model for other downstream audio tasks
At the end of the tutorial, youâll develop strong intuition about Audio data and learn how to leverage large pre-trained backbone models for downstream tasks. Youâll also learn how to create quick demos to test and share your models.
Libraries: HuggingFace, SpeechBrain, PyTorch & Librosa