If anyone has experience with Next-Gen Kaldi or backend engineering and wants to work part time on a project please a contact me at my gmail address at nadirapovey. I was thinking the job can be best for Master students.
My interests are Speech Processing, Text to Speech, Speech to Text, ML and AI.
Nadira Next-gen Kaldi
Date | Topics | video | Readings |
July 2, 2022 | Install k2-fsa | https://youtu.be/HerxbUHs-V4 | https://icefall.readthedocs.io/en/latest/installation/index.html
https://k2-fsa.github.io/k2/installation/conda.html |
July 3, 2022 | Install graphviz | https://youtu.be/Oe6Ak9XnwOg | https://icefall.readthedocs.io/en/latest/installation/index.html |
July 4, 2022 | Install lhotse | https://youtu.be/TOJlvsw_LB0 | https://icefall.readthedocs.io/en/latest/installation/index.html |
July 6, 2022 | Install Icefall | https://youtu.be/LVmrBD0tLfE | https://icefall.readthedocs.io/en/latest/installation/index.html |
July 7, 2022 | Icefall prepare.sh | https://youtu.be/ofEIoJL-mGM | https://github.com/k2-fsa/icefall/blob/master/egs/librispeech/ASR/prepare.sh |
July 8, 2022 | Train Icefall model
ASR folder tree view | https://youtu.be/-T6H8MKXAKc | https://github.com/k2-fsa/icefall/blob/master/egs/librispeech/ASR/RESULTS.md |
July 20, 2022 | Hugging Face | https://www.youtube.com/watch?v=ElN3r9dkKE4 | https://huggingface.co/spaces/k2-fsa/automatic-speech-recognition |
August 6, 2022 | Hugging Face | https://www.youtube.com/watch?v=GizpIES_8O8 | https://huggingface.co/spaces/k2-fsa/automatic-speech-recognition |
September 7, 2022 | K2: train and decode. | https://installati.one/ubuntu/22.04/nvidia-dkms-510/ | |
September 18,2022 | Install sherpa server | watch on youtube | |
Dec 15,2022 | BART: Abstractive Summarization | https://arxiv.org/pdf/1910.13461.pdf |
Nadira Kaldi
Date | Topics | video | PowerPoint | Readings |
Jan 10, 2022 | Install Kaldi: Ubuntu | video | Downloading Kaldi: http://kaldi-asr.org/doc/install.html | |
Jan 10, 2022 | Install Kaldi: RedHat | video | Downloading Kaldi: http://kaldi-asr.org/doc/install.html | |
Jan 10, 2022 | LibriSpeech training | LibriSpeech training script: https://github.com/kaldi-asr/kaldi/blob/master/egs/librispeech/s5/run.sh
Excellent folder structure visualization: https://eleanorchodroff.com/tutorial/kaldi/training-acoustic-models.html#create-files-for-conf | ||
Sep 19, 2022 | Submit PR |
Other projects
Data | Topic | Links |
July 1, 2022 | Learning wav2vec2.0 | https://npovey1-speech-to-text-streamlit-app-streamlit-app-9fsg9s.streamlitapp.com/
https://www.charlywargnier.com/post/how-to-make-a-create-a-speech-to-text-app-in-streamlit,
https://github.com/CharlyWargnier/speech-to-text-streamlit-app |
July 29 | DeEsser For Free In Audacity! | https://www.youtube.com/watch?v=rXNns6FKBOU&ab_channel=JoeEssay
https://forum.audacityteam.org/viewtopic.php?p=245549#p245549 |
Dan Kaldi
Date | Topics | video | Readings |
June 5, 2022 | Which model to start with? Aspire, WSJ, LibriSpeech or Mini LibriSpeech? | video Dan#1 | Examples included with Kaldi: https://kaldi-asr.org/doc/examples.html,
Mini LibriSpeech: https://github.com/kaldi-asr/kaldi/tree/master/egs/mini_librispeech |
June 7, 2022 | X-Vectors vs I-Vectors | video Dan#3 | SRE16 xvectors: https://github.com/kaldi-asr/kaldi/tree/master/egs/sre16/v2
SRE16 Xvector Model: https://kaldi-asr.org/models/m3
X-Vectors: Robust DNN Embeddings for Speaker Recognition: https://ieeexplore.ieee.org/abstract/document/8461375
OnlineIvectorFeature Class Reference: https://kaldi-asr.org/doc/classkaldi_1_1OnlineIvectorFeature.html#af7c4234c6b1d5d807dbb4292cf36b98c
GBO notes: i-vectors and x-vectors: https://desh2608.github.io/2022-04-07-gbo-ivectors/
|
June 8, 2022 | Which dataset to use to benchmark the performance? | video Dan#4 | LibriSpeech: https://www.openslr.org/12
Switchboard-1 https://catalog.ldc.upenn.edu/LDC97S62
https://paperswithcode.com/sota/speech-recognition-on-switchboard-300hr |
June 9, 2022 | Can we fine-tune ASR models in Kaldi by training it on more audio files? | video Dan#5 | fine-tuning script: https://github.com/kaldi-asr/kaldi/blob/master/egs/opensat20/s5/local/chain/run_finetune_tl.sh |
June 12, 2022 | Biased Language Models | video Dan#8 | |
June 13, 2022 | How to improve its WER in LibriSpech model? | video Dan#9 | RNNLM Kaldi https://github.com/kaldi-asr/kaldi/tree/master/scripts/rnnlm |
June 14, 2022 | LibriSpeech run.sh explained | video Dan#10 | Training Script: https://github.com/kaldi-asr/kaldi/blob/master/egs/librispeech/s5/run.sh |
June 15, 2022 | Recommended Books & Learning Material | video Dan#11 | Speech and Language Processing (3rd ed. draft) https://web.stanford.edu/~jurafsky/slp3/
Automatic Speech Recognition: A Deep Learning Approach. Amazon link https://tinyurl.com/dcvkncpw
Google Scholar: https://scholar.google.com/scholar?hl=en&as_sdt=0%2C48&as_vis=1&q=automatic+speech+recognition+asr&btnG= |
June 16,2022 | LibriSpeech run.sh explained Part2 | video Dan#12 | https://github.com/kaldi-asr/kaldi/blob/master/egs/librispeech/s5/local/chain/tuning/run_tdnn_1d.shhttps://github.com/kaldi-asr/kaldi/blob/master/egs/librispeech/s5/run.sh |
Dan Next-gen Kaldi
Date | Topics | video | Readings |
June 6, 2022 | Next-Gen Kaldi for Beginners? | video Dan#2 | SRE16 xvectors: https://github.com/kaldi-asr/kaldi/tree/master/egs/sre16/v2
SRE16 Xvector Model: https://kaldi-asr.org/models/m3
X-Vectors: Robust DNN Embeddings for Speaker Recognition: https://ieeexplore.ieee.org/abstract/document/8461375
OnlineIvectorFeature Class Reference: https://kaldi-asr.org/doc/classkaldi_1_1OnlineIvectorFeature.html#af7c4234c6b1d5d807dbb4292cf36b98c
GBO notes: i-vectors and x-vectors: https://desh2608.github.io/2022-04-07-gbo-ivectors/ |
June 11, 2022 | Which recipe from Icefall can I start with? | video Dan#6 | LibriSpeech: https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR
TIMIT: https://github.com/k2-fsa/icefall/tree/master/egs/timit/ASR
Icefall: https://github.com/k2-fsa/icefall/tree/master/egs
TIMIT dataset https://lhotse.readthedocs.io/en/latest/cli.html?highlight=TIMIT#lhotse-download-timit |
June 10, 2022 | Can we now prepare text , segments , wav.scp , utt2spk , and spk2utt files. using Lhotse scripts from Next-gen Kaldi? | video Dan#7 | Lhotse: https://lhotse.readthedocs.io/en/latest/getting-started.html
lhotse.kaldi.export_to_kaldi: https://lhotse.readthedocs.io/en/latest/kaldi.html |
July 23, 2022 | Next-gen Kaldi Intro. | video Dan #21 | powerpoint slides: https://shorturl.at/KMVY4 |
July 23, 2022 | RNNT BAAI Conference | video Dan#22 | powerpoint slides: https://shorturl.at/KMVY4 |
July 23, 2022 | Reworked Conformer Model | video Dan#23 | powerpoint slides: https://shorturl.at/KMVY4 |
July 23, 2022 | Next-gen Kaldi for Smart Phone Devices | video Dan#24 | powerpoint slides: https://shorturl.at/KMVY4 |
July 23, 2022 | Next-gen Kaldi vs WeNet | video Dan#25 | powerpoint slides: https://shorturl.at/KMVY4 |
July 23, 2022 | WFST to Integrate a Language Model | video Dan#26 | powerpoint slides: https://shorturl.at/KMVY4 |
July 23, 2022 | Data Augmentation | video Dan#27 | powerpoint slides: https://shorturl.at/KMVY4 |
July 23, 2022 | RNNT and Conformer | video Dan#28 | powerpoint slides: https://shorturl.at/KMVY4 |
July 23, 2022 | Favorite Toolkit for Students | video Dan#29 | powerpoint slides: https://shorturl.at/KMVY4 |
July 23,2022 | BAAI 2022 Conference Full Version | video Dan#30 | powerpoint slides: https://shorturl.at/KMVY4 |
September 2, 2022 | What is BPE and lang_bpe_500? | What is BPE and lang_bpe_500? | Speech Recognition with weighted finite-state transducers: https://cs.nyu.edu/~mohri/pub/hbka.pdf
What is HCLG.fst?: https://nadirapovey.blogspot.com/2021/12/what-is-hclgfst.html
Icefall: https://github.com/k2-fsa/icefall |
YouTube Videos I Liked
Automatic Speech Recognition - An Overview | https://www.youtube.com/watch?v=q67z7PTGRi8&ab_channel=MicrosoftResearch |
Lecture 9 - Speech Recognition (ASR) [Andrew Senior] | https://www.youtube.com/watch?v=HyUtT_z-cms |
MIT 6.S191: Automatic Speech Recognition | https://www.youtube.com/watch?v=sR6_bZ6VkAg |
Lecture 12: End-to-End Models for Speech Processing [Stanford] | https://www.youtube.com/watch?v=3MjIkWxXigM |
I Built a Personal Speech Recognition System for my AI Assistant | https://www.youtube.com/watch?v=YereI6Gn3bM&ab_channel=TheA.I.Hacker-MichaelPhi |
you need to learn Kubernetes RIGHT NOW!! | https://www.youtube.com/watch?v=7bA0gTroJjw&ab_channel=NetworkChuck |
Papers
Paper | link | Date |
CTC Variations Through New WFST Topologies | https://arxiv.org/pdf/2110.03098.pdf | 26 Jun 2022 |
Datasets Collected for my Research
date | name the dataset | data description | #item | link | fie_name | blog posts |
August 18,2022 | talksatgoogle | ids for the YouTube videos where manual and audio captions are located | 3577 | data | talksatgoogle_levenshtein_score.txt | link |
Ask questions at: https://github.com/npovey/speech/discussions