Nadira Povey

If anyone has experience with Next-Gen Kaldi or backend engineering and wants to work part time on a project please a contact me at my gmail address at nadirapovey. I was thinking the job can be best for Master students.

My interests are Speech Processing, Text to Speech, Speech to Text, ML and AI.

Nadira Next-gen Kaldi

Date	Topics	video	Readings
July 2, 2022	Install k2-fsa	https://youtu.be/HerxbUHs-V4	https://icefall.readthedocs.io/en/latest/installation/index.html https://k2-fsa.github.io/k2/installation/conda.html
July 3, 2022	Install graphviz	https://youtu.be/Oe6Ak9XnwOg	https://icefall.readthedocs.io/en/latest/installation/index.html
July 4, 2022	Install lhotse	https://youtu.be/TOJlvsw_LB0	https://icefall.readthedocs.io/en/latest/installation/index.html
July 6, 2022	Install Icefall	https://youtu.be/LVmrBD0tLfE	https://icefall.readthedocs.io/en/latest/installation/index.html
July 7, 2022	Icefall prepare.sh	https://youtu.be/ofEIoJL-mGM	https://github.com/k2-fsa/icefall/blob/master/egs/librispeech/ASR/prepare.sh
July 8, 2022	Train Icefall model ASR folder tree view	https://youtu.be/-T6H8MKXAKc	https://github.com/k2-fsa/icefall/blob/master/egs/librispeech/ASR/RESULTS.md
July 20, 2022	Hugging Face	https://www.youtube.com/watch?v=ElN3r9dkKE4	https://huggingface.co/spaces/k2-fsa/automatic-speech-recognition
August 6, 2022	Hugging Face	https://www.youtube.com/watch?v=GizpIES_8O8	https://huggingface.co/spaces/k2-fsa/automatic-speech-recognition
September 7, 2022	K2: train and decode.		https://installati.one/ubuntu/22.04/nvidia-dkms-510/
September 18,2022	Install sherpa server	watch on youtube
Dec 15,2022	BART: Abstractive Summarization		https://arxiv.org/pdf/1910.13461.pdf

Nadira Kaldi

Date	Topics	video	PowerPoint	Readings
Jan 10, 2022	Install Kaldi: Ubuntu	video		Downloading Kaldi: http://kaldi-asr.org/doc/install.html
Jan 10, 2022	Install Kaldi: RedHat	video		Downloading Kaldi: http://kaldi-asr.org/doc/install.html
Jan 10, 2022	LibriSpeech training			LibriSpeech training script: https://github.com/kaldi-asr/kaldi/blob/master/egs/librispeech/s5/run.sh Excellent folder structure visualization: https://eleanorchodroff.com/tutorial/kaldi/training-acoustic-models.html#create-files-for-conf
Sep 19, 2022	Submit PR

Other projects

Data	Topic	Links
July 1, 2022	Learning wav2vec2.0	https://npovey1-speech-to-text-streamlit-app-streamlit-app-9fsg9s.streamlitapp.com/ https://www.charlywargnier.com/post/how-to-make-a-create-a-speech-to-text-app-in-streamlit, https://github.com/CharlyWargnier/speech-to-text-streamlit-app
July 29	DeEsser For Free In Audacity!	https://www.youtube.com/watch?v=rXNns6FKBOU&ab_channel=JoeEssay https://forum.audacityteam.org/viewtopic.php?p=245549#p245549

Dan Kaldi

Date	Topics	video	Readings
June 5, 2022	Which model to start with? Aspire, WSJ, LibriSpeech or Mini LibriSpeech?	video Dan#1	Examples included with Kaldi: https://kaldi-asr.org/doc/examples.html, Mini LibriSpeech: https://github.com/kaldi-asr/kaldi/tree/master/egs/mini_librispeech
June 7, 2022	X-Vectors vs I-Vectors	video Dan#3	SRE16 xvectors: https://github.com/kaldi-asr/kaldi/tree/master/egs/sre16/v2 SRE16 Xvector Model: https://kaldi-asr.org/models/m3 X-Vectors: Robust DNN Embeddings for Speaker Recognition: https://ieeexplore.ieee.org/abstract/document/8461375 OnlineIvectorFeature Class Reference: https://kaldi-asr.org/doc/classkaldi_1_1OnlineIvectorFeature.html#af7c4234c6b1d5d807dbb4292cf36b98c GBO notes: i-vectors and x-vectors: https://desh2608.github.io/2022-04-07-gbo-ivectors/
June 8, 2022	Which dataset to use to benchmark the performance?	video Dan#4	LibriSpeech: https://www.openslr.org/12 Switchboard-1 https://catalog.ldc.upenn.edu/LDC97S62 https://paperswithcode.com/sota/speech-recognition-on-switchboard-300hr
June 9, 2022	Can we fine-tune ASR models in Kaldi by training it on more audio files?	video Dan#5	fine-tuning script: https://github.com/kaldi-asr/kaldi/blob/master/egs/opensat20/s5/local/chain/run_finetune_tl.sh
June 12, 2022	Biased Language Models	video Dan#8
June 13, 2022	How to improve its WER in LibriSpech model?	video Dan#9	RNNLM Kaldi https://github.com/kaldi-asr/kaldi/tree/master/scripts/rnnlm
June 14, 2022	LibriSpeech run.sh explained	video Dan#10	Training Script: https://github.com/kaldi-asr/kaldi/blob/master/egs/librispeech/s5/run.sh
June 15, 2022	Recommended Books & Learning Material	video Dan#11	Speech and Language Processing (3rd ed. draft) https://web.stanford.edu/~jurafsky/slp3/ Automatic Speech Recognition: A Deep Learning Approach. Amazon link https://tinyurl.com/dcvkncpw Google Scholar: https://scholar.google.com/scholar?hl=en&as_sdt=0%2C48&as_vis=1&q=automatic+speech+recognition+asr&btnG=
June 16,2022	LibriSpeech run.sh explained Part2	video Dan#12	https://github.com/kaldi-asr/kaldi/blob/master/egs/librispeech/s5/local/chain/tuning/run_tdnn_1d.shhttps://github.com/kaldi-asr/kaldi/blob/master/egs/librispeech/s5/run.sh

Dan Next-gen Kaldi

Date	Topics	video	Readings
June 6, 2022	Next-Gen Kaldi for Beginners?	video Dan#2	SRE16 xvectors: https://github.com/kaldi-asr/kaldi/tree/master/egs/sre16/v2 SRE16 Xvector Model: https://kaldi-asr.org/models/m3 X-Vectors: Robust DNN Embeddings for Speaker Recognition: https://ieeexplore.ieee.org/abstract/document/8461375 OnlineIvectorFeature Class Reference: https://kaldi-asr.org/doc/classkaldi_1_1OnlineIvectorFeature.html#af7c4234c6b1d5d807dbb4292cf36b98c GBO notes: i-vectors and x-vectors: https://desh2608.github.io/2022-04-07-gbo-ivectors/
June 11, 2022	Which recipe from Icefall can I start with?	video Dan#6	LibriSpeech: https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR TIMIT: https://github.com/k2-fsa/icefall/tree/master/egs/timit/ASR Icefall: https://github.com/k2-fsa/icefall/tree/master/egs TIMIT dataset https://lhotse.readthedocs.io/en/latest/cli.html?highlight=TIMIT#lhotse-download-timit
June 10, 2022	Can we now prepare `text`, `segments`, `wav.scp`, `utt2spk`, and `spk2utt` files. using Lhotse scripts from Next-gen Kaldi?	video Dan#7	Lhotse: https://lhotse.readthedocs.io/en/latest/getting-started.html lhotse.kaldi.export_to_kaldi: https://lhotse.readthedocs.io/en/latest/kaldi.html
July 23, 2022	Next-gen Kaldi Intro.	video Dan #21	powerpoint slides: https://shorturl.at/KMVY4
July 23, 2022	RNNT BAAI Conference	video Dan#22	powerpoint slides: https://shorturl.at/KMVY4
July 23, 2022	Reworked Conformer Model	video Dan#23	powerpoint slides: https://shorturl.at/KMVY4
July 23, 2022	Next-gen Kaldi for Smart Phone Devices	video Dan#24	powerpoint slides: https://shorturl.at/KMVY4
July 23, 2022	Next-gen Kaldi vs WeNet	video Dan#25	powerpoint slides: https://shorturl.at/KMVY4
July 23, 2022	WFST to Integrate a Language Model	video Dan#26	powerpoint slides: https://shorturl.at/KMVY4
July 23, 2022	Data Augmentation	video Dan#27	powerpoint slides: https://shorturl.at/KMVY4
July 23, 2022	RNNT and Conformer	video Dan#28	powerpoint slides: https://shorturl.at/KMVY4
July 23, 2022	Favorite Toolkit for Students	video Dan#29	powerpoint slides: https://shorturl.at/KMVY4
July 23,2022	BAAI 2022 Conference Full Version	video Dan#30	powerpoint slides: https://shorturl.at/KMVY4
September 2, 2022	What is BPE and lang_bpe_500?	What is BPE and lang_bpe_500?	Speech Recognition with weighted finite-state transducers: https://cs.nyu.edu/~mohri/pub/hbka.pdf What is HCLG.fst?: https://nadirapovey.blogspot.com/2021/12/what-is-hclgfst.html Icefall: https://github.com/k2-fsa/icefall

YouTube Videos I Liked

Automatic Speech Recognition - An Overview	https://www.youtube.com/watch?v=q67z7PTGRi8&ab_channel=MicrosoftResearch
Lecture 9 - Speech Recognition (ASR) [Andrew Senior]	https://www.youtube.com/watch?v=HyUtT_z-cms
MIT 6.S191: Automatic Speech Recognition	https://www.youtube.com/watch?v=sR6_bZ6VkAg
Lecture 12: End-to-End Models for Speech Processing [Stanford]	https://www.youtube.com/watch?v=3MjIkWxXigM
I Built a Personal Speech Recognition System for my AI Assistant	https://www.youtube.com/watch?v=YereI6Gn3bM&ab_channel=TheA.I.Hacker-MichaelPhi
you need to learn Kubernetes RIGHT NOW!!	https://www.youtube.com/watch?v=7bA0gTroJjw&ab_channel=NetworkChuck

Papers

Paper	link	Date
CTC Variations Through New WFST Topologies	https://arxiv.org/pdf/2110.03098.pdf	26 Jun 2022

Datasets Collected for my Research

date	name the dataset	data description	#item	link	fie_name	blog posts
August 18,2022	talksatgoogle	ids for the YouTube videos where manual and audio captions are located	3577	data	talksatgoogle_levenshtein_score.txt	link