Nadira Povey
🔉

Nadira Povey

If anyone has experience with Next-Gen Kaldi or backend engineering and wants to work part time on a project please a contact me at my gmail address at nadirapovey. I was thinking the job can be best for Master students.

My interests are Speech Processing, Text to Speech, Speech to Text, ML and AI.

Nadira Next-gen Kaldi

Date
Topics
video
Readings
July 2, 2022
Install k2-fsa
https://youtu.be/HerxbUHs-V4
https://icefall.readthedocs.io/en/latest/installation/index.html https://k2-fsa.github.io/k2/installation/conda.html
July 3, 2022
Install graphviz
https://youtu.be/Oe6Ak9XnwOg
https://icefall.readthedocs.io/en/latest/installation/index.html
July 4, 2022
Install lhotse
https://youtu.be/TOJlvsw_LB0
https://icefall.readthedocs.io/en/latest/installation/index.html
July 6, 2022
Install Icefall
https://youtu.be/LVmrBD0tLfE
https://icefall.readthedocs.io/en/latest/installation/index.html
July 7, 2022
Icefall prepare.sh
https://youtu.be/ofEIoJL-mGM
https://github.com/k2-fsa/icefall/blob/master/egs/librispeech/ASR/prepare.sh
July 8, 2022
Train Icefall model ASR folder tree view
https://youtu.be/-T6H8MKXAKc
https://github.com/k2-fsa/icefall/blob/master/egs/librispeech/ASR/RESULTS.md
July 20, 2022
Hugging Face
https://www.youtube.com/watch?v=ElN3r9dkKE4
https://huggingface.co/spaces/k2-fsa/automatic-speech-recognition
August 6, 2022
Hugging Face
https://www.youtube.com/watch?v=GizpIES_8O8
https://huggingface.co/spaces/k2-fsa/automatic-speech-recognition
September 7, 2022
K2: train and decode.
https://installati.one/ubuntu/22.04/nvidia-dkms-510/
September 18,2022
Install sherpa server
watch on youtube
Dec 15,2022
BART: Abstractive Summarization
https://arxiv.org/pdf/1910.13461.pdf

Nadira Kaldi

Date
Topics
video
PowerPoint
Readings
Jan 10, 2022
Install Kaldi: Ubuntu
video
Downloading Kaldi: http://kaldi-asr.org/doc/install.html
Jan 10, 2022
Install Kaldi: RedHat
video
Downloading Kaldi: http://kaldi-asr.org/doc/install.html
Jan 10, 2022
LibriSpeech training
LibriSpeech training script: https://github.com/kaldi-asr/kaldi/blob/master/egs/librispeech/s5/run.sh Excellent folder structure visualization: https://eleanorchodroff.com/tutorial/kaldi/training-acoustic-models.html#create-files-for-conf
Sep 19, 2022
Submit PR

Other projects

Data
Topic
Links
July 1, 2022
Learning wav2vec2.0
https://npovey1-speech-to-text-streamlit-app-streamlit-app-9fsg9s.streamlitapp.com/ https://www.charlywargnier.com/post/how-to-make-a-create-a-speech-to-text-app-in-streamlit, https://github.com/CharlyWargnier/speech-to-text-streamlit-app
July 29
DeEsser For Free In Audacity!
https://www.youtube.com/watch?v=rXNns6FKBOU&ab_channel=JoeEssay https://forum.audacityteam.org/viewtopic.php?p=245549#p245549

Dan Kaldi

Date
Topics
video
Readings
June 5, 2022
Which model to start with? Aspire, WSJ, LibriSpeech or Mini LibriSpeech?
video Dan#1
Examples included with Kaldi: https://kaldi-asr.org/doc/examples.html, Mini LibriSpeech: https://github.com/kaldi-asr/kaldi/tree/master/egs/mini_librispeech
June 7, 2022
X-Vectors vs I-Vectors
video Dan#3
SRE16 xvectors: https://github.com/kaldi-asr/kaldi/tree/master/egs/sre16/v2 SRE16 Xvector Model: https://kaldi-asr.org/models/m3 X-Vectors: Robust DNN Embeddings for Speaker Recognition: https://ieeexplore.ieee.org/abstract/document/8461375 OnlineIvectorFeature Class Reference: https://kaldi-asr.org/doc/classkaldi_1_1OnlineIvectorFeature.html#af7c4234c6b1d5d807dbb4292cf36b98c GBO notes: i-vectors and x-vectors: https://desh2608.github.io/2022-04-07-gbo-ivectors/
June 8, 2022
Which dataset to use to benchmark the performance?
video Dan#4
LibriSpeech: https://www.openslr.org/12 Switchboard-1 https://catalog.ldc.upenn.edu/LDC97S62 https://paperswithcode.com/sota/speech-recognition-on-switchboard-300hr
June 9, 2022
Can we fine-tune ASR models in Kaldi by training it on more audio files?
video Dan#5
fine-tuning script: https://github.com/kaldi-asr/kaldi/blob/master/egs/opensat20/s5/local/chain/run_finetune_tl.sh
June 12, 2022
Biased Language Models
video Dan#8
June 13, 2022
How to improve its WER in LibriSpech model?
video Dan#9
RNNLM Kaldi https://github.com/kaldi-asr/kaldi/tree/master/scripts/rnnlm
June 14, 2022
LibriSpeech run.sh explained
video Dan#10
Training Script: https://github.com/kaldi-asr/kaldi/blob/master/egs/librispeech/s5/run.sh
June 15, 2022
Recommended Books & Learning Material
video Dan#11
Speech and Language Processing (3rd ed. draft) https://web.stanford.edu/~jurafsky/slp3/ Automatic Speech Recognition: A Deep Learning Approach. Amazon link https://tinyurl.com/dcvkncpw Google Scholar: https://scholar.google.com/scholar?hl=en&as_sdt=0%2C48&as_vis=1&q=automatic+speech+recognition+asr&btnG=
June 16,2022
LibriSpeech run.sh explained Part2
video Dan#12
https://github.com/kaldi-asr/kaldi/blob/master/egs/librispeech/s5/local/chain/tuning/run_tdnn_1d.shhttps://github.com/kaldi-asr/kaldi/blob/master/egs/librispeech/s5/run.sh

Dan Next-gen Kaldi

Date
Topics
video
Readings
June 6, 2022
Next-Gen Kaldi for Beginners?
video Dan#2
SRE16 xvectors: https://github.com/kaldi-asr/kaldi/tree/master/egs/sre16/v2 SRE16 Xvector Model: https://kaldi-asr.org/models/m3 X-Vectors: Robust DNN Embeddings for Speaker Recognition: https://ieeexplore.ieee.org/abstract/document/8461375 OnlineIvectorFeature Class Reference: https://kaldi-asr.org/doc/classkaldi_1_1OnlineIvectorFeature.html#af7c4234c6b1d5d807dbb4292cf36b98c GBO notes: i-vectors and x-vectors: https://desh2608.github.io/2022-04-07-gbo-ivectors/
June 11, 2022
Which recipe from Icefall can I start with?
video Dan#6
LibriSpeech: https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR TIMIT: https://github.com/k2-fsa/icefall/tree/master/egs/timit/ASR Icefall: https://github.com/k2-fsa/icefall/tree/master/egs TIMIT dataset https://lhotse.readthedocs.io/en/latest/cli.html?highlight=TIMIT#lhotse-download-timit
June 10, 2022
Can we now prepare text, segments, wav.scp, utt2spk, and spk2utt files. using Lhotse scripts from Next-gen Kaldi?
video Dan#7
Lhotse: https://lhotse.readthedocs.io/en/latest/getting-started.html lhotse.kaldi.export_to_kaldi: https://lhotse.readthedocs.io/en/latest/kaldi.html
July 23, 2022
Next-gen Kaldi Intro.
video Dan #21
powerpoint slides: https://shorturl.at/KMVY4
July 23, 2022
RNNT BAAI Conference
video Dan#22
powerpoint slides: https://shorturl.at/KMVY4
July 23, 2022
Reworked Conformer Model
video Dan#23
powerpoint slides: https://shorturl.at/KMVY4
July 23, 2022
Next-gen Kaldi for Smart Phone Devices
video Dan#24
powerpoint slides: https://shorturl.at/KMVY4
July 23, 2022
Next-gen Kaldi vs WeNet
video Dan#25
powerpoint slides: https://shorturl.at/KMVY4
July 23, 2022
WFST to Integrate a Language Model
video Dan#26
powerpoint slides: https://shorturl.at/KMVY4
July 23, 2022
Data Augmentation
video Dan#27
powerpoint slides: https://shorturl.at/KMVY4
July 23, 2022
RNNT and Conformer
video Dan#28
powerpoint slides: https://shorturl.at/KMVY4
July 23, 2022
Favorite Toolkit for Students
video Dan#29
powerpoint slides: https://shorturl.at/KMVY4
July 23,2022
BAAI 2022 Conference Full Version
video Dan#30
powerpoint slides: https://shorturl.at/KMVY4
September 2, 2022
What is BPE and lang_bpe_500?
What is BPE and lang_bpe_500?
Speech Recognition with weighted finite-state transducers: https://cs.nyu.edu/~mohri/pub/hbka.pdf What is HCLG.fst?: https://nadirapovey.blogspot.com/2021/12/what-is-hclgfst.html Icefall: https://github.com/k2-fsa/icefall

YouTube Videos I Liked

Automatic Speech Recognition - An Overview
https://www.youtube.com/watch?v=q67z7PTGRi8&ab_channel=MicrosoftResearch
Lecture 9 - Speech Recognition (ASR) [Andrew Senior]
https://www.youtube.com/watch?v=HyUtT_z-cms
MIT 6.S191: Automatic Speech Recognition
https://www.youtube.com/watch?v=sR6_bZ6VkAg
Lecture 12: End-to-End Models for Speech Processing [Stanford]
https://www.youtube.com/watch?v=3MjIkWxXigM
I Built a Personal Speech Recognition System for my AI Assistant
https://www.youtube.com/watch?v=YereI6Gn3bM&ab_channel=TheA.I.Hacker-MichaelPhi
you need to learn Kubernetes RIGHT NOW!!
https://www.youtube.com/watch?v=7bA0gTroJjw&ab_channel=NetworkChuck

Papers

Paper
link
Date
CTC Variations Through New WFST Topologies
https://arxiv.org/pdf/2110.03098.pdf
26 Jun 2022

Datasets Collected for my Research

date
name the dataset
data description
#item
link
fie_name
blog posts
August 18,2022
talksatgoogle
ids for the YouTube videos where manual and audio captions are located
3577
data
talksatgoogle_levenshtein_score.txt
link

Ask questions at: https://github.com/npovey/speech/discussions

Install Kaldi: UbuntuInstall Kaldi: Red HatLibriSpeech training#1 Which model to start with? Aspire, WSJ, LibriSpeech or Mini LibriSpeech?#2 Next Gen Kaldi for Beginners?#3 X-Vectors vs I-Vectors#4 Which dataset to use to benchmark the performance?#5 Can we fine-tune ASR models in Kaldi by training it on more audio files?#6 Which recipe from Icefall can I start with?#7 Can we now prepare text, segments, wav.scp, utt2spk, and spk2uttfiles using Lhotse scripts from Next-gen Kaldi?#8 What are biased language models?#9 We trained a LibriSpech model using Kaldi scripts, what is the next step?What can we do now to improve its Word Error Rate?#11 Recommended Books & Learning Material#14 k2 installed but (ModuleNotFoundError: No module named 'k2')#15 ModuleNotFoundError: No module named 'graphviz'#16 Install lhotse#17 Install Icefall#18 prepare.sh#20_1 Train Icefall model#20_2 Files and Folders for icefall/egs/librispeech/ASR#21 DeEsser For Free In Audacity!What is BPE and lang_bpe_500?Next-gen Kaldi: training and decoding for LibriSpeech dataset. Next-gen Kaldi: Reworked Conformer ModelNext-gen Kaldi: what is it?Next-gen Kaldi: training and decoding for LibriSpeech dataset. Next-gen Kaldi: recent work with RNN-TNext-gen Kaldi: My Sherpa Server InstallationHow to submit PR on GitHubBART: Abstractive Summarization