LibriSpeech training

Step 0: Make a copy of run.sh
Step 1: Install flac
Step 2: Change cmd.sh
Step 3: Needed if only one GPU is used
Step 4: Create a screen to run the script.
Step 5: Run run_edited.sh

LibriSpeech training will need at least one GPU and 400GB storage space. The tutorial below was created using AWS instance.

Step 0: Make a copy of run.sh

Normally I copy the run.sh to new file that I can edit. But we can edit original run.sh if wanted.

cp run.sh run_edited.sh

Need to edit path to where we want to put all downloaded folders

nano run_edited.sh

Edit the new version. Add few lines after line 7. Pick a data folder location that you want. I chose the one below.

mkdir -p nadira/data
data=nadira/data

Line 17 was data=/export/a15/vpanayotov/data , If you don’t have that folder you can create it, or just make a data folder.

run_edited.sh top few lines:

#!/usr/bin/env bash


# Set this to somewhere where you want to put your data, or where
# someone else has already put it.  You'll want to change this
# if you're not on the CLSP grid.
# data=/export/a15/vpanayotov/data
mkdir -p nadira/data # <--- added
data=nadira/data # <--- added
# base url for downloads.
data_url=www.openslr.org/resources/12
lm_url=www.openslr.org/resources/11

Step 1: Install flac

ubuntu@ip-172-31-6-144:~/kaldi/egs/librispeech/s5$ sudo apt install flac

Step 2: Change cmd.sh

Comment out the last 3 lines and add three lines that you see below. As AWS doesn’t have qsub package we can’t use the last 3 lines.

# you can change cmd.sh depending on what type of queue you are using.
# If you have no queueing system and want to run on a local machine, you
# can change all instances 'queue.pl' to run.pl (but be careful and run
# commands one by one: most recipes will exhaust the memory on your
# machine).  queue.pl works with GridEngine (qsub).  slurm.pl works
# with slurm.  Different queues are configured differently, with different
# queue names and different ways of specifying things like memory;
# to account for these differences you can create and edit the file
# conf/queue.conf to match your queue's configuration.  Search for
# conf/queue.conf in http://kaldi-asr.org/doc/queue.html for more information,
# or search for the string 'default_config' in utils/queue.pl or utils/slurm.pl.

# comment out line below as we don't have qsub on AWS
# export train_cmd="queue.pl --mem 2G"
# export decode_cmd="queue.pl --mem 4G"
# export mkgraph_cmd="queue.pl --mem 8G"

# my instance has 8 vCPUs
export train_cmd="run.pl --max-jobs-run 8"
export decode_cmd="run.pl --max-jobs-run 8"
export mkgraph_cmd="run.pl --max-jobs-run 8"

Step 3: Needed if only one GPU is used

Need to change the option for nvidia-smi otherwise it will assume that we have multiple GPUs. My AWS instance only has one GPU.

 sudo nvidia-smi -c 3

ubuntu@ip-172-31-6-144:~/kaldi/egs/librispeech/s5$ pico local/chain/run_tdnn.sh

add --use-gpu=wait \

steps/nnet3/chain/train.py --stage $train_stage \
    --cmd "$decode_cmd" \
    --use-gpu=wait \
    --feat.online-ivector-dir $train_ivector_dir \
    --feat.cmvn-opts "--norm-means=false --norm-vars=false" \
    --chain.xent-regularize $xent_regularize \
    --chain.leaky-hmm-coefficient 0.1 \
    --chain.l2-regularize 0.0 \
    --chain.apply-deriv-weights false \
    --chain.lm-opts="--num-extra-lm-states=2000" \
    --egs.dir "$common_egs_dir" \
    --egs.stage $get_egs_stage \
    --egs.opts "--frames-overlap-per-eg 0 --constrained false" \
    --egs.chunk-width $frames_per_eg \
    --trainer.dropout-schedule $dropout_schedule \
    --trainer.add-option="--optimization.memory-compression-level=2" \
    --trainer.num-chunk-per-minibatch 64 \
    --trainer.frames-per-iter 2500000 \
    --trainer.num-epochs 4 \
    --trainer.optimization.num-jobs-initial 3 \
    --trainer.optimization.num-jobs-final 16 \
    --trainer.optimization.initial-effective-lrate 0.00015 \
    --trainer.optimization.final-effective-lrate 0.000015 \
    --trainer.max-param-change 2.0 \
    --cleanup.remove-egs $remove_egs \
    --feat-dir $train_data_dir \
    --tree-dir $tree_dir \
    --lat-dir $lat_dir \
    --dir $dir  || exit 1;

If step 3 is not done when using only one GPU we will get error below:

ERROR (nnet3-chain-train[5.5.997~1-054af]:AllocateNewRegion():cu-allocator.cc:491) Failed to allocate a memory region of 12582912 bytes.  Possibly this is due to sharing the GPU.  Try switching the GPUs to exclusive mode (nvidia-smi -c 3) and using the option --use-gpu=wait to scripts like steps/nnet3/chain/train.py.  Memory info: free:10M, used:15099M, total:15109M, free/total:0.000711461 CUDA error: 'out of memory'

Step 4: Create a screen to run the script.

screen -S libri

Step 5: Run run_edited.sh

./run_edited.sh > output.txt

The following results can be observed after 10 days of training. My instance has 8 vCPUs and 1GPU.

Training is completed. We can find final Neural Net system in exp/chain_cleaned/tdnn_1d_sp/final.mdl

and graph in exp/chain_cleaned/tdnn_1d_sp/graph_tgsmall/HCLG.fst

2022-01-09 02:15:39,486 [steps/nnet3/chain/train.py:529 - train - INFO ] Iter: 496/520   Jobs: 15   Epoch: 3.67/4.0 (91.8% complete)   lr: 0.000272
2022-01-09 02:50:23,893 [steps/nnet3/chain/train.py:529 - train - INFO ] Iter: 497/520   Jobs: 15   Epoch: 3.68/4.0 (92.1% complete)   lr: 0.000270
2022-01-09 03:25:41,148 [steps/nnet3/chain/train.py:529 - train - INFO ] Iter: 498/520   Jobs: 15   Epoch: 3.69/4.0 (92.4% complete)   lr: 0.000268
2022-01-09 04:01:32,203 [steps/nnet3/chain/train.py:529 - train - INFO ] Iter: 499/520   Jobs: 15   Epoch: 3.71/4.0 (92.7% complete)   lr: 0.000266
2022-01-09 04:37:38,180 [steps/nnet3/chain/train.py:529 - train - INFO ] Iter: 500/520   Jobs: 15   Epoch: 3.72/4.0 (93.0% complete)   lr: 0.000264
2022-01-09 05:14:40,809 [steps/nnet3/chain/train.py:529 - train - INFO ] Iter: 501/520   Jobs: 16   Epoch: 3.73/4.0 (93.3% complete)   lr: 0.000280
2022-01-09 05:53:18,670 [steps/nnet3/chain/train.py:529 - train - INFO ] Iter: 502/520   Jobs: 16   Epoch: 3.74/4.0 (93.6% complete)   lr: 0.000278
2022-01-09 06:31:05,279 [steps/nnet3/chain/train.py:529 - train - INFO ] Iter: 503/520   Jobs: 16   Epoch: 3.76/4.0 (93.9% complete)   lr: 0.000276
2022-01-09 07:08:46,082 [steps/nnet3/chain/train.py:529 - train - INFO ] Iter: 504/520   Jobs: 16   Epoch: 3.77/4.0 (94.2% complete)   lr: 0.000274
2022-01-09 07:47:07,220 [steps/nnet3/chain/train.py:529 - train - INFO ] Iter: 505/520   Jobs: 16   Epoch: 3.78/4.0 (94.6% complete)   lr: 0.000272
2022-01-09 08:23:57,390 [steps/nnet3/chain/train.py:529 - train - INFO ] Iter: 506/520   Jobs: 16   Epoch: 3.80/4.0 (94.9% complete)   lr: 0.000270
2022-01-09 09:02:17,142 [steps/nnet3/chain/train.py:529 - train - INFO ] Iter: 507/520   Jobs: 16   Epoch: 3.81/4.0 (95.2% complete)   lr: 0.000268
2022-01-09 09:40:52,534 [steps/nnet3/chain/train.py:529 - train - INFO ] Iter: 508/520   Jobs: 16   Epoch: 3.82/4.0 (95.5% complete)   lr: 0.000266
2022-01-09 10:18:47,231 [steps/nnet3/chain/train.py:529 - train - INFO ] Iter: 509/520   Jobs: 16   Epoch: 3.83/4.0 (95.9% complete)   lr: 0.000264
2022-01-09 10:57:05,727 [steps/nnet3/chain/train.py:529 - train - INFO ] Iter: 510/520   Jobs: 16   Epoch: 3.85/4.0 (96.2% complete)   lr: 0.000262
2022-01-09 11:35:18,583 [steps/nnet3/chain/train.py:529 - train - INFO ] Iter: 511/520   Jobs: 16   Epoch: 3.86/4.0 (96.5% complete)   lr: 0.000260
2022-01-09 12:12:56,573 [steps/nnet3/chain/train.py:529 - train - INFO ] Iter: 512/520   Jobs: 16   Epoch: 3.87/4.0 (96.8% complete)   lr: 0.000258
2022-01-09 12:51:06,920 [steps/nnet3/chain/train.py:529 - train - INFO ] Iter: 513/520   Jobs: 16   Epoch: 3.89/4.0 (97.2% complete)   lr: 0.000256
2022-01-09 13:29:19,736 [steps/nnet3/chain/train.py:529 - train - INFO ] Iter: 514/520   Jobs: 16   Epoch: 3.90/4.0 (97.5% complete)   lr: 0.000254
2022-01-09 14:07:52,940 [steps/nnet3/chain/train.py:529 - train - INFO ] Iter: 515/520   Jobs: 16   Epoch: 3.91/4.0 (97.8% complete)   lr: 0.000252
2022-01-09 14:46:16,641 [steps/nnet3/chain/train.py:529 - train - INFO ] Iter: 516/520   Jobs: 16   Epoch: 3.92/4.0 (98.1% complete)   lr: 0.000251
2022-01-09 15:24:13,838 [steps/nnet3/chain/train.py:529 - train - INFO ] Iter: 517/520   Jobs: 16   Epoch: 3.94/4.0 (98.4% complete)   lr: 0.000249
2022-01-09 16:01:52,900 [steps/nnet3/chain/train.py:529 - train - INFO ] Iter: 518/520   Jobs: 16   Epoch: 3.95/4.0 (98.8% complete)   lr: 0.000247
2022-01-09 16:40:28,511 [steps/nnet3/chain/train.py:529 - train - INFO ] Iter: 519/520   Jobs: 16   Epoch: 3.96/4.0 (99.1% complete)   lr: 0.000245
2022-01-09 17:18:47,249 [steps/nnet3/chain/train.py:529 - train - INFO ] Iter: 520/520   Jobs: 16   Epoch: 3.98/4.0 (99.4% complete)   lr: 0.000240
2022-01-09 17:58:31,315 [steps/nnet3/chain/train.py:585 - train - INFO ] Doing final combination to produce final.mdl
2022-01-09 17:58:31,315 [steps/libs/nnet3/train/chain_objf/acoustic_model.py:571 - combine_models - INFO ] Combining set([512, 513, 514, 515, 516, 517, 518, 519, 520, 521, 483, 484, 485, 486, 487, 488, 489, 490, 491, 492, 493, 494, 495, 496, 497, 498, 499, 500, 501, 502, 503, 504, 505, 506, 507, 508, 509, 510, 511]) models.
2022-01-09 17:59:36,259 [steps/nnet3/chain/train.py:614 - train - INFO ] Cleaning up the experiment directory exp/chain_cleaned/tdnn_1d_sp
tree-info exp/chain_cleaned/tdnn_1d_sp/tree
tree-info exp/chain_cleaned/tdnn_1d_sp/tree
fstcomposecontext --context-size=2 --central-position=1 --read-disambig-syms=data/lang_test_tgsmall/phones/disambig.int --write-disambig-syms=data/lang_test_tgsmall/tmp/disambig_ilabels_2_1.int data/lang_test_tgsmall/tmp/ilabels_2_1.2242 data/lang_test_tgsmall/tmp/LG.fst
fstisstochastic data/lang_test_tgsmall/tmp/CLG_2_1.fst
make-h-transducer --disambig-syms-out=exp/chain_cleaned/tdnn_1d_sp/graph_tgsmall/disambig_tid.int --transition-scale=1.0 data/lang_test_tgsmall/tmp/ilabels_2_1 exp/chain_cleaned/tdnn_1d_sp/tree exp/chain_cleaned/tdnn_1d_sp/final.mdl
fstrmsymbols exp/chain_cleaned/tdnn_1d_sp/graph_tgsmall/disambig_tid.int
fstminimizeencoded
fstrmepslocal
fsttablecompose exp/chain_cleaned/tdnn_1d_sp/graph_tgsmall/Ha.fst 'fstrmsymbols --remove-arcs=true --apply-to-output=true data/lang_test_tgsmall/oov.int data/lang_test_tgsmall/tmp/CLG_2_1.fst|'
fstdeterminizestar --use-log=true
fstrmsymbols --remove-arcs=true --apply-to-output=true data/lang_test_tgsmall/oov.int data/lang_test_tgsmall/tmp/CLG_2_1.fst
fstisstochastic exp/chain_cleaned/tdnn_1d_sp/graph_tgsmall/HCLGa.fst
add-self-loops --self-loop-scale=1.0 --reorder=true exp/chain_cleaned/tdnn_1d_sp/final.mdl exp/chain_cleaned/tdnn_1d_sp/graph_tgsmall/HCLGa.fst
fstisstochastic exp/chain_cleaned/tdnn_1d_sp/graph_tgsmall/HCLG.fst
fstrmsymbols data/lang_test_tgmed/phones/disambig.int
fstdeterminizestar
fstrmsymbols data/lang_test_tgmed/phones/disambig.int
fstdeterminizestar
fstdeterminizestar
fstrmsymbols data/lang_test_tgmed/phones/disambig.int
fstrmsymbols data/lang_test_tgmed/phones/disambig.int
fstdeterminizestar
ubuntu@ip-172-31-6-144:~/kaldi/egs/librispeech/s5$

graph folder:

ubuntu@ip-172-31-6-144:~/kaldi/egs/librispeech/s5/exp/chain_cleaned/tdnn_1d_sp/graph_tgsmall$ ls
HCLG.fst  disambig_tid.int  num_pdfs  phones  phones.txt  words.txt
ubuntu@ip-172-31-6-144:~/kaldi/egs/librispeech/s5/exp/chain_cleaned/tdnn_1d_sp/graph_tgsmall$Fo

Comments:

https://github.com/npovey/speech/discussions/categories/announcements