To train the model check the RESULTS.md
in icefall/egs/librispeech/ASR
folder, I picked the medium sized model to train.
- I created the script medium_librispeech.sh.
- Comment out export statement if you are using all GPUs. As I want to use all my GPUs I don’t need to export statement,
- Change
world-size
from 8 to 2. I need to change the world-size as I only have 2GPUs
np@np-INTEL:/mnt/speech1/nadira/stt/icefall/egs/librispeech/ASR$ pico medium_librispeech.sh
# export CUDA_VISIBLE_DEVICES="0,1"
# using all, don't need to export
./pruned_transducer_stateless5/train.py \
--world-size 2 \
--num-epochs 40 \
--start-epoch 1 \
--full-libri 1 \
--exp-dir pruned_transducer_stateless5/exp-M \
--max-duration 300 \
--use-fp16 0 \
--num-encoder-layers 18 \
--dim-feedforward 1024 \
--nhead 4 \
--encoder-dim 256 \
--decoder-dim 512 \
--joiner-dim 512
np@np-INTEL:/mnt/speech1/nadira/stt/icefall/egs/librispeech/ASR$ pico RESULTS.md
At the end we should get the results shown below.
#### Medium
Number of model parameters 30896748 (i.e, 30.9 M).
| | test-clean | test-other | comment |
|-------------------------------------|------------|------------|-----------------------------------------|
| greedy search (max sym per frame 1) | 2.88 | 6.69 | --epoch 39 --avg 17 --max-duration 600 |
| modified beam search | 2.83 | 6.59 | --epoch 39 --avg 17 --max-duration 600 |
| fast beam search | 2.83 | 6.61 | --epoch 39 --avg 17 --max-duration 600 |
The training commands are:
```bash
export CUDA_VISIBLE_DEVICES="0,1,2,3,4,5,6,7"
./pruned_transducer_stateless5/train.py \
--world-size 8 \
--num-epochs 40 \
--start-epoch 0 \
--full-libri 1 \
--exp-dir pruned_transducer_stateless5/exp-M \
--max-duration 300 \
--use-fp16 0 \
--num-encoder-layers 18 \
--dim-feedforward 1024 \
--nhead 4 \
--encoder-dim 256 \
--decoder-dim 512 \
--joiner-dim 512
```