June 9, 2022
*These blog was created from Dan Povey’s interview with us : https://youtu.be/ElDIaOswY18
Hello this is Daniel Povey and today we're asking him can we fine-tune ASR models in Kaldi by training it on more audio files.
So fine tuning is a little bit of a weak spot in Kaldi and i'll try to explain the reasons why. So firstly fine tuning means that we already have a trained model and we want to train it on some other data that maybe better matches our domain and the general idea is that you train using a lower learning rate because you don't want to destroy the already learned parameters. Now unfortunately this generally hasn't worked too well in Kaldi, we have a few recipes that are supposed to do this but the performance is just so so. Later in the video maybe we'll add some links to the recipes.
fine-tuning script: https://github.com/kaldi-asr/kaldi/blob/master/egs/opensat20/s5/local/chain/run_finetune_tl.sh I suspect that one of the reasons it doesn't work so well is because our normal models with tdnns they have batch norm layers in them and I believe what happens is that when you train on different type of data the batch norm stats they are different they don't match and this means that right from the start the model is performing worse because in effect it's you're normalizing it using a different data distribution than when you're originally trained.
Now in principle one fix for this would be to train the original model without batching on by replacing it with normalized layer in Kaldi which is something like layer norm. But that would require that you return the original model.
In general I recommend that people simply train from scratch using a mixture of the original training data and the in-domain data if that's possible.
Thank you. Bye