#5 Finetuning youtube auto transcript

0:02
hello this is daniel povey and today
0:04
we're asking him can we fine-tune asr
0:07
models in cauldi by training it on more
0:09
audio files
0:13
okay
0:14
so fine tuning is a little bit of a weak
0:18
spot in calde and i'll try to explain
0:20
the reasons why so
0:23
firstly fine tuning means that we
0:25
already have a train model
0:27
and we want to train it on some other
0:28
data that maybe better matches our like
0:31
domain
0:32
uh and the general idea
0:35
with
0:36
is that you train using a lower learning
0:38
rate because you don't want to destroy
0:40
the already learned parameters
0:43
now
0:45
unfortunately this generally hasn't
0:47
worked too well in calde
0:51
we have a few recipes that are supposed
0:54
to do this but the
0:57
the performance is just so so
0:59
uh
1:00
later in the video maybe we'll add some
1:02
links to the recipes uh
1:04
i i suspect that one of the reasons it
1:06
doesn't work so well is because our
1:08
normal models with
1:10
tdnns they have
1:12
batch norm layers in them
1:14
and
1:15
i believe what happens
1:18
is that when you train on different type
1:20
of data the batch norm stats like are
1:22
different they don't match
1:25
and this means that right from the start
1:27
the model is performing worse because in
1:28
effect it's uh
1:31
you're normalizing use it using a
1:33
different data distribution than when
1:34
you're originally trained
1:36
now in principle one fix for this would
1:38
be to train the original model with
1:41
without batching on by replacing it with
1:44
normalized layer in keldi which is
1:46
something like layer norm
1:49
but that would require that you return
1:50
the original model in general
1:52
i recommend that people simply train
1:55
from scratch
1:56
using a mixture of the original training
1:58
data and the in-domain data if that's
2:01
possible
2:04
thank you
2:05
bye
2:06
bye