#8 Biased LMs youtube auto transcript

0:01
hello this is daniel povey and today
0:02
we're going to ask him what are biased
0:04
language models
0:07
okay a bias language model is a language
0:09
model that's mostly estimated from
0:12
the specific utterance or recording that
0:15
you're trying to recognize
0:17
so
0:19
it's something that you can estimate
0:20
when you have the transcript available
0:23
and you normally do it for data cleanup
0:25
or alignment purposes
0:28
so the idea is if someone gives you a
0:30
transcript and you're not sure if it's
0:32
correct or you're not sure if it's the
0:34
transcript for that utterance
0:36
then you build a bias language model on
0:38
that transcript it mostly
0:41
has probability mass just for that
0:43
sequence
0:44
and you uh
0:46
you do data alignment with with that
0:48
graph
0:50
from that language model
0:51
and you see whether it recognizes the
0:53
same utterance you know
0:56
you look to see if that same sequence is
0:58
there or maybe you cut out parts where
1:00
it didn't align because those are
1:02
probably wrong
1:03
follow-up question do you build biased
1:06
language models per sentence
1:09
uh
1:10
i mean often you would you normally you
1:12
would build them at the level of
1:15
uh however you got the transcript so if
1:18
you got the transcript in let's say one
1:20
file that covers the whole recording
1:22
then you'd normally build a biased
1:24
language model at that level or if you
1:26
got them for individual segments of the
1:28
recording then you'd get them per
1:30
segment i often these things don't
1:32
necessarily correspond
1:34
to what we would think of as a sentence
1:39
okay thank you
1:41
goodbye
1:43
[Music]