#7 Can we now prepare text, segments, wav.scp, utt2spk, and spk2utt files using Lhotse scripts from Next-gen Kaldi?

Answer: No

Currently to train with Kaldi we need to create text, segments, wav.scp, utt2spk, and spk2utt files.

Can we now prepare these files using Lhotse scripts from Next-gen Kaldi?

We have to know how to prepare data in Lhotse style, then we can convert from Lhotse style to Kaldi style using Lhotse function called lhotse.kaldi.export_to_kaldi(recordingssupervisionsoutput_dirmap_underscores_to=Noneprefix_spk_id=False)

We can prepare files in Lhotse format, then we can export Lhotse format to current Kaldi format.

We can’t just get those 5 needed files from Lhotse scripts.

image

#7 data prep youtube auto transcript