kaldi lattice

概况

两种lattice结构

Lattice结构

FST的形式，weight包括两部分（graph cost和acoustic cost），输入是transition-ids，输出是words。
其中weight的graph cost包含LM+transition+pronunciation三部分。

CompactLattice结构

和lattice相似，区别在于它是接收机，输入和输出一样（都是words），weight包含两部分（权重和transition-ids），相比于Lattice，CompactLattice把输入的transition-ids转移到weight上面。

lattice保证每一个word-sequence只对应lattice中的一条路径。

Lattice实现

假设a表示graph cost，b表示acoustic cost，那么
LatticeWeight表示为(a,b)
LexicographicWeight表示为(a+b,a-b)

lattice相关的算法在Lattice上面更为高效，因为CompactLattice的weight包含有transition-ids，当take best path的时候就涉及到transition-ids的拼接操作。

lattice写入文件时保持acoustic cost是unscaled。可以从archive中读取Lattice，即使里面包含有CompactLattice。

可以使用ConvertLattice()函数将Lattice转化为CompactLattice。lattice转化：

  Lattice lat;// initialize lat.CompactLattice compact_lat;ConvertLattice(lat, &compact_lat);

使用OpenFst算法

  Lattice lat;// initialize lat.Lattice best_path;fst::ShortestPath(lat, &best_path);

CompactLattice实现

weight不仅包含cost，还有transition-id，对于multiplication操作，transition-id进行append，对于adding操作，cost占优的transition-id作为结果。

lattice generation

对应的类 LatticeSimpleDecoder，流程包括：

产生state级别的lattice
使用lattice-delta对lattice剪枝
使用特殊的确定化算法对每一个word sequence只保留最优路径

lattice operation

lattice-prune --acoustic-scale=0.1 --beam=5 ark:in.lats ark:out.lats
lattice-best-path --acoustic-scale=0.1 ark:in.lats ark:out.tra ark:out.ali
lattice-nbest --n=10 --acoustic-scale=0.1 ark:in.lats ark:out.nbest
#LM rescore
lattice-lmrescore --lm-scale=-1.0 ark:in.lats G_old.fst ark:nolm.lats
lattice-lmrescore --lm-scale=1.0 ark:nolm.lats G_new.fst ark:out.lats
#probability scaling
lattice-scale --acoustic-scale=0.1 --lm-scale=0.8 ark:in.lats ark:out.lats
#用在MMI的区分度训练MMI，保证正确的transcription出现在分母
lattice-union ark:num_lats.ark ark:den_lats.ark ark:augmented_den_lats.ark
lattice-interp --alpha=0.4 ark:1.lats ark:2.lats ark:3.lats
lattice-to-phones final.mdl ark:1.lats ark:phones.lats
lattice-project ark:1.lats ark:- | lattice-compose ark:- ark:2.lats ark:3.lats
lattice-equivalent ark:1.lats ark:2.lats || echo "Not equivalent!"
lattice-rmali ark:in.lats ark:word.lats
#Boosted MMI training
lattice-boost-ali --silence-phones=1:2:3 --b=0.1 final.mdl ark:1.lats \ark:1.ali ark:boosted.lats
#前后向计算后验概率
lattice-to-post --acoustic-scale=0.1 ark:1.lats ark:- | \gmm-acc-stats 10.mdl "$feats" ark:- 1.acc
lattice-determinize ark:1.lats ark:det.lats
#计算WERcat $data/text | \sed 's:<NOISE>::g' |  sed 's:<SPOKEN_NOISE>::g'  | \scripts/sym2int.pl --ignore-first-field $lang/words.txt | \lattice-oracle --word-symbol-table=$lang/words.txt  \"ark:gunzip -c $dir/lats.pruned.gz|" ark:- ark,t:$dir/oracle.tra \2>$dir/oracle.log
#增加转移概率
lattice-add-trans-probs --transition-scale=1.0 --self-loop-scale=0.1 \final.mdl ark:1.lats ark:2.lats
lattice-to-fst --lm-scale=0.0 --acoustic-scale=0.0 ark:1.lats ark:1.words
lattice-copy ark:1.lats ark,t:- | head -50
lattice-to-nbest --acoustic-scale=0.1 --n=10 ark:1.lats ark:1.nbest
nbest-to-linear ark:1.nbest ark:1.ali ark:1.words ark:1.lmscore ark:1.acscore

对于lattice来讲，边上的word/transition-id/weight并不是完全的对应关系，同时得到的时间信息也是不准确的.
对于CompactLattice来说，单个边上的信息意义不明确，只有组合成一条完整的path才有意义。

后面的技术分享转移到微信公众号上面更新了，【欢迎扫码关注交流】

kaldi lattice相关推荐

kaldi理解WFST，HCLG，lattice
文章目录 WFST,HCLG lattice 两种lattice结构The Lattice type 和Compact lattices Lattice的产生获取raw lattice,并将其转换为 ...
kaldi 源码分析(七) - HCLG 分析
Kaldi 语音识别主流程: 语音识别过程解码网络使用 HCLG.fst 的方式, 它由 4 个 fst 经过一系列算法组合而成.分别是 H.fst.C.fst.L.fst 和 G.fst 4 个 ...
Kaldi(A1)语音识别原理
Ref 强烈建议先看完以上资料语音识别的原理语音识别的过程可以理解为找路: 一个朋友告诉你他从杭州站走到杭州东经过了水.桥.广场,想让你猜猜他走的是哪条路,你该怎么办呢? 那我们就找这些特征所对应 ...
[转]Kaldi语音识别
Kaldi语音识别1.声学建模单元的选择1.1对声学建模单元加入位置信息2.输入特征3.区分性技术4.多音字如何处理?5.Noise Robust ASR6.Deep Learning[DNN/CNN ...
语音识别kaldi该如何学习？
我目前使用kaldi分成两块: hmm-gmm和神经网络. 学习kaldi的话,先从hmm-gmm入手比较好,像steps/train_delta.sh, steps/train_fmllr.sh, ...
基于Kaldi下babel项目的语音关键词检索（KWS）
前言一般来说,一个kws系统包括两个部分:lvcsr 模块解码检索集合并且产生相应的网格,一个kws模块生成网格索引并从索引中查找关键词. 在Kaldi中,关键词识别(KWS)和大词汇量连续识别(L ...
Kaldi的关键词搜索（Keyword Search，KWS）
本文简单地介绍了KWS的原理--为Lattice中每个词生成索引并进行搜索:介绍了如何处理OOV--替补(Proxy,词典内对OOV的替补)关键词技术:介绍了KWS的语料库格式:介绍了KWS在Kald ...
Kaldi单步完美运行AIShell v1 S5之三：三音tri1,tri2,tri3,tri4,tri5
Kaldi单步完美运行AIShell v1 S5之三:三音tri1 2 3 4 5 致谢机器配置 Kaldi下AIShell v1详细输出之三:三音triphone 第五部分:三音结果更新第六部分 ...
语音识别——解码器（WFST、Lattice）
解码为给定声学观测序列的前提下,找到最有可能出现的词序列,由贝叶斯得: 解码的目的:从解码空间中找到一条或多条从初始状态到终止状态的最优路径. 解码器是语音识别系统中的重要一环,主要解码方式有以下几种 ...