Utterance-level Aggregation For Speaker Recognition In The Wild笔记

论文链接：https://arxiv.org/abs/1902.10107v1
开源代码：http://www.robots.ox.ac.uk/~vgg/research/speakerID/

网络结构

输入：每帧257维向量，256维的频率量+1维的DC量
主干网络：Thin-ResNet，提取frame-level特征
NetVLAD或GhostVLAD层：将frame-level的特征转换成utterance-level特征。大多数算法是采用Average pooling层直接对帧维度进行平均，这样做的缺点是每帧的weight是一样的，但是实际上每帧对结果的contribution肯定是不一样的，比如有说话的帧肯定比没说话帧的contribution高，本文采用的方法其实是自动学习给予每帧不同的权重。
trainning loss:标准的softmax loss和additive margin softmax(AM-Softmax)

Utterance-level Aggregation For Speaker Recognition In The Wild笔记相关推荐

Utterance-Level Aggregation For Speaker Recognition In The Wild
本文使用NetVLAD,将frame-level聚合为utterance-level. in the wild: 4s以上的语音实现流程将通过Thin ResNet的frame-level通过Ne ...
Within-sample variability-invariant loss for robust speaker recognition under noisy environments
Within-sample variability-invariant loss for robust speaker recognition under noisy environments 标题: ...
ICASSP 2019----Analysis and Mitigation of Vocal Effort Variations in Speaker Recognition
Mahesh Kumar Nandwana1 , Mitchell McLaren1 , Luciana Ferrer2 , Diego Castan1 , Aaron Lawson1 1,Speec ...
Speaker Recognition: Gaussian probabilistic LDA (PLDA)理解
"MSR Identity Toolbox"里使用到了G-PLDA(Gaussian probabilistic LDA). 根据文献[1]对G-PLDA的原理进行了初步的了解,记 ...
Speaker Recognition: Feature Extraction
1. Short-Term Spectral Features 常用的有MFCC, LPCC, LSF, PLP.实际应用中,如何选择哪个特征参数,重要性不如如何做好channel compensat ...
Speaker Recognition: GMM-UBM
1. WHY --- 为什么需要使用GMM-UBM来建立Individual Speaker Modeling? "Usually, we do not have much data fro ...
voxsrc20_std_00-How many kinds of topology used in speaker recognition?
ID = voxsrc20_std_00 Status: closed Content Topic Study record [200711] VoxSRC19 Reference Topic How ...
【论文学习】《Who is Real Bob? Adversarial Attacks on Speaker Recognition Systems》
<Who is Real Bob? Adversarial Attacks on Speaker Recognition Systems>论文学习文章目录 <Who is Real ...
END-TO-END DNN BASED SPEAKER RECOGNITION INSPIRED BY I-VECTOR AND PLDA
END-TO-END DNN BASED SPEAKER RECOGNITION INSPIRED BY I-VECTOR AND PLDA Johan Rohdin, Anna Silnova, M ...

Utterance-level Aggregation For Speaker Recognition In The Wild笔记

网络结构

Utterance-level Aggregation For Speaker Recognition In The Wild笔记相关推荐

最新文章

热门文章