Mahesh Kumar Nandwana1
, Mitchell McLaren1
, Luciana Ferrer2
, Diego Castan1
, Aaron Lawson1

1,Speech Technology and Research Laboratory, SRI International, Menlo Park, California, USA
美国加利福尼亚门罗公园SRI国际SPEECH技术和研究实验室
2,Instituto de Investigacon en Ciencias de la Computaci ´ on, UBA-CONICET, Argentina
阿根廷联邦大学计算机研究所

http://150.162.46.34:8080/icassp2019/ICASSP2019/pdfs/0006001.pdf

Abstract:
摘要:
In this work, we assess the impact of vocal effort on discrimination and calibration performance of a state-of-the-art speaker recognition system.
在这项工作中,我们评估了vocal effort对最先进的说话人识别系统的辨别和校准性能的影响。
We analyze three levels of vocal effort (low, normal, and high) from the SRI-FRTIV corpus.
我们分析了来自SRI-FRTIV语料库的三种vocal effort (低、正常和高)。
We use a deep neural network (DNN) speaker embeddings system with probabilistic linear discriminant analysis (PLDA) and find that vocal effort variation significantly degrades system performance.
我们利用深度神经网络(DNN)说话人嵌入系统与概率线性判别分析(PLDA),发现vocal effort的变化会明显降低系统的性能。
We apply both mixture PLDA (mix-PLDA) and trial-based calibration(校准) with condition PLDA similarity (TBC-CPLDA) to improve system robustness.
为了提高系统的鲁棒性,我们采用了混合PLDA (mix-PLDA)和基于条件PLDA相似性(TBC-CPLDA)的实验标定方法。
Our proposed approaches resulted in 18% and 33% relative improvement in discrimination and calibration performance respectively on the SRI-FRTIV corpus.
我们提出的方法在SRI-FRTIV语料库上的识别和校准性能分别提高了18%和33%。

From Wikipedia:
Vocal effort is a quantity varied by speakers when adjusting to an increase or decrease in the communication distance.
Vocal effort是这样一个变量,当交谈距离变化的时候,它也会随说话人的不同而变化
The communication distance is the distance between the speaker and the listener.
交谈距离是指说话者和听者之间的距离。
Vocal effort is a subjective physiological quantity, and is mainly dependent on subglottal pressure, vocal fold tension and jaw opening.
Vocal effort 是一个主观的生理变量,主要取决于声门下压力、声带张力和下颌张开度。
Vocal effort is different from sound pressure.
Vocal effort不是声压。
To measure vocal effort, listeners are asked to rate the distance between speaker and addressee.
为了衡量说话人的vocal effort,听众被要求对说话人和听众之间的距离打分。

SECTION 1.INTRODUCTION
1.节介绍
Variability in the acoustic signal is a persistent challenge for speaker recognition systems operating under real-world conditions.
声音信号的可变性是说话人识别系统在真实环境下工作所面临的一个长期挑战。
Such variability is caused by either intrinsic or extrinsic factors.
这种变异性是由内在因素或外在因素造成的。
Intrinsic factors are associated with the speaker rather than the recording environment.
内在因素与说话者有关,而与录音环境无关。
These factors include changes in vocal effort, speaking style [1], non-speech sounds [2], [3], [4], emotions, language [5], aging, etc. across recordings of the same speaker.
内在因素包括在vocal effort方面的变化,说话风格[1],非语言的声音[2],[3],[4],情绪,语言[5],年龄等。
Extrinsic factors are associated with the differences in the recording environments between recordings.
外部因素与录音环境的差异有关。
These factors include changes in background noise, microphone, room acoustics, distance from the microphone [6], transmission channel, codec [7], etc.
外部因素包括背景噪声的变化、麦克风、房间音响效果、与麦克风[6]的距离、传输通道、编解码器[7]等。
Intrinsic factors are also known as speaker-dependent factors, whereas extrinsic factors are called speaker-independent factors [8].
内在因素也称为说话者相关因素,而外在因素则称为说话者无关因素[8]。
During recent decades, US government evaluations and programs (such as the NIST Speaker Recognition Evaluations (SRE), the IARPA BEST program, and the DARPA RATS program) have motivated particular research directions in the speaker recognition community.
近几十年来,美国政府的评估和项目(如NIST Speaker Recognition assessment (SRE)、IARPA BEST program和DARPA RATS program)推动了说话人识别领域的特定研究方向。
Those research programs have primarily focused on the problem of extrinsic variability, including channel effects, transmission noise, and environmental noise.
这些研究项目主要集中于外部变异性的问题,包括通道效应、传输噪声和环境噪声。
Intrinsic variability, in contrast, has received sparse research exposure.
相反,内在的可变性却很少得到研究的关注。
Yet, intrinsic variability is a key factor for unconstrained applications, such as forensic speaker recognition.
然而,内在的可变性是无约束应用的一个关键因素,例如法医说话人识别。
This work is focused specifically on vocal effort variations, which is one class of intrinsic variability.
我们的工作特别关注 vocal effort方面的变化,这是一种内在的变异性。
Vocal effort has been shown to impact the performance of speaker recognition systems [9].
vocal effort已经被证明,会影响说话人识别系统[9]的性能。
In the past, a number of studies focused on different levels of vocal effort, such as whisper [10], shouts [11], and screams [4].
在过去,许多研究集中于不同程度的 vocal effort,如耳语[10]、大喊[11]和尖叫[4]。
The impact of Lombard speech on the performance of speaker verification system was considered in [12], [13].
[12]、[13]中考虑了朗巴德语对说话人验证系统性能的影响。
The main contributions of this work are as follows.
这项工作的主要贡献如下。
First, we use a state-of-the-art DNN speaker embeddings based speaker recognition system over classical GMM-UBM or i-vector based systems.
首先,不同于经典的基于GMM-UBM或i-vector的系统,我们使用了一种最先进的基于DNN说话人嵌入式的说话人识别系统。
Second, rather than focusing on just one type of vocal effort level such as whisper or shouts, we develop our mitigation approaches for a range of vocal efforts from low to high.
其次,我们不是只专注于一种类型的 vocal effort,如耳语或呼喊,而是为一系列从低到高的 vocal effort开发我们的缓解方法。
Third, we use a relatively large number of speakers with sufficient audio data per speaker to get significant results.
第三,我们使用相对较多的说话人,每个说话人有足够的音频数据,以获得显著的结果。
Also, to the best of our knowledge, this study is the first to consider calibration of speaker recognition system for a range of vocal efforts.
此外,据我们所知,本研究是第一个考虑校准说话人识别系统的一系列 vocal effort。
In this study, we first assess the impact of vocal effort on discrimination and calibration performance of a DNN speaker embeddings speaker recognition system.
在这项研究中,我们首先评估了 vocal effort对DNN嵌入式说话人识别系统的识别和校准性能的影响。
We then apply mixture PLDA (mix-PLDA) using meta information and the recently proposed trial-based calibration with condition PLDA similarity (TBC-CPLDA) to mitigate the impact of vocal effort.
然后,我们使用元信息和最近提出的基于条件PLDA相似性的实验校准(TBC-CPLDA),混合PLDA (mix-PLDA)来减轻 vocal effort的影响。
We used SRI-FRTIV corpora for all the experiments.
所有实验均采用SRI-FRTIV语料库。

ICASSP 2019----Analysis and Mitigation of Vocal Effort Variations in Speaker Recognition相关推荐

  1. 【无标题】RADICAL ANALYSIS NETWORK FOR ZERO-SHOT LEARNING IN PRINTED CHINESE CHARACTER RECOGNITION

    印刷体汉字识别中零次学习的部件分析网络 (RADICAL ANALYSIS NETWORK FOR ZERO-SHOT LEARNING IN PRINTED CHINESE CHARACTER RE ...

  2. TGARS 2019: What, Where, and How to Transfer in SAR Target Recognition Based on Deep CNNs ——学习笔记

    1 TGARS-2019论文:What, Where, and How to Transfer in SAR Target Recognition Based on Deep CNNs 链接:http ...

  3. 11.FREQUENCY AND TEMPORAL CONVOLUTIONAL ATTENTION FORTEXT-INDEPENDENT SPEAKER RECOGNITION(2019.10)

    题目:用于独立于文本的说话人识别的频率和时间卷积注意力 论文地址:https://arxiv.org/abs/1910.07364 摘要:大多数最近的与文本无关的说话人识别方法都应用注意力或类似技术来 ...

  4. 流利阅读 2019.1.31 #10YearChallenge: harmless trend or boon to facial recognition technology?

    下载 笔记版/无笔记版 pdf资料: GitHub - zhbink/LiuLiYueDu: 流利阅读pdf汇总 本文内容全部来源于流利阅读.流利阅读对每期内容均有很好的文章讲解,向您推荐. 您可以关 ...

  5. (ICASSP 19)AUTOMATIC GRAMMAR AUGMENTATION FOR ROBUST VOICE COMMAND RECOGNITION

    会议:ICASSP 2019 论文:AUTOMATIC GRAMMAR AUGMENTATION FOR ROBUST VOICE COMMAND RECOGNITION 作者:Yang Yang ; ...

  6. 人工智能/数据科学比赛汇总 2019.8

    内容来自 DataSciComp,人工智能/数据科学比赛整理平台. Github:iphysresearch/DataSciComp 本项目由 ApacheCN 强力支持. 微博 | 知乎 | CSD ...

  7. 人工智能/数据科学比赛汇总 2019.9

    内容来自 DataSciComp,人工智能/数据科学比赛整理平台. Github:iphysresearch/DataSciComp 本项目由 ApacheCN 强力支持. 微博 | 知乎 | CSD ...

  8. Comprehensive survey of computational ECG analysis: Databases,methods and applications

    1.Learning algorithms classifiers(most common and highest-performing): Support Vector Machines(SVM 支 ...

  9. 最全2019 AI/计算机/机器人顶会时间表来了,共收录36场会议,投稿冲鸭!

    郭一璞 整理  量子位 出品 | 公众号 QbitAI 2018年又要数着指头过了,你的2019学术计划怎么样了? 下面,是量子位给大家整理的2019 AI顶会时间表,包含会议举办的时间.地点.投稿截 ...

最新文章

  1. oracle scn与数据恢复,SCN与数据库恢复的关系
  2. windows 下frp服务启动_内网穿透frp linux服务端搭建和windows客户端使用
  3. 【转】adobe acrobat pro修改pdf文字
  4. 关于HOOK API Lib 0.1 For Delphi
  5. rman删除7天前备份_RMAN备份
  6. 判断C#中的字符串是否是数字,如果是转换成int类型
  7. Thrift第三课 编写脚本
  8. paip.C#.NET JSON解析总结
  9. php 微信支付md5签名,微信支付回调验证签名处理
  10. 线程间到底共享了哪些进程资源?
  11. iOS中创建,使用动态库(dylib)
  12. 暴风集团:9月21日起公司股票交易进入退市整理期
  13. html5 机构化元素
  14. 生产者与消费者 代码实现 java
  15. python 列表的行 列长度_Python连载|Pandas手册(上)
  16. sdk 今日头条_今日头条大数据分析平台艰辛成长路
  17. 学会这招,小姐姐看你的眼神将不一样
  18. chrome 切换标签页快捷键_如何在Chrome浏览器中切换标签页
  19. 挂茶馆VIP问道教程
  20. JAVA设计模式-06-建造者模式

热门文章

  1. Qt图形视图框架:QGraphicsScene详解
  2. 关于开源项目「基于ZigBee和STM32的智能家居控制系统」的使用说明
  3. c/c++ 运行时: R6034
  4. 怎么用计算机程序打开文件,win7打开计算机管理出现该文件没有与之关联的程序来执行该操作怎么解决...
  5. Arduino DHT11温湿度模块 LCD1602A
  6. 新版jdk无jre解决方案
  7. awk - 数据分析和展示
  8. 基于遗传算法的排课设计
  9. 劳易测扫描仪BCL 348i SM 102 参数及应用功能
  10. 揭开ZigBee 2006协议栈Z-Stack的”开源“面纱