基于端到端 语音合成

Healthcare has been one of the countless beneficiaries of the revolutionary advances that widespread computing has brought. Fast, efficient data organization, storage, and access that have greatly sped up the medical enterprise, yet many low hanging fruits remain hanging. Chief among those is the increased application of technologies that can process speech.

医疗保健已成为广泛计算带来的革命性进步的无数受益者之一。 快速,高效的数据组织,存储和访问极大地加快了医疗企业的发展速度,但仍有许多悬而未决的成果。 其中最主要的是增加了可以处理语音的技术的应用。

In this post, we’ll share with you how speech technology can improve healthcare in the three following ways. (1) Speech technology can be used to improve the efficiency of medical personnel. (2) Voice-based interactions for patients and hospital visitors can simplify access to information and speed up the registration process. Finally, (3) voice signal analysis can be used for earlier diagnosis and to help track the changes in medical conditions over time.

在本文中,我们将与您分享语音技术如何通过以下三种方式改善医疗保健。 (1)语音技术可用于提高医务人员的效率。 (2)患者和医院访客的基于语音的交互可以简化信息访问并加快注册过程。 最后,(3)语音信号分析可用于早期诊断并帮助跟踪医疗状况随时间的变化。

Let’s explore these three ways in detail.

让我们详细探讨这三种方式。

帮助医师 (Helping physicians)

Research has shown that medical personnel unsurprisingly spend a significant amount of their time interacting with their patients using their voice, but, surprisingly, they can spend even more time documenting these physician-patient sessions (Arndt et al., 2017). Taking notes, creating and updating medical records, and so on, take up a considerable amount of time. Physicians often make use of any resources available to them to speed up this process. For example, a physician can record the session with a patient to be able to revisit details of it later or employ a human scribe, that can either transcribe the session records or document the session directly as it unfolds. It is no secret, however, that the requirement of additional personnel with non-trivial training makes these options somewhat luxurious and unreliable.

研究表明,医务人员花费大量时间与他们的声音与患者互动并不奇怪,但令人惊讶的是,他们可以花更多的时间记录这些医患会议的内容(Arndt et al。,2017) 。 做笔记,创建和更新病历等都需要花费大量时间。 医师通常会利用他们可用的任何资源来加快此过程。 例如,医师可以记录与患者的会话,以便以后可以重访它的细节或雇用人类抄写员,后者可以抄录会话记录或在会话展开时直接记录该会话。 但是,秘密培训的额外人员需求使得这些选择有些奢侈和不可靠,这已不是秘密。

The first and most obvious remedy for this time sink is to use a digital dictation device including a speech-to-text system (introduced in our previous post) that automatically processes the recordings and prepares a draft transcription of the session, which the physician or a trained scribe can quickly post-edit. Speech-to-text can also be used to transcribe the dictated post-session recordings made by physicians. Processing medical speech is not without its challenges, though. While there are multiple products on the market that serve this use case already, it is useful to look at the challenges these systems are facing.

解决此问题的第一个也是最明显的补救方法是使用数字听写设备,该设备包括语音转文本系统(在我们之前的文章中介绍),该系统会自动处理录音并准备会议的录音草稿,医生或受过训练的抄写员可以快速进行后期编辑。 语音转文字还可以用于转录医师录制的指定的会后录音。 但是,处理医学演讲并非没有挑战。 尽管市场上已经有多种产品可以满足这种用例的需求,但是查看这些系统所面临的挑战很有用。

Like many other non-professional recordings, the audio quality in healthcare is very inconsistent. There’s often a background noise of varying amounts, and the speakers are also frequently unclear, specifically when speaking too fast or too far from the microphone. Good audio quality is usually not the primary concern of neither the patient nor the physician during a session. Thus, any speech recognition system that has been trained on clean audio will probably have a bad time trying to make sense of the vast possible range of acoustic conditions in medical recordings. Fortunately, special speech-to-text training techniques and data preparation can help create noise-robust systems that can provide much better recognition quality with such inconsistent inputs.

像许多其他非专业录音一样,医疗保健中的音频质量也非常不一致。 通常会有不同数量的背景噪音,而且扬声器也经常不清楚,特别是当说话速度太快或离麦克风太远时。 在会议期间,良好的音频质量通常不是患者和医生都不会关注的主要问题。 因此,任何已经接受过清晰音频训练的语音识别系统都可能会花费大量时间来尝试理解医疗记录中的声学条件。 幸运的是,特殊的语音到文本训练技术和数据准备可以帮助创建噪声健壮的系统,这些系统在输入不一致的情况下可以提供更好的识别质量。

Also, medical language is highly specialized, with a vast vocabulary, acronyms and other forms of expression not found in common language. Commercial speech-to-text systems are often trained to cover as many application domains as possible. Therefore, they are called general domain systems. Using a general domain system on a narrow domain like the medical one often results in a noticeable drop in recognition accuracy. Fortunately, we can use domain adaptation to adjust our general domain speech-to-text system to a narrower domain or, given enough data, even train a domain-specific system. These systems trade off accuracy on the general domain for much better accuracy on the domain of interest. Given the vast size of medical vocabulary, it can sometimes make sense to focus on a particular medical specialisation such as, e.g., radiology or dermatology.

同样,医学语言是高度专业化的,具有大量的词汇,首字母缩写词和其他表达形式,这些语言是普通语言所没有的。 商业语音转文本系统通常经过培训,以覆盖尽可能多的应用领域。 因此,它们被称为通用域系统。 在医学等狭窄领域上使用通用领域系统通常会导致识别准确性明显下降。 幸运的是,我们可以使用域自适应将我们的通用域语音转文本系统调整为一个较窄的域,或者在给定足够数据的情况下,甚至可以训练特定域的系统。 这些系统会权衡通用域的精度,以达到所关注域更好的精度。 鉴于医学词汇量巨大,有时将重点放在特定医学专长上是有意义的,例如放射学或皮肤病学。

While speech-to-text is a significant boost on its own, we can make further improvements by building on top of it. We can use speech-to-text as a stepping stone to build smarter downstream applications that can deal with the recognised text in a useful manner. For example, it would be useful to extract structured information from the raw transcript, and try to prepare draft medical records, which often follow a regular structure. Spoken language understanding tools (also introduced in our previous post) can be used to extract information from raw transcripts, starting from recognising highly regular patterns in the text such as medicine dosage and standardised measurements, to classifying and extracting more nebulous entities such as medical history, symptom descriptions, a chief complaint by the patient, etc. The extracted structural information can then be used to prepare the actual record drafts, a template where the physician must fill in the missing pieces and check the rest.

虽然语音转文字本身是一个巨大的进步,但我们可以在此基础上进一步改进。 我们可以使用语音转文本作为垫脚石,构建更智能的下游应用程序,以有用的方式处理已识别的文本。 例如,从原始成绩单中提取结构化信息,并尝试准备通常遵循常规结构的病历草案将非常有用。 可以使用口语理解工具(也可以在我们以前的文章中介绍)可以从原始成绩单中提取信息,从识别文本中高度规则的模式(例如药物剂量和标准化测量)到分类和提取更模糊的实体(例如病史) ,症状描述,患者的主诉等。提取的结构信息可用于准备实际的记录草稿,医生必须在其中填写缺失的部分并检查其余部分的模板。

对患者和访客的协助 (Assistance for patients and visitors)

It’s not only doctors and other medical personnel that speech technologies can help. A clever combination of speech recognition, speech synthesis and chatbot technologies can result in a more useful virtual assistant in hospitals, which is capable of helping people with common questions and problems. Eventually, this can optimise the workload of hospital personnel and reduce the time people have to wait in queues for information and registration. Such a hospital assistant can, for example, help with directions or guide through the proper hospital processes for getting to where the visitor or patient has or wants to be.

语音技术不仅可以帮助医生和其他医务人员。 语音识别,语音合成和聊天机器人技术的巧妙结合可以在医院提供更有用的虚拟助手,从而能够帮助有常见问题的人。 最终,这可以优化医院人员的工作量,并减少人们在队列中等待信息和注册的时间。 这样的医院助手可以,例如,帮助指导或指导适当的医院程序,以到达访客或患者想要或想要去的地方。

Beyond procedural help in the hallways and corridors of a hospital, speech technologies can also have a positive impact on the quality of life of patients with disabilities. Speech recognition and speech synthesis together can provide a practical substitute for the mostly visual interface of a modern computer, enabling access to many of the benefits of modern computing (Hawley, 2002). Among the benefits, we can mention the example of a person with vision impairments who can enjoy voice feedback from a computer relaying the information that would be traditionally presented on the screen. Another example can be a person with limited mobility who can interact with and issue speech commands to modern virtual assistants, which can automatically perform a range of useful tasks, thanks to modern connectivity enabled household objects.

除了在医院的走廊和走廊中提供程序上的帮助外,语音技术还可以对残疾患者的生活质量产生积极影响。 语音识别和语音合成一起可以为现代计算机的大部分视觉界面提供实用的替代,从而能够获得现代计算机的许多好处(Hawley,2002) 。 在这些好处中,我们可以举出一个有视力障碍的人的例子,该人可以享受来自计算机的语音反馈,该计算机可以中继传统上显示在屏幕上的信息。 另一个例子是行动不便的人,他可以与现代虚拟助手互动并向其发出语音命令,这要归功于具有现代连通性的家用对象,该虚拟助手可以自动执行一系列有用的任务。

Most of these helpful use cases require integration between speech technology, which can provide voice interfaces, and the actual systems doing something useful, that can’t yet be accessed by either the visitor or the patient. While there are some existing integrations out there today, the number of research and commercial interests in this space should ensure that we’ll see such applications increasingly often in the future.

这些有用的用例中的大多数都需要语音技术与可以提供语音接口的实际系统之间的集成,而实际系统正在做一些有用的事情,访客或患者都无法访问。 尽管今天有一些现有的集成,但是在这个领域中的研究和商业利益的数量应该确保我们将来会越来越频繁地看到这种应用。

诊断与分析 (Diagnosis and analysis)

The frontier of speech technology in health goes beyond assistance. A highly active and perspective area of speech technology research is diagnosing various physical and mental disorders that affect patient’s speech and other vocal expressions. A diverse set of technologies is used for diagnostic purposes, and a significant part of them do not only consider the linguistic contents of speech, i.e., what is being said, but pay more attention to its paralinguistic features, i.e., how it’s being said. Both the contents and the manner of speech can provide important clues to determine a more accurate diagnosis.

语音技术在健康领域的前沿已超越了援助。 语音技术研究的高度活跃和具前景的领域正在诊断影响患者语音和其他声音表达的各种身体和精神疾病。 各种各样的技术都用于诊断目的,其中很大一部分不仅考虑语音的语言内容(即正在说的是什么),还更加注意其副语言功能(即如何说的)。 内容和讲话方式都可以提供重要的线索来确定更准确的诊断。

Quite obviously, speech technologies can be used to diagnose, classify and track speech disorders. These technologies can be used to identify acute speech disorders establishing initial diagnosis without the need for expensive and potentially painful medical procedures. Likewise, keeping track of how pronunciation changes over time can help keep a non-perceptual record of how a disorder is progressing or receding. Providing or improving diagnosis by using just an audio recording does not require expensive medical equipment. Adding to this, it helps avoid subjective perceptual judgments by the clinician and, most importantly for the patient, it is painless and easy (Wu et al., 2018).

很明显,语音技术可用于诊断,分类和跟踪语音障碍。 这些技术可用于识别建立初始诊断的急性言语障碍,而无需昂贵且可能痛苦的医疗程序。 同样,跟踪发音随时间变化的方式可以帮助保持对疾病如何进展或消退的非感知记录。 仅使用音频记录来提供或改善诊断不需要昂贵的医疗设备。 除此之外,它有助于避免临床医生的主观感知判断,并且对患者而言最重要的是,它无痛且容易(Wu等人,2018) 。

Similar methods can be applied to track and provide early detection of problems during child development, where an early diagnosis of a speech disorder can help provide effective early intervention to reduce potential problems later. One simple implementation would be a convenient mobile application that a parent can use to record their child speaking a set of carefully chosen benchmark words, which can then be analysed and compared to a set of ground-truth recordings from known healthy recordings (Kothalkar et al., 2018).

类似的方法可用于跟踪和提供儿童发育过程中的问题的早期检测,其中语音障碍的早期诊断可帮助提供有效的早期干预,以减少以后的潜在问题。 一个简单的实现是便捷的移动应用程序,父母可以使用它来记录他们的孩子说出一组精心选择的基准词,然后可以对其进行分析,并将其与来自已知健康记录的一系列真实记录进行比较(Kothalkar等人。,2018) 。

Speech signal analysis could also prove itself useful for early diagnosis of neurodegenerative conditions, some of which are tricky to diagnose early as the symptoms are subtle and could be explained by several other conditions. Prominent examples, where speech signal analysis has shown promise in early detection are amyotrophic lateral sclerosis (An et al., 2018), Parkinson’s disease (Tsanas et al., 2012) and brain injury (Falcone et al., 2013). Many of the markers used today rely on self-assessment or performance of tasks that are judged perceptually by a human observer, but these conditions affecting the brain usually become noticeable to a human only after significant brain damage has already occurred. Statistical analysis of speech recordings, however, has shown to be more sensitive than human ear to early symptoms (An et al., 2018).

语音信号分析还可以证明自己对神经退行性疾病的早期诊断很有用,其中一些症状很难诊断,因为症状很细微,并且可以由其他几种疾病进行解释。 语音信号分析在早期发现中显示出希望的突出例子是肌萎缩性侧索硬化(An等,2018) ,帕金森氏病(Tsanas等,2012)和脑损伤(Falcone等,2013) 。 当今使用的许多标记物都依赖于人类观察者在感知上判断的自我评估或任务的执行,但是影响大脑的这些条件通常只有在严重的脑部损伤已经发生之后才对人类显而易见。 然而,语音记录的统计分析已显示比人耳对早期症状更敏感(An等人,2018) 。

It has been shown that even subtle mental disorders can be analysed using both the contents of patient’s speech and the manner of speech (Cummins et al., 2015). For example, research shows that depression notably affects the patient’s vocabulary, and this change can be statistically measured to bolster the clinician’s confidence. On the other hand, subtle pronunciation differences can also betray the speaker’s mood and state of mind to not just a perceptive human listener, but also to statistical machine analysis.

已经表明,即使是细微的精神障碍,也可以使用患者的言语内容和言语方式进行分析(Cummins等人,2015) 。 例如,研究表明,抑郁症会显着影响患者的词汇量,可以通过统计学方法测量这种变化以增强临床医生的信心。 另一方面,细微的发音差异也可能使说话者的情绪和心态不仅向听觉敏锐的听众出示,而且还会向统计机器分析出卖。

While many of these methods promise early, easy and, most importantly, objective diagnosis for a whole range of medical conditions, currently they are also limited to mostly research laboratories and curated medical datasets. In other words, it’s still mostly the future, but early signs give us cautious hope that these technologies might soon deliver substantial real benefits for doctors and patients.

尽管这些方法中的许多方法都有望对整个医学状况进行早期,简便且最重要的是客观诊断,但目前它们还仅限于大多数研究实验室和精选的医学数据集。 换句话说,这仍然主要是未来,但是早期迹象给我们谨慎的希望,这些技术可能很快为医生和患者带来实质性的好处。

Figure 2 图2 — Research into an automatic paralinguistic and linguistic analysis of patient’s speech promises improvements for the diagnosis of a range of medical conditions. –对患者语音的自动副语言和语言分析的研究有望改善一系列医疗状况的诊断。

In this blog post, we have described just a few promising avenues, where speech technology can help both the medical practitioner and the patient. While many of the above-mentioned technologies have been tested only in research and experimental settings and thus can be considered somewhat pre-mature, the growing number of these technologies also points towards the potential for improvement on multiple levels in healthcare by employing these technological advances.

在此博客文章中,我们仅描述了一些有希望的途径,其中语音技术可以帮助医生和患者。 尽管许多上述技术仅在研究和实验环境中进行过测试,因此可以认为还为时过早,但通过采用这些技术进步,这些技术的数量不断增加,也表明在医疗保健多个层面上都有改进的潜力。

It should be noted that the medical industry is one with high stakes and moral responsibility. Medical data is often highly sensitive, and particular care should be taken to protect the privacy of the patient. The technologists and researchers bringing these improvements to life should also pay attention to not overselling and carefully indicating where the new technology still falls short. The improvements here should help and facilitate the job of a medical expert, but can’t substitute for it.

应该指出的是,医疗行业是一个利益攸关且具有道德责任的行业。 医疗数据通常是高度敏感的,应特别注意保护患者的隐私。 实现这些改进的技术人员和研究人员也应注意不要过度销售,并仔细指出新技术的不足之处。 此处的改进应有助于并促进医学专家的工作,但不能替代它。

翻译自: https://medium.com/language-tech/voice-based-applications-for-e-health-befdadb1fd31

基于端到端 语音合成


http://www.taodudu.cc/news/show-5640078.html

相关文章:

  • 音视频技术之《直播概念和流程框架》
  • 多人聊天室的WebRTC实现方案
  • outlook接收邮件后右下角没有弹窗通知问题解决
  • Word公式居中,编号右对齐
  • 浅谈安卓逆向协议(三)- 最右
  • 网络爬虫ip防止被封常见的三种解决方法
  • 如何查询域名是否被微信拦截 被已经被微信屏蔽了的的域名网址改如何做防封防屏蔽
  • 微信域名经常被封?微信屏蔽域名的机制
  • 微信分享域名被封杀拦截的原因 怎样降低微信被封的几率
  • 微信域名防封跳转系统详解,域名总是被微信屏蔽应该这样做
  • iOS app被拒------3.2(f)开发者账号被封
  • 有趣的摆、混沌系统
  • Python在大气海洋科学的应用
  • 哨兵2数据预处理 大气校正 分为2017之前与2017年之后
  • 统计学笔记——大气科学中的统计方法——简介
  • 使用Acolite批量大气校正Sentinel-2 L1C数据
  • Landsat系列卫星地表温度批量反演代码(大气校正法)
  • 一个月刷 400 道 leetcode 之后的经验与心得总结、真题整理分享
  • 时间复杂度和空间复杂度计算(数据结构C语言实现1)
  • 【刷题日记】C++入门必刷经典题目
  • 手把手教你拿下空间复杂度【下】(超详细)
  • 手机刷钻
  • 知乎万赞,值得反复刷的Android面试题,金九银十正确打开方式
  • 数据结构“入门”—空间复杂度
  • mysql刷脏页的原理_Mysql脏页flush及收缩表空间原理解析
  • 用计算机谈狂狼,抖音花姐狂狼mp3歌曲歌词视频完整版分享
  • flutter3 强制横屏,控制横竖屏显示
  • RK3588-android12强制第三方app横屏显示
  • Flutter 强制横屏竖屏设置
  • android动态设置横竖屏,Android开发强制横屏和强制竖屏设置实例代码

基于端到端 语音合成_基于语音的电子医疗应用相关推荐

  1. 端到端训练 联合训练_基于元学习和人机协同的端到端对话模型

    Data Scarcity Problem:端到端模型一般需要大量的训练数据,且场景越复杂,需求越大.在云小蜜 Dialog Studio 中的ToB 的业务,不少新场景一开始是没有可用的高质量对话日 ...

  2. 基于数据库数据增量同步_基于 Flink SQL CDC 的实时数据同步方案

    简介:Flink 1.11 引入了 Flink SQL CDC,CDC 能给我们数据和业务间能带来什么变化?本文由 Apache Flink PMC,阿里巴巴技术专家伍翀 (云邪)分享,内容将从传统的 ...

  3. 基于linux的智能小车_基于ARM10与LINUX智能小车系统设计.doc

    基于ARM10与LINUX智能小车系统设计 基于ARM的智能小车系统的设计 摘 要 基于ARM的嵌入式系统其性能优良,移植性好,已广泛应用在各个行业,因此将ARM微处理器应用于智能小车的控制系统是一种 ...

  4. uniapp 移动端上传文件_基于 uniapp 实现动态路由和动态 Tabbar

    本文转载自 字节逆旅 公众号,感谢作者的好文分享✌️,点击阅读原文可以查看作者的掘金链接. uniapp用来开发app还是比较方便快捷的,官网教程[1]挺细致,几乎所有问题都能找到答案.网上也有不少入 ...

  5. 基于实战开发垂直搜索引擎_基于DDD的微服务设计和开发实战

    作者:欧创新.邓頔.文艺 你是否还在为微服务应该拆多小而争论不休?到底如何才能设计出收放自如的微服务?怎样才能保证业务领域模型与代码模型的一致性?或许本文能帮你找到答案. 本文是基于 DDD 的微服务 ...

  6. 基于cpt的组网实验_基于E-PUCK 2.0多智能体自主协同 高频投影定位系统

    群体智能机器人是一种国际前沿的人工智能研究项目,由多个小型机器人组成的集群式解决系统,灵感源于蚂蚁.蜜蜂.鱼等群体生物,在没有统一领导的情况下,也能合作执行大量复杂的任务,比如组建一个图形,再在此基础 ...

  7. python基于水色图像的水质评价_基于Python和遥感图像的膨胀与腐蚀操作

    引言 膨胀与腐蚀是图像形态学中的基本操作,本文将从两个角度实现python的膨胀与腐蚀算法,分别是特征提取与分割后处理.目前,在RGB图像领域,这两种形态学算法经常用于分割结果的处理,例如连通区处理. ...

  8. 基于python的房地产数据分析_基于Python的数据分析

    转载 | CSDN 编辑 | 雷课小雷 下面来介绍一下基于Python的数据分析,主要介绍数据分析的概念.数据分析流程.Python优势.常用模块的用途以及使用Python进行数据分析的学习方法及步骤 ...

  9. 基于高分辨率影像城市绿地信息提取_基于ENVI的高分辨率遥感影像城市绿地信息提取研究_陈阳pdf.doc...

    基于ENVI的高分辨率遥感影像城市绿地信息提取研究_陈阳pdf 第卷第期测绘工程,244 Vol.24No.4年月,20154Engineering of Surveying and MappingA ...

最新文章

  1. 细数二十世纪最伟大的10大算法
  2. 2021-04-21--中标麒麟-yum源修改
  3. 有效的括号—leetcode20
  4. 安装Hadoop及Spark(Ubuntu 16.04)
  5. 如何:在各种字符串类型之间进行转换
  6. 【Linux】生产者消费者编程实现-线程池+信号量
  7. Scala基础之变量和数据类型
  8. 小乐乐与二段数(2019哈理工新生赛第20题)
  9. System Center VMM请注意不同语言版本的差异
  10. boost线程指南手册
  11. lvs-rrd 监控LVS
  12. 网页打开慢升级服务器宽带,网速快打开网页慢怎么办_网络测速很快但是上网很慢如何解决-win7之家...
  13. wordpress插件Imagepaste的命名规则修改(一款 直接复制粘贴图片自动上传的编辑器增强插件)
  14. 阿里云添加域名解析设置
  15. scanf(“%*[\n]%[^\n]“,s)是什么意思?
  16. 网络协议报文理解刨析篇二(再谈Http和Https), 加上TCP/UDP/IP协议分析(理解着学习), 面试官都惊讶你对网络的见解
  17. 110 李俊民 新庵
  18. mac xcode 开发C++
  19. 火车采集 PHP插件 post,火车头采集器外部PHP编程插件的使用
  20. 怎样才能拿到期货开户最低的手续费?

热门文章

  1. vue request底层封装
  2. vue+elementui微信支付状态问题
  3. 20145202马超《java》实验四
  4. PCB参数计算神器——Saturn PCB Design Toolkit
  5. better-scroll 实现下拉刷新、上拉加载的那些坑
  6. TEA-Tiny Encryption算法
  7. 用计算机弹出爸爸妈妈,小学生把和平精英伪装成计算机,父母点开后出现了4个字!...
  8. 如何通过玩游戏的同时还能有所收益
  9. 三网话费接口API 源码
  10. 高等数学(第七版)同济大学 总习题十 (前6题)个人解答