使用 Azure Speech Service 进行语音识别Speech recognition using Azure Speech Service

01/14/2020

本文内容

Azure Speech Service 是一种基于云的 API,它提供以下功能:Azure Speech Service is a cloud-based API that offers the following functionality:

语音到文本 转录音频文件或流到文本。Speech-to-text transcribes audio files or streams to text.

文本到语音 转换将输入文本转换为用户喜欢的合成语音。Text-to-speech converts input text into human-like synthesized speech.

语音翻译 为语音到文本和语音到语音转换启用了实时、多语言翻译。Speech translation enables real-time, multi-language translation for both speech-to-text and speech-to-speech.

语音助手 可为应用程序创建类似于用户的对话接口。Voice assistants can create human-like conversation interfaces for applications.

本文介绍如何 :::no-loc(Xamarin.Forms)::: 使用 Azure 语音服务在示例应用程序中实现语音到文本。This article explains how speech-to-text is implemented in the sample :::no-loc(Xamarin.Forms)::: application using the Azure Speech Service. 以下屏幕截图显示了 iOS 和 Android 上的示例应用程序:The following screenshots show the sample application on iOS and Android:

创建 Azure 语音服务资源Create an Azure Speech Service resource

Azure Speech Service 是 Azure 认知服务的一部分,可为图像识别、语音识别和翻译等任务和必应搜索提供基于云的 Api。Azure Speech Service is part of Azure Cognitive Services, which provides cloud-based APIs for tasks such as image recognition, speech recognition and translation, and Bing search.

示例项目需要在 Azure 门户中创建 Azure 认知服务资源。The sample project requires an Azure Cognitive Services resource to be created in your Azure portal. 可以为单个服务(如语音服务)或多服务资源创建认知服务资源。A Cognitive Services resource can be created for a single service, such as Speech Service, or as a multi-service resource. 创建语音服务资源的步骤如下所示:The steps to create a Speech Service resource are as follows:

创建多服务或单一服务资源。Create a multi-service or single-service resource.

获取资源的 API 密钥和区域信息。Obtain the API key and region information for your resource.

更新示例 Constants.cs 文件。Update the sample Constants.cs file.

有关创建资源的循序渐进指南,请参阅 创建认知服务资源。For a step-by-step guide to creating a resource, see Create a Cognitive Services resource.

备注

如果还没有 Azure 订阅,可以在开始前创建一个免费帐户。If you don't have an Azure subscription, create a free account before you begin. 拥有帐户后,可以在免费层创建单个服务资源来试用服务。Once you have an account, a single-service resource can be created at the free tier to try out the service.

通过语音服务配置应用Configure your app with the Speech Service

创建认知服务资源后,可使用 Azure 资源中的区域和 API 密钥更新 Constants.cs 文件:After creating a Cognitive Services resource, the Constants.cs file can be updated with the region and API key from your Azure resource:

public static class Constants

{

public static string CognitiveServicesApiKey = "YOUR_KEY_GOES_HERE";

public static string CognitiveServicesRegion = "westus";

}

安装 NuGet 语音服务包Install NuGet Speech Service package

示例应用程序使用 cognitiveservices account NuGet 包来连接到 Azure 语音服务。The sample application uses the Microsoft.CognitiveServices.Speech NuGet package to connect to the Azure Speech Service. 在共享项目和每个平台项目中安装此 NuGet 包。Install this NuGet package in the shared project and each platform project.

创建 IMicrophoneService 接口Create an IMicrophoneService interface

每个平台都需要有权访问麦克风。Each platform requires permission to access to the microphone. 示例项目 IMicrophoneService 在共享项目中提供了一个接口,并使用 :::no-loc(Xamarin.Forms)::: DependencyService 来获取接口的平台实现。The sample project provides an IMicrophoneService interface in the shared project, and uses the :::no-loc(Xamarin.Forms)::: DependencyService to obtain platform implementations of the interface.

public interface IMicrophoneService

{

Task GetPermissionAsync();

void OnRequestPermissionResult(bool isGranted);

}

创建页面布局Create the page layout

示例项目定义了 MainPage 文件中的基本页面布局。The sample project defines a basic page layout in the MainPage.xaml file. 键布局元素是一个 Button ,它启动脚本过程、 Label 包含转录文本的,以及一个要在脚本 ActivityIndicator 进行过程中显示的:The key layout elements are a Button that starts the transcription process, a Label to contain the transcribed text, and an ActivityIndicator to show when transcription is in progress:

...>

... />

IsRunning="False" />

...

Clicked="TranscribeClicked"/>

实现语音服务Implement the Speech Service

MainPage.xaml.cs 代码隐藏文件包含用于从 Azure Speech Service 发送音频和接收转录文本的所有逻辑。The MainPage.xaml.cs code-behind file contains all of the logic to send audio and receive transcribed text from the Azure Speech Service.

MainPage构造函数从获取接口的实例 IMicrophoneService DependencyService :The MainPage constructor gets an instance of the IMicrophoneService interface from the DependencyService:

public partial class MainPage : ContentPage

{

SpeechRecognizer recognizer;

IMicrophoneService micService;

bool isTranscribing = false;

public MainPage()

{

InitializeComponent();

micService = DependencyService.Resolve();

}

// ...

}

TranscribeClicked当点击实例时,将调用方法 transcribeButton :The TranscribeClicked method is called when the transcribeButton instance is tapped:

async void TranscribeClicked(object sender, EventArgs e)

{

bool isMicEnabled = await micService.GetPermissionAsync();

// EARLY OUT: make sure mic is accessible

if (!isMicEnabled)

{

UpdateTranscription("Please grant access to the microphone!");

return;

}

// initialize speech recognizer

if (recognizer == null)

{

var config = SpeechConfig.FromSubscription(Constants.CognitiveServicesApiKey, Constants.CognitiveServicesRegion);

recognizer = new SpeechRecognizer(config);

recognizer.Recognized += (obj, args) =>

{

UpdateTranscription(args.Result.Text);

};

}

// if already transcribing, stop speech recognizer

if (isTranscribing)

{

try

{

await recognizer.StopContinuousRecognitionAsync();

}

catch(Exception ex)

{

UpdateTranscription(ex.Message);

}

isTranscribing = false;

}

// if not transcribing, start speech recognizer

else

{

Device.BeginInvokeOnMainThread(() =>

{

InsertDateTimeRecord();

});

try

{

await recognizer.StartContinuousRecognitionAsync();

}

catch(Exception ex)

{

UpdateTranscription(ex.Message);

}

isTranscribing = true;

}

UpdateDisplayState();

}

TranscribeClicked 方法执行以下操作:The TranscribeClicked method does the following:

检查应用程序是否有权访问麦克风,如果不是,则提前退出。Checks if the application has access to the microphone and exits early if it does not.

创建类的实例( SpeechRecognizer 如果它尚不存在)。Creates an instance of SpeechRecognizer class if it doesn't already exist.

如果正在进行,则停止运行。Stops continuous transcription if it is in progress.

插入时间戳,如果未在进行,则启动连续脚本。Inserts a timestamp and starts continuous transcription if it is not in progress.

通知应用程序基于新应用程序状态更新其外观。Notifies the application to update its appearance based on the new application state.

类方法的其余部分 MainPage 是用于显示应用程序状态的帮助程序:The remainder of the MainPage class methods are helpers for displaying the application state:

void UpdateTranscription(string newText)

{

Device.BeginInvokeOnMainThread(() =>

{

if (!string.IsNullOrWhiteSpace(newText))

{

transcribedText.Text += $"{newText}\n";

}

});

}

void InsertDateTimeRecord()

{

var msg = $"=================\n{DateTime.Now.ToString()}\n=================";

UpdateTranscription(msg);

}

void UpdateDisplayState()

{

Device.BeginInvokeOnMainThread(() =>

{

if (isTranscribing)

{

transcribeButton.Text = "Stop";

transcribeButton.BackgroundColor = Color.Red;

transcribingIndicator.IsRunning = true;

}

else

{

transcribeButton.Text = "Transcribe";

transcribeButton.BackgroundColor = Color.Green;

transcribingIndicator.IsRunning = false;

}

});

}

UpdateTranscription方法将提供的写入名为的 newText string Label 元素 transcribedText 。The UpdateTranscription method writes the provided newText string to the Label element named transcribedText. 它强制在 UI 线程上进行此更新,以便在不引发异常的情况下从任何上下文中进行调用。It forces this update to happen on the UI thread so it can be called from any context without causing exceptions. 将 InsertDateTimeRecord 当前日期和时间写入 transcribedText 实例,以标记新脚本的开头。The InsertDateTimeRecord writes the current date and time to the transcribedText instance to mark the start of a new transcription. 最后, UpdateDisplayState 方法更新 Button 和 ActivityIndicator 元素以反映脚本是否正在进行。Finally, the UpdateDisplayState method updates the Button and ActivityIndicator elements to reflect whether or not transcription is in progress.

创建平台麦克风服务Create platform microphone services

应用程序必须具有用于收集语音数据的麦克风访问权限。The application must have microphone access to collect speech data. IMicrophoneService接口必须在 DependencyService 每个平台上实现并注册,才能让应用程序正常工作。The IMicrophoneService interface must be implemented and registered with the DependencyService on each platform for the application to function.

AndroidAndroid

示例项目定义了一个 IMicrophoneService 名为的 Android 实现 AndroidMicrophoneService :The sample project defines an IMicrophoneService implementation for Android called AndroidMicrophoneService:

[assembly: Dependency(typeof(AndroidMicrophoneService))]

namespace CognitiveSpeechService.Droid.Services

{

public class AndroidMicrophoneService : IMicrophoneService

{

public const int RecordAudioPermissionCode = 1;

private TaskCompletionSource tcsPermissions;

string[] permissions = new string[] { Manifest.Permission.RecordAudio };

public Task GetPermissionAsync()

{

tcsPermissions = new TaskCompletionSource();

if ((int)Build.VERSION.SdkInt < 23)

{

tcsPermissions.TrySetResult(true);

}

else

{

var currentActivity = MainActivity.Instance;

if (ActivityCompat.CheckSelfPermission(currentActivity, Manifest.Permission.RecordAudio) != (int)Permission.Granted)

{

RequestMicPermissions();

}

else

{

tcsPermissions.TrySetResult(true);

}

}

return tcsPermissions.Task;

}

public void OnRequestPermissionResult(bool isGranted)

{

tcsPermissions.TrySetResult(isGranted);

}

void RequestMicPermissions()

{

if (ActivityCompat.ShouldShowRequestPermissionRationale(MainActivity.Instance, Manifest.Permission.RecordAudio))

{

Snackbar.Make(MainActivity.Instance.FindViewById(Android.Resource.Id.Content),

"Microphone permissions are required for speech transcription!",

Snackbar.LengthIndefinite)

.SetAction("Ok", v =>

{

((Activity)MainActivity.Instance).RequestPermissions(permissions, RecordAudioPermissionCode);

})

.Show();

}

else

{

ActivityCompat.RequestPermissions((Activity)MainActivity.Instance, permissions, RecordAudioPermissionCode);

}

}

}

}

AndroidMicrophoneService具有以下功能:The AndroidMicrophoneService has the following features:

Dependency特性向注册该类 DependencyService 。The Dependency attribute registers the class with the DependencyService.

GetPermissionAsync方法根据 Android SDK 版本检查是否需要权限,并在 RequestMicPermissions 尚未授予权限的情况下调用。The GetPermissionAsync method checks if permissions are required based on the Android SDK version, and calls RequestMicPermissions if permission has not already been granted.

RequestMicPermissions方法使用 Snackbar 类从用户请求权限(如果需要基本原理),否则它会直接请求音频录制权限。The RequestMicPermissions method uses the Snackbar class to request permissions from the user if a rationale is required, otherwise it directly requests audio recording permissions.

OnRequestPermissionResult bool 当用户对权限请求作出响应后,将使用结果调用方法。The OnRequestPermissionResult method is called with a bool result once the user has responded to the permissions request.

MainActivity自定义类,以便 AndroidMicrophoneService 在完成权限请求时更新实例:The MainActivity class is customized to update the AndroidMicrophoneService instance when permissions requests are complete:

public class MainActivity : global:::::no-loc(Xamarin.Forms):::.Platform.Android.FormsAppCompatActivity

{

IMicrophoneService micService;

internal static MainActivity Instance { get; private set; }

protected override void OnCreate(Bundle savedInstanceState)

{

Instance = this;

// ...

micService = DependencyService.Resolve();

}

public override void OnRequestPermissionsResult(int requestCode, string[] permissions, [GeneratedEnum] Android.Content.PM.Permission[] grantResults)

{

// ...

switch(requestCode)

{

case AndroidMicrophoneService.RecordAudioPermissionCode:

if (grantResults[0] == Permission.Granted)

{

micService.OnRequestPermissionResult(true);

}

else

{

micService.OnRequestPermissionResult(false);

}

break;

}

}

}

MainActivity类定义了一个名为的静态引用 Instance , AndroidMicrophoneService 对象在请求权限时需要该引用。The MainActivity class defines a static reference called Instance, which is required by the AndroidMicrophoneService object when requesting permissions. OnRequestPermissionsResult AndroidMicrophoneService 当用户批准或拒绝了权限请求时,它将重写方法以更新对象。It overrides the OnRequestPermissionsResult method to update the AndroidMicrophoneService object when the permissions request is approved or denied by the user.

最后,Android 应用程序必须包括在 AndroidManifest.xml 文件中录制音频的权限:Finally, the Android application must include the permission to record audio in the AndroidManifest.xml file:

...

iOSiOS

示例项目定义了一个 IMicrophoneService 名为的 iOS 实现 iOSMicrophoneService :The sample project defines an IMicrophoneService implementation for iOS called iOSMicrophoneService:

[assembly: Dependency(typeof(iOSMicrophoneService))]

namespace CognitiveSpeechService.iOS.Services

{

public class iOSMicrophoneService : IMicrophoneService

{

TaskCompletionSource tcsPermissions;

public Task GetPermissionAsync()

{

tcsPermissions = new TaskCompletionSource();

RequestMicPermission();

return tcsPermissions.Task;

}

public void OnRequestPermissionResult(bool isGranted)

{

tcsPermissions.TrySetResult(isGranted);

}

void RequestMicPermission()

{

var session = AVAudioSession.SharedInstance();

session.RequestRecordPermission((granted) =>

{

tcsPermissions.TrySetResult(granted);

});

}

}

}

iOSMicrophoneService具有以下功能:The iOSMicrophoneService has the following features:

Dependency特性向注册该类 DependencyService 。The Dependency attribute registers the class with the DependencyService.

GetPermissionAsync方法调用 RequestMicPermissions 来请求设备用户的权限。The GetPermissionAsync method calls RequestMicPermissions to request permissions from the device user.

RequestMicPermissions方法使用共享 AVAudioSession 实例来请求录制权限。The RequestMicPermissions method uses the shared AVAudioSession instance to request recording permissions.

OnRequestPermissionResult方法 TaskCompletionSource 用提供的值更新实例 bool 。The OnRequestPermissionResult method updates the TaskCompletionSource instance with the provided bool value.

最后,iOS 应用 信息。 info.plist 必须包含一条消息,告知用户应用请求访问麦克风的原因。Finally, the iOS app Info.plist must include a message that tells the user why the app is requesting access to the microphone. 编辑 info.plist 文件,以在元素中包含以下标记 :Edit the Info.plist file to include the following tags within the element:

...

NSMicrophoneUsageDescription

Voice transcription requires microphone access

UWPUWP

示例项目定义了一个 IMicrophoneService 名为的 UWP 实现 UWPMicrophoneService :The sample project defines an IMicrophoneService implementation for UWP called UWPMicrophoneService:

[assembly: Dependency(typeof(UWPMicrophoneService))]

namespace CognitiveSpeechService.UWP.Services

{

public class UWPMicrophoneService : IMicrophoneService

{

public async Task GetPermissionAsync()

{

bool isMicAvailable = true;

try

{

var mediaCapture = new MediaCapture();

var settings = new MediaCaptureInitializationSettings();

settings.StreamingCaptureMode = StreamingCaptureMode.Audio;

await mediaCapture.InitializeAsync(settings);

}

catch(Exception ex)

{

isMicAvailable = false;

}

if(!isMicAvailable)

{

await Windows.System.Launcher.LaunchUriAsync(new Uri("ms-settings:privacy-microphone"));

}

return isMicAvailable;

}

public void OnRequestPermissionResult(bool isGranted)

{

// intentionally does nothing

}

}

}

UWPMicrophoneService具有以下功能:The UWPMicrophoneService has the following features:

Dependency特性向注册该类 DependencyService 。The Dependency attribute registers the class with the DependencyService.

此 GetPermissionAsync 方法尝试初始化 MediaCapture 实例。The GetPermissionAsync method attempts to initialize a MediaCapture instance. 如果此操作失败,则会启动用户请求以启用麦克风。If that fails, it launches a user request to enable the microphone.

此 OnRequestPermissionResult 方法存在以满足接口,但不是 UWP 实现所必需的。The OnRequestPermissionResult method exists to satisfy the interface but is not required for the UWP implementation.

最后,UWP Package. appxmanifest.xml 必须指定应用程序使用麦克风。Finally, the UWP Package.appxmanifest must specify that the application uses the microphone. 双击 appxmanifest.xml 文件,并在 Visual Studio 2019 的 " 功能 " 选项卡上选择 " 麦克风 " 选项:Double-click the Package.appxmanifest file and select the Microphone option on the Capabilities tab in Visual Studio 2019:

测试应用程序Test the application

运行应用,然后单击 " 转录 " 按钮。Run the app and click the Transcribe button. 应用应请求麦克风访问并开始脚本过程。The app should request microphone access and begin the transcription process. ActivityIndicator将动画显示,并显示脚本处于活动状态。The ActivityIndicator will animate, showing that transcription is active. 在说话时,应用程序会将音频数据流式传输到 Azure 语音服务资源,该资源将以转录文本响应。As you speak, the app will stream audio data to the Azure Speech Services resource, which will respond with transcribed text. 转录文本将在收到时显示在 Label 元素中。The transcribed text will appear in the Label element as it is received.

备注

Android 仿真程序无法加载和初始化语音服务库。Android emulators fail to load and initialize the Speech Service libraries. 对于 Android 平台,建议使用物理设备进行测试。Testing on a physical device is recommended for the Android platform.

相关链接Related links

android语音识别服务,使用语音服务 API 的语音识别 - Xamarin | Microsoft Docs相关推荐

  1. 阿里云 语音服务-国内语音服务

    @阿里云 语音服务-国内语音服务 前言 公司需要实现一个业务,前台访客来访电话通知,这儿涉及到了语音通信,通过查阅资料了解到了阿里云正好提供这样一个服务 阿里云语音服务的介绍:语音服务(Voice M ...

  2. android之基于百度语音合讯飞语音识别的语音交互

    app:http://fir.im/gval 这里面包含拨盘UI 开发平台:android studio 模拟一个  原始需求如下: 1)  在界面上,通过声音提示用户讲话: 2)  将语音内容转换为 ...

  3. twilio php 发送短信,如何使用 Twilio 实现语音和短信功能 (PHP) | Microsoft Docs

    您现在访问的是微软AZURE全球版技术文档网站,若需要访问由世纪互联运营的MICROSOFT AZURE中国区技术文档网站,请访问 https://docs.azure.cn. 如何通过 PHP 使用 ...

  4. 文件共享服务器onedrive,访问共享文件和文件夹 - OneDrive API - OneDrive dev center | Microsoft Docs...

    使用远程项目访问共享文件和文件夹 09/10/2017 本文内容 OneDrive 支持将共享文件夹添加到驱动器中,以便可以更轻松地从共享文件夹访问内容. 将共享文件夹添加到 OneDrive 后,此 ...

  5. xamarin.android 控件,Android 库控件 - Xamarin | Microsoft Docs

    Xamarin Android 库控件Xamarin.Android Gallery control 03/15/2018 本文内容 Gallery是一种布局小组件,用于显示水平滚动列表中的项,并将当 ...

  6. android transform xml xsl,XslCompiledTransform.Transform 方法 (System.Xml.Xsl) | Microsoft Docs

    使用 URI 指定的输入文档执行转换,然后将结果输出到 XmlWriter.Executes the transform using the input document specified by t ...

  7. 服务开通语音通知功能怎样实现?

    语音通知作为一种强提醒的信息通知方式,非常适合使用在服务开通通知场景中,可以有效避免用户错过重要信息.那服务开通语音通知怎么实现?这里互亿无线小编为大家做个详细介绍: 一.如何发送服务开通语音通知信息 ...

  8. ASP.NET Core环境Web Audio API+SingalR+微软语音服务实现web实时语音识别

    处于项目需要,我研究了一下web端的语音识别实现.目前市场上语音服务已经非常成熟了,国内的科大讯飞或是国外的微软在这块都可以提供足够优质的服务,对于我们工程应用来说只需要花钱调用接口就行了,难点在于整 ...

  9. 安卓调用系统语音识别功能全解(谷歌语音服务):获取识别结果,使用语音识别进行搜索。

    全栈工程师开发手册 (作者:栾鹏) 安卓教程全解 安卓调用系统语音识别功能全解(谷歌语音服务):获取识别结果,使用语音识别进行搜索. 首先要添加权限 <uses-permission andro ...

最新文章

  1. Activity详细解释(生命周期、以各种方式启动Activity、状态保存,等完全退出)...
  2. ondraw() 和dispatchdraw()的区别
  3. php多进程并发,php多进程模拟并发事务
  4. google谷歌官方的上拉刷新(可变的颜色)
  5. 百度,你拿什么和谷歌争?| 畅言
  6. C、C++用指针引用的差异
  7. 计算机网络基础系列(七)复用、分用和可靠数据传输的基本原理
  8. Web开发之Cookie
  9. 退出CrOS Factory,进入正常Chrome OS
  10. 微信小程序 后端接口(thinkphp)
  11. JavaSE Map集合 HashMap和Hashtable的区别 Collections(集合工具类) 集合练习 模拟斗地主(洗牌,发牌,看牌)
  12. HTML在线转换JS
  13. linux 安装codeql环境 (二)codeql database create通过报错分析其流程
  14. Python面向对象编程-类和实例
  15. 追光的人 团队团队展示
  16. Unity3d坦克大战联网
  17. HDU 4960:Another OCD Patient
  18. java备忘--20190828
  19. 详解数据结构课程设计———运动会分数统计
  20. 如何还原完整差异备份

热门文章

  1. git 本地与远程仓库出现代码冲突解决方法
  2. swagger生成错误问题 汇总解决
  3. coc部落冲突关联错误101解决方案
  4. 彻底解决_OBJC_CLASS_$_某文件名“, referenced from:问题(转)
  5. 从文件扩展名获取MIME类型
  6. CSS显示:内联vs内联块[重复]
  7. Python中的null对象?
  8. aardio部署_aardio有什么用
  9. 查询当天交易总额最大的用户信息_场内场外交易
  10. centos7火狐浏览器上不了网_网络问题备忘:能ping通,就是上不了网