最近CSDN开展了《0元试用微软 Azure人工智能认知服务,精美礼品大放送》,当前目前活动还在继续,热心的我已经第一时间报名参与,只不过今天才有时间实际的试用。

目前活动要求博文形式分享试用语音转文本、文本转语音、语音翻译、文本分析、文本翻译、语言理解中三项以上的服务。

目前我在试用了 语音转文本、文本转语音、语音翻译 功能后,决定做一个实时语音翻译机,使用后效果是真不错。

下面我们看看如何操作吧,首先我们进入:https://portal.azure.cn/并登录。

获取密钥

在搜索框输入 认知服务 并确认:

image-20211031003856676

然后可以创建语音服务:

image-20211031004119696

然后输入名称,选择位置,选择免费定价,新增资源组并选择:

image-20211031004740415

之后,点击创建。创建过程中会显示正在部署:

image-20211031004909066

部署完成后,点击转到资源:

image-20211031004955509

然后我们点击密钥和终结点,查看密钥和位置/区域:

image-20211031005136423

有两个密钥任选一个即可,位置/区域也需要记录下来,后面我们的程序就需要通过密钥和位置来调用。

Azure 认知服务初体验

Azure 认知服务文档:https://docs.azure.cn/zh-cn/cognitive-services/

按文档要求,我们首先安装Azure 语音相关的python库:

pip install azure-cognitiveservices-speech

首先我们体验一下语音转文本:

测试语音转文本

文档:https://docs.azure.cn/zh-cn/cognitive-services/speech-service/get-started-speech-to-text?tabs=windowsinstall&pivots=programming-language-python

复制官方的代码后,简单修改下实现从麦克风识别语音:

import azure.cognitiveservices.speech as speechsdk

speech_key, service_region = "59392xxxxxxxxxx559de", "chinaeast2"
speech_config = speechsdk.SpeechConfig(
    subscription=speech_key, region=service_region, speech_recognition_language="zh-cn")
speech_recognizer = speechsdk.SpeechRecognizer(speech_config=speech_config)

print("说:", end="")
result = speech_recognizer.recognize_once()
print(result.text)

speech_recognition_language决定了语言,这里我设置为中文。

我运行后,对麦克风说了一句话,程序已经准确的识别出我说的内容:

说:微软人工智能服务非常好用。

测试文本转语音

文档:https://docs.azure.cn/zh-cn/cognitive-services/speech-service/get-started-text-to-speech?tabs=script%2Cwindowsinstall&pivots=programming-language-python

借助文档我们还可以实现将转换完成的语音保存起来,但这里我只演示直接声音播放出来:

from azure.cognitiveservices.speech import AudioDataStream, SpeechConfig, SpeechSynthesizer, SpeechSynthesisOutputFormat
from azure.cognitiveservices.speech.audio import AudioOutputConfig

speech_config.speech_synthesis_language = "zh-cn"
audio_config = AudioOutputConfig(use_default_speaker=True)
speech_synthesizer = SpeechSynthesizer(
    speech_config=speech_config, audio_config=audio_config)

text_words = "微软人工智能服务非常好用。"
result = speech_synthesizer.speak_text_async(text_words).get()
if result.reason != speechsdk.ResultReason.SynthesizingAudioCompleted:
    print(result.reason)

感觉转换效果很好。

测试语音翻译功能

文档地址:https://docs.azure.cn/zh-cn/cognitive-services/speech-service/get-started-speech-translation?tabs=script%2Cwindowsinstall&pivots=programming-language-python

经测试,语音翻译同时包含了语音转文本和翻译功能:

from_language, to_language = 'zh-cn', 'en'
translation_config = speechsdk.translation.SpeechTranslationConfig(
    subscription=speech_key, region=service_region, speech_recognition_language=from_language)
translation_config.add_target_language(to_language)
recognizer = speechsdk.translation.TranslationRecognizer(
    translation_config=translation_config)


def speakAndTranslation():
    result = recognizer.recognize_once()
    if result.reason == speechsdk.ResultReason.TranslatedSpeech:
        return result.text, result.translations[to_language]
    elif result.reason == speechsdk.ResultReason.RecognizedSpeech:
        return result.text, None
    elif result.reason == speechsdk.ResultReason.NoMatch:
        print(result.no_match_details)
    elif result.reason == speechsdk.ResultReason.Canceled:
        print(result.cancellation_details)


speakAndTranslation()

这里执行后并说一句话,结果:

('大家好才是真的好。', 'Everyone is really good.')

可以同时获取原始文本和译文,所以我们后面的语音翻译工具,也都使用该接口。

语音翻译机开发

程序的大致逻辑结构:

image-20211031022300239

完整代码:

"""
小小明的代码
CSDN主页:https://blog.csdn.net/as604049322
"""
__author__ = '小小明'
__time__ = '2021/10/30'

import azure.cognitiveservices.speech as speechsdk

from azure.cognitiveservices.speech.audio import AudioOutputConfig

speech_key, service_region = "59xxxxde", "chinaeast2"
speech_config = speechsdk.SpeechConfig(subscription=speech_key, region=service_region,
                                       speech_recognition_language="zh-cn")
speech_config.speech_synthesis_language = "zh-cn"
audio_config = AudioOutputConfig(use_default_speaker=True)
speech_synthesizer = speechsdk.SpeechSynthesizer(
    speech_config=speech_config, audio_config=audio_config)

from_language, to_language = 'zh-cn', 'en'
translation_config = speechsdk.translation.SpeechTranslationConfig(
    subscription=speech_key, region=service_region, speech_recognition_language=from_language)
translation_config.add_target_language(to_language)
recognizer = speechsdk.translation.TranslationRecognizer(
    translation_config=translation_config)


def speakAndTranslation():
    result = recognizer.recognize_once()
    if result.reason == speechsdk.ResultReason.TranslatedSpeech:
        return result.text, result.translations[to_language]
    elif result.reason == speechsdk.ResultReason.RecognizedSpeech:
        return result.text, None
    elif result.reason == speechsdk.ResultReason.NoMatch:
        print(result.no_match_details)
    elif result.reason == speechsdk.ResultReason.Canceled:
        print(result.cancellation_details)


def speak(text_words):
    result = speech_synthesizer.speak_text_async(text_words).get()
    #     print(result.reason)
    if result.reason == speechsdk.ResultReason.Canceled:
        cancellation_details = result.cancellation_details
        print("识别取消:", cancellation_details.reason)
        if cancellation_details.reason == speechsdk.CancellationReason.Error:
            if cancellation_details.error_details:
                print("错误详情:", cancellation_details.error_details)


while True:
    print("说:", end=" ")
    text, translation_text = speakAndTranslation()
    print(text)
    print("译文:", translation_text)
    if "退出" in text:
        break
    if text:
        speak(translation_text)

简单的运行了一下,中间的打印效果如下:

说: 我只想进转过山和大海。
译文: I just want to go in and out of the mountains and the sea.
说: 也穿越,人山人海。
译文: Also through, the sea of people and mountains.
说: 我曾经目睹这一切全部都随风飘然。
译文: I've seen it all blow in the wind.
说: 转眼成空。
译文: It's empty.
说: 问,世间能有几多愁?
译文: Q, how much worry can there be in the world?
说: 退出。
译文: quit.

最终的语音功能也只有各位亲自体验了噢。


本文转载:CSDN博客