前段时间分享了一篇语音转文字(科大讯飞篇)
今天尝试下开源的deepspeech,deepspeech是基于百度的深度语音论文和谷歌的深度学习框架Tensorflow来实现的。
DeepSpeech is an open source Speech-To-Text engine, using a model trained by machine learning techniques based on Baidu’s Deep Speech research paper. Project DeepSpeech uses Google’s TensorFlow to make the implementation easier.
安装过程比较简单,
1.首先安装python,virtualenv,去python官网下载即可(这里用的3.9版本,3.10版本无法用于deepspeech 0.9版本)
2.创建pyhon的虚拟环境
virtualenv -p python3 $HOME/tmp/deepspeech-venv/
source $HOME/tmp/deepspeech-venv/Scripts/activate
3.安装deepspeech,和下载训练好的模型,及语音文件。
pip3 install deepspeech
curl -LO https://github.com/mozilla/DeepSpeech/releases/download/v0.9.3/deepspeech-0.9.3-models.pbmm
curl -LO https://github.com/mozilla/DeepSpeech/releases/download/v0.9.3/deepspeech-0.9.3-models.scorer
curl -LO https://github.com/mozilla/DeepSpeech/releases/download/v0.9.3/audio-0.9.3.tar.gz
tar xvf audio-0.9.3.tar.gz
4.一切准备好后,我们就可以测试语音的识别效果了,如下的音频文件,识别出的结果为:why should one halt on the way,正确率还不错。#教育听我说##我要上 #
,