ASRT_SpeechRecognition/README.md

# A Deep-Learning-Based Chinese Speech Recognition System
基于深度学习的中文语音识别系统，如果您觉得喜欢，请点一个 **"Star"** 吧~

[![GPL-3.0 Licensed](https://img.shields.io/badge/License-GPL3.0-blue.svg?style=flat)](https://opensource.org/licenses/GPL-3.0) [![TensorFlow Version](https://img.shields.io/badge/Tensorflow-1.4+-blue.svg)](https://www.tensorflow.org/) [![Keras Version](https://img.shields.io/badge/Keras-2.0+-blue.svg)](https://keras.io/) [![Python Version](https://img.shields.io/badge/Python-3.x-blue.svg)](https://www.python.org/) 

**ReadMe Language** | 中文版 | [English](https://github.com/nl8590687/ASRT_SpeechRecognition/blob/master/README_EN.md) |

[**查看本项目的Wiki文档**](https://github.com/nl8590687/ASRT_SpeechRecognition/wiki) 

如果程序运行期间或使用中有什么问题，可以及时在issue中提出来，我将尽快做出答复。本项目作者交流QQ群：**867888133**

提问前可以先 [查看常见问题](https://github.com/nl8590687/ASRT_SpeechRecognition/wiki/issues) 避免重复提问

ASRT的原理请查看本文：
* [ASRT：一个中文语音识别系统](https://blog.ailemon.me/2018/08/29/asrt-a-chinese-speech-recognition-system/)

关于经常被问到的统计语言模型原理的问题，请看：

* [统计语言模型：从中文拼音到文本](https://blog.ailemon.me/2017/04/27/statistical-language-model-chinese-pinyin-to-words/)
* [无需中文分词算法的简单词频统计](https://blog.ailemon.me/2017/02/20/simple-words-frequency-statistic-without-segmentation-algorithm/)

## Introduction 简介

本项目使用Keras、TensorFlow基于深度卷积神经网络和长短时记忆神经网络、注意力机制以及CTC实现。

This project uses Keras, TensorFlow based on deep convolutional neural network and long-short memory neural network, attention mechanism and CTC to implement.

* **操作步骤**

首先通过Git将本项目克隆到您的计算机上，然后下载本项目训练所需要的数据集，下载链接详见[文档末尾部分](https://github.com/nl8590687/ASRT_SpeechRecognition#data-sets-%E6%95%B0%E6%8D%AE%E9%9B%86)。
```shell
$ git clone https://github.com/nl8590687/ASRT_SpeechRecognition.git
```

或者您也可以通过 "Fork" 按钮，将本项目Copy一份副本，然后通过您自己的SSH密钥克隆到本地。

通过git克隆仓库以后，进入项目根目录；并创建子目录 `dataset/` (可使用软链接代替)，然后将下载好的数据集直接解压进去
```shell
$ cd ASRT_SpeechRecognition

$ mkdir dataset

$ tar zxf <数据集压缩文件名> -C dataset/ 
```

然后需要将datalist目录下的文件全部拷贝到 `dataset/` 目录下，也就是将其跟数据集放在一起。
```shell
$ cp -rf datalist/* dataset/
```

目前可用的模型有24、25和251

运行本项目之前，请安装必要的[Python3版依赖库](https://github.com/nl8590687/ASRT_SpeechRecognition#python-import)

本项目开始训练请执行：
```shell
$ python3 train_mspeech.py
```
本项目开始测试请执行：
```shell
$ python3 test_mspeech.py
```
测试之前，请确保代码中填写的模型文件路径存在。

ASRT API服务器启动请执行：
```shell
$ python3 asrserver.py
```

请注意，开启API服务器之后，需要使用本ASRT项目对应的客户端软件来进行语音识别，详见Wiki文档[ASRT客户端Demo](https://github.com/nl8590687/ASRT_SpeechRecognition/wiki/ClientDemo)。

如果要训练和使用模型251，请在代码中 `import SpeechModel` 的相应位置做修改。

## Model 模型

### Speech Model 语音模型

CNN + LSTM/GRU + CTC

其中，输入的音频的最大时间长度为16秒，输出为对应的汉语拼音序列

* 关于下载已经训练好的模型的问题

可以在Github本仓库下[releases](https://github.com/nl8590687/ASRT_SpeechRecognition/releases)里面的查看发布的各个版本软件的压缩包里获得包含已经训练好模型参数的完整源程序。

### Language Model 语言模型

基于概率图的最大熵隐马尔可夫模型

输入为汉语拼音序列，输出为对应的汉字文本

## About Accuracy 关于准确率

当前，最好的模型在测试集上基本能达到80%的汉语拼音正确率

不过由于目前国际和国内的部分团队能做到98%，所以正确率仍有待于进一步提高

## Python Import
Python的依赖库

* python_speech_features
* TensorFlow
* Keras
* Numpy
* wave
* matplotlib
* math
* Scipy
* h5py
* http
* urllib

## Data Sets 数据集
* **清华大学THCHS30中文语音数据集**

  data_thchs30.tgz 
[OpenSLR国内镜像](<http://cn-mirror.openslr.org/resources/18/data_thchs30.tgz>)
[OpenSLR国外镜像](<http://www.openslr.org/resources/18/data_thchs30.tgz>)

  test-noise.tgz 
[OpenSLR国内镜像](<http://cn-mirror.openslr.org/resources/18/test-noise.tgz>)
[OpenSLR国外镜像](<http://www.openslr.org/resources/18/test-noise.tgz>)

  resource.tgz 
[OpenSLR国内镜像](<http://cn-mirror.openslr.org/resources/18/resource.tgz>)
[OpenSLR国外镜像](<http://www.openslr.org/resources/18/resource.tgz>)

* **Free ST Chinese Mandarin Corpus** 

  ST-CMDS-20170001_1-OS.tar.gz 
[OpenSLR国内镜像](<http://cn-mirror.openslr.org/resources/38/ST-CMDS-20170001_1-OS.tar.gz>)
[OpenSLR国外镜像](<http://www.openslr.org/resources/38/ST-CMDS-20170001_1-OS.tar.gz>)

* **AIShell-1 开源版数据集** 

  data_aishell.tgz
[OpenSLR国内镜像](<http://cn-mirror.openslr.org/resources/33/data_aishell.tgz>)
[OpenSLR国外镜像](<http://www.openslr.org/resources/33/data_aishell.tgz>)

  注：数据集解压方法

  ```
  $ tar xzf data_aishell.tgz
  $ cd data_aishell/wav
  $ for tar in *.tar.gz;  do tar xvf $tar; done
  ```

* **Primewords Chinese Corpus Set 1** 

  primewords_md_2018_set1.tar.gz
[OpenSLR国内镜像](<http://cn-mirror.openslr.org/resources/47/primewords_md_2018_set1.tar.gz>)
[OpenSLR国外镜像](<http://www.openslr.org/resources/47/primewords_md_2018_set1.tar.gz>)

* **aidatatang_200zh**

  200.zip
[OpenSLR国内镜像](<http://cn-mirror.openslr.org/resources/62/200.zip>)
[OpenSLR国外镜像](<http://www.openslr.org/resources/62/200.zip>)

特别鸣谢！感谢前辈们的公开语音数据集

如果提供的数据集链接无法打开和下载，请点击该链接 [OpenSLR](http://www.openslr.org)

## Log
日志链接：[进展日志](https://github.com/nl8590687/ASRT_SpeechRecognition/blob/master/log.md)

## Contributors 贡献者们

[@zw76859420](https://github.com/zw76859420) 
@madeirak @ZJUGuoShuai @williamchenwl

@nl8590687 (repo owner)

[**打赏作者**](https://github.com/nl8590687/ASRT_SpeechRecognition/wiki/donate)
-												update readme.md

											
										
										
											2018-06-19 16:10:56 +08:00
+								# A Deep-Learning-Based Chinese Speech Recognition System
-												Update readme

											
										
										
											2018-11-16 16:39:19 +08:00
+								基于深度学习的中文语音识别系统，如果您觉得喜欢，请点一个 **"Star"** 吧~
-												update readme.md

											
										
										
											2018-06-19 16:10:56 +08:00
-												Add software version infomation

											
										
										
											2018-08-09 23:44:22 +08:00
+								[![GPL-3.0 Licensed](https://img.shields.io/badge/License-GPL3.0-blue.svg?style=flat)](https://opensource.org/licenses/GPL-3.0) [![TensorFlow Version](https://img.shields.io/badge/Tensorflow-1.4+-blue.svg)](https://www.tensorflow.org/) [![Keras Version](https://img.shields.io/badge/Keras-2.0+-blue.svg)](https://keras.io/) [![Python Version](https://img.shields.io/badge/Python-3.x-blue.svg)](https://www.python.org/)
-												update readme

											
										
										
											2018-07-26 10:41:00 +08:00
-												Update readme

											
										
										
											2018-11-16 16:39:19 +08:00
+								**ReadMe Language** | 中文版 | [English](https://github.com/nl8590687/ASRT_SpeechRecognition/blob/master/README_EN.md) |
-												update readme

											
										
										
											2018-07-26 10:41:00 +08:00
-												update readme

											
										
										
											2019-01-25 22:48:54 +08:00
+								[**查看本项目的Wiki文档**](https://github.com/nl8590687/ASRT_SpeechRecognition/wiki)
-												修复了一大堆bug...

											
										
										
											2018-03-30 23:04:11 +08:00
-												Update readme

											
										
										
											2018-11-16 16:39:19 +08:00
+								如果程序运行期间或使用中有什么问题，可以及时在issue中提出来，我将尽快做出答复。本项目作者交流QQ群：**867888133**
-												add some info

											
										
										
											2018-09-08 15:13:05 +08:00
-												Update readme

											
										
										
											2018-11-16 16:39:19 +08:00
+								提问前可以先 [查看常见问题](https://github.com/nl8590687/ASRT_SpeechRecognition/wiki/issues) 避免重复提问
 								ASRT的原理请查看本文：
 								* [ASRT：一个中文语音识别系统](https://blog.ailemon.me/2018/08/29/asrt-a-chinese-speech-recognition-system/)
 								关于经常被问到的统计语言模型原理的问题，请看：
 								* [统计语言模型：从中文拼音到文本](https://blog.ailemon.me/2017/04/27/statistical-language-model-chinese-pinyin-to-words/)
 								* [无需中文分词算法的简单词频统计](https://blog.ailemon.me/2017/02/20/simple-words-frequency-statistic-without-segmentation-algorithm/)
-												add some infomation for a question often asked

											
										
										
											2018-09-27 17:29:18 +08:00
-												修复了一大堆bug...

											
										
										
											2018-03-30 23:04:11 +08:00
+								## Introduction 简介
-												update readme and del m22 23,and fix m25 get_freq2 bug

											
										
										
											2018-06-30 13:11:14 +08:00
+								本项目使用Keras、TensorFlow基于深度卷积神经网络和长短时记忆神经网络、注意力机制以及CTC实现。
-												修复了一大堆bug...

											
										
										
											2018-03-30 23:04:11 +08:00
-												update readme and del m22 23,and fix m25 get_freq2 bug

											
										
										
											2018-06-30 13:11:14 +08:00
+								This project uses Keras, TensorFlow based on deep convolutional neural network and long-short memory neural network, attention mechanism and CTC to implement.
-												修复了一大堆bug...

											
										
										
											2018-03-30 23:04:11 +08:00
-												Update readme

											
										
										
											2018-11-16 16:39:19 +08:00
+								* **操作步骤**
 								首先通过Git将本项目克隆到您的计算机上，然后下载本项目训练所需要的数据集，下载链接详见[文档末尾部分](https://github.com/nl8590687/ASRT_SpeechRecognition#data-sets-%E6%95%B0%E6%8D%AE%E9%9B%86)。
 								```shell
 								$ git clone https://github.com/nl8590687/ASRT_SpeechRecognition.git
 								```
 								或者您也可以通过 "Fork" 按钮，将本项目Copy一份副本，然后通过您自己的SSH密钥克隆到本地。
-												修复了一大堆bug...

											
										
										
											2018-03-30 23:04:11 +08:00
-												Update readme

											
										
										
											2018-11-16 16:39:19 +08:00
+								通过git克隆仓库以后，进入项目根目录；并创建子目录 `dataset/` (可使用软链接代替)，然后将下载好的数据集直接解压进去
 								```shell
 								$ cd ASRT_SpeechRecognition
 								$ mkdir dataset
 								$ tar zxf <数据集压缩文件名> -C dataset/
 								```
 								然后需要将datalist目录下的文件全部拷贝到 `dataset/` 目录下，也就是将其跟数据集放在一起。
-												add train and test python3 script and modify readme

											
										
										
											2018-05-11 18:42:58 +08:00
+								```shell
 								$ cp -rf datalist/* dataset/
 								```
-												fix bugs

											
										
										
											2018-05-02 11:51:44 +08:00
-												update readme

											
										
										
											2018-07-26 10:41:00 +08:00
+								目前可用的模型有24、25和251
-												fix bugs

											
										
										
											2018-05-02 11:51:44 +08:00
-												Update readme

											
										
										
											2018-11-16 16:39:19 +08:00
+								运行本项目之前，请安装必要的[Python3版依赖库](https://github.com/nl8590687/ASRT_SpeechRecognition#python-import)
-												add train and test python3 script and modify readme

											
										
										
											2018-05-11 18:42:58 +08:00
+								本项目开始训练请执行：
 								```shell
 								$ python3 train_mspeech.py
 								```
 								本项目开始测试请执行：
-												修复了一大堆bug...

											
										
										
											2018-03-30 23:04:11 +08:00
+								```shell
-												add train and test python3 script and modify readme

											
										
										
											2018-05-11 18:42:58 +08:00
+								$ python3 test_mspeech.py
-												修复了一大堆bug...

											
										
										
											2018-03-30 23:04:11 +08:00
+								```
-												add train and test python3 script and modify readme

											
										
										
											2018-05-11 18:42:58 +08:00
+								测试之前，请确保代码中填写的模型文件路径存在。
-												使thchs30的数据列表适用于2018年新版数据集格式，并切换特征提取算法

											
										
										
											2018-05-15 21:43:11 +08:00
+								ASRT API服务器启动请执行：
 								```shell
 								$ python3 asrserver.py
 								```
-												update readme

											
										
										
											2019-01-25 22:48:54 +08:00
+								请注意，开启API服务器之后，需要使用本ASRT项目对应的客户端软件来进行语音识别，详见Wiki文档[ASRT客户端Demo](https://github.com/nl8590687/ASRT_SpeechRecognition/wiki/ClientDemo)。
-												add a new model 251

											
										
										
											2018-07-06 13:57:53 +08:00
+								如果要训练和使用模型251，请在代码中 `import SpeechModel` 的相应位置做修改。
-												update readme and log

											
										
										
											2018-05-21 19:11:22 +08:00
-												修复了一大堆bug...

											
										
										
											2018-03-30 23:04:11 +08:00
+								## Model 模型
 								### Speech Model 语音模型
-												fix bugs

											
										
										
											2018-05-02 11:51:44 +08:00
+								CNN + LSTM/GRU + CTC
-												修复了一大堆bug...

											
										
										
											2018-03-30 23:04:11 +08:00
-												Update README

											
										
										
											2018-12-24 14:01:40 +08:00
+								其中，输入的音频的最大时间长度为16秒，输出为对应的汉语拼音序列
-												m22模型正式定型，并且添加了如何下载训练好的模型的说明

											
										
										
											2018-05-25 14:26:03 +08:00
+								* 关于下载已经训练好的模型的问题
-												add GetFreqFeat4 and update readme

											
										
										
											2019-03-18 14:28:54 +08:00
+								可以在Github本仓库下[releases](https://github.com/nl8590687/ASRT_SpeechRecognition/releases)里面的查看发布的各个版本软件的压缩包里获得包含已经训练好模型参数的完整源程序。
-												m22模型正式定型，并且添加了如何下载训练好的模型的说明

											
										
										
											2018-05-25 14:26:03 +08:00
-												修复了一大堆bug...

											
										
										
											2018-03-30 23:04:11 +08:00
+								### Language Model 语言模型
-												update readme.md

											
										
										
											2018-06-25 20:22:23 +08:00
+								基于概率图的最大熵隐马尔可夫模型
-												修复了一大堆bug...

											
										
										
											2018-03-30 23:04:11 +08:00
-												Update README

											
										
										
											2018-12-24 14:01:40 +08:00
+								输入为汉语拼音序列，输出为对应的汉字文本
-												add train and test python3 script and modify readme

											
										
										
											2018-05-11 18:42:58 +08:00
+								## About Accuracy 关于准确率
-												update readme and del m22 23,and fix m25 get_freq2 bug

											
										
										
											2018-06-30 13:11:14 +08:00
+								当前，最好的模型在测试集上基本能达到80%的汉语拼音正确率
-												add train and test python3 script and modify readme

											
										
										
											2018-05-11 18:42:58 +08:00
-												Update README

											
										
										
											2018-12-24 14:01:40 +08:00
+								不过由于目前国际和国内的部分团队能做到98%，所以正确率仍有待于进一步提高
-												add train and test python3 script and modify readme

											
										
										
											2018-05-11 18:42:58 +08:00
-												修复了一大堆bug...

											
										
										
											2018-03-30 23:04:11 +08:00
+								## Python Import
 								Python的依赖库
 								* python_speech_features
 								* TensorFlow
 								* Keras
 								* Numpy
 								* wave
 								* matplotlib
 								* math
 								* Scipy
 								* h5py
-												Update README

											
										
										
											2018-12-24 14:01:40 +08:00
+								* http
 								* urllib
-												修复了一大堆bug...

											
										
										
											2018-03-30 23:04:11 +08:00
 								## Data Sets 数据集
-												Update readme

											
										
										
											2018-11-16 16:39:19 +08:00
+								* **清华大学THCHS30中文语音数据集**
-												修正格式

											
										
										
											2018-03-30 23:06:43 +08:00
-												Update README

											
										
										
											2018-12-24 14:01:40 +08:00
+								  data_thchs30.tgz
-												Update readme

											
										
										
											2018-11-16 16:39:19 +08:00
+								[OpenSLR国内镜像](<http://cn-mirror.openslr.org/resources/18/data_thchs30.tgz>)
 								[OpenSLR国外镜像](<http://www.openslr.org/resources/18/data_thchs30.tgz>)
-												修正格式

											
										
										
											2018-03-30 23:06:43 +08:00
-												Update README

											
										
										
											2018-12-24 14:01:40 +08:00
+								  test-noise.tgz
-												Update readme

											
										
										
											2018-11-16 16:39:19 +08:00
+								[OpenSLR国内镜像](<http://cn-mirror.openslr.org/resources/18/test-noise.tgz>)
 								[OpenSLR国外镜像](<http://www.openslr.org/resources/18/test-noise.tgz>)
-												修正格式

											
										
										
											2018-03-30 23:06:43 +08:00
-												Update README

											
										
										
											2018-12-24 14:01:40 +08:00
+								  resource.tgz
-												Update readme

											
										
										
											2018-11-16 16:39:19 +08:00
+								[OpenSLR国内镜像](<http://cn-mirror.openslr.org/resources/18/resource.tgz>)
 								[OpenSLR国外镜像](<http://www.openslr.org/resources/18/resource.tgz>)
 								* **Free ST Chinese Mandarin Corpus**
-												move m2 m21 to trash and test asrserver

											
										
										
											2018-04-25 21:18:28 +08:00
-												Update README

											
										
										
											2018-12-24 14:01:40 +08:00
+								  ST-CMDS-20170001_1-OS.tar.gz
-												Update readme

											
										
										
											2018-11-16 16:39:19 +08:00
+								[OpenSLR国内镜像](<http://cn-mirror.openslr.org/resources/38/ST-CMDS-20170001_1-OS.tar.gz>)
 								[OpenSLR国外镜像](<http://www.openslr.org/resources/38/ST-CMDS-20170001_1-OS.tar.gz>)
-												update readme docs

											
										
										
											2019-01-15 16:46:48 +08:00
+								* **AIShell-1 开源版数据集**
-												Update readme

											
										
										
											2018-11-16 16:39:19 +08:00
-												Update README

											
										
										
											2018-12-24 14:01:40 +08:00
+								  data_aishell.tgz
-												Update readme

											
										
										
											2018-11-16 16:39:19 +08:00
+								[OpenSLR国内镜像](<http://cn-mirror.openslr.org/resources/33/data_aishell.tgz>)
 								[OpenSLR国外镜像](<http://www.openslr.org/resources/33/data_aishell.tgz>)
-												更新贡献者名单

											
										
										
											2019-03-16 13:22:59 +08:00
+								  注：数据集解压方法
-												update readme

											
										
										
											2019-01-25 22:48:54 +08:00
-												更新贡献者名单

											
										
										
											2019-03-16 13:22:59 +08:00
+								  ```
 								  $ tar xzf data_aishell.tgz
 								  $ cd data_aishell/wav
 								  $ for tar in *.tar.gz;  do tar xvf $tar; done
 								  ```
-												update readme

											
										
										
											2019-01-25 22:48:54 +08:00
-												update readme docs

											
										
										
											2019-01-15 16:46:48 +08:00
+								* **Primewords Chinese Corpus Set 1**
-												Update readme

											
										
										
											2018-11-16 16:39:19 +08:00
-												Update README

											
										
										
											2018-12-24 14:01:40 +08:00
+								  primewords_md_2018_set1.tar.gz
-												Update readme

											
										
										
											2018-11-16 16:39:19 +08:00
+								[OpenSLR国内镜像](<http://cn-mirror.openslr.org/resources/47/primewords_md_2018_set1.tar.gz>)
 								[OpenSLR国外镜像](<http://www.openslr.org/resources/47/primewords_md_2018_set1.tar.gz>)
-												修正格式

											
										
										
											2018-03-30 23:06:43 +08:00
-												add new open source dataset aidatatang_200zh in readme

											
										
										
											2019-04-13 17:18:54 +08:00
+								* **aidatatang_200zh**
 .zip
 								[OpenSLR国内镜像](<http://cn-mirror.openslr.org/resources/62/200.zip>)
 								[OpenSLR国外镜像](<http://www.openslr.org/resources/62/200.zip>)
-												修复了一大堆bug...

											
										
										
											2018-03-30 23:04:11 +08:00
+								特别鸣谢！感谢前辈们的公开语音数据集
-												update readme.md

											
										
										
											2018-06-19 16:10:56 +08:00
+								如果提供的数据集链接无法打开和下载，请点击该链接 [OpenSLR](http://www.openslr.org)
-												修复了一大堆bug...

											
										
										
											2018-03-30 23:04:11 +08:00
+								## Log
-												Update readme

											
										
										
											2018-11-16 16:39:19 +08:00
+								日志链接：[进展日志](https://github.com/nl8590687/ASRT_SpeechRecognition/blob/master/log.md)
-												纠正了数据集标注中的一些错误

											
										
										
											2018-05-23 17:14:32 +08:00
 								## Contributors 贡献者们
-												更新贡献者名单

											
										
										
											2019-03-16 13:22:59 +08:00
 								[@zw76859420](https://github.com/zw76859420)
 								@madeirak @ZJUGuoShuai @williamchenwl
-												纠正了数据集标注中的一些错误

											
										
										
											2018-05-23 17:14:32 +08:00
-												update readme

											
										
										
											2018-07-26 10:41:00 +08:00
+								@nl8590687 (repo owner)
-												Update readme

											
										
										
											2018-11-16 16:39:19 +08:00
+								[**打赏作者**](https://github.com/nl8590687/ASRT_SpeechRecognition/wiki/donate)