ASRT_SpeechRecognition/README.md

# A Deep-Learning-Based Chinese Speech Recognition System
基于深度学习的中文语音识别系统

ReadMe Language [中文版](https://github.com/nl8590687/ASRT_SpeechRecognition/blob/master/README.md) [English](https://github.com/nl8590687/ASRT_SpeechRecognition/blob/master/README_EN.md) 

## Introduction 简介

本项目使用Keras、TensorFlow基于长短时记忆神经网络和卷积神经网络以及CTC进行制作。

This project uses keras, TensorFlow based on LSTM, CNN and CTC to implement. 

[查看本项目的Wiki页面](https://github.com/nl8590687/ASRT_SpeechRecognition/wiki) (正在完善中)

本项目目前已经可以正常进行训练了。

通过git克隆仓库以后，需要将datalist目录下的文件全部拷贝到dataset目录下，也就是将其跟数据集放在一起。
```shell
$ cp -rf datalist/* dataset/
```

目前可用的模型有22、24和25

本项目开始训练请执行：
```shell
$ python3 train_mspeech.py
```
本项目开始测试请执行：
```shell
$ python3 test_mspeech.py
```
测试之前，请确保代码中填写的模型文件路径存在。

ASRT API服务器启动请执行：
```shell
$ python3 asrserver.py
```

如果要训练和使用模型25，请在代码中 `import SpeechModel` 的相应位置做修改。

如果程序运行期间或使用中有什么问题，可以及时在issue中提出来，我将尽快做出答复。

提问前可以先 [查看常见问题](https://github.com/nl8590687/ASRT_SpeechRecognition/wiki/issues) 

## Model 模型

### Speech Model 语音模型

CNN + LSTM/GRU + CTC

* 关于下载已经训练好的模型的问题

可以在Github本仓库下release里面的查看发布的各个版本软件的压缩包里获得完整源程序。

### Language Model 语言模型

基于概率图的最大熵隐马尔可夫模型

## About Accuracy 关于准确率

当前，speech_model22的准确率在GPU上训练了120+小时（大约50个epoch），在测试集上基本能达到70+%的汉语拼音正确率

不过由于目前国际和国内的部分团队能做到97%，所以正确率仍有待于进一步提高

* 目前可知的可以继续提高准确率的一个方案就是纠正数据集标注错误，尤其是ST-CMDS里面关于syllable文件中拼音的错误，这里面有一定比例的错误标注，如果走过路过的各位有意愿尽自己的能力帮助纠正一些数据标注错误的，我将非常欢迎，可以通过提交Pull Request来纠正，并且将登上本仓库的贡献者名单。

样例：`不是： bu4 shi4 -> bu2 shi4` `一个：yi1 ge4 -> yi2 ge4` `了解：le5 jie3 -> liao3 jie3`

* 已订正部分：

ST-CMDS

train:  20170001P00001A    20170001P00001I    20170001P00002A

## Python Import
Python的依赖库

* python_speech_features
* TensorFlow
* Keras
* Numpy
* wave
* matplotlib
* math
* Scipy
* h5py

## Data Sets 数据集
* 清华大学THCHS30中文语音数据集

data_thchs30.tgz 
<http://cn-mirror.openslr.org/resources/18/data_thchs30.tgz>
<http://www.openslr.org/resources/18/data_thchs30.tgz>

test-noise.tgz 
<http://cn-mirror.openslr.org/resources/18/test-noise.tgz>
<http://www.openslr.org/resources/18/test-noise.tgz>

resource.tgz 
<http://cn-mirror.openslr.org/resources/18/resource.tgz>
<http://www.openslr.org/resources/18/resource.tgz>

* Free ST Chinese Mandarin Corpus

ST-CMDS-20170001_1-OS.tar.gz 
<http://cn-mirror.openslr.org/resources/38/ST-CMDS-20170001_1-OS.tar.gz>
<http://www.openslr.org/resources/38/ST-CMDS-20170001_1-OS.tar.gz>

特别鸣谢！感谢前辈们的公开语音数据集

如果提供的数据集链接无法打开和下载，请点击该链接 [OpenSLR](http://www.openslr.org)

## Log
日志

链接：[进展日志](https://github.com/nl8590687/ASRT_SpeechRecognition/blob/master/log.md)

## Contributors 贡献者们
@ZJUGuoShuai @williamchenwl

@nl8590687 (repo owner)
-												update readme.md

											
										
										
											2018-06-19 16:10:56 +08:00
+								# A Deep-Learning-Based Chinese Speech Recognition System
 								基于深度学习的中文语音识别系统
 								ReadMe Language [中文版](https://github.com/nl8590687/ASRT_SpeechRecognition/blob/master/README.md) [English](https://github.com/nl8590687/ASRT_SpeechRecognition/blob/master/README_EN.md)
-												修复了一大堆bug...

											
										
										
											2018-03-30 23:04:11 +08:00
 								## Introduction 简介
 								本项目使用Keras、TensorFlow基于长短时记忆神经网络和卷积神经网络以及CTC进行制作。
 								This project uses keras, TensorFlow based on LSTM, CNN and CTC to implement.
-												add train and test python3 script and modify readme

											
										
										
											2018-05-11 18:42:58 +08:00
+								[查看本项目的Wiki页面](https://github.com/nl8590687/ASRT_SpeechRecognition/wiki) (正在完善中)
-												move m2 m21 to trash and test asrserver

											
										
										
											2018-04-25 21:18:28 +08:00
+								本项目目前已经可以正常进行训练了。
-												修复了一大堆bug...

											
										
										
											2018-03-30 23:04:11 +08:00
-												fix bugs

											
										
										
											2018-05-02 11:51:44 +08:00
+								通过git克隆仓库以后，需要将datalist目录下的文件全部拷贝到dataset目录下，也就是将其跟数据集放在一起。
-												add train and test python3 script and modify readme

											
										
										
											2018-05-11 18:42:58 +08:00
+								```shell
 								$ cp -rf datalist/* dataset/
 								```
-												fix bugs

											
										
										
											2018-05-02 11:51:44 +08:00
-												update readme.md

											
										
										
											2018-06-25 20:22:23 +08:00
+								目前可用的模型有22、24和25
-												fix bugs

											
										
										
											2018-05-02 11:51:44 +08:00
-												add train and test python3 script and modify readme

											
										
										
											2018-05-11 18:42:58 +08:00
+								本项目开始训练请执行：
 								```shell
 								$ python3 train_mspeech.py
 								```
 								本项目开始测试请执行：
-												修复了一大堆bug...

											
										
										
											2018-03-30 23:04:11 +08:00
+								```shell
-												add train and test python3 script and modify readme

											
										
										
											2018-05-11 18:42:58 +08:00
+								$ python3 test_mspeech.py
-												修复了一大堆bug...

											
										
										
											2018-03-30 23:04:11 +08:00
+								```
-												add train and test python3 script and modify readme

											
										
										
											2018-05-11 18:42:58 +08:00
+								测试之前，请确保代码中填写的模型文件路径存在。
-												使thchs30的数据列表适用于2018年新版数据集格式，并切换特征提取算法

											
										
										
											2018-05-15 21:43:11 +08:00
+								ASRT API服务器启动请执行：
 								```shell
 								$ python3 asrserver.py
 								```
-												update readme.md

											
										
										
											2018-06-25 20:22:23 +08:00
+								如果要训练和使用模型25，请在代码中 `import SpeechModel` 的相应位置做修改。
-												update readme and log

											
										
										
											2018-05-21 19:11:22 +08:00
-												使thchs30的数据列表适用于2018年新版数据集格式，并切换特征提取算法

											
										
										
											2018-05-15 21:43:11 +08:00
+								如果程序运行期间或使用中有什么问题，可以及时在issue中提出来，我将尽快做出答复。
-												删掉无用的代码，并做一些优化

											
										
										
											2018-05-17 21:57:57 +08:00
+								提问前可以先 [查看常见问题](https://github.com/nl8590687/ASRT_SpeechRecognition/wiki/issues)
-												修复了一大堆bug...

											
										
										
											2018-03-30 23:04:11 +08:00
 								## Model 模型
 								### Speech Model 语音模型
-												fix bugs

											
										
										
											2018-05-02 11:51:44 +08:00
+								CNN + LSTM/GRU + CTC
-												修复了一大堆bug...

											
										
										
											2018-03-30 23:04:11 +08:00
-												m22模型正式定型，并且添加了如何下载训练好的模型的说明

											
										
										
											2018-05-25 14:26:03 +08:00
+								* 关于下载已经训练好的模型的问题
-												update readme.md

											
										
										
											2018-06-25 20:22:23 +08:00
+								可以在Github本仓库下release里面的查看发布的各个版本软件的压缩包里获得完整源程序。
-												m22模型正式定型，并且添加了如何下载训练好的模型的说明

											
										
										
											2018-05-25 14:26:03 +08:00
-												修复了一大堆bug...

											
										
										
											2018-03-30 23:04:11 +08:00
+								### Language Model 语言模型
-												update readme.md

											
										
										
											2018-06-25 20:22:23 +08:00
+								基于概率图的最大熵隐马尔可夫模型
-												修复了一大堆bug...

											
										
										
											2018-03-30 23:04:11 +08:00
-												add train and test python3 script and modify readme

											
										
										
											2018-05-11 18:42:58 +08:00
+								## About Accuracy 关于准确率
 								当前，speech_model22的准确率在GPU上训练了120+小时（大约50个epoch），在测试集上基本能达到70+%的汉语拼音正确率
 								不过由于目前国际和国内的部分团队能做到97%，所以正确率仍有待于进一步提高
-												m22模型正式定型，并且添加了如何下载训练好的模型的说明

											
										
										
											2018-05-25 14:26:03 +08:00
+								* 目前可知的可以继续提高准确率的一个方案就是纠正数据集标注错误，尤其是ST-CMDS里面关于syllable文件中拼音的错误，这里面有一定比例的错误标注，如果走过路过的各位有意愿尽自己的能力帮助纠正一些数据标注错误的，我将非常欢迎，可以通过提交Pull Request来纠正，并且将登上本仓库的贡献者名单。
-												纠正了数据集标注中的一些错误

											
										
										
											2018-05-23 17:14:32 +08:00
 								样例：`不是： bu4 shi4 -> bu2 shi4` `一个：yi1 ge4 -> yi2 ge4` `了解：le5 jie3 -> liao3 jie3`
 								* 已订正部分：
 								ST-CMDS
 								train:  20170001P00001A    20170001P00001I    20170001P00002A
-												修复了一大堆bug...

											
										
										
											2018-03-30 23:04:11 +08:00
+								## Python Import
 								Python的依赖库
 								* python_speech_features
 								* TensorFlow
 								* Keras
 								* Numpy
 								* wave
 								* matplotlib
 								* math
 								* Scipy
 								* h5py
 								## Data Sets 数据集
-												move m2 m21 to trash and test asrserver

											
										
										
											2018-04-25 21:18:28 +08:00
+								* 清华大学THCHS30中文语音数据集
-												修正格式

											
										
										
											2018-03-30 23:06:43 +08:00
-												move m2 m21 to trash and test asrserver

											
										
										
											2018-04-25 21:18:28 +08:00
+								data_thchs30.tgz
 								<http://cn-mirror.openslr.org/resources/18/data_thchs30.tgz>
-												update readme.md

											
										
										
											2018-06-25 20:22:23 +08:00
+								<http://www.openslr.org/resources/18/data_thchs30.tgz>
-												修正格式

											
										
										
											2018-03-30 23:06:43 +08:00
-												move m2 m21 to trash and test asrserver

											
										
										
											2018-04-25 21:18:28 +08:00
+								test-noise.tgz
 								<http://cn-mirror.openslr.org/resources/18/test-noise.tgz>
-												update readme.md

											
										
										
											2018-06-25 20:22:23 +08:00
+								<http://www.openslr.org/resources/18/test-noise.tgz>
-												修正格式

											
										
										
											2018-03-30 23:06:43 +08:00
-												move m2 m21 to trash and test asrserver

											
										
										
											2018-04-25 21:18:28 +08:00
+								resource.tgz
 								<http://cn-mirror.openslr.org/resources/18/resource.tgz>
-												update readme.md

											
										
										
											2018-06-25 20:22:23 +08:00
+								<http://www.openslr.org/resources/18/resource.tgz>
-												move m2 m21 to trash and test asrserver

											
										
										
											2018-04-25 21:18:28 +08:00
 								* Free ST Chinese Mandarin Corpus
 								ST-CMDS-20170001_1-OS.tar.gz
 								<http://cn-mirror.openslr.org/resources/38/ST-CMDS-20170001_1-OS.tar.gz>
-												update readme.md

											
										
										
											2018-06-25 20:22:23 +08:00
+								<http://www.openslr.org/resources/38/ST-CMDS-20170001_1-OS.tar.gz>
-												修正格式

											
										
										
											2018-03-30 23:06:43 +08:00
-												修复了一大堆bug...

											
										
										
											2018-03-30 23:04:11 +08:00
+								特别鸣谢！感谢前辈们的公开语音数据集
-												update readme.md

											
										
										
											2018-06-19 16:10:56 +08:00
+								如果提供的数据集链接无法打开和下载，请点击该链接 [OpenSLR](http://www.openslr.org)
-												修复了一大堆bug...

											
										
										
											2018-03-30 23:04:11 +08:00
+								## Log
 								日志
 								链接：[进展日志](https://github.com/nl8590687/ASRT_SpeechRecognition/blob/master/log.md)
-												纠正了数据集标注中的一些错误

											
										
										
											2018-05-23 17:14:32 +08:00
 								## Contributors 贡献者们
 								@ZJUGuoShuai @williamchenwl
 								@nl8590687 (repo owner)