ASRT_SpeechRecognition/README_EN.md

# A Deep-Learning-Based Chinese Speech Recognition System

[![GPL-3.0 Licensed](https://img.shields.io/badge/License-GPL3.0-blue.svg?style=flat)](https://opensource.org/licenses/GPL-3.0) 
[![TensorFlow Version](https://img.shields.io/badge/Tensorflow-1.13+-blue.svg)](https://www.tensorflow.org/) 
[![Keras Version](https://img.shields.io/badge/Keras-2.3+-blue.svg)](https://keras.io/) 
[![Python Version](https://img.shields.io/badge/Python-3.5+-blue.svg)](https://www.python.org/) 

**ReadMe Language** | [中文版](https://github.com/nl8590687/ASRT_SpeechRecognition/blob/master/README.md) | English |

[**ASRT Project Home Page**](https://asrt.ailemon.me/) | [**Released Download**](https://asrt.ailemon.me/download) | [**View this project's wiki document (Chinese)**](https://asrt.ailemon.me/docs/) | [**Experience Demo**](https://asrt.ailemon.me/demo)

If you have any questions in your works with this project, welcome to put up issues in this repo and I will response as soon as possible. 

You can check the [FAQ Page (Chinese)](https://asrt.ailemon.me/docs/issues) first before asking questions to avoid repeating questions.

A post about ASRT's introduction 
* [ASRT: Chinese Speech Recognition System (Chinese)](https://blog.ailemon.me/2018/08/29/asrt-a-chinese-speech-recognition-system/)

About how to use ASRT to train and deploy：
* [Teach you how to use ASRT to train Chinese ASR model (Chinese)](<https://blog.ailemon.me/2020/08/20/teach-you-how-use-asrt-train-chinese-asr-model/>)
* [Teach you how to use ASRT to deploy Chinese ASR API Server (Chinese)](<https://blog.ailemon.me/2020/08/27/teach-you-how-use-asrt-deploy-chinese-asr-api-server/>)

For questions about the principles of the statistical language model that are often asked, see: 
* [Simple Chinese word frequency statistics to generate N-gram language model (Chinese)](https://blog.ailemon.me/2017/02/20/simple-words-frequency-statistic-without-segmentation-algorithm/)
* [Statistical Language Model: Chinese Pinyin to Words (Chinese)](https://blog.ailemon.me/2017/04/27/statistical-language-model-chinese-pinyin-to-words/)

For questions about CTC, see: 

* [[Translation] Sequence Modeling with CTC (Chinese)](<https://blog.ailemon.me/2019/07/18/sequence-modeling-with-ctc/>)

For more infomation please refer to author's blog website: [AILemon Blog](https://blog.ailemon.me/) (Chinese)

## Introduction

This project uses Keras, TensorFlow based on deep convolutional neural network and long-short memory neural network, attention mechanism and CTC to implement. 

* **Steps**

First, clone the project to your computer through Git, and then download the data sets needed for the training of this project. For the download links, please refer to [End of Document](https://github.com/nl8590687/ASRT_SpeechRecognition/blob/master/README_EN.md#data-sets)
```shell
$ git clone https://github.com/nl8590687/ASRT_SpeechRecognition.git
```

Or you can use the "Fork" button to copy a copy of the project and then clone it locally with your own SSH key.

After cloning the repository via git, go to the project root directory; create a subdirectory `dataset/` (you can use a soft link instead), and then extract the downloaded datasets directly into it.

Note that in the current version, both the Thchs30 and ST-CMDS data sets must be downloaded and used, and using other data sets need to modify the sourece codes. 

```shell
$ cd ASRT_SpeechRecognition

$ mkdir dataset

$ tar zxf <dataset zip files name> -C dataset/ 
```

Then, you need to copy all the files in the 'datalist' directory to the dataset directory, that is, put them together with the data set.

```shell
$ cp -rf datalist/* dataset/
```

Currently available models are 24, 25 and 251

Before running this project, please install the necessary [Python3 version dependent library](https://github.com/nl8590687/ASRT_SpeechRecognition#python-import)

To start training this project, please execute:
```shell
$ python3 train_mspeech.py
```
To start the test of this project, please execute:
```shell
$ python3 test_mspeech.py
```
Before testing, make sure the model file path filled in the code files exists.

ASRT API Server startup please execute:
```shell
$ python3 asrserver.py
```

Please note that after opening the API server, you need to use the client software corresponding to this ASRT project for voice recognition. For details, see the Wiki documentation [ASRT Client Demo](https://asrt.ailemon.me/docs/client-demo).

If you want to train and use Model 251, make changes in the corresponding position of the `import SpeechModel` in the code files.

If there is any problem during the execution of the program or during use, it can be promptly put forward in the issue, and I will reply as soon as possible.


## Model

### Speech Model

CNN + LSTM/GRU + CTC

The maximum length of the input audio is 16 seconds, and the output is the corresponding Chinese pinyin sequence. 

* Questions about downloading trained models

The complete source program that includes trained model weights can be obtained from the archives of the various versions of the software released in the [releases](https://github.com/nl8590687/ASRT_SpeechRecognition/releases) page of Github.

The released finished software can be downloaded here: [ASRT download page](https://asrt.ailemon.me/download)

### Language Model 

Maximum Entropy Hidden Markov Model Based on Probability Graph. 

The input is a Chinese pinyin sequence, and the output is the corresponding Chinese character text. 

## About Accuracy

At present, the best model can basically reach 80% of Pinyin correct rate on the test set. 

However, as the current international and domestic teams can achieve 98%, the accuracy rate still needs to be further improved. 

## Python libraries that need importing

* python_speech_features
* TensorFlow (1.13+)
* Keras (2.3+)
* Numpy
* wave
* matplotlib
* math
* Scipy
* h5py
* http
* urllib

[Dependent Environment Details](https://asrt.ailemon.me/docs/dependent-environment)

## Data Sets 

[Some free Chinese speech datasets (Chinese)](https://blog.ailemon.me/2018/11/21/free-open-source-chinese-speech-datasets/)

* **Tsinghua University THCHS30 Chinese voice data set**

  data_thchs30.tgz 
[Download](<http://www.openslr.org/resources/18/data_thchs30.tgz>)

  test-noise.tgz 
[Download](<http://www.openslr.org/resources/18/test-noise.tgz>)

  resource.tgz 
[Download](<http://www.openslr.org/resources/18/resource.tgz>)

* **Free ST Chinese Mandarin Corpus**

  ST-CMDS-20170001_1-OS.tar.gz 
[Download](<http://www.openslr.org/resources/38/ST-CMDS-20170001_1-OS.tar.gz>)

* **AIShell-1 Open Source Dataset** 

  data_aishell.tgz
[Download](<http://www.openslr.org/resources/33/data_aishell.tgz>)

  Note：unzip this dataset

  ```
  $ tar xzf data_aishell.tgz
  $ cd data_aishell/wav
  $ for tar in *.tar.gz;  do tar xvf $tar; done
  ```

* **Primewords Chinese Corpus Set 1** 

  primewords_md_2018_set1.tar.gz
[Download](<http://www.openslr.org/resources/47/primewords_md_2018_set1.tar.gz>)

* **aidatatang_200zh**

  aidatatang_200zh.tgz
[Download](<http://www.openslr.org/resources/62/aidatatang_200zh.tgz>)

* **MagicData**

  train_set.tar.gz
[Download](<http://www.openslr.org/resources/68/train_set.tar.gz>)

  dev_set.tar.gz
[Download](<http://www.openslr.org/resources/68/dev_set.tar.gz>)

  test_set.tar.gz
[Download](<http://www.openslr.org/resources/68/test_set.tar.gz>)

  metadata.tar.gz
[Download](<http://www.openslr.org/resources/68/metadata.tar.gz>)

Special thanks! Thanks to the predecessors' public voice data set. 

If the provided dataset link cannot be opened and downloaded, click this link [OpenSLR](http://www.openslr.org)

## License

[GPL v3.0](LICENSE) © [nl8590687](https://github.com/nl8590687) Author: [ailemon](https://ailemon.me/)

## Contributors
[@zw76859420](https://github.com/zw76859420) 
@madeirak @ZJUGuoShuai @williamchenwl

@nl8590687 (repo owner)

[**Donate**](https://github.com/nl8590687/ASRT_SpeechRecognition/wiki/donate)
-												update readme.md

											
										
										
											2018-06-25 20:22:23 +08:00
+								# A Deep-Learning-Based Chinese Speech Recognition System
-												update readme

											
										
										
											2020-04-18 14:35:39 +08:00
+								[![GPL-3.0 Licensed](https://img.shields.io/badge/License-GPL3.0-blue.svg?style=flat)](https://opensource.org/licenses/GPL-3.0)
 								[![TensorFlow Version](https://img.shields.io/badge/Tensorflow-1.13+-blue.svg)](https://www.tensorflow.org/)
-												update readme

											
										
										
											2020-05-29 19:16:08 +08:00
+								[![Keras Version](https://img.shields.io/badge/Keras-2.3+-blue.svg)](https://keras.io/)
 								[![Python Version](https://img.shields.io/badge/Python-3.5+-blue.svg)](https://www.python.org/)
-												update readme

											
										
										
											2018-07-26 10:41:00 +08:00
-												Update README

											
										
										
											2018-12-24 14:01:40 +08:00
+								**ReadMe Language** | [中文版](https://github.com/nl8590687/ASRT_SpeechRecognition/blob/master/README.md) | English |
-												update readme

											
										
										
											2018-07-26 10:41:00 +08:00
-												update readme

											
										
										
											2020-09-30 18:02:37 +08:00
+								[**ASRT Project Home Page**](https://asrt.ailemon.me/) | [**Released Download**](https://asrt.ailemon.me/download) | [**View this project's wiki document (Chinese)**](https://asrt.ailemon.me/docs/) | [**Experience Demo**](https://asrt.ailemon.me/demo)
-												update readme.md

											
										
										
											2018-06-25 20:22:23 +08:00
-												Update README

											
										
										
											2018-12-24 14:01:40 +08:00
+								If you have any questions in your works with this project, welcome to put up issues in this repo and I will response as soon as possible.
-												add some info

											
										
										
											2018-09-08 15:13:05 +08:00
-												update readme

											
										
										
											2020-05-11 17:56:54 +08:00
+								You can check the [FAQ Page (Chinese)](https://asrt.ailemon.me/docs/issues) first before asking questions to avoid repeating questions.
-												Update README

											
										
										
											2018-12-24 14:01:40 +08:00
 								A post about ASRT's introduction
 								* [ASRT: Chinese Speech Recognition System (Chinese)](https://blog.ailemon.me/2018/08/29/asrt-a-chinese-speech-recognition-system/)
-												update readme

											
										
										
											2020-09-30 18:02:37 +08:00
+								About how to use ASRT to train and deploy：
 								* [Teach you how to use ASRT to train Chinese ASR model (Chinese)](<https://blog.ailemon.me/2020/08/20/teach-you-how-use-asrt-train-chinese-asr-model/>)
 								* [Teach you how to use ASRT to deploy Chinese ASR API Server (Chinese)](<https://blog.ailemon.me/2020/08/27/teach-you-how-use-asrt-deploy-chinese-asr-api-server/>)
-												Update README

											
										
										
											2018-12-24 14:01:40 +08:00
+								For questions about the principles of the statistical language model that are often asked, see:
-												update readme

											
										
										
											2020-10-21 22:34:19 +08:00
+								* [Simple Chinese word frequency statistics to generate N-gram language model (Chinese)](https://blog.ailemon.me/2017/02/20/simple-words-frequency-statistic-without-segmentation-algorithm/)
-												Update README

											
										
										
											2018-12-24 14:01:40 +08:00
+								* [Statistical Language Model: Chinese Pinyin to Words (Chinese)](https://blog.ailemon.me/2017/04/27/statistical-language-model-chinese-pinyin-to-words/)
-												add some infomation for a question often asked

											
										
										
											2018-09-27 17:29:18 +08:00
-												update readme

											
										
										
											2020-04-18 14:35:39 +08:00
+								For questions about CTC, see:
 								* [[Translation] Sequence Modeling with CTC (Chinese)](<https://blog.ailemon.me/2019/07/18/sequence-modeling-with-ctc/>)
 								For more infomation please refer to author's blog website: [AILemon Blog](https://blog.ailemon.me/) (Chinese)
-												update readme.md

											
										
										
											2018-06-25 20:22:23 +08:00
+								## Introduction
-												Update README

											
										
										
											2018-12-24 14:01:40 +08:00
+								This project uses Keras, TensorFlow based on deep convolutional neural network and long-short memory neural network, attention mechanism and CTC to implement.
 								* **Steps**
 								First, clone the project to your computer through Git, and then download the data sets needed for the training of this project. For the download links, please refer to [End of Document](https://github.com/nl8590687/ASRT_SpeechRecognition/blob/master/README_EN.md#data-sets)
 								```shell
 								$ git clone https://github.com/nl8590687/ASRT_SpeechRecognition.git
 								```
 								Or you can use the "Fork" button to copy a copy of the project and then clone it locally with your own SSH key.
 								After cloning the repository via git, go to the project root directory; create a subdirectory `dataset/` (you can use a soft link instead), and then extract the downloaded datasets directly into it.
-												添加提示：注意，Thchs30和ST-CMDS都必须下载，缺一不可

											
										
										
											2019-10-20 18:13:03 +08:00
+								Note that in the current version, both the Thchs30 and ST-CMDS data sets must be downloaded and used, and using other data sets need to modify the sourece codes.
-												Update README

											
										
										
											2018-12-24 14:01:40 +08:00
+								```shell
 								$ cd ASRT_SpeechRecognition
 								$ mkdir dataset
-												update readme.md

											
										
										
											2018-06-25 20:22:23 +08:00
-												Update README

											
										
										
											2018-12-24 14:01:40 +08:00
+								$ tar zxf <dataset zip files name> -C dataset/
 								```
-												update readme.md

											
										
										
											2018-06-25 20:22:23 +08:00
-												Update README

											
										
										
											2018-12-24 14:01:40 +08:00
+								Then, you need to copy all the files in the 'datalist' directory to the dataset directory, that is, put them together with the data set.
-												update readme.md

											
										
										
											2018-06-25 20:22:23 +08:00
 								```shell
 								$ cp -rf datalist/* dataset/
 								```
-												update readme

											
										
										
											2018-07-26 10:41:00 +08:00
+								Currently available models are 24, 25 and 251
-												update readme.md

											
										
										
											2018-06-25 20:22:23 +08:00
-												Update README

											
										
										
											2018-12-24 14:01:40 +08:00
+								Before running this project, please install the necessary [Python3 version dependent library](https://github.com/nl8590687/ASRT_SpeechRecognition#python-import)
-												update readme.md

											
										
										
											2018-06-25 20:22:23 +08:00
+								To start training this project, please execute:
 								```shell
 								$ python3 train_mspeech.py
 								```
 								To start the test of this project, please execute:
 								```shell
 								$ python3 test_mspeech.py
 								```
 								Before testing, make sure the model file path filled in the code files exists.
 								ASRT API Server startup please execute:
 								```shell
 								$ python3 asrserver.py
 								```
-												update readme

											
										
										
											2020-05-21 21:02:32 +08:00
+								Please note that after opening the API server, you need to use the client software corresponding to this ASRT project for voice recognition. For details, see the Wiki documentation [ASRT Client Demo](https://asrt.ailemon.me/docs/client-demo).
-												update readme

											
										
										
											2019-01-25 22:48:54 +08:00
-												add a new model 251

											
										
										
											2018-07-06 13:57:53 +08:00
+								If you want to train and use Model 251, make changes in the corresponding position of the `import SpeechModel` in the code files.
-												update readme.md

											
										
										
											2018-06-25 20:22:23 +08:00
 								If there is any problem during the execution of the program or during use, it can be promptly put forward in the issue, and I will reply as soon as possible.
-												Update README

											
										
										
											2018-12-24 14:01:40 +08:00
-												update readme.md

											
										
										
											2018-06-25 20:22:23 +08:00
 								## Model
 								### Speech Model
 								CNN + LSTM/GRU + CTC
-												Update README

											
										
										
											2018-12-24 14:01:40 +08:00
+								The maximum length of the input audio is 16 seconds, and the output is the corresponding Chinese pinyin sequence.
-												update readme.md

											
										
										
											2018-06-25 20:22:23 +08:00
+								* Questions about downloading trained models
-												add GetFreqFeat4 and update readme

											
										
										
											2019-03-18 14:28:54 +08:00
+								The complete source program that includes trained model weights can be obtained from the archives of the various versions of the software released in the [releases](https://github.com/nl8590687/ASRT_SpeechRecognition/releases) page of Github.
-												update readme.md

											
										
										
											2018-06-25 20:22:23 +08:00
-												update readme

											
										
										
											2020-05-11 17:56:54 +08:00
+								The released finished software can be downloaded here: [ASRT download page](https://asrt.ailemon.me/download)
-												update readme.md

											
										
										
											2018-06-25 20:22:23 +08:00
+								### Language Model
 								Maximum Entropy Hidden Markov Model Based on Probability Graph.
-												Update README

											
										
										
											2018-12-24 14:01:40 +08:00
+								The input is a Chinese pinyin sequence, and the output is the corresponding Chinese character text.
-												update readme.md

											
										
										
											2018-06-25 20:22:23 +08:00
+								## About Accuracy
 								At present, the best model can basically reach 80% of Pinyin correct rate on the test set.
-												Update README

											
										
										
											2018-12-24 14:01:40 +08:00
+								However, as the current international and domestic teams can achieve 98%, the accuracy rate still needs to be further improved.
-												update readme.md

											
										
										
											2018-06-25 20:22:23 +08:00
 								## Python libraries that need importing
 								* python_speech_features
-												update readme

											
										
										
											2020-04-18 14:35:39 +08:00
+								* TensorFlow (1.13+)
-												update readme

											
										
										
											2020-05-29 19:16:08 +08:00
+								* Keras (2.3+)
-												update readme.md

											
										
										
											2018-06-25 20:22:23 +08:00
+								* Numpy
 								* wave
 								* matplotlib
 								* math
 								* Scipy
 								* h5py
-												Update README

											
										
										
											2018-12-24 14:01:40 +08:00
+								* http
 								* urllib
-												update readme.md

											
										
										
											2018-06-25 20:22:23 +08:00
-												update readme

											
										
										
											2020-05-21 21:02:32 +08:00
+								[Dependent Environment Details](https://asrt.ailemon.me/docs/dependent-environment)
-												update readme

											
										
										
											2020-01-17 17:57:35 +08:00
-												update readme.md

											
										
										
											2018-06-25 20:22:23 +08:00
+								## Data Sets
-												update readme

											
										
										
											2020-04-18 14:35:39 +08:00
 								[Some free Chinese speech datasets (Chinese)](https://blog.ailemon.me/2018/11/21/free-open-source-chinese-speech-datasets/)
-												Update README

											
										
										
											2018-12-24 14:01:40 +08:00
+								* **Tsinghua University THCHS30 Chinese voice data set**
 								  data_thchs30.tgz
 								[Download](<http://www.openslr.org/resources/18/data_thchs30.tgz>)
 								  test-noise.tgz
 								[Download](<http://www.openslr.org/resources/18/test-noise.tgz>)
 								  resource.tgz
 								[Download](<http://www.openslr.org/resources/18/resource.tgz>)
 								* **Free ST Chinese Mandarin Corpus**
-												update readme.md

											
										
										
											2018-06-25 20:22:23 +08:00
-												Update README

											
										
										
											2018-12-24 14:01:40 +08:00
+								  ST-CMDS-20170001_1-OS.tar.gz
 								[Download](<http://www.openslr.org/resources/38/ST-CMDS-20170001_1-OS.tar.gz>)
-												update readme.md

											
										
										
											2018-06-25 20:22:23 +08:00
-												update readme docs

											
										
										
											2019-01-15 16:46:48 +08:00
+								* **AIShell-1 Open Source Dataset**
-												update readme.md

											
										
										
											2018-06-25 20:22:23 +08:00
-												Update README

											
										
										
											2018-12-24 14:01:40 +08:00
+								  data_aishell.tgz
 								[Download](<http://www.openslr.org/resources/33/data_aishell.tgz>)
-												update readme.md

											
										
										
											2018-06-25 20:22:23 +08:00
-												更新贡献者名单

											
										
										
											2019-03-16 13:22:59 +08:00
+								  Note：unzip this dataset
-												update readme

											
										
										
											2019-01-25 22:48:54 +08:00
-												更新贡献者名单

											
										
										
											2019-03-16 13:22:59 +08:00
+								  ```
 								  $ tar xzf data_aishell.tgz
 								  $ cd data_aishell/wav
 								  $ for tar in *.tar.gz;  do tar xvf $tar; done
 								  ```
-												update readme

											
										
										
											2019-01-25 22:48:54 +08:00
-												update readme docs

											
										
										
											2019-01-15 16:46:48 +08:00
+								* **Primewords Chinese Corpus Set 1**
-												update readme.md

											
										
										
											2018-06-25 20:22:23 +08:00
-												Update README

											
										
										
											2018-12-24 14:01:40 +08:00
+								  primewords_md_2018_set1.tar.gz
 								[Download](<http://www.openslr.org/resources/47/primewords_md_2018_set1.tar.gz>)
-												update readme.md

											
										
										
											2018-06-25 20:22:23 +08:00
-												add new open source dataset aidatatang_200zh in readme

											
										
										
											2019-04-13 17:18:54 +08:00
+								* **aidatatang_200zh**
-												更新数据集下载链接

											
										
										
											2019-07-09 11:52:02 +08:00
+								  aidatatang_200zh.tgz
 								[Download](<http://www.openslr.org/resources/62/aidatatang_200zh.tgz>)
-												add new open source dataset aidatatang_200zh in readme

											
										
										
											2019-04-13 17:18:54 +08:00
-												添加新数据集的下载链接

											
										
										
											2019-08-14 12:06:44 +08:00
+								* **MagicData**
 								  train_set.tar.gz
 								[Download](<http://www.openslr.org/resources/68/train_set.tar.gz>)
 								  dev_set.tar.gz
 								[Download](<http://www.openslr.org/resources/68/dev_set.tar.gz>)
 								  test_set.tar.gz
 								[Download](<http://www.openslr.org/resources/68/test_set.tar.gz>)
 								  metadata.tar.gz
 								[Download](<http://www.openslr.org/resources/68/metadata.tar.gz>)
-												update readme.md

											
										
										
											2018-06-25 20:22:23 +08:00
+								Special thanks! Thanks to the predecessors' public voice data set.
-												Update README

											
										
										
											2018-12-24 14:01:40 +08:00
+								If the provided dataset link cannot be opened and downloaded, click this link [OpenSLR](http://www.openslr.org)
-												update readme.md

											
										
										
											2018-06-25 20:22:23 +08:00
-												modify md docs

											
										
										
											2019-09-01 19:43:10 +08:00
+								## License
-												update readme.md

											
										
										
											2018-06-25 20:22:23 +08:00
-												update readme

											
										
										
											2020-05-11 17:56:54 +08:00
+								[GPL v3.0](LICENSE) © [nl8590687](https://github.com/nl8590687) Author: [ailemon](https://ailemon.me/)
-												update readme.md

											
										
										
											2018-06-25 20:22:23 +08:00
 								## Contributors
-												更新贡献者名单

											
										
										
											2019-03-16 13:22:59 +08:00
+								[@zw76859420](https://github.com/zw76859420)
 								@madeirak @ZJUGuoShuai @williamchenwl
-												update readme.md

											
										
										
											2018-06-25 20:22:23 +08:00
-												update readme

											
										
										
											2018-07-26 10:41:00 +08:00
+								@nl8590687 (repo owner)
-												Update README

											
										
										
											2018-12-24 14:01:40 +08:00
+								[**Donate**](https://github.com/nl8590687/ASRT_SpeechRecognition/wiki/donate)