2018-06-25 20:22:23 +08:00
# A Deep-Learning-Based Chinese Speech Recognition System
2020-04-18 14:35:39 +08:00
[![GPL-3.0 Licensed ](https://img.shields.io/badge/License-GPL3.0-blue.svg?style=flat )](https://opensource.org/licenses/GPL-3.0)
[![TensorFlow Version ](https://img.shields.io/badge/Tensorflow-1.13+-blue.svg )](https://www.tensorflow.org/)
2020-05-29 19:16:08 +08:00
[![Keras Version ](https://img.shields.io/badge/Keras-2.3+-blue.svg )](https://keras.io/)
[![Python Version ](https://img.shields.io/badge/Python-3.5+-blue.svg )](https://www.python.org/)
2018-07-26 10:41:00 +08:00
2018-12-24 14:01:40 +08:00
**ReadMe Language** | [中文版 ](https://github.com/nl8590687/ASRT_SpeechRecognition/blob/master/README.md ) | English |
2018-07-26 10:41:00 +08:00
2020-09-30 18:02:37 +08:00
[**ASRT Project Home Page** ](https://asrt.ailemon.me/ ) | [**Released Download** ](https://asrt.ailemon.me/download ) | [**View this project's wiki document (Chinese)** ](https://asrt.ailemon.me/docs/ ) | [**Experience Demo** ](https://asrt.ailemon.me/demo )
2018-06-25 20:22:23 +08:00
2018-12-24 14:01:40 +08:00
If you have any questions in your works with this project, welcome to put up issues in this repo and I will response as soon as possible.
2018-09-08 15:13:05 +08:00
2020-05-11 17:56:54 +08:00
You can check the [FAQ Page (Chinese) ](https://asrt.ailemon.me/docs/issues ) first before asking questions to avoid repeating questions.
2018-12-24 14:01:40 +08:00
A post about ASRT's introduction
* [ASRT: Chinese Speech Recognition System (Chinese) ](https://blog.ailemon.me/2018/08/29/asrt-a-chinese-speech-recognition-system/ )
2020-09-30 18:02:37 +08:00
About how to use ASRT to train and deploy:
* [Teach you how to use ASRT to train Chinese ASR model (Chinese) ](<https://blog.ailemon.me/2020/08/20/teach-you-how-use-asrt-train-chinese-asr-model/> )
* [Teach you how to use ASRT to deploy Chinese ASR API Server (Chinese) ](<https://blog.ailemon.me/2020/08/27/teach-you-how-use-asrt-deploy-chinese-asr-api-server/> )
2018-12-24 14:01:40 +08:00
For questions about the principles of the statistical language model that are often asked, see:
2020-10-21 22:34:19 +08:00
* [Simple Chinese word frequency statistics to generate N-gram language model (Chinese) ](https://blog.ailemon.me/2017/02/20/simple-words-frequency-statistic-without-segmentation-algorithm/ )
2018-12-24 14:01:40 +08:00
* [Statistical Language Model: Chinese Pinyin to Words (Chinese) ](https://blog.ailemon.me/2017/04/27/statistical-language-model-chinese-pinyin-to-words/ )
2018-09-27 17:29:18 +08:00
2020-04-18 14:35:39 +08:00
For questions about CTC, see:
* [[Translation] Sequence Modeling with CTC (Chinese)](< https: // blog . ailemon . me / 2019 / 07 / 18 / sequence-modeling-with-ctc /> )
For more infomation please refer to author's blog website: [AILemon Blog ](https://blog.ailemon.me/ ) (Chinese)
2018-06-25 20:22:23 +08:00
## Introduction
2018-12-24 14:01:40 +08:00
This project uses Keras, TensorFlow based on deep convolutional neural network and long-short memory neural network, attention mechanism and CTC to implement.
* **Steps**
First, clone the project to your computer through Git, and then download the data sets needed for the training of this project. For the download links, please refer to [End of Document ](https://github.com/nl8590687/ASRT_SpeechRecognition/blob/master/README_EN.md#data-sets )
```shell
$ git clone https://github.com/nl8590687/ASRT_SpeechRecognition.git
```
Or you can use the "Fork" button to copy a copy of the project and then clone it locally with your own SSH key.
After cloning the repository via git, go to the project root directory; create a subdirectory `dataset/` (you can use a soft link instead), and then extract the downloaded datasets directly into it.
2019-10-20 18:13:03 +08:00
Note that in the current version, both the Thchs30 and ST-CMDS data sets must be downloaded and used, and using other data sets need to modify the sourece codes.
2018-12-24 14:01:40 +08:00
```shell
$ cd ASRT_SpeechRecognition
$ mkdir dataset
2018-06-25 20:22:23 +08:00
2018-12-24 14:01:40 +08:00
$ tar zxf < dataset zip files name > -C dataset/
```
2018-06-25 20:22:23 +08:00
2018-12-24 14:01:40 +08:00
Then, you need to copy all the files in the 'datalist' directory to the dataset directory, that is, put them together with the data set.
2018-06-25 20:22:23 +08:00
```shell
$ cp -rf datalist/* dataset/
```
2018-07-26 10:41:00 +08:00
Currently available models are 24, 25 and 251
2018-06-25 20:22:23 +08:00
2018-12-24 14:01:40 +08:00
Before running this project, please install the necessary [Python3 version dependent library ](https://github.com/nl8590687/ASRT_SpeechRecognition#python-import )
2018-06-25 20:22:23 +08:00
To start training this project, please execute:
```shell
$ python3 train_mspeech.py
```
To start the test of this project, please execute:
```shell
$ python3 test_mspeech.py
```
Before testing, make sure the model file path filled in the code files exists.
ASRT API Server startup please execute:
```shell
$ python3 asrserver.py
```
2020-05-21 21:02:32 +08:00
Please note that after opening the API server, you need to use the client software corresponding to this ASRT project for voice recognition. For details, see the Wiki documentation [ASRT Client Demo ](https://asrt.ailemon.me/docs/client-demo ).
2019-01-25 22:48:54 +08:00
2018-07-06 13:57:53 +08:00
If you want to train and use Model 251, make changes in the corresponding position of the `import SpeechModel` in the code files.
2018-06-25 20:22:23 +08:00
If there is any problem during the execution of the program or during use, it can be promptly put forward in the issue, and I will reply as soon as possible.
2018-12-24 14:01:40 +08:00
2018-06-25 20:22:23 +08:00
## Model
### Speech Model
CNN + LSTM/GRU + CTC
2018-12-24 14:01:40 +08:00
The maximum length of the input audio is 16 seconds, and the output is the corresponding Chinese pinyin sequence.
2018-06-25 20:22:23 +08:00
* Questions about downloading trained models
2019-03-18 14:28:54 +08:00
The complete source program that includes trained model weights can be obtained from the archives of the various versions of the software released in the [releases ](https://github.com/nl8590687/ASRT_SpeechRecognition/releases ) page of Github.
2018-06-25 20:22:23 +08:00
2020-05-11 17:56:54 +08:00
The released finished software can be downloaded here: [ASRT download page ](https://asrt.ailemon.me/download )
2018-06-25 20:22:23 +08:00
### Language Model
Maximum Entropy Hidden Markov Model Based on Probability Graph.
2018-12-24 14:01:40 +08:00
The input is a Chinese pinyin sequence, and the output is the corresponding Chinese character text.
2018-06-25 20:22:23 +08:00
## About Accuracy
At present, the best model can basically reach 80% of Pinyin correct rate on the test set.
2018-12-24 14:01:40 +08:00
However, as the current international and domestic teams can achieve 98%, the accuracy rate still needs to be further improved.
2018-06-25 20:22:23 +08:00
## Python libraries that need importing
* python_speech_features
2020-04-18 14:35:39 +08:00
* TensorFlow (1.13+)
2020-05-29 19:16:08 +08:00
* Keras (2.3+)
2018-06-25 20:22:23 +08:00
* Numpy
* wave
* matplotlib
* math
* Scipy
* h5py
2018-12-24 14:01:40 +08:00
* http
* urllib
2018-06-25 20:22:23 +08:00
2020-05-21 21:02:32 +08:00
[Dependent Environment Details ](https://asrt.ailemon.me/docs/dependent-environment )
2020-01-17 17:57:35 +08:00
2018-06-25 20:22:23 +08:00
## Data Sets
2020-04-18 14:35:39 +08:00
[Some free Chinese speech datasets (Chinese) ](https://blog.ailemon.me/2018/11/21/free-open-source-chinese-speech-datasets/ )
2018-12-24 14:01:40 +08:00
* **Tsinghua University THCHS30 Chinese voice data set**
data_thchs30.tgz
[Download ](<http://www.openslr.org/resources/18/data_thchs30.tgz> )
test-noise.tgz
[Download ](<http://www.openslr.org/resources/18/test-noise.tgz> )
resource.tgz
[Download ](<http://www.openslr.org/resources/18/resource.tgz> )
* **Free ST Chinese Mandarin Corpus**
2018-06-25 20:22:23 +08:00
2018-12-24 14:01:40 +08:00
ST-CMDS-20170001_1-OS.tar.gz
[Download ](<http://www.openslr.org/resources/38/ST-CMDS-20170001_1-OS.tar.gz> )
2018-06-25 20:22:23 +08:00
2019-01-15 16:46:48 +08:00
* **AIShell-1 Open Source Dataset**
2018-06-25 20:22:23 +08:00
2018-12-24 14:01:40 +08:00
data_aishell.tgz
[Download ](<http://www.openslr.org/resources/33/data_aishell.tgz> )
2018-06-25 20:22:23 +08:00
2019-03-16 13:22:59 +08:00
Note: unzip this dataset
2019-01-25 22:48:54 +08:00
2019-03-16 13:22:59 +08:00
```
$ tar xzf data_aishell.tgz
$ cd data_aishell/wav
$ for tar in *.tar.gz; do tar xvf $tar; done
```
2019-01-25 22:48:54 +08:00
2019-01-15 16:46:48 +08:00
* **Primewords Chinese Corpus Set 1**
2018-06-25 20:22:23 +08:00
2018-12-24 14:01:40 +08:00
primewords_md_2018_set1.tar.gz
[Download ](<http://www.openslr.org/resources/47/primewords_md_2018_set1.tar.gz> )
2018-06-25 20:22:23 +08:00
2019-04-13 17:18:54 +08:00
* **aidatatang_200zh**
2019-07-09 11:52:02 +08:00
aidatatang_200zh.tgz
[Download ](<http://www.openslr.org/resources/62/aidatatang_200zh.tgz> )
2019-04-13 17:18:54 +08:00
2019-08-14 12:06:44 +08:00
* **MagicData**
train_set.tar.gz
[Download ](<http://www.openslr.org/resources/68/train_set.tar.gz> )
dev_set.tar.gz
[Download ](<http://www.openslr.org/resources/68/dev_set.tar.gz> )
test_set.tar.gz
[Download ](<http://www.openslr.org/resources/68/test_set.tar.gz> )
metadata.tar.gz
[Download ](<http://www.openslr.org/resources/68/metadata.tar.gz> )
2018-06-25 20:22:23 +08:00
Special thanks! Thanks to the predecessors' public voice data set.
2018-12-24 14:01:40 +08:00
If the provided dataset link cannot be opened and downloaded, click this link [OpenSLR ](http://www.openslr.org )
2018-06-25 20:22:23 +08:00
2019-09-01 19:43:10 +08:00
## License
2018-06-25 20:22:23 +08:00
2020-05-11 17:56:54 +08:00
[GPL v3.0 ](LICENSE ) © [nl8590687 ](https://github.com/nl8590687 ) Author: [ailemon ](https://ailemon.me/ )
2018-06-25 20:22:23 +08:00
## Contributors
2019-03-16 13:22:59 +08:00
[@zw76859420 ](https://github.com/zw76859420 )
@madeirak @ZJUGuoShuai @williamchenwl
2018-06-25 20:22:23 +08:00
2018-07-26 10:41:00 +08:00
@nl8590687 (repo owner)
2018-12-24 14:01:40 +08:00
[**Donate** ](https://github.com/nl8590687/ASRT_SpeechRecognition/wiki/donate )