2021-01-15 15:09:49 +08:00
# ASRT: A Deep-Learning-Based Chinese Speech Recognition System
2018-06-25 20:22:23 +08:00
2020-04-18 14:35:39 +08:00
[![GPL-3.0 Licensed ](https://img.shields.io/badge/License-GPL3.0-blue.svg?style=flat )](https://opensource.org/licenses/GPL-3.0)
2021-11-09 21:37:19 +08:00
[![TensorFlow Version ](https://img.shields.io/badge/Tensorflow-1.15+-blue.svg )](https://www.tensorflow.org/)
[![Python Version ](https://img.shields.io/badge/Python-3.6+-blue.svg )](https://www.python.org/)
2018-07-26 10:41:00 +08:00
2018-12-24 14:01:40 +08:00
**ReadMe Language** | [中文版 ](https://github.com/nl8590687/ASRT_SpeechRecognition/blob/master/README.md ) | English |
2018-07-26 10:41:00 +08:00
2021-05-09 17:43:39 +08:00
[**ASRT Project Home Page** ](https://asrt.ailemon.net/ ) |
[**Released Download** ](https://asrt.ailemon.net/download ) |
[**View this project's wiki document (Chinese)** ](https://asrt.ailemon.net/docs/ ) |
[**Experience Demo** ](https://asrt.ailemon.net/demo ) |
2021-01-15 15:09:49 +08:00
[**Donate** ](https://github.com/nl8590687/ASRT_SpeechRecognition/wiki/donate )
2018-06-25 20:22:23 +08:00
2018-12-24 14:01:40 +08:00
If you have any questions in your works with this project, welcome to put up issues in this repo and I will response as soon as possible.
2018-09-08 15:13:05 +08:00
2021-05-09 17:43:39 +08:00
You can check the [FAQ Page (Chinese) ](https://asrt.ailemon.net/docs/issues ) first before asking questions to avoid repeating questions.
2018-12-24 14:01:40 +08:00
A post about ASRT's introduction
2021-11-09 21:41:06 +08:00
* [ASRT: Chinese Speech Recognition System (Chinese) ](https://blog.ailemon.net/2018/08/29/asrt-a-chinese-speech-recognition-system/ )
2018-12-24 14:01:40 +08:00
2020-09-30 18:02:37 +08:00
About how to use ASRT to train and deploy:
2021-11-09 21:41:06 +08:00
* [Teach you how to use ASRT to train Chinese ASR model (Chinese) ](<https://blog.ailemon.net/2020/08/20/teach-you-how-use-asrt-train-chinese-asr-model/> )
* [Teach you how to use ASRT to deploy Chinese ASR API Server (Chinese) ](<https://blog.ailemon.net/2020/08/27/teach-you-how-use-asrt-deploy-chinese-asr-api-server/> )
2020-09-30 18:02:37 +08:00
2018-12-24 14:01:40 +08:00
For questions about the principles of the statistical language model that are often asked, see:
2021-11-09 21:41:06 +08:00
* [Simple Chinese word frequency statistics to generate N-gram language model (Chinese) ](https://blog.ailemon.net/2017/02/20/simple-words-frequency-statistic-without-segmentation-algorithm/ )
* [Statistical Language Model: Chinese Pinyin to Words (Chinese) ](https://blog.ailemon.net/2017/04/27/statistical-language-model-chinese-pinyin-to-words/ )
2018-09-27 17:29:18 +08:00
2020-04-18 14:35:39 +08:00
For questions about CTC, see:
2021-11-09 21:41:06 +08:00
* [[Translation] Sequence Modeling with CTC (Chinese)](< https: // blog . ailemon . net / 2019 / 07 / 18 / sequence-modeling-with-ctc /> )
2020-04-18 14:35:39 +08:00
2021-11-09 21:41:06 +08:00
For more infomation please refer to author's blog website: [AILemon Blog ](https://blog.ailemon.net/ ) (Chinese)
2020-04-18 14:35:39 +08:00
2018-06-25 20:22:23 +08:00
## Introduction
2021-11-24 15:51:46 +08:00
This project uses tensorFlow.keras based on deep convolutional neural network and long-short memory neural network, attention mechanism and CTC to implement.
2018-12-24 14:01:40 +08:00
* **Steps**
First, clone the project to your computer through Git, and then download the data sets needed for the training of this project. For the download links, please refer to [End of Document ](https://github.com/nl8590687/ASRT_SpeechRecognition/blob/master/README_EN.md#data-sets )
```shell
$ git clone https://github.com/nl8590687/ASRT_SpeechRecognition.git
```
Or you can use the "Fork" button to copy a copy of the project and then clone it locally with your own SSH key.
2021-11-24 15:51:46 +08:00
After cloning the repository via git, go to the project root directory; create a subdirectory `dataset/` (you can use a soft link instead) for datasets, and then extract the downloaded datasets directly into it.
2019-10-20 18:13:03 +08:00
2018-12-24 14:01:40 +08:00
```shell
$ cd ASRT_SpeechRecognition
$ mkdir dataset
2018-06-25 20:22:23 +08:00
2018-12-24 14:01:40 +08:00
$ tar zxf < dataset zip files name > -C dataset/
```
2018-06-25 20:22:23 +08:00
2018-12-24 14:01:40 +08:00
Then, you need to copy all the files in the 'datalist' directory to the dataset directory, that is, put them together with the data set.
2018-06-25 20:22:23 +08:00
2021-11-24 15:51:46 +08:00
Note that in the current version, in the configuration file, two data sets, Thchs30 and ST-CMDS, are added by default, please delete them if you don’ t need them. If you want to use other data sets, you need to add data configuration yourself, and use the standard format supported by ASRT to organize the data in advance.
2018-06-25 20:22:23 +08:00
```shell
$ cp -rf datalist/* dataset/
```
2018-07-26 10:41:00 +08:00
Currently available models are 24, 25 and 251
2018-06-25 20:22:23 +08:00
2018-12-24 14:01:40 +08:00
Before running this project, please install the necessary [Python3 version dependent library ](https://github.com/nl8590687/ASRT_SpeechRecognition#python-import )
2018-06-25 20:22:23 +08:00
To start training this project, please execute:
```shell
2021-11-24 15:51:46 +08:00
$ python3 train_speech_model.py
2018-06-25 20:22:23 +08:00
```
To start the test of this project, please execute:
```shell
2021-11-24 15:51:46 +08:00
$ python3 evaluate_speech_model.py
2018-06-25 20:22:23 +08:00
```
Before testing, make sure the model file path filled in the code files exists.
ASRT API Server startup please execute:
```shell
$ python3 asrserver.py
```
2021-05-09 17:43:39 +08:00
Please note that after opening the API server, you need to use the client software corresponding to this ASRT project for voice recognition. For details, see the Wiki documentation [ASRT Client Demo ](https://asrt.ailemon.net/docs/client-demo ).
2019-01-25 22:48:54 +08:00
2021-11-24 15:51:46 +08:00
If you want to train and use other model(not Model 251), make changes in the corresponding position of the `import speech_model_zoo` in the code files.
2018-06-25 20:22:23 +08:00
If there is any problem during the execution of the program or during use, it can be promptly put forward in the issue, and I will reply as soon as possible.
2018-12-24 14:01:40 +08:00
2018-06-25 20:22:23 +08:00
## Model
### Speech Model
2021-11-24 15:51:46 +08:00
CNN/LSTM/GRU + CTC
2018-06-25 20:22:23 +08:00
2018-12-24 14:01:40 +08:00
The maximum length of the input audio is 16 seconds, and the output is the corresponding Chinese pinyin sequence.
2018-06-25 20:22:23 +08:00
* Questions about downloading trained models
2021-05-09 17:43:39 +08:00
The released finished software that includes trained model weights can be downloaded from [ASRT download page ](https://asrt.ailemon.net/download ).
2018-06-25 20:22:23 +08:00
2021-03-04 19:25:17 +08:00
Github [Releases ](https://github.com/nl8590687/ASRT_SpeechRecognition/releases ) page includes the archives of the various versions of the software released and it's introduction. Under each version module, there is a zip file that includes trained model weights files.
2020-05-11 17:56:54 +08:00
2018-06-25 20:22:23 +08:00
### Language Model
Maximum Entropy Hidden Markov Model Based on Probability Graph.
2018-12-24 14:01:40 +08:00
The input is a Chinese pinyin sequence, and the output is the corresponding Chinese character text.
2018-06-25 20:22:23 +08:00
## About Accuracy
At present, the best model can basically reach 80% of Pinyin correct rate on the test set.
2018-12-24 14:01:40 +08:00
However, as the current international and domestic teams can achieve 98%, the accuracy rate still needs to be further improved.
2018-06-25 20:22:23 +08:00
2021-11-24 15:51:46 +08:00
## Python Dependency Library
2018-06-25 20:22:23 +08:00
2021-11-24 15:51:46 +08:00
* tensorFlow (1.15 - 2.x)
* numpy
2018-06-25 20:22:23 +08:00
* wave
* matplotlib
* math
2021-11-24 15:51:46 +08:00
* scipy
2021-05-16 21:54:12 +08:00
* requests
2021-11-09 21:37:19 +08:00
If you have trouble when install those packages, please run the following script to do it as long as you have a GPU and CUDA 11.2 and cudnn 8.1 have been installed:
2021-05-16 21:54:12 +08:00
```shell
$ pip install -r requirements.txt
```
2018-06-25 20:22:23 +08:00
2021-05-09 17:43:39 +08:00
[Dependent Environment Details ](https://asrt.ailemon.net/docs/dependent-environment )
2020-01-17 17:57:35 +08:00
2018-06-25 20:22:23 +08:00
## Data Sets
2020-04-18 14:35:39 +08:00
2021-11-09 21:41:06 +08:00
[Some free Chinese speech datasets (Chinese) ](https://blog.ailemon.net/2018/11/21/free-open-source-chinese-speech-datasets/ )
2020-04-18 14:35:39 +08:00
2018-12-24 14:01:40 +08:00
* **Tsinghua University THCHS30 Chinese voice data set**
data_thchs30.tgz
[Download ](<http://www.openslr.org/resources/18/data_thchs30.tgz> )
test-noise.tgz
[Download ](<http://www.openslr.org/resources/18/test-noise.tgz> )
resource.tgz
[Download ](<http://www.openslr.org/resources/18/resource.tgz> )
* **Free ST Chinese Mandarin Corpus**
2018-06-25 20:22:23 +08:00
2018-12-24 14:01:40 +08:00
ST-CMDS-20170001_1-OS.tar.gz
[Download ](<http://www.openslr.org/resources/38/ST-CMDS-20170001_1-OS.tar.gz> )
2018-06-25 20:22:23 +08:00
2019-01-15 16:46:48 +08:00
* **AIShell-1 Open Source Dataset**
2018-06-25 20:22:23 +08:00
2018-12-24 14:01:40 +08:00
data_aishell.tgz
[Download ](<http://www.openslr.org/resources/33/data_aishell.tgz> )
2018-06-25 20:22:23 +08:00
2019-03-16 13:22:59 +08:00
Note: unzip this dataset
2019-01-25 22:48:54 +08:00
2019-03-16 13:22:59 +08:00
```
$ tar xzf data_aishell.tgz
$ cd data_aishell/wav
$ for tar in *.tar.gz; do tar xvf $tar; done
```
2019-01-25 22:48:54 +08:00
2019-01-15 16:46:48 +08:00
* **Primewords Chinese Corpus Set 1**
2018-06-25 20:22:23 +08:00
2018-12-24 14:01:40 +08:00
primewords_md_2018_set1.tar.gz
[Download ](<http://www.openslr.org/resources/47/primewords_md_2018_set1.tar.gz> )
2018-06-25 20:22:23 +08:00
2019-04-13 17:18:54 +08:00
* **aidatatang_200zh**
2019-07-09 11:52:02 +08:00
aidatatang_200zh.tgz
[Download ](<http://www.openslr.org/resources/62/aidatatang_200zh.tgz> )
2019-04-13 17:18:54 +08:00
2019-08-14 12:06:44 +08:00
* **MagicData**
train_set.tar.gz
[Download ](<http://www.openslr.org/resources/68/train_set.tar.gz> )
dev_set.tar.gz
[Download ](<http://www.openslr.org/resources/68/dev_set.tar.gz> )
test_set.tar.gz
[Download ](<http://www.openslr.org/resources/68/test_set.tar.gz> )
metadata.tar.gz
[Download ](<http://www.openslr.org/resources/68/metadata.tar.gz> )
2018-06-25 20:22:23 +08:00
Special thanks! Thanks to the predecessors' public voice data set.
2018-12-24 14:01:40 +08:00
If the provided dataset link cannot be opened and downloaded, click this link [OpenSLR ](http://www.openslr.org )
2018-06-25 20:22:23 +08:00
2019-09-01 19:43:10 +08:00
## License
2018-06-25 20:22:23 +08:00
2021-05-16 21:54:12 +08:00
[GPL v3.0 ](LICENSE ) © [nl8590687 ](https://github.com/nl8590687 ) Author: [ailemon ](https://www.ailemon.net/ )
2018-06-25 20:22:23 +08:00
## Contributors
2019-03-16 13:22:59 +08:00
[@zw76859420 ](https://github.com/zw76859420 )
@madeirak @ZJUGuoShuai @williamchenwl
2018-06-25 20:22:23 +08:00
2018-07-26 10:41:00 +08:00
@nl8590687 (repo owner)