a18b234682 | ||
---|---|---|
datalist | ||
general_function | ||
model_language | ||
.gitignore | ||
LICENSE | ||
LanguageModel.py | ||
README.md | ||
README_EN.md | ||
SpeechModel24.py | ||
SpeechModel25.py | ||
SpeechModel26.py | ||
SpeechModel251.py | ||
SpeechModel251_p.py | ||
SpeechModel252.py | ||
SpeechModel261.py | ||
SpeechModel261_p.py | ||
asrserver.py | ||
dict.txt | ||
log.md | ||
readdata24.py | ||
speech-recorder.py | ||
test.py | ||
testClient.py | ||
test_mspeech.py | ||
train_mspeech.py |
README_EN.md
A Deep-Learning-Based Chinese Speech Recognition System
ReadMe Language | 中文版 | English |
View this project's wiki document (Chinese)
If you have any questions in your works with this project, welcome to put up issues in this repo and I will response as soon as possible.
You can check the FAQ Page (Chinese) first before asking questions to avoid repeating questions.
A post about ASRT's introduction
For questions about the principles of the statistical language model that are often asked, see:
- Simple word frequency statistics without Chinese word segmentation algorithm (Chinese)
- Statistical Language Model: Chinese Pinyin to Words (Chinese)
Introduction
This project uses Keras, TensorFlow based on deep convolutional neural network and long-short memory neural network, attention mechanism and CTC to implement.
- Steps
First, clone the project to your computer through Git, and then download the data sets needed for the training of this project. For the download links, please refer to End of Document
$ git clone https://github.com/nl8590687/ASRT_SpeechRecognition.git
Or you can use the "Fork" button to copy a copy of the project and then clone it locally with your own SSH key.
After cloning the repository via git, go to the project root directory; create a subdirectory dataset/
(you can use a soft link instead), and then extract the downloaded datasets directly into it.
$ cd ASRT_SpeechRecognition
$ mkdir dataset
$ tar zxf <dataset zip files name> -C dataset/
Then, you need to copy all the files in the 'datalist' directory to the dataset directory, that is, put them together with the data set.
$ cp -rf datalist/* dataset/
Currently available models are 24, 25 and 251
Before running this project, please install the necessary Python3 version dependent library
To start training this project, please execute:
$ python3 train_mspeech.py
To start the test of this project, please execute:
$ python3 test_mspeech.py
Before testing, make sure the model file path filled in the code files exists.
ASRT API Server startup please execute:
$ python3 asrserver.py
Please note that after opening the API server, you need to use the client software corresponding to this ASRT project for voice recognition. For details, see the Wiki documentation ASRT Client Demo.
If you want to train and use Model 251, make changes in the corresponding position of the import SpeechModel
in the code files.
If there is any problem during the execution of the program or during use, it can be promptly put forward in the issue, and I will reply as soon as possible.
Model
Speech Model
CNN + LSTM/GRU + CTC
The maximum length of the input audio is 16 seconds, and the output is the corresponding Chinese pinyin sequence.
- Questions about downloading trained models
The complete source program can be obtained from the archives of the various versions of the software released in the releases page of Github.
Language Model
Maximum Entropy Hidden Markov Model Based on Probability Graph.
The input is a Chinese pinyin sequence, and the output is the corresponding Chinese character text.
About Accuracy
At present, the best model can basically reach 80% of Pinyin correct rate on the test set.
However, as the current international and domestic teams can achieve 98%, the accuracy rate still needs to be further improved.
Python libraries that need importing
- python_speech_features
- TensorFlow
- Keras
- Numpy
- wave
- matplotlib
- math
- Scipy
- h5py
- http
- urllib
Data Sets
-
Tsinghua University THCHS30 Chinese voice data set
data_thchs30.tgz Download
test-noise.tgz Download
resource.tgz Download
-
Free ST Chinese Mandarin Corpus
ST-CMDS-20170001_1-OS.tar.gz Download
-
AIShell-1 Open Source Dataset
data_aishell.tgz Download
Note:unzip this dataset
$ tar xzf data_aishell.tgz
$ cd data_aishell/wav
$ for tar in *.tar.gz; do tar xvf $tar; done
-
Primewords Chinese Corpus Set 1
primewords_md_2018_set1.tar.gz Download
Special thanks! Thanks to the predecessors' public voice data set.
If the provided dataset link cannot be opened and downloaded, click this link OpenSLR
Logs
Links: Progress Logs
Contributors
@ZJUGuoShuai @williamchenwl
@nl8590687 (repo owner)