Go to file

nl ee9f723f69 Modify default GPU configure		2019-01-06 22:04:12 +08:00
datalist	同时修复部分数据集标注错误	2018-11-01 17:56:15 +08:00
general_function	fix init bug for multi gpu model	2018-09-26 19:47:04 +08:00
model_language	fix bugs and modify dropout and update language model	2018-05-05 13:41:45 +08:00
.gitignore	fix bugs and improve asrserver	2018-05-11 16:56:59 +08:00
LICENSE	add license and gitignore.modify py code main.py	2017-08-22 17:56:05 +08:00
LanguageModel.py	文件同步	2018-07-19 21:21:51 +08:00
README.md	Update README	2018-12-24 14:01:40 +08:00
README_EN.md	Update README	2018-12-24 14:01:40 +08:00
SpeechModel24.py	modify code comment	2019-01-05 21:27:09 +08:00
SpeechModel25.py	modify code comment	2019-01-05 21:27:09 +08:00
SpeechModel26.py	modify code comment	2019-01-05 21:27:09 +08:00
SpeechModel251.py	Modify default GPU configure	2019-01-06 22:04:12 +08:00
SpeechModel251_p.py	修改模型251，并添加多GPU代码	2018-07-23 19:59:29 +08:00
asrserver.py	switch model to 251	2018-07-27 14:31:48 +08:00
dict.txt	纠正了dict.txt中部分错误，感谢@weifeigao	2018-11-01 17:35:07 +08:00
log.md	add a new model 251	2018-07-06 13:57:53 +08:00
readdata24.py	修改了随机读取数据的方法，可以提高模型的泛化能力	2018-07-09 17:50:17 +08:00
speech-recorder.py	添加一个python录音程序	2018-10-23 10:40:08 +08:00
test.py	fix bug	2019-01-04 15:38:20 +08:00
testClient.py	delete no use code	2018-07-28 17:17:38 +08:00
test_mspeech.py	switch model to 251	2018-07-27 14:31:48 +08:00
train_mspeech.py	switch model to 251	2018-07-27 14:31:48 +08:00

README_EN.md

A Deep-Learning-Based Chinese Speech Recognition System

ReadMe Language | 中文版 | English |

View this project's wiki pages (Chinese)

If you have any questions in your works with this project, welcome to put up issues in this repo and I will response as soon as possible.

You can check the FAQ Page (Chinese) first before asking questions to avoid repeating questions.

A post about ASRT's introduction

ASRT: Chinese Speech Recognition System (Chinese)

For questions about the principles of the statistical language model that are often asked, see:

Introduction

This project uses Keras, TensorFlow based on deep convolutional neural network and long-short memory neural network, attention mechanism and CTC to implement.

Steps

First, clone the project to your computer through Git, and then download the data sets needed for the training of this project. For the download links, please refer to End of Document

$ git clone https://github.com/nl8590687/ASRT_SpeechRecognition.git

Or you can use the "Fork" button to copy a copy of the project and then clone it locally with your own SSH key.

After cloning the repository via git, go to the project root directory; create a subdirectory dataset/ (you can use a soft link instead), and then extract the downloaded datasets directly into it.

$ cd ASRT_SpeechRecognition

$ mkdir dataset

$ tar zxf <dataset zip files name> -C dataset/

Then, you need to copy all the files in the 'datalist' directory to the dataset directory, that is, put them together with the data set.

$ cp -rf datalist/* dataset/

Currently available models are 24, 25 and 251

Before running this project, please install the necessary Python3 version dependent library

To start training this project, please execute:

$ python3 train_mspeech.py

To start the test of this project, please execute:

$ python3 test_mspeech.py

Before testing, make sure the model file path filled in the code files exists.

ASRT API Server startup please execute:

$ python3 asrserver.py

If you want to train and use Model 251, make changes in the corresponding position of the import SpeechModel in the code files.

If there is any problem during the execution of the program or during use, it can be promptly put forward in the issue, and I will reply as soon as possible.

Model

Speech Model

CNN + LSTM/GRU + CTC

The maximum length of the input audio is 16 seconds, and the output is the corresponding Chinese pinyin sequence.

Questions about downloading trained models

The complete source program can be obtained from the archives of the various versions of the software released in the releases page of Github.

Language Model

Maximum Entropy Hidden Markov Model Based on Probability Graph.

The input is a Chinese pinyin sequence, and the output is the corresponding Chinese character text.

About Accuracy

At present, the best model can basically reach 80% of Pinyin correct rate on the test set.

However, as the current international and domestic teams can achieve 98%, the accuracy rate still needs to be further improved.

At present, one solution that can continue to improve the accuracy rate is correcting data set labeling errors, especially the ST-CMDS error in the syllable file. There is a certain percentage of errors in the label. If you have see this and you have the will to help correct some of the data tagging mistakes by own ability, I will be very welcome. It can be corrected by submitting a Pull Request, and you will be on the list of contributors of this repo.

Samples: 不是： bu4 shi4 -> bu2 shi4 一个：yi1 ge4 -> yi2 ge4 了解：le5 jie3 -> liao3 jie3

Corrected part:

ST-CMDS

train: 20170001P00001A 20170001P00001I 20170001P00002A

Python libraries that need importing

python_speech_features
TensorFlow
Keras
Numpy
wave
matplotlib
math
Scipy
h5py
http
urllib

Data Sets

Tsinghua University THCHS30 Chinese voice data set

data_thchs30.tgz Download

test-noise.tgz Download

resource.tgz Download
Free ST Chinese Mandarin Corpus

ST-CMDS-20170001_1-OS.tar.gz Download
AIShell-1 Open Source Dataset (This project has not been used yet and it will be added later)

data_aishell.tgz Download
Primewords Chinese Corpus Set 1 (This project has not been used yet and it will be added later)

primewords_md_2018_set1.tar.gz Download

Special thanks! Thanks to the predecessors' public voice data set.

If the provided dataset link cannot be opened and downloaded, click this link OpenSLR

Logs

Links: Progress Logs

Contributors

@ZJUGuoShuai @williamchenwl

@nl8590687 (repo owner)

Donate

README_EN.md Unescape Escape