Go to file
nl fe928f5f1f add some infomation for a question often asked 2018-09-27 17:29:18 +08:00
datalist 修复了识别成汉字时有时候识别不全的bug 2018-05-28 17:43:50 +08:00
general_function fix init bug for multi gpu model 2018-09-26 19:47:04 +08:00
model_language fix bugs and modify dropout and update language model 2018-05-05 13:41:45 +08:00
.gitignore fix bugs and improve asrserver 2018-05-11 16:56:59 +08:00
LICENSE add license and gitignore.modify py code main.py 2017-08-22 17:56:05 +08:00
LanguageModel.py 文件同步 2018-07-19 21:21:51 +08:00
README.md add some infomation for a question often asked 2018-09-27 17:29:18 +08:00
README_EN.md add some infomation for a question often asked 2018-09-27 17:29:18 +08:00
SpeechModel24.py 调整默认配置 2018-07-15 14:21:10 +08:00
SpeechModel25.py 调整默认配置 2018-07-15 14:21:10 +08:00
SpeechModel26.py 调整默认配置 2018-07-15 14:21:10 +08:00
SpeechModel251.py 修改模型251,并添加多GPU代码 2018-07-23 19:59:29 +08:00
SpeechModel251_p.py 修改模型251,并添加多GPU代码 2018-07-23 19:59:29 +08:00
asrserver.py switch model to 251 2018-07-27 14:31:48 +08:00
dict.txt 纠正dict.txt中的错误 2018-07-29 19:51:45 +08:00
log.md add a new model 251 2018-07-06 13:57:53 +08:00
readdata24.py 修改了随机读取数据的方法,可以提高模型的泛化能力 2018-07-09 17:50:17 +08:00
test.py 修复了识别成汉字时有时候识别不全的bug 2018-05-28 17:43:50 +08:00
testClient.py delete no use code 2018-07-28 17:17:38 +08:00
test_mspeech.py switch model to 251 2018-07-27 14:31:48 +08:00
train_mspeech.py switch model to 251 2018-07-27 14:31:48 +08:00

README_EN.md

A Deep-Learning-Based Chinese Speech Recognition System

GPL-3.0 Licensed TensorFlow Version Keras Version Python Version

ReadMe Language 中文版 English

View this project's wiki page (In progress..)

A post about ASRT's introduction ASRT: Chinese Speech Recognition System

For questions about the principles of the statistical language model that are often asked, see: [Simple word frequency statistics without Chinese word segmentation algorithm] (https://blog.ailemon.me/2017/02/20/simple-words-frequency-statistic-without-segmentation-algorithm/)

Introduction

This project uses Keras, TensorFlow based on deep convolutional neural network and long-short memory neural network, attention mechanism and CTC to implement.

The project can now be properly trained.

After cloning a repository through git, you need to copy all the files in the datalist directory to the dataset directory, that is, put them together with the data set.

$ cp -rf datalist/* dataset/

Currently available models are 24, 25 and 251

To start training this project, please execute:

$ python3 train_mspeech.py

To start the test of this project, please execute:

$ python3 test_mspeech.py

Before testing, make sure the model file path filled in the code files exists.

ASRT API Server startup please execute:

$ python3 asrserver.py

If you want to train and use Model 251, make changes in the corresponding position of the import SpeechModel in the code files.

If there is any problem during the execution of the program or during use, it can be promptly put forward in the issue, and I will reply as soon as possible.

You can check the FAQ first before asking questions.

Model

Speech Model

CNN + LSTM/GRU + CTC

  • Questions about downloading trained models

The complete source program can be obtained from the archives of the various versions of the software released in the releases page of Github.

Language Model

Maximum Entropy Hidden Markov Model Based on Probability Graph.

About Accuracy

At present, the best model can basically reach 80% of Pinyin correct rate on the test set.

However, as the current international and domestic teams can achieve 97%, the accuracy rate still needs to be further improved.

  • At present, one solution that can continue to improve the accuracy rate is correcting data set labeling errors, especially the ST-CMDS error in the syllable file. There is a certain percentage of errors in the label. If you have see this and you have the will to help correct some of the data tagging mistakes by own ability, I will be very welcome. It can be corrected by submitting a Pull Request, and you will be on the list of contributors of this repo.

Samples: 不是: bu4 shi4 -> bu2 shi4 一个yi1 ge4 -> yi2 ge4 了解le5 jie3 -> liao3 jie3

  • Corrected part:

ST-CMDS

train: 20170001P00001A 20170001P00001I 20170001P00002A

Python libraries that need importing

  • python_speech_features
  • TensorFlow
  • Keras
  • Numpy
  • wave
  • matplotlib
  • math
  • Scipy
  • h5py

Data Sets

  • Tsinghua University THCHS30 Chinese voice data set

data_thchs30.tgz http://www.openslr.org/resources/18/data_thchs30.tgz

test-noise.tgz http://www.openslr.org/resources/18/test-noise.tgz

resource.tgz http://www.openslr.org/resources/18/resource.tgz

  • Free ST Chinese Mandarin Corpus

ST-CMDS-20170001_1-OS.tar.gz http://www.openslr.org/resources/38/ST-CMDS-20170001_1-OS.tar.gz

Special thanks! Thanks to the predecessors' public voice data set.

If the provided dataset link cannot be opened and downloaded, click this link [OpenSLR] (http://www.openslr.org)

Logs

Links: Progress Logs

Contributors

@ZJUGuoShuai @williamchenwl

@nl8590687 (repo owner)

Donate