Go to file

nl 9639d08d61 update readme		2020-09-30 18:02:37 +08:00
.github	Update FUNDING.yml	2019-05-25 21:02:13 +08:00
datalist	同时修复部分数据集标注错误	2018-11-01 17:56:15 +08:00
general_function	修复bug	2019-07-31 13:42:12 +08:00
model_language	fix bugs and modify dropout and update language model	2018-05-05 13:41:45 +08:00
.gitignore	fix bugs and improve asrserver	2018-05-11 16:56:59 +08:00
LICENSE	add license and gitignore.modify py code main.py	2017-08-22 17:56:05 +08:00
LanguageModel.py	update readme docs	2019-01-15 16:46:48 +08:00
LanguageModel2.py	添加恐慌模式算法，通过解码失败时回退若干字，实现鲁棒性更高的连续文本拼音转汉字HMM统计语言模型	2019-03-22 16:44:24 +08:00
README.md	update readme	2020-09-30 18:02:37 +08:00
README_EN.md	update readme	2020-09-30 18:02:37 +08:00
SpeechModel24.py	适应性修改	2019-09-18 14:29:55 +08:00
SpeechModel25.py	适应性修改	2019-09-18 14:29:55 +08:00
SpeechModel26.py	适应性修改	2019-09-18 14:29:55 +08:00
SpeechModel251.py	解决预测一次后图没了的问题	2020-08-19 15:38:24 +08:00
SpeechModel251_limitless.py	添加不限制时间长度的代码	2019-11-15 21:11:13 +08:00
SpeechModel251_p.py	适应性修改	2019-09-18 14:29:55 +08:00
SpeechModel252.py	适应性修改	2019-09-18 14:29:55 +08:00
SpeechModel261.py	适应性修改	2019-09-18 14:29:55 +08:00
SpeechModel261_p.py	适应性修改	2019-09-18 14:29:55 +08:00
asrserver.py	优化data_generator代码	2019-01-19 11:27:57 +08:00
dict.txt	纠正dict里面的错误	2019-04-02 16:33:51 +08:00
log.md	modify md docs	2019-09-01 19:43:10 +08:00
readdata24.py	对第202行除8模8的代码添加相关注释信息	2019-07-31 10:26:59 +08:00
readdata24_limitless.py	添加不限制时间长度的代码	2019-11-15 21:11:13 +08:00
speech-recorder.py	添加一个python录音程序	2018-10-23 10:40:08 +08:00
test.py	更新拼音参数和几条文件路径，声学模型文件跟之前版本不再兼容，需要重新训练	2019-03-29 14:28:01 +08:00
testClient.py	添加提示：注意，Thchs30和ST-CMDS都必须下载，缺一不可	2019-10-20 18:13:03 +08:00
test_mspeech.py	compatible with TF2.0	2020-03-26 14:04:53 -07:00
train_mspeech.py	自动创建保存模型的子目录	2020-08-10 16:35:40 +08:00

README_EN.md

A Deep-Learning-Based Chinese Speech Recognition System

ReadMe Language | 中文版 | English |

ASRT Project Home Page | Released Download | View this project's wiki document (Chinese) | Experience Demo

If you have any questions in your works with this project, welcome to put up issues in this repo and I will response as soon as possible.

You can check the FAQ Page (Chinese) first before asking questions to avoid repeating questions.

A post about ASRT's introduction

ASRT: Chinese Speech Recognition System (Chinese)

About how to use ASRT to train and deploy：

For questions about the principles of the statistical language model that are often asked, see:

For questions about CTC, see:

[Translation] Sequence Modeling with CTC (Chinese)

For more infomation please refer to author's blog website: AILemon Blog (Chinese)

Introduction

This project uses Keras, TensorFlow based on deep convolutional neural network and long-short memory neural network, attention mechanism and CTC to implement.

Steps

First, clone the project to your computer through Git, and then download the data sets needed for the training of this project. For the download links, please refer to End of Document

$ git clone https://github.com/nl8590687/ASRT_SpeechRecognition.git

Or you can use the "Fork" button to copy a copy of the project and then clone it locally with your own SSH key.

After cloning the repository via git, go to the project root directory; create a subdirectory dataset/ (you can use a soft link instead), and then extract the downloaded datasets directly into it.

Note that in the current version, both the Thchs30 and ST-CMDS data sets must be downloaded and used, and using other data sets need to modify the sourece codes.

$ cd ASRT_SpeechRecognition

$ mkdir dataset

$ tar zxf <dataset zip files name> -C dataset/

Then, you need to copy all the files in the 'datalist' directory to the dataset directory, that is, put them together with the data set.

$ cp -rf datalist/* dataset/

Currently available models are 24, 25 and 251

Before running this project, please install the necessary Python3 version dependent library

To start training this project, please execute:

$ python3 train_mspeech.py

To start the test of this project, please execute:

$ python3 test_mspeech.py

Before testing, make sure the model file path filled in the code files exists.

ASRT API Server startup please execute:

$ python3 asrserver.py

Please note that after opening the API server, you need to use the client software corresponding to this ASRT project for voice recognition. For details, see the Wiki documentation ASRT Client Demo.

If you want to train and use Model 251, make changes in the corresponding position of the import SpeechModel in the code files.

If there is any problem during the execution of the program or during use, it can be promptly put forward in the issue, and I will reply as soon as possible.

Model

Speech Model

CNN + LSTM/GRU + CTC

The maximum length of the input audio is 16 seconds, and the output is the corresponding Chinese pinyin sequence.

Questions about downloading trained models

The complete source program that includes trained model weights can be obtained from the archives of the various versions of the software released in the releases page of Github.

The released finished software can be downloaded here: ASRT download page

Language Model

Maximum Entropy Hidden Markov Model Based on Probability Graph.

The input is a Chinese pinyin sequence, and the output is the corresponding Chinese character text.

About Accuracy

At present, the best model can basically reach 80% of Pinyin correct rate on the test set.

However, as the current international and domestic teams can achieve 98%, the accuracy rate still needs to be further improved.

Python libraries that need importing

python_speech_features
TensorFlow (1.13+)
Keras (2.3+)
Numpy
wave
matplotlib
math
Scipy
h5py
http
urllib

Dependent Environment Details

Data Sets

Some free Chinese speech datasets (Chinese)

Tsinghua University THCHS30 Chinese voice data set

data_thchs30.tgz Download

test-noise.tgz Download

resource.tgz Download
Free ST Chinese Mandarin Corpus

ST-CMDS-20170001_1-OS.tar.gz Download

AIShell-1 Open Source Dataset

data_aishell.tgz Download

Note：unzip this dataset

$ tar xzf data_aishell.tgz
$ cd data_aishell/wav
$ for tar in *.tar.gz;  do tar xvf $tar; done

Primewords Chinese Corpus Set 1

primewords_md_2018_set1.tar.gz Download
aidatatang_200zh

aidatatang_200zh.tgz Download
MagicData

train_set.tar.gz Download

dev_set.tar.gz Download

test_set.tar.gz Download

metadata.tar.gz Download

Special thanks! Thanks to the predecessors' public voice data set.

If the provided dataset link cannot be opened and downloaded, click this link OpenSLR

License

Contributors

@zw76859420 @madeirak @ZJUGuoShuai @williamchenwl

@nl8590687 (repo owner)

Donate

README_EN.md Unescape Escape