ASRT_SpeechRecognition

Go to file

ailemon ed0e927853 feat: 为python的generator包装一个线程安全的生成器		2022-04-18 14:48:44 +08:00
.github	Update FUNDING.yml	2019-05-25 21:02:13 +08:00
assets	feat: 实现新的基于http协议API服务接口	2022-02-09 18:52:35 +08:00
datalist	更新st-cmds拼音标签	2020-10-23 20:05:05 +08:00
model_language	fix: 去除其中的异形字和生僻字	2021-05-16 19:46:47 +08:00
speech_features	feat: 为Logfbank特征增加可配置的滤波器数参数	2022-03-05 19:09:28 +08:00
utils	feat: 为python的generator包装一个线程安全的生成器	2022-04-18 14:48:44 +08:00
.gitignore	feat: 移除旧版本中本身已经不用的代码	2021-11-24 15:19:17 +08:00
Dockerfile	docs & ci: 更新相关信息	2022-03-26 23:18:10 +08:00
LICENSE	add license and gitignore.modify py code main.py	2017-08-22 17:56:05 +08:00
LanguageModel2.py	perf: 改冒泡排序为sorted快排以提升速度	2021-12-05 23:06:11 +08:00
README.md	docs: update readme	2022-04-06 13:01:39 +08:00
README_EN.md	docs: update readme	2022-04-06 13:01:39 +08:00
asrserver_http.py	feat: 切换默认声学模型到m251bn	2022-03-27 21:47:12 +08:00
asrt_config.json	feat: 在配置文件中默认添加数据集到6个	2021-12-04 23:24:54 +08:00
client_http.py	feat: 实现新的基于http协议API服务接口	2022-02-09 18:52:35 +08:00
data_loader.py	style: 规范代码风格	2021-11-24 15:11:08 +08:00
dict.txt	fix: 纠正dict.txt中的错字	2021-12-04 22:15:55 +08:00
download_default_datalist.py	feat: 添加下载默认datalist的程序	2021-12-05 01:47:41 +08:00
evaluate_speech_model.py	feat: 切换默认声学模型到m251bn	2022-03-27 21:47:12 +08:00
predict_speech_file.py	feat: 切换默认声学模型到m251bn	2022-03-27 21:47:12 +08:00
requirements.txt	Merge pull request #270 from nl8590687/dependabot/pip/tensorflow-gpu-2.5.3	2022-03-08 15:20:54 +08:00
speech_model.py	feat: 为python的generator包装一个线程安全的生成器	2022-04-18 14:48:44 +08:00
speech_model_zoo.py	feat: 添加SpeechModel251BN模型	2022-03-16 14:31:16 +08:00
speech_recorder.py	style: 规范代码风格	2021-11-26 18:27:35 +08:00
train_speech_model.py	feat: 切换默认声学模型到m251bn	2022-03-27 21:47:12 +08:00

README_EN.md

ASRT is A Deep-Learning-Based Chinese Speech Recognition System. If you like this project, please star it.

ReadMe Language | 中文版 | English |

ASRT Project Home Page | Released Download | View this project's wiki document (Chinese) | Experience Demo | Donate

If you have any questions in your works with this project, welcome to put up issues in this repo and I will response as soon as possible.

You can check the FAQ Page (Chinese) first before asking questions to avoid repeating questions.

If there is any abnormality when the program is running, please send a complete screenshot when asking questions, and indicate the CPU architecture, GPU model, operating system, Python, TensorFlow and CUDA versions used, and whether any code has been modified or data sets have been added or deleted, etc. .

Introduction

This project uses tensorFlow.keras based on deep convolutional neural network and long-short memory neural network, attention mechanism and CTC to implement.

Minimum requirements for training

Hardware

CPU: 4 Core (x86_64, amd64) +
RAM: 16 GB +
GPU: NVIDIA, Graph Memory 11GB+ (>1080ti)
硬盘: 500 GB HDD(or SSD)

Software

Linux: Ubuntu 18.04 + / CentOS 7 +
Python: 3.6 +
TensorFlow: 1.15, 2.x + (The latest and x.x.0 are deprecated)

Quick Start

Take the operation under the Linux system as an example:

First, clone the project to your computer through Git, and then download the data sets needed for the training of this project. For the download links, please refer to End of Document

$ git clone https://github.com/nl8590687/ASRT_SpeechRecognition.git

Or you can use the "Fork" button to copy a copy of the project and then clone it locally with your own SSH key.

After cloning the repository via git, go to the project root directory; create a subdirectory /data/speech_data (you can use a soft link instead) for datasets, and then extract the downloaded datasets directly into it.

$ cd ASRT_SpeechRecognition

$ mkdir /data/speech_data

$ tar zxf <dataset zip files name> -C /data/speech_data/

Note that in the current version, in the configuration file, six data sets, Thchs30, ST-CMDS, Primewords, aishell-1, aidatatang200, MagicData, are added by default, please delete them if you don’t need them. If you want to use other data sets, you need to add data configuration yourself, and use the standard format supported by ASRT to organize the data in advance.

To download pinyin syllable list files for default dataset:

$ python download_default_datalist.py

Currently available models are 24, 25, 251 and 251bn

Before running this project, please install the necessary Python3 version dependent library

To start training this project, please execute:

$ python3 train_speech_model.py

To start the test of this project, please execute:

$ python3 evaluate_speech_model.py

Before testing, make sure the model file path filled in the code files exists.

To predict one wave audio file for speech recognition：

$ python3 predict_speech_file.py

ASRT API Server startup please execute:

$ python3 asrserver_http.py

Please note that after opening the API server, you need to use the client software corresponding to this ASRT project for voice recognition. For details, see the Wiki documentation to download ASRT Client SDK & Demo.

To test whether it is successful or not that calls api service interface:

$ python3 client_http.py

If you want to train and use other model(not Model 251bn), make changes in the corresponding position of the import speech_model_zoo in the code files.

If there is any problem during the execution of the program or during use, it can be promptly put forward in the issue, and I will reply as soon as possible.

Deploy ASRT by docker：

$ docker pull ailemondocker/asrt_service:1.2.0
$ docker run --rm -it -p 20001:20001 --name asrt-server -d ailemondocker/asrt_service:1.2.0

It will start a api server for recognition rather than training.

Model

Speech Model

DCNN + CTC

The maximum length of the input audio is 16 seconds, and the output is the corresponding Chinese pinyin sequence.

Questions about downloading trained models

The released finished software that includes trained model weights can be downloaded from ASRT download page.

Github Releases page includes the archives of the various versions of the software released and it's introduction. Under each version module, there is a zip file that includes trained model weights files.

Language Model

Maximum Entropy Hidden Markov Model Based on Probability Graph.

The input is a Chinese pinyin sequence, and the output is the corresponding Chinese character text.

About Accuracy

At present, the best model can basically reach 85% of Pinyin correct rate on the test set.

Python Dependency Library

tensorFlow (1.15 - 2.x)
numpy
wave
matplotlib
math
scipy
requests
flask
waitress

If you have trouble when install those packages, please run the following script to do it as long as you have a GPU and CUDA 11.2 and cudnn 8.1 have been installed：

$ pip install -r requirements.txt

Dependent Environment Details and Hardware Requirement

ASRT Client SDK for Calling Speech Recognition API

ASRT provides the abilities to import client SDKs for several platform and programing language for client develop speech recognition features , which work by RPC. Please refer ASRT project documents for detail.

Client Platform	Project Repos Link
Windows Client SDK & Demo	ASRT_SDK_WinClient
Python3 Client SDK & Demo (Any Platform)	ASRT_SDK_Python3
Golang Client SDK & Demo	asrt-sdk-go
Java Client SDK & Demo	ASRT_SDK_Java

Data Sets

For full content please refer: Some free Chinese speech datasets (Chinese)

Dataset	Time	Size	Download (CN Mirrors)	Download (Source)
THCHS30	40h	6.01G	data_thchs30.tgz	data_thchs30.tgz
ST-CMDS	100h	7.67G	ST-CMDS-20170001_1-OS.tar.gz	ST-CMDS-20170001_1-OS.tar.gz
AIShell-1	178h	14.51G	data_aishell.tgz	data_aishell.tgz
Primewords	100h	8.44G	primewords_md_2018_set1.tar.gz	primewords_md_2018_set1.tar.gz
aidatatang_200zh	200h	17.47G	aidatatang_200zh.tgz	aidatatang_200zh.tgz
MagicData	755h	52G/1.0G/2.2G	train_set.tar.gz / dev_set.tar.gz / test_set.tar.gz	train_set.tar.gz / dev_set.tar.gz / test_set.tar.gz

Note：The way to unzip AISHELL-1 dataset

$ tar xzf data_aishell.tgz
$ cd data_aishell/wav
$ for tar in *.tar.gz;  do tar xvf $tar; done

Special thanks! Thanks to the predecessors' public voice data set.

If the provided dataset link cannot be opened and downloaded, click this link OpenSLR

ASRT Docuemnts

ASRT project's Wiki document

A post about ASRT's introduction

ASRT: Chinese Speech Recognition System (Chinese)

About how to use ASRT to train and deploy：

For questions about the principles of the statistical language model that are often asked, see:

For questions about CTC, see:

[Translation] Sequence Modeling with CTC (Chinese)

For more infomation please refer to author's blog website: AILemon Blog (Chinese)

License

Cite this project

DOI: 10.5281/zenodo.5808434

Contributors

Contributors Page

@nl8590687 (repo owner)

README_EN.md Unescape Escape