目录
一、环境配置
1、创建虚拟环境
2、安装环境及pytorch
官网:pytorch下载地址
3、安装funasr之前,确保已经安装了下面依赖环境:
python代码调用(推荐)
4、模型下载
5、启动funasr服务
二、 客户端连接
2.1 html连接
三、推理识别模型
1、实时语音识别
2、非实时语音识别
一、环境配置
源码地址:FunASR
FunASR/README_zh.md at main · alibaba-damo-academy/FunASR · GitHub
1、创建虚拟环境
conda create -n funasr python==3.9 -y
conda activate funasr
2、安装环境及pytorch
官网:pytorch下载地址
pip3 install -e ./ -i https://pypi.tuna.tsinghua.edu.cn/simple
conda install pytorch==2.5.0 torchvision==0.20.0 torchaudio==2.5.0 pytorch-cuda=12.4 -c pytorch -c nvidia -y
pip install torch==2.5.0 torchvision==0.20.0 torchaudio==2.5.0 --index-url https://download.pytorch.org/whl/cu124
3、安装funasr之前,确保已经安装了下面依赖环境:
pip3 install -U funasr -i https://pypi.tuna.tsinghua.edu.cn/simple
或者
touch requirements.txt
# Ultralytics requirements
# Usage: pip3 install -r requirements.txt -i https://pypi.tuna.tsinghua.edu.cn/simple
# Base ----------------------------------------
matplotlib>=3.2.2
numpy>=1.22.2 # pinned by Snyk to avoid a vulnerability
opencv-python>=4.6.0
pillow>=7.1.2
pyyaml>=5.3.1
requests>=2.23.0
scipy>=1.4.1
torch>=1.7.0
torchvision>=0.8.1
tqdm>=4.64.0
# Logging -------------------------------------
# tensorboard>=2.13.0
# dvclive>=2.12.0
# clearml
# comet
# Plotting ------------------------------------
pandas>=1.1.4
seaborn>=0.11.0
# Export --------------------------------------
# coremltools>=7.0.b1 # CoreML export
# onnx>=1.12.0 # ONNX export
# onnxsim>=0.4.1 # ONNX simplifier
# nvidia-pyindex # TensorRT export
# nvidia-tensorrt # TensorRT export
# scikit-learn==0.19.2 # CoreML quantization
# tensorflow>=2.4.1 # TF exports (-cpu, -aarch64, -macos)
# tflite-support
# tensorflowjs>=3.9.0 # TF.js export
# openvino-dev>=2023.0 # OpenVINO export
# Extras --------------------------------------
psutil # system utilization
py-cpuinfo # display CPU info
# thop>=0.1.1 # FLOPs computation
# ipython # interactive notebook
# albumentations>=1.0.3 # training augmentations
# pycocotools>=2.0.6 # COCO mAP
# roboflow
pip3 install -r requirements.txt -i https://pypi.tuna.tsinghua.edu.cn/simple
pip install torchaudio -i https://pypi.tuna.tsinghua.edu.cn/simple
自己录了一段语音测试效果还不错:
funasr ++model=paraformer-zh ++vad_model="fsmn-vad" ++punc_model="ct-punc" ++input=/home/sxj/FunASR/outputs/c.wav
模型保存路径:
/home/sxj/.cache/modelscope/hub/iic
python代码调用(推荐)
from funasr import AutoModel
model = AutoModel(model="paraformer-zh")
res = model.generate(input="https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/vad_example.wav")
print(res)
4、模型下载
实时语音识别模型地址:FunASR语音识别模型下载
测试音频(中文,英文)
5、启动funasr服务
cd runtime
bash run_server_2pass.sh
启动成功,端口号为【10095】
二、 服务器部署
1、
cd /home/sxj/FunASR/runtime/python/websocket
2、服务器端
先安装环境
pip install -r requirements_client.txt -i https://pypi.tuna.tsinghua.edu.cn/simple
python funasr_wss_server.py
3、客户端
python funasr_wss_client.py
4、运行html5页面:/home/sxj/FunASR/runtime/html5/static
funasr_samples文件夹中包含多种客户端连接方式,此处以html和python为例
2.1 html连接
打开file:///home/sxj/FunASR/web-pages/public/static/online/index.html文件夹,使用网页运行index.html
将asr服务器地址更改为 【ws://127.0.0.1:10095】,点击连接进行测试,如连接失败更改端口为【10096】
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
部署与开发文档
部署模型来自于ModelScope,或者用户finetune,支持用户定制服务,详细文档参考(点击此处)
三、推理识别模型
快速开始
funasr ++model=paraformer-zh ++vad_model="fsmn-vad" ++punc_model="ct-punc" ++input=asr_example_zh.wav
1、实时语音识别
from funasr import AutoModel
chunk_size = [0, 10, 5] #[0, 10, 5] 600ms, [0, 8, 4] 480ms
encoder_chunk_look_back = 4 #number of chunks to lookback for encoder self-attention
decoder_chunk_look_back = 1 #number of encoder chunks to lookback for decoder cross-attention
model = AutoModel(model="paraformer-zh-streaming")
import soundfile
import os
wav_file = os.path.join(model.model_path, "example/asr_example.wav")
speech, sample_rate = soundfile.read(wav_file)
chunk_stride = chunk_size[1] * 960 # 600ms
cache = {}
total_chunk_num = int(len((speech)-1)/chunk_stride+1)
for i in range(total_chunk_num):
speech_chunk = speech[i*chunk_stride:(i+1)*chunk_stride]
is_final = i == total_chunk_num - 1
res = model.generate(input=speech_chunk, cache=cache, is_final=is_final, chunk_size=chunk_size, encoder_chunk_look_back=encoder_chunk_look_back, decoder_chunk_look_back=decoder_chunk_look_back)
print(res)
注:chunk_size
为流式延时配置,[0,10,5]
表示上屏实时出字粒度为10*60=600ms
,未来信息为5*60=300ms
。每次推理输入为600ms
(采样点数为16000*0.6=960
),输出为对应文字,最后一个语音片段输入需要设置is_final=True
来强制输出最后一个字。
2、非实时语音识别
from funasr import AutoModel
from funasr.utils.postprocess_utils import rich_transcription_postprocess
model_dir = "iic/SenseVoiceSmall"
model = AutoModel(
model=model_dir,
vad_model="fsmn-vad",
vad_kwargs={"max_single_segment_time": 30000},
device="cuda:0",
)
# en
res = model.generate(
input=f"{model.model_path}/example/en.mp3",
cache={},
language="auto", # "zn", "en", "yue", "ja", "ko", "nospeech"
use_itn=True,
batch_size_s=60,
merge_vad=True, #
merge_length_s=15,
)
text = rich_transcription_postprocess(res[0]["text"])
print(text)
参数说明:
model_dir
:模型名称,或本地磁盘中的模型路径。vad_model
:表示开启VAD,VAD的作用是将长音频切割成短音频,此时推理耗时包括了VAD与SenseVoice总耗时,为链路耗时,如果需要单独测试SenseVoice模型耗时,可以关闭VAD模型。vad_kwargs
:表示VAD模型配置,max_single_segment_time
: 表示vad_model
最大切割音频时长, 单位是毫秒ms。use_itn
:输出结果中是否包含标点与逆文本正则化。batch_size_s
表示采用动态batch,batch中总音频时长,单位为秒s。merge_vad
:是否将 vad 模型切割的短音频碎片合成,合并后长度为merge_length_s
,单位为秒s。ban_emo_unk
:禁用emo_unk标签,禁用后所有的句子都会被赋与情感标签
未完...
参考:FunASR/README_zh.md at main · modelscope/FunASR · GitHub