Go to file

phezzan 13e0e826bf first commit		2024-12-17 15:47:24 +08:00
models--BELLE-2--Belle-whisper-large-v3-turbo-zh	first commit	2024-12-17 15:47:24 +08:00
audio.txt	first commit	2024-12-17 15:47:24 +08:00
audio.wav	first commit	2024-12-17 15:47:24 +08:00
README.md	first commit	2024-12-17 15:47:24 +08:00
wav2text.py	first commit	2024-12-17 15:47:24 +08:00

README.md

项目名称：自动语音识别 (ASR) 项目

项目简介

本项目基于 Hugging Face transformers 库，使用BELLE-2-Belle-whisper-large-v3-turbo-zh 模型实现 中文语音识别 功能。
项目能够处理音频文件，并输出对应的文本内容，支持返回时间戳。

功能特性

自动将输入的中文音频文件转换为文本。
支持时间戳输出，方便定位音频内容。
将识别结果保存为文本文件。

环境依赖

1. Python 环境

请确保已安装 Python 3.12.7。

2. FFmpeg 支持

该项目需要 FFmpeg 工具来处理音频文件。

FFmpeg 安装指南：

Windows 用户：
下载 FFmpeg 安装包并添加到系统环境变量。
下载链接

Linux/Mac 用户：
使用包管理器安装：

sudo apt-get install ffmpeg       # Ubuntu/Debian
brew install ffmpeg               # macOS

3. Python 依赖库

本项目使用以下 Python 库，请在项目虚拟环境中安装依赖：

pip install transformers torch tokenizers requests tqdm safetensors numpy jinja2 regex packaging colorama

安装与运行步骤

1. 创建虚拟环境

使用 conda 创建虚拟环境 ai：

conda create -n ai python=3.12.7
conda activate ai

2. 安装依赖包

安装项目所需的 Python 包：

pip install transformers torch tokenizers requests tqdm safetensors numpy jinja2 regex packaging colorama

3. 确保 FFmpeg 已安装

验证 FFmpeg 是否安装成功：

ffmpeg -version

4. 下载模型文件

请将 BELLE-2-Belle-whisper-large-v3-turbo-zh 模型文件放置在本地目录 ./models--BELLE-2--Belle-whisper-large-v3-turbo-zh。

运行项目

将您的输入音频文件重命名为 output_file2.wav 并放置在项目根目录下，运行以下脚本：

python main.py

示例输出

程序将识别音频文件内容，并将结果保存到 20241028_151221.txt。

项目核心代码

from transformers import pipeline

transcriber = pipeline(
    "automatic-speech-recognition",
    model="./models--BELLE-2--Belle-whisper-large-v3-turbo-zh"
)

transcriber.model.config.forced_decoder_ids = (
    transcriber.tokenizer.get_decoder_prompt_ids(
        language="zh",
        task="transcribe"
    )
)

# 添加 return_timestamps=True
transcription = transcriber("output_file2.wav", return_timestamps=True)
print(transcription)

# 保存识别结果
with open("20241028_151221.txt", "w", encoding="utf-8") as f:
    f.write(str(transcription))

文件目录结构

project-root/
│
├── models--BELLE-2--Belle-whisper-large-v3-turbo-zh/   # 模型文件目录
├── output_file2.wav                                   # 输入的音频文件
├── main.py                                            # 项目主程序
├── requirements.txt                                   # 依赖包列表
└── README.md                                          # 项目说明文档

注意事项

FFmpeg 配置：若程序提示缺少音频处理支持，请确保 FFmpeg 已正确安装。
模型路径：请确保 model 参数指向正确的本地模型路径。
虚拟环境：运行项目时，必须激活虚拟环境 ai。

贡献与反馈

如果您在使用过程中遇到问题或有改进建议，请提交 Issue 或 Pull Request。

许可证

本项目遵循 Apache License 2.0。

README.md Unescape Escape