SenseVoice在FunASR的1.2.0版本时间戳与字符概率对不齐的问题 #2324

psk-github · 2024-12-20T08:41:18Z

🐛 Bug

SenseVoice在FunASR的1.2.0版本支持字符时间戳功能下，字符数与时间戳个数不一致。

To Reproduce

使用官方SenseVoice时间戳demo，在Notebook中直接识别音频文件

音频文件如下：
temp.zip

Code sample

import torch
from funasr import AutoModel
from funasr.utils.postprocess_utils import rich_transcription_postprocess

model_dir = "iic/SenseVoiceSmall"

model = AutoModel(
model=model_dir,
vad_model="fsmn-vad",
vad_kwargs={"max_single_segment_time": 30000},
device="cpu",
ncpu=8,
disable_update=True,
disable_pbar=True,
)
print("模型加载完成")

#torch.set_num_threads(8)
#torch.set_num_interop_threads(8)

res = model.generate(
input=f"sd_pr_right.wav",
cache={},
language="zh", # "zh", "en", "yue", "ja", "ko", "nospeech"
use_itn=True,
batch_size_s=60,
merge_vad=False, #
merge_length_s=15,
output_timestamp=True,
return_raw_text=True,
)
print(res)
text = rich_transcription_postprocess(res[0]["text"])
print(text)

Expected behavior

识别结果字符数与时间戳个数不一致，字符个数为49个，时间戳个数只有47个。且英文部分识别效果较差。

Environment

Notebook CPU 8核32G
FunASR 1.2.0

Additional context

上述问题是今天早上在一个Notebook实例上出现的，在触发1小时闲置后，换了一个实例运行，结果就是字符数与时间戳数一致。但是在本地的python3.8 docker镜像中手动安装FunASR 1.2.0的环境下识别上述音频文件，可以稳定复现上述问题。

psk-github added the bug Something isn't working label Dec 20, 2024

LauraGPT assigned R1ckShi Dec 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SenseVoice在FunASR的1.2.0版本时间戳与字符概率对不齐的问题 #2324

SenseVoice在FunASR的1.2.0版本时间戳与字符概率对不齐的问题 #2324

psk-github commented Dec 20, 2024

SenseVoice在FunASR的1.2.0版本时间戳与字符概率对不齐的问题 #2324

SenseVoice在FunASR的1.2.0版本时间戳与字符概率对不齐的问题 #2324

Comments

psk-github commented Dec 20, 2024

🐛 Bug

To Reproduce

Code sample

Expected behavior

Environment

Additional context