PaddleSpeech r0.2.0
S2T
- Replace kaidi_fbank with paddleaudio #1612
- Support CTC decoder online #821 #1626
- Improve accuracy of Conformer. Support using kaiming Uniform as default initialization. #1577
TTS
- Add SpeedySpeech multi-speaker support for synthesize_e2e.py. #1370 by @jerryuhoo
- Add WaveRNN for CSMSC dataset. #1379
- Add Tacotron2 for CSMSC / LJSpeech datasets. #1314 / #1416
- Add GE2E Tacotron2 Voice Cloning for AISHELL3 dataset. #1419
- Update text frontend. #1506
- Add HiFiGAN for LJSpeech / AISHELL-3 / VCTK datasets. #1549 / #1581 / #1587
- Add NPU support for TransformerTTS. #1593 by @windstamp
- Add CNN Decoder for Streaming Fastspeech2. #1634
Audio
- Add
paddleaudio.compliance
modules that offers audio feature APIs aligned with Kaldi and Librosa. #1518 - Unittest and benchmark for audio feature APIs. #1548
- [Audio] - [audio] refactor audio arch #1494 by @zh794390558
- [Audio] - [audio] dtw metric #1493 by @zh794390558
- [Audio] - [audio] fix complicance bug #1597 by @zh794390558
Deployment
- [Deployment] - [speechx] high performance inference of speech task #1496 by @SmileGoat @zh794390558
- [Deployment] - [Speechx]fix normalizer bug #1600 #1621 #1619 #1633 #1635 #1619 by @SmileGoat
- [Deployment] - [speechx] refactor speechx #1631 #1616 #1576 #1572 #1541 by @zh794390558
- [Deployment] - [speechx] simplify cmake compiler #1538 #1536 #1535 by @zh794390558
server
- [server] - [websocket] added online asr engine #1627 by @WilliamZhang06
- [server] - [server] added engine type and asr inference #1475 by @WilliamZhang06
- [server] - [Server] added asr engine #1413 by @WilliamZhang06
- [server] - [Server] added engine factory and config #1399 by @WilliamZhang06
- [server] - [server] added engine framework #1383 by @WilliamZhang06
- [server] - [server] update readme #1604 by @lym0302
- [server] - [server] add server cls #1554 by @lym0302
- [server] - [server] add paddlespeech_server stats #1510 by @lym0302
- [server] - [server] add cli #1466 by @lym0302
- [server] - [server] add tts postprocess #1411 by @lym0302
- [server] - [server] tts server #1386 by @lym0302
vector
CLI
- Batch input supported. #1460
- TTS: Add WaveRNN for CSMSC dataset.
- TTS: Add HiFiGAN for LJSpeech / AISHELL-3 / VCTK datasets.
- Vector: add speaker verification demo and doc #1605 by @honei
Demo
- [Demo] - [vec][search] update client image url #1628 by @qingen
- [Demo] - [server] add server demo #1480 by @lym0302
- [Demo] - [vec][search] add audio similarity search #1609 by @qingen
Acknowledgements
Special thanks to @WilliamZhang06 @yt605155624 @windstamp @Jackwaterveg @honei @SmileGoat @KPatr1ck @zh794390558 @lym0302 @qingen