You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
professional post-processing. Costs a lot of money (think at least five digits). Compare this to untrained speaker with 50 slash $ usb podcast microphone recording in the bedroom. The sound examples are from the gaming context as well - so no surprise that it works well there. Are there any links to real world examples, e.g. how would it sounds when speaking a news article or weather report? (model has seen the input at training) - this always sounds good as the original. resources for training and inference. This works for gaming context where the spoken voice is pre-recorded. I personally aim for realtime inference on restricted hardware - this makes Waveglow not a first choice for me.
Link to Nvidia FastPitch is interesting, did not here about it before.
For comparison you might want to look out for work with the 'Gothic' dataset.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
>>> Ole_Klett
[February 18, 2021, 1:42pm]
I just read
this.
I started to wonder: are we wasting our time here?
We try to accomplish the same, but most soundcloud samples are not
convincing to say the least.
And this guy uses the same technologies with way better results.
What am I overlooking here?
[This is an archived TTS discussion thread from discourse.mozilla.org/t/results-can-be-so-much-better-we-are-all-doing-it-wrong]
Beta Was this translation helpful? Give feedback.
All reactions