flossypossy
DF Vagrant
would be cool
Some content may not be available to Guests. Consider registering an account to enjoy unrestricted access to guides, support and tools
GhostTears said:This has been brought up so many times and I have no Idea what eveyone is thinking is going to happen if this software actually existed. what, are you going to match the tone of every single moan and sloppy dick suck to perfectly match emma watsons particular accent and the one "OH YES"? Like get real.
FaceRipper said:I am currently working on developing my own voice dataset of some person (i havent decided yet). I have researched it quite a bit so here is what I can tell you. The best program is tacotron-2 by Iperov on github. Tacotron is google's implementation of TTS speech deep learning. The github versions are clones of the idea.
To make a "good" voice you need around 20 hours of clean audio. Then you need to make a csv file that each row contains the name of the wav file (1-10sec), text of wave file, fully spelled out numbers and punctuation text of wave file. Deliminated with '|' character.
Iterations are slower (i believe) than image deepfakes, and it takes around 300-400k iterations to get a good clean sound that has no raspyness to it. The end result's manner of speech and inflections can be weird if the text entered doesn't correspond well to inflections in the source audio. So just like image deepfakes you have to limit the scope of data to get more quality results. Here is a twitch streamer named Forsen that a subscriber of his built Trump and Forsen voice models most likely with tacotron. You can hear it is pretty good.
FakesWizard said:FaceRipper said:I am currently working on developing my own voice dataset of some person (i havent decided yet). I have researched it quite a bit so here is what I can tell you. The best program is tacotron-2 by Iperov on github. Tacotron is google's implementation of TTS speech deep learning. The github versions are clones of the idea.
To make a "good" voice you need around 20 hours of clean audio. Then you need to make a csv file that each row contains the name of the wav file (1-10sec), text of wave file, fully spelled out numbers and punctuation text of wave file. Deliminated with '|' character.
Iterations are slower (i believe) than image deepfakes, and it takes around 300-400k iterations to get a good clean sound that has no raspyness to it. The end result's manner of speech and inflections can be weird if the text entered doesn't correspond well to inflections in the source audio. So just like image deepfakes you have to limit the scope of data to get more quality results. Here is a twitch streamer named Forsen that a subscriber of his built Trump and Forsen voice models most likely with tacotron. You can hear it is pretty good.
Does there exist a comprehensive tutorial on how to do this?
stron01 said:does anyone knows which program they are using?
https://www.youtube.com/channel/UCID5qusrF32kSj-oSGq3rJg/featured