MrDeepFakes Forums

Some content may not be available to Guests. Consider registering an account to enjoy unrestricted access to guides, support and tools

  • We have temporarily disabled registrations. It will be re-enabled within a few weeks.

Voice generation

lordquas

DF Pleb
GhostTears said:
This has been brought up so many times and I have no Idea what eveyone is thinking is going to happen if this software actually existed. what, are you going to match the tone of every single moan and sloppy dick suck to perfectly match emma watsons particular accent and the one "OH YES"? Like get real.
 

FaceRipper

DF Admirer
I am currently working on developing my own voice dataset of some person (i havent decided yet). I have researched it quite a bit so here is what I can tell you. The best program is tacotron-2 by Iperov on github. Tacotron is google's implementation of TTS speech deep learning. The github versions are clones of the idea.

To make a "good" voice you need around 20 hours of clean audio. Then you need to make a csv file that each row contains the name of the wav file (1-10sec), text of wave file, fully spelled out numbers and punctuation text of wave file. Deliminated with '|' character.

Iterations are slower (i believe) than image deepfakes, and it takes around 300-400k iterations to get a good clean sound that has no raspyness to it. The end result's manner of speech and inflections can be weird if the text entered doesn't correspond well to inflections in the source audio. So just like image deepfakes you have to limit the scope of data to get more quality results. Here is a twitch streamer named Forsen that a subscriber of his built Trump and Forsen voice models most likely with tacotron. You can hear it is pretty good.
 

FakesWizard

DF Vagrant
Verified Video Creator
FaceRipper said:
I am currently working on developing my own voice dataset of some person (i havent decided yet).  I have researched it quite a bit so here is what I can tell you.  The best program is tacotron-2 by Iperov on github.  Tacotron is google's implementation of TTS speech deep learning.  The github versions are clones of the idea.  

To make a "good" voice you need around 20 hours of clean audio.  Then you need to make a csv file that each row contains the name of the wav file (1-10sec), text of wave file, fully spelled out numbers and punctuation text of wave file.  Deliminated with '|' character.  

Iterations are slower (i believe) than image deepfakes, and it takes around 300-400k iterations to get a good clean sound that has no raspyness to it.  The end result's manner of speech and inflections can be weird if the text entered doesn't correspond well to inflections in the source audio.  So just like image deepfakes you have to limit the scope of data to get more quality results.   Here is a twitch streamer named Forsen that a subscriber of his built Trump and Forsen voice models most likely with tacotron.  You can hear it is pretty good.  


Does there exist a comprehensive tutorial on how to do this?
 

FaceRipper

DF Admirer
FakesWizard said:
FaceRipper said:
I am currently working on developing my own voice dataset of some person (i havent decided yet).  I have researched it quite a bit so here is what I can tell you.  The best program is tacotron-2 by Iperov on github.  Tacotron is google's implementation of TTS speech deep learning.  The github versions are clones of the idea.  

To make a "good" voice you need around 20 hours of clean audio.  Then you need to make a csv file that each row contains the name of the wav file (1-10sec), text of wave file, fully spelled out numbers and punctuation text of wave file.  Deliminated with '|' character.  

Iterations are slower (i believe) than image deepfakes, and it takes around 300-400k iterations to get a good clean sound that has no raspyness to it.  The end result's manner of speech and inflections can be weird if the text entered doesn't correspond well to inflections in the source audio.  So just like image deepfakes you have to limit the scope of data to get more quality results.   Here is a twitch streamer named Forsen that a subscriber of his built Trump and Forsen voice models most likely with tacotron.  You can hear it is pretty good.  


Does there exist a comprehensive tutorial on how to do this?


Not yet as far as I know.  The only good youtube video 
 

Mr.BaBe

DF Enthusiast
Verified Video Creator
I come across Real-Time-Voice-Cloning-master.

a short youtube video shows how it works.


download link:
https://github.com/CorentinJ/Real-Time-Voice-Cloning

I ttried to install it, but it has a lot of depencies and I get stuck on installing webrtcvad, and coundt worked. According to video it seems decent work.

I will try other voice programs posted on different topic,
https://github.com/andabi/deep-voice-conversion
https://github.com/mazzzystar/randomCNN-voice-transfer
https://github.com/keithito/tacotron

try this
https://github.com/NVIDIA/tacotron2

anyone has able to use any of this or other program, that actually works and we can use on DF videos.
 
stron01 said:

This is amazing wow! They even got the cartoons speech impediment correct.

Has anyone done any deepfake audio yet that's believable yet? I don't know how anyone can gather an entire 20 hours for something it seems impossible even if they're on podcast you have to go back and type out every word they're saying? Or am I misunderstanding something here
 

Morfeus

DF Vagrant
Is someone dealing with this issue? I need a voice clone sofatware or a someone who can do it for me
 

Theking13

DF Vagrant
Wanted to chime in.

So I played with Real time Voice Cloning using the Google using the Colab version someone made. 

It's is very fast even through having to use Colab and uploading audio and stuff. But the generated voice sound like someone is talking through a muffled mic, sometimes you can hear words and sometimes you may get a decent sentence out. 
Not sure if I'm doing something wrong or audio I gave was a issue. I tried different audio lengths from 10 seconds to many minutes ripped from video files and cleaning it up.

The best results I got still sounded like someone talking though a gaming headset mic.

The guy who implemented this said it needs more work that he can't focus on yet. So I think it may just need improvemts.

https://colab.research.google.com/g...ing/blob/master/Real_Time_Voice_Cloning.ipynb
 
D

[deleted]

Guest
I think that if we're talking about shifting the actress voice, you could either have her saying something out of context (like in all the pre-deepfake fake porn videos when the heads were cut and pasted), or could add a sound tape from the said celeb's sex scene in a movie if there is something like that... otherwise, the debate seems wuite pointless to me
 

greywolf22

DF Vagrant
New Descript app has Overdub feature. Basically you add sound file, app gives it's transcription then you can generate voice(text to speech). Unfortunately this feature is in beta and they require voice authentication to access this it.
Anybody knows any hacker?

https://www.descript.com/

 

Insidious

DF Pleb
I find it funny (ironic) that people on THIS site think "deep voice" software is a pie in the sky fantasy. A few years ago there was no way for the average person to deep fake a movie. Now we have FREE software that automates the entire process on an average PC. Cloning a voice seems the logical next step. It's only a matter of time.
 

jarjarbinks

DF Vagrant
Definitely eager to see the future with these types of programs. It would be neat seeing a character like Elsa from Frozen have a voice like James Earl Jones. Though I can imagine the definite issues and how easy it would be to abuse it. Whether it be political figures or celebrities, a lot of people are going to be “cancelled”.
 

fsalgo

DF Vagrant
I was thinking about building an app to "deepfake" voices instead.
It was a good idea i think but hard to accomplish
 
Top