Voice generation

flossypossy · Jun 25, 2019

would be cool

lordquas · Jul 15, 2019

GhostTears said:
This has been brought up so many times and I have no Idea what eveyone is thinking is going to happen if this software actually existed. what, are you going to match the tone of every single moan and sloppy dick suck to perfectly match emma watsons particular accent and the one "OH YES"? Like get real.

FaceRipper · Jul 16, 2019

I am currently working on developing my own voice dataset of some person (i havent decided yet). I have researched it quite a bit so here is what I can tell you. The best program is tacotron-2 by Iperov on github. Tacotron is google's implementation of TTS speech deep learning. The github versions are clones of the idea.

To make a "good" voice you need around 20 hours of clean audio. Then you need to make a csv file that each row contains the name of the wav file (1-10sec), text of wave file, fully spelled out numbers and punctuation text of wave file. Deliminated with '|' character.

Iterations are slower (i believe) than image deepfakes, and it takes around 300-400k iterations to get a good clean sound that has no raspyness to it. The end result's manner of speech and inflections can be weird if the text entered doesn't correspond well to inflections in the source audio. So just like image deepfakes you have to limit the scope of data to get more quality results. Here is a twitch streamer named Forsen that a subscriber of his built Trump and Forsen voice models most likely with tacotron. You can hear it is pretty good.

FakesWizard · Jul 16, 2019

FaceRipper said:
I am currently working on developing my own voice dataset of some person (i havent decided yet). I have researched it quite a bit so here is what I can tell you. The best program is tacotron-2 by Iperov on github. Tacotron is google's implementation of TTS speech deep learning. The github versions are clones of the idea.

To make a "good" voice you need around 20 hours of clean audio. Then you need to make a csv file that each row contains the name of the wav file (1-10sec), text of wave file, fully spelled out numbers and punctuation text of wave file. Deliminated with '|' character.

Iterations are slower (i believe) than image deepfakes, and it takes around 300-400k iterations to get a good clean sound that has no raspyness to it. The end result's manner of speech and inflections can be weird if the text entered doesn't correspond well to inflections in the source audio. So just like image deepfakes you have to limit the scope of data to get more quality results. Here is a twitch streamer named Forsen that a subscriber of his built Trump and Forsen voice models most likely with tacotron. You can hear it is pretty good.

Does there exist a comprehensive tutorial on how to do this?

FaceRipper · Jul 16, 2019

FakesWizard said:
FaceRipper said:

I am currently working on developing my own voice dataset of some person (i havent decided yet). I have researched it quite a bit so here is what I can tell you. The best program is tacotron-2 by Iperov on github. Tacotron is google's implementation of TTS speech deep learning. The github versions are clones of the idea.

To make a "good" voice you need around 20 hours of clean audio. Then you need to make a csv file that each row contains the name of the wav file (1-10sec), text of wave file, fully spelled out numbers and punctuation text of wave file. Deliminated with '|' character.

Iterations are slower (i believe) than image deepfakes, and it takes around 300-400k iterations to get a good clean sound that has no raspyness to it. The end result's manner of speech and inflections can be weird if the text entered doesn't correspond well to inflections in the source audio. So just like image deepfakes you have to limit the scope of data to get more quality results. Here is a twitch streamer named Forsen that a subscriber of his built Trump and Forsen voice models most likely with tacotron. You can hear it is pretty good.

Click to expand...

Does there exist a comprehensive tutorial on how to do this?

Not yet as far as I know. The only good youtube video

Mr.BaBe · Sep 20, 2019

I come across Real-Time-Voice-Cloning-master.

a short youtube video shows how it works.

download link:
https://github.com/CorentinJ/Real-Time-Voice-Cloning

I ttried to install it, but it has a lot of depencies and I get stuck on installing webrtcvad, and coundt worked. According to video it seems decent work.

I will try other voice programs posted on different topic,
https://github.com/andabi/deep-voice-conversion
https://github.com/mazzzystar/randomCNN-voice-transfer
https://github.com/keithito/tacotron

try this
https://github.com/NVIDIA/tacotron2

anyone has able to use any of this or other program, that actually works and we can use on DF videos.

Mr.BaBe · Sep 21, 2019

does anyone knows which program they are using?

https://www.youtube.com/channel/UCID5qusrF32kSj-oSGq3rJg/featured

testerdumbu · Sep 22, 2019

stron01 said:
does anyone knows which program they are using?

https://www.youtube.com/channel/UCID5qusrF32kSj-oSGq3rJg/featured

This is amazing wow! They even got the cartoons speech impediment correct.

Has anyone done any deepfake audio yet that's believable yet? I don't know how anyone can gather an entire 20 hours for something it seems impossible even if they're on podcast you have to go back and type out every word they're saying? Or am I misunderstanding something here

Morfeus · Oct 8, 2019

Is someone dealing with this issue? I need a voice clone sofatware or a someone who can do it for me

Theking13 · Nov 18, 2019

Wanted to chime in.

So I played with Real time Voice Cloning using the Google using the Colab version someone made.

It's is very fast even through having to use Colab and uploading audio and stuff. But the generated voice sound like someone is talking through a muffled mic, sometimes you can hear words and sometimes you may get a decent sentence out.
Not sure if I'm doing something wrong or audio I gave was a issue. I tried different audio lengths from 10 seconds to many minutes ripped from video files and cleaning it up.

The best results I got still sounded like someone talking though a gaming headset mic.

The guy who implemented this said it needs more work that he can't focus on yet. So I think it may just need improvemts.

https://colab.research.google.com/g...ing/blob/master/Real_Time_Voice_Cloning.ipynb

[deleted] · Feb 23, 2020

I think that if we're talking about shifting the actress voice, you could either have her saying something out of context (like in all the pre-deepfake fake porn videos when the heads were cut and pasted), or could add a sound tape from the said celeb's sex scene in a movie if there is something like that... otherwise, the debate seems wuite pointless to me

Nickrus · Mar 24, 2020

That is right but it's pretty easy to find relevant voice parts so it's worth discussing

greywolf22 · Apr 3, 2020

New Descript app has Overdub feature. Basically you add sound file, app gives it's transcription then you can generate voice(text to speech). Unfortunately this feature is in beta and they require voice authentication to access this it.
Anybody knows any hacker?

https://www.descript.com/

Insidious · May 19, 2020

I find it funny (ironic) that people on THIS site think "deep voice" software is a pie in the sky fantasy. A few years ago there was no way for the average person to deep fake a movie. Now we have FREE software that automates the entire process on an average PC. Cloning a voice seems the logical next step. It's only a matter of time.

jarjarbinks · May 20, 2020

Definitely eager to see the future with these types of programs. It would be neat seeing a character like Elsa from Frozen have a voice like James Earl Jones. Though I can imagine the definite issues and how easy it would be to abuse it. Whether it be political figures or celebrities, a lot of people are going to be “cancelled”.

manusia6 · May 29, 2020

would be cool if we can actually do such stuff

fsalgo · Mar 2, 2021

I was thinking about building an app to "deepfake" voices instead.
It was a good idea i think but hard to accomplish

fartmcmuff · Mar 6, 2022

have at em pervs

Voice generation

flossypossy

DF Vagrant

lordquas

DF Pleb

FaceRipper

DF Admirer

FakesWizard

DF Vagrant

FaceRipper

DF Admirer

Mr.BaBe

DF Enthusiast

Mr.BaBe

DF Enthusiast

testerdumbu

DF Pleb

Morfeus

DF Vagrant

Theking13

DF Vagrant

[deleted]

Guest

Nickrus

DF Admirer

greywolf22

DF Vagrant

Insidious

DF Pleb

jarjarbinks

DF Vagrant

manusia6

Guest

fsalgo

DF Vagrant

fartmcmuff

DF Vagrant