Voice Cloning: New Open Source Model - zero shot "Metavoice" (Only TTS free but sounds like S2S is possible but not public)

666VR999 · Feb 12, 2024

MetaVoice-1B is a 1.2B parameter base model trained on 100K hours of speech for TTS (text-to-speech). It has been built with the following priorities:

Emotional speech rhythm and tone in English. No hallucinations.
Zero-shot cloning for American & British voices, with 30s reference audio.
Support for (cross-lingual) voice cloning with finetuning.

We have had success with as little as 1 minute training data for Indian speakers.
Support for long-form synthesis.

Got really good clone results with 1 minute clips, but only generates output from Text, which isn't where most of the fun is, you want the version that copies the emotion from the original speech - but still might work for JOI stuff potentially. I also think speech-to-speech is possible but behind a paywall:

TTS by MetaVoice

MetaVoice - Text to Speech & AI Voice Changer

Emotive, human-like speech at scale in any voice or style. Perfect for content creators, developers, and businesses. Use text to speech to voice content for your videos, brands, characters, or AI agents. Alternatively, use our AI voice changer to transform your voice to a different style, whilst...

themetavoice.xyz

You can also download the ~5gb model but the python training stuff needs a module called flash-attention which only works on Linux. Maybe some of the Linux Subsystem for Windows folks will be able to get along with it. I'd love to dabble with the python and figure out how to change the input from TTS to STS, as clone result was good, but typing out the text or autotranscribing from a video is going to lead to less emotion and sync going out.

GitHub - metavoiceio/metavoice-src: Foundational model for human-like, expressive TTS

Foundational model for human-like, expressive TTS. Contribute to metavoiceio/metavoice-src development by creating an account on GitHub.

github.com

Voice Cloning: New Open Source Model - zero shot "Metavoice" (Only TTS free but sounds like S2S is possible but not public)

666VR999

DF Enthusiast

MetaVoice - Text to Speech & AI Voice Changer

GitHub - metavoiceio/metavoice-src: Foundational model for human-like, expressive TTS