Qwen3 TTS Voice Clone

No training required; a 10–20s clip creates a custom voice. Create the voice first, then synthesize with qwen3-tts-vc-realtime.

My voices

Text to synthesize0/600

History

Sample outputs

Cherry

Sunny, upbeat, friendly young woman

Serena

Gentle, warm young woman

Ethan

Standard Mandarin with a slight northern accent; sunny, warm, energetic

Chelsie

Anime-style virtual girlfriend

Momo

Playful, cute, teasing tone

Vivian

Spunky, cute, a little feisty

Ещё инструменты для изображений

Смотреть все

Voice cloning workflow

Provide a short clip, create a custom voice, then synthesize speech.

10–20s clip

Recommended 10–20s, max 60s.

Format & sample rate

WAV/MP3/M4A, ≥24kHz, mono, <10MB.

Clean speech

At least 3s continuous clear reading; no noise or singing.

Create then synthesize

Create the voice, then synthesize with the same target_model.

Synthesis examples (preset voices)

Preset-voice synthesis examples (not cloned); actual results depend on your input.

Synthesis example · Cherry

Listen

0:000:00

Synthesis example · Dylan

Listen

0:000:00

Voice clone FAQ

Key requirements and workflow questions.

Keep exploring

Want to try image/video generation?

Same interaction style and parameter design, with more models coming.

Image generator Video generator