Qwen3 TTS Voice Clone

No training required; a 10–20s clip creates a custom voice. Create the voice first, then synthesize with qwen3-tts-vc-realtime.

Voice clone

0/600
Cronologia
Output di esempio
Cherry
Sunny, upbeat, friendly young woman
Serena
Gentle, warm young woman
Ethan
Standard Mandarin with a slight northern accent; sunny, warm, energetic
Chelsie
Anime-style virtual girlfriend
Momo
Playful, cute, teasing tone
Vivian
Spunky, cute, a little feisty

Model overview

Voice cloning workflow

Provide a short clip, create a custom voice, then synthesize speech.

10–20s clip

Recommended 10–20s, max 60s.

Format & sample rate

WAV/MP3/M4A, ≥24kHz, mono, <10MB.

Clean speech

At least 3s continuous clear reading; no noise or singing.

Create then synthesize

Create the voice, then synthesize with the same target_model.

Synthesis examples (preset voices)

Preset-voice synthesis examples (not cloned); actual results depend on your input.

Synthesis example · Cherry

Listen
0:000:00

Synthesis example · Dylan

Listen
0:000:00

Voice clone FAQ

Key requirements and workflow questions.




Keep exploring

Want to try image/video generation?

Same interaction style and parameter design, with more models coming.