MiMo v2 TTS

Generate high-quality speech from text using the latest MiMo v2 TTS API. Support styles, voice selection, and emotions.

561

Total Visits

MIMO API Key

User Context (Optional)

Voice

mimo_default (Default)

Voice

Style (Optional)

Text to Synthesize (Assistant)

About MiMo v2 TTS

This online tool is powered by the latest MiMo v2 TTS (Text-to-Speech) model released by Xiaomi, capable of automatically converting input text into highly natural and fluent speech. You can generate vivid, expressive voice content by configuring speech styles and inserting fine-grained audio tags.

⚠️ Disclaimer: In order to bring this tool to you quickly, it was built fast and might have edge-case bugs. If you experience issues or have feature requests, please feel free to raise them!

🔗 Quick Links

🔑 Apply for MIMO API Key (Console)
📖 Official Speech Synthesis API Docs
💰 Billing: Currently free for a limited time.

🌟 Configuration Guide

1. API Key Application & Security

Before using this tool, you must provide a valid MIMO API Key.

How to apply: Visit the Xiaomi MiMo Console to register and generate your unique Key.
🔒 Privacy Guarantee: All API calls from this website are made directly from your browser to the official servers. We will NEVER record, collect, or upload your API Key. If you are still concerned, you can delete or revoke the key in the console after using the tool.

2. Voice Selection (Built-in Voices)

You can choose an official pre-set voice from the dropdown:

mimo_default: MiMo-Default
default_zh: MiMo-Chinese Female Voice
default_en: MiMo-English Female Voice (Note: Voice cloning is currently not supported by the API)

3. Overall Speech Style Control

Input your desired emotion or dialect into the "Style" input box. The tool will automatically prepend it as <style>Your Style</style> to the target content. You can even combine styles separated by spaces!

Supported styles include but are not limited to:

Speech Rate: Speed up / Slow down
Emotions: Happy / Sad / Angry
Roles: Sun Wukong / Lin Daiyu
Style Change: Whisper / Clamped voice / Taiwanese accent / Singing
Dialects: Northeastern dialect / Sichuan dialect / Henan dialect / Cantonese

Examples:

<style>Happy</style>Tomorrow is Friday, so happy!
<style>Whisper</style>Oh my goodness, it's so cold today! You know that wind, it's howling like a knife!
(Note: To achieve the best singing style, you must add ONLY <style>唱歌</style> at the very beginning of the target text).

4. Fine-grained Audio Tags

Through inline Audio Tags, you can exercise fine-grained control to precisely adjust tone, emotion, and expression style—whether it's a whisper, a hearty laugh, or inserting breaths, pauses, and coughs. Insert them directly into the target text. Examples:

Achoo! Ahem. I—I really [cough] think I am coming down with a terrible [cough] terrible cold.
[heavy breathing] Just... give me... a second.
It's just so stupid! (sobbing) he just ate the whole thing in one bite!

5. Roles: User Context vs Assistant Text

Assistant Text (Required): The target text for speech synthesis MUST be placed in an assistant role message. This field is the actual speech audio that will be generated.
User Context (Optional): Provides a background conversational context for the TTS engine. It helps the TTS model adapt a suitable tone in response to the user's input.

Comments(0)

Please login to participate

No comments yet. Be the first!

Sponsor Us

More Tools