← All Models
Grok TTS
xAI's text-to-speech model with 169 expressive voices across 36 languages. Browse and filter the catalog by language, gender, tone, and use case via the voices API. Supports inline speech tags ([pause], [laugh], <whisper>) for fine-grained delivery control, plus the original Grok voices (eve, ara, rex, sal, leo).
Specifications
| Provider | xAI |
| Category | Voice |
Pricing (Starter tier)
1,950.00 credits / 1M input characters
Higher tiers get volume discounts. See tiers
Quick Start
curl -X POST https://query.genx.sh/api/v1/generate \\
-H "Authorization: Bearer YOUR_API_KEY" \\
-H "Content-Type: application/json" \\
-d '{"model":"grok-tts","params":{"text":"Hello, welcome to GenX. How can I help you today?","voice_id":"eve","language":"en","codec":"mp3"}}'
Parameters
| Name | Type | Required | Default | Description |
|---|---|---|---|---|
text |
string | required | — | The text to convert to speech (max 15,000 characters). Supports inline speech tags such as [pause], [laugh], and <whisper>text</whisper>. |
voice_id |
string | optional | eve |
Voice ID. Call GET /api/v1/models/grok-tts/voices to browse 169 voices across 36 languages, filterable by language, gender, tone, and use_case. Legacy IDs eve, ara, rex, sal, leo continue to work. |
language |
string | required | en |
BCP-47 language code (en, es, fr, de, ja, ko, hi, ar, ru, pt, vi, sv-se, and many more — match the voice's native language for best results). |
codec |
string | optional | mp3 |
Audio codec: mp3, wav, pcm, mulaw, alaw. |
sample_rate |
number | optional | 24000 |
Sample rate in Hz: 8000, 16000, 22050, 24000, 44100, 48000. |
bit_rate |
number | optional | 128000 |
Bit rate for MP3 only: 32000, 64000, 96000, 128000, 192000. |