| Duration | 1.5 hours |
| Day | 2 of 2 |
Learning Objectives
By the end of this module, students will be able to:
- Select appropriate TTS engines and voices
- Configure language settings for different locales
- Implement filler phrases for natural conversation
- Add pronunciation rules for correct TTS output
- Test voice configurations effectively
Topics
1. Text-to-Speech Engines (25 min)
Available TTS Engines
| Provider | Engine Code | Example Voice | Reference |
|---|---|---|---|
| Amazon Polly | amazon | amazon.Joanna-Neural | Voice IDs |
| Cartesia | cartesia | cartesia.a167e0f3-df7e-4d52-a9c3-f949145efdab | Voice IDs |
| Deepgram | deepgram | deepgram.aura-asteria-en | Voice IDs |
| ElevenLabs | elevenlabs | elevenlabs.thomas | Voice IDs |
| Google Cloud | gcloud | gcloud.en-US-Casual-K | Voice IDs |
| Microsoft Azure | azure | azure.en-US-AvaNeural | Voice IDs |
| OpenAI | openai | openai.alloy | Voice IDs |
| Rime | rime | rime.luna:arcana | Voice IDs |
Pricing: Rime, Cartesia, and ElevenLabs are premium, usage-based providers. Check your SignalWire dashboard for pricing details.
Voice Selection
# Rime voices
agent.add_language("English", "en-US", "rime.spore")
# Google Cloud voices
agent.add_language("English", "en-US", "gcloud.en-US-Casual-K")
# Amazon Polly voices
agent.add_language("English", "en-US", "amazon.Joanna-Neural")
# ElevenLabs voices
agent.add_language("English", "en-US", "elevenlabs.thomas")
Popular Rime Voices
| Voice | Style | Best For |
|---|---|---|
rime.spore | Professional, clear | Business, support |
rime.marsh | Warm, friendly | Hospitality, sales |
rime.cove | Calm, measured | Healthcare, finance |
rime.brook | Energetic, upbeat | Marketing, entertainment |
Choosing the Right Voice
Consider:
- Brand alignment - Does the voice match your brand?
- Use case - Support needs different voice than sales
- Clarity - Can users understand clearly?
- Fatigue - How does it sound over long calls?
2. Language Configuration (20 min)
Basic Language Setup
agent.add_language(
name="English", # Display name
code="en-US", # BCP-47 language code
voice="rime.spore" # TTS voice
)
Common Language Codes
| Language | Code | Example Voice |
|---|---|---|
| English (US) | en-US | rime.spore |
| English (UK) | en-GB | gcloud.en-GB-Neural2-A |
| Spanish (US) | es-US | gcloud.es-US-Neural2-A |
| Spanish (MX) | es-MX | gcloud.es-MX-Neural2-A |
| French (CA) | fr-CA | gcloud.fr-CA-Neural2-A |
| French (FR) | fr-FR | gcloud.fr-FR-Neural2-A |
| German | de-DE | gcloud.de-DE-Neural2-A |
| Portuguese (BR) | pt-BR | gcloud.pt-BR-Neural2-A |
| Mandarin | zh-CN | gcloud.cmn-CN-Neural2-A |
| Japanese | ja-JP | gcloud.ja-JP-Neural2-A |
Multi-Language Agents
# Primary language first
agent.add_language("English", "en-US", "rime.spore")
# Additional languages
agent.add_language("Spanish", "es-US", "gcloud.es-US-Neural2-A")
agent.add_language("French", "fr-CA", "gcloud.fr-CA-Neural2-A")
The AI will detect language and switch automatically!
3. Filler Phrases (25 min)
What Are Fillers?
Fillers are phrases spoken while the AI is processing or waiting. There are two types:
| Type | Purpose | Example |
|---|---|---|
| Speech fillers | Natural hesitation words during pauses | “Um”, “Uh”, “Well”, “Let me think” |
| Function fillers | Phrases while executing functions | “One moment please”, “Let me check that” |
They make the conversation feel natural.
Adding Language-Level Fillers
Important: You must provide BOTH speech_fillers AND function_fillers together. If you only provide one, the SDK falls back to a deprecated format.
agent.add_language(
"English",
"en-US",
"rime.spore",
speech_fillers=["Um", "Uh", "Well", "Let me think"],
function_fillers=[
"One moment please...",
"Let me look into that...",
"Sure, checking now..."
]
)
Context-Appropriate Fillers
Professional/Formal:
agent.add_language(
"English", "en-US", "rime.spore",
speech_fillers=["Well", "Let me see", "One moment"],
function_fillers=[
"One moment please...",
"Allow me to check...",
"I'll look into that for you..."
]
)
Casual/Friendly:
agent.add_language(
"English", "en-US", "rime.marsh",
speech_fillers=["Um", "So", "Let's see"],
function_fillers=[
"Sure thing, checking now!",
"One sec...",
"Got it, looking now..."
]
)
Healthcare/Sensitive:
agent.add_language(
"English", "en-US", "rime.cove",
speech_fillers=["I see", "Of course", "Certainly"],
function_fillers=[
"I understand, let me help with that...",
"Of course, checking your information now...",
"I'm looking into this for you..."
]
)
Function-Specific Fillers
Set fillers for specific functions that take time. This uses a separate fillers parameter on the @tool decorator (not the same as language-level fillers):
@agent.tool(
description="Look up order status",
fillers=[
"Looking up your order now...",
"Checking the system for your order...",
"One moment while I find that..."
]
)
def get_order_status(order_id: str):
# Takes a few seconds
return f"Order {order_id} shipped yesterday."
Note: Function-level
fillerson@toolare different from language-levelspeech_fillers/function_fillers. Function fillers override language fillers for that specific function.
4. Speech Recognition Optimization (15 min)
Language Code Importance
The language code affects speech recognition accuracy:
# US English - recognizes American accents better
agent.add_language("English", "en-US", "rime.spore")
# UK English - recognizes British accents better
agent.add_language("English", "en-GB", "rime.cove")
Hints for Better Recognition
Add hints for domain-specific terms:
agent.set_hints([
"Acme", # Company name
"TechCorp", # Partner name
"SKU", # Industry term
"API", # Technical term
"A B C 1 2 3" # Common format
])
We’ll cover hints in detail in Module 1.7.
5. Pronunciation Rules (15 min)
What Are Pronunciation Rules?
Pronunciation rules tell the TTS engine how to say specific words correctly. This is essential for:
- Acronyms - “API” should be “A P I” not “appy”
- Brand names - “SignalWire” should be “Signal Wire”
- Technical terms - “PostgreSQL” should be “Postgres Q L”
- Industry jargon - “SaaS” should be “sass” not “S A A S”
Adding Pronunciation Rules
# Spell out acronyms letter by letter
agent.add_pronunciation("API", "A P I")
agent.add_pronunciation("SDK", "S D K")
agent.add_pronunciation("CLI", "C L I")
# Pronounce as a word
agent.add_pronunciation("SIP", "sip", ignore_case=True)
agent.add_pronunciation("VoIP", "voyp")
# Brand names
agent.add_pronunciation("SignalWire", "Signal Wire")
agent.add_pronunciation("PostgreSQL", "Postgres Q L")
The ignore_case Parameter
By default, pronunciation matching is case-sensitive. Use ignore_case=True for flexible matching:
# Only matches "API" exactly
agent.add_pronunciation("API", "A P I")
# Matches "api", "Api", "API", etc.
agent.add_pronunciation("api", "A P I", ignore_case=True)
Common Pronunciation Patterns
| Term | Pronunciation | Notes |
|---|---|---|
API | A P I | Spell out |
SDK | S D K | Spell out |
HTTP | H T T P | Spell out |
URL | U R L | Spell out |
SIP | sip | Say as word |
VoIP | voyp | Say as word |
SQL | sequel or S Q L | Either works |
GIF | gif or jif | Your choice! |
Bulk Pronunciation Setup
For many rules, use set_pronunciations():
agent.set_pronunciations([
{"replace": "API", "with": "A P I"},
{"replace": "SDK", "with": "S D K"},
{"replace": "HTTP", "with": "H T T P", "ignore_case": True},
{"replace": "SignalWire", "with": "Signal Wire"}
])
Pronunciation vs Hints
| Feature | Purpose | Example |
|---|---|---|
| Hints | Help ASR hear words correctly | agent.set_hints(["SignalWire"]) |
| Pronunciation | Help TTS say words correctly | agent.add_pronunciation("API", "A P I") |
Use both together for best results:
# Help recognize AND pronounce correctly
agent.set_hints(["SignalWire", "API", "SDK"])
agent.add_pronunciation("API", "A P I")
agent.add_pronunciation("SDK", "S D K")
agent.add_pronunciation("SignalWire", "Signal Wire")
Pattern Hints (ASR Replacement)
Sometimes users say something that should be interpreted as something else. Pattern hints let you replace what the ASR heard with what you want the AI to receive.
This is the reverse of pronunciation - instead of changing how words are spoken, you’re changing how heard words are interpreted.
# "swimmel" -> "SWML" (common mispronunciation)
agent.add_pattern_hint(
hint="SWML", # What to help recognize
pattern="swimmel", # What users might say
replace="SWML", # What to send to the AI
ignore_case=True
)
# "swig" or "schwaig" -> "SWAIG"
agent.add_pattern_hint(
hint="SWAIG",
pattern="swig|schwaig",
replace="SWAIG",
ignore_case=True
)
# Phone number format normalization
agent.add_pattern_hint(
hint="phone format",
pattern=r"(\d{3})\s*(\d{3})\s*(\d{4})",
replace=r"(\1) \2-\3",
ignore_case=False
)
Complete Speech Optimization Example
# 1. Hints: Help ASR recognize domain terms
agent.set_hints(["SWML", "SWAIG", "SignalWire", "API"])
# 2. Pattern hints: Fix common mishearings
agent.add_pattern_hint("SWML", "swimmel", "SWML", ignore_case=True)
agent.add_pattern_hint("SWAIG", "swig", "SWAIG", ignore_case=True)
# 3. Pronunciation: Help TTS say terms correctly
agent.add_pronunciation("API", "A P I")
agent.add_pronunciation("SWML", "swimmel") # Say it phonetically
agent.add_pronunciation("SignalWire", "Signal Wire")
Summary:
- Hints → Help ASR recognize words
- Pattern Hints → Transform what was heard into something else
- Pronunciation → Help TTS speak words correctly
6. Testing Voice Configuration (15 min)
Using swaig-test
# View language configuration in SWML
swaig-test agent.py --dump-swml | grep -A 20 '"languages"'
Expected Output
"languages": [
{
"name": "English",
"code": "en-US",
"voice": "rime.spore",
"speech_fillers": ["Um", "Uh", "Well"],
"function_fillers": [
"One moment please...",
"Let me check..."
]
}
]
Note: If you only see a
fillersfield (notspeech_fillersandfunction_fillers), you need to provide both filler types inadd_language().
Live Testing Checklist
When testing with real calls:
- Voice is clear and understandable
- Pronunciation of company/product names is correct
- Fillers play at appropriate times
- Language matches user’s speech
- No awkward pauses
Configuration Patterns
Professional Support Agent
agent.add_language(
"English",
"en-US",
"rime.spore", # Professional voice
speech_fillers=["Well", "Let me see", "One moment"],
function_fillers=[
"One moment please...",
"Let me check on that for you...",
"I'm looking into this now..."
]
)
Friendly Sales Agent
agent.add_language(
"English",
"en-US",
"rime.marsh", # Warm, friendly voice
speech_fillers=["Um", "So", "Oh"],
function_fillers=[
"Great question! Let me check...",
"Let me find that for you...",
"Sure thing, looking now!",
"Absolutely, checking now..."
]
)
Healthcare Information Line
agent.add_language(
"English",
"en-US",
"rime.cove", # Calm, measured voice
speech_fillers=["I see", "Of course", "Certainly"],
function_fillers=[
"I understand, let me help...",
"Of course, one moment...",
"I'm checking that information now..."
]
)
# Spanish option
agent.add_language(
"Spanish",
"es-US",
"gcloud.es-US-Neural2-A",
speech_fillers=["Bueno", "A ver", "Pues"],
function_fillers=[
"Un momento por favor...",
"Déjeme verificar...",
"Estoy buscando esa información..."
]
)
Common Mistakes
1. Wrong Language Code Format
Wrong:
agent.add_language("English", "english-us", "rime.spore")
Right:
agent.add_language("English", "en-US", "rime.spore")
2. Only Providing One Filler Type
Wrong (falls back to deprecated fillers field):
agent.add_language("English", "en-US", "rime.spore",
speech_fillers=["Um", "Uh", "Well"])
# Missing function_fillers!
Right (both types required):
agent.add_language("English", "en-US", "rime.spore",
speech_fillers=["Um", "Uh", "Well"],
function_fillers=["One moment please...", "Let me check..."])
3. Too Many Fillers
Wrong:
speech_fillers=[
"Um...", "Uh...", "Well...", "So...",
"Let's see...", "Hmm...", # 20 more...
]
Keep it to 3-5 appropriate fillers per type.
4. Mismatched Tone
Wrong:
# Professional prompt but casual fillers
agent.prompt_add_section("Role", "You are a formal legal assistant.")
agent.add_language("English", "en-US", "rime.spore",
speech_fillers=["Yo", "Like", "Ya know"],
function_fillers=["Sure thing!", "No prob!", "Gotcha!"])
Key Takeaways
- Voice selection matters - Match voice to brand and use case
- Language codes enable ASR - Correct codes improve recognition
- Fillers create naturalness - Fill processing gaps
- Pronunciation ensures clarity - Help TTS say acronyms and names correctly
- Test with real calls - swaig-test shows config, calls show experience
- Consistency is key - Voice, fillers, and prompts should align
Preparation for Lab 1.6
- Working agent with prompts configured
- Think about your agent’s “voice personality”
- Consider what fillers fit your use case
Lab Preview
In Lab 1.6, you will:
- Select and configure a voice
- Add appropriate filler phrases
- Test voice output
- Optionally add a second language