Module 1.6: Voice and Language Configuration


Duration	1.5 hours
Day	2 of 2

Learning Objectives

By the end of this module, students will be able to:

Select appropriate TTS engines and voices
Configure language settings for different locales
Implement filler phrases for natural conversation
Add pronunciation rules for correct TTS output
Test voice configurations effectively

Topics

1. Text-to-Speech Engines (25 min)

Available TTS Engines

Provider	Engine Code	Example Voice	Reference
Amazon Polly	`amazon`	`amazon.Joanna-Neural`	Voice IDs
Cartesia	`cartesia`	`cartesia.a167e0f3-df7e-4d52-a9c3-f949145efdab`	Voice IDs
Deepgram	`deepgram`	`deepgram.aura-asteria-en`	Voice IDs
ElevenLabs	`elevenlabs`	`elevenlabs.thomas`	Voice IDs
Google Cloud	`gcloud`	`gcloud.en-US-Casual-K`	Voice IDs
Microsoft Azure	`azure`	`azure.en-US-AvaNeural`	Voice IDs
OpenAI	`openai`	`openai.alloy`	Voice IDs
Rime	`rime`	`rime.luna:arcana`	Voice IDs

Pricing: Rime, Cartesia, and ElevenLabs are premium, usage-based providers. Check your SignalWire dashboard for pricing details.

Voice Selection

# Rime voices
agent.add_language("English", "en-US", "rime.spore")

# Google Cloud voices
agent.add_language("English", "en-US", "gcloud.en-US-Casual-K")

# Amazon Polly voices
agent.add_language("English", "en-US", "amazon.Joanna-Neural")

# ElevenLabs voices
agent.add_language("English", "en-US", "elevenlabs.thomas")

Popular Rime Voices

Voice	Style	Best For
`rime.spore`	Professional, clear	Business, support
`rime.marsh`	Warm, friendly	Hospitality, sales
`rime.cove`	Calm, measured	Healthcare, finance
`rime.brook`	Energetic, upbeat	Marketing, entertainment

Choosing the Right Voice

Consider:

Brand alignment - Does the voice match your brand?
Use case - Support needs different voice than sales
Clarity - Can users understand clearly?
Fatigue - How does it sound over long calls?

2. Language Configuration (20 min)

Basic Language Setup

agent.add_language(
    name="English",        # Display name
    code="en-US",         # BCP-47 language code
    voice="rime.spore"    # TTS voice
)

Common Language Codes

Language	Code	Example Voice
English (US)	`en-US`	`rime.spore`
English (UK)	`en-GB`	`gcloud.en-GB-Neural2-A`
Spanish (US)	`es-US`	`gcloud.es-US-Neural2-A`
Spanish (MX)	`es-MX`	`gcloud.es-MX-Neural2-A`
French (CA)	`fr-CA`	`gcloud.fr-CA-Neural2-A`
French (FR)	`fr-FR`	`gcloud.fr-FR-Neural2-A`
German	`de-DE`	`gcloud.de-DE-Neural2-A`
Portuguese (BR)	`pt-BR`	`gcloud.pt-BR-Neural2-A`
Mandarin	`zh-CN`	`gcloud.cmn-CN-Neural2-A`
Japanese	`ja-JP`	`gcloud.ja-JP-Neural2-A`

Multi-Language Agents

# Primary language first
agent.add_language("English", "en-US", "rime.spore")

# Additional languages
agent.add_language("Spanish", "es-US", "gcloud.es-US-Neural2-A")
agent.add_language("French", "fr-CA", "gcloud.fr-CA-Neural2-A")

The AI will detect language and switch automatically!

3. Filler Phrases (25 min)

What Are Fillers?

Fillers are phrases spoken while the AI is processing or waiting. There are two types:

Type	Purpose	Example
Speech fillers	Natural hesitation words during pauses	“Um”, “Uh”, “Well”, “Let me think”
Function fillers	Phrases while executing functions	“One moment please”, “Let me check that”

They make the conversation feel natural.

Adding Language-Level Fillers

Important: You must provide BOTH speech_fillers AND function_fillers together. If you only provide one, the SDK falls back to a deprecated format.

agent.add_language(
    "English",
    "en-US",
    "rime.spore",
    speech_fillers=["Um", "Uh", "Well", "Let me think"],
    function_fillers=[
        "One moment please...",
        "Let me look into that...",
        "Sure, checking now..."
    ]
)

Context-Appropriate Fillers

Professional/Formal:

agent.add_language(
    "English", "en-US", "rime.spore",
    speech_fillers=["Well", "Let me see", "One moment"],
    function_fillers=[
        "One moment please...",
        "Allow me to check...",
        "I'll look into that for you..."
    ]
)

Casual/Friendly:

agent.add_language(
    "English", "en-US", "rime.marsh",
    speech_fillers=["Um", "So", "Let's see"],
    function_fillers=[
        "Sure thing, checking now!",
        "One sec...",
        "Got it, looking now..."
    ]
)

Healthcare/Sensitive:

agent.add_language(
    "English", "en-US", "rime.cove",
    speech_fillers=["I see", "Of course", "Certainly"],
    function_fillers=[
        "I understand, let me help with that...",
        "Of course, checking your information now...",
        "I'm looking into this for you..."
    ]
)

Function-Specific Fillers

Set fillers for specific functions that take time. This uses a separate fillers parameter on the @tool decorator (not the same as language-level fillers):

@agent.tool(
    description="Look up order status",
    fillers=[
        "Looking up your order now...",
        "Checking the system for your order...",
        "One moment while I find that..."
    ]
)
def get_order_status(order_id: str):
    # Takes a few seconds
    return f"Order {order_id} shipped yesterday."

Note: Function-level fillers on @tool are different from language-level speech_fillers/function_fillers. Function fillers override language fillers for that specific function.

4. Speech Recognition Optimization (15 min)

Language Code Importance

The language code affects speech recognition accuracy:

# US English - recognizes American accents better
agent.add_language("English", "en-US", "rime.spore")

# UK English - recognizes British accents better
agent.add_language("English", "en-GB", "rime.cove")

Hints for Better Recognition

Add hints for domain-specific terms:

agent.set_hints([
    "Acme",              # Company name
    "TechCorp",          # Partner name
    "SKU",               # Industry term
    "API",               # Technical term
    "A B C 1 2 3"        # Common format
])

We’ll cover hints in detail in Module 1.7.

5. Pronunciation Rules (15 min)

What Are Pronunciation Rules?

Pronunciation rules tell the TTS engine how to say specific words correctly. This is essential for:

Acronyms - “API” should be “A P I” not “appy”
Brand names - “SignalWire” should be “Signal Wire”
Technical terms - “PostgreSQL” should be “Postgres Q L”
Industry jargon - “SaaS” should be “sass” not “S A A S”

Adding Pronunciation Rules

# Spell out acronyms letter by letter
agent.add_pronunciation("API", "A P I")
agent.add_pronunciation("SDK", "S D K")
agent.add_pronunciation("CLI", "C L I")

# Pronounce as a word
agent.add_pronunciation("SIP", "sip", ignore_case=True)
agent.add_pronunciation("VoIP", "voyp")

# Brand names
agent.add_pronunciation("SignalWire", "Signal Wire")
agent.add_pronunciation("PostgreSQL", "Postgres Q L")

The `ignore_case` Parameter

By default, pronunciation matching is case-sensitive. Use ignore_case=True for flexible matching:

# Only matches "API" exactly
agent.add_pronunciation("API", "A P I")

# Matches "api", "Api", "API", etc.
agent.add_pronunciation("api", "A P I", ignore_case=True)

Common Pronunciation Patterns

Term	Pronunciation	Notes
`API`	`A P I`	Spell out
`SDK`	`S D K`	Spell out
`HTTP`	`H T T P`	Spell out
`URL`	`U R L`	Spell out
`SIP`	`sip`	Say as word
`VoIP`	`voyp`	Say as word
`SQL`	`sequel` or `S Q L`	Either works
`GIF`	`gif` or `jif`	Your choice!

Bulk Pronunciation Setup

For many rules, use set_pronunciations():

agent.set_pronunciations([
    {"replace": "API", "with": "A P I"},
    {"replace": "SDK", "with": "S D K"},
    {"replace": "HTTP", "with": "H T T P", "ignore_case": True},
    {"replace": "SignalWire", "with": "Signal Wire"}
])

Pronunciation vs Hints

Feature	Purpose	Example
Hints	Help ASR hear words correctly	`agent.set_hints(["SignalWire"])`
Pronunciation	Help TTS say words correctly	`agent.add_pronunciation("API", "A P I")`

Use both together for best results:

# Help recognize AND pronounce correctly
agent.set_hints(["SignalWire", "API", "SDK"])
agent.add_pronunciation("API", "A P I")
agent.add_pronunciation("SDK", "S D K")
agent.add_pronunciation("SignalWire", "Signal Wire")

Pattern Hints (ASR Replacement)

Sometimes users say something that should be interpreted as something else. Pattern hints let you replace what the ASR heard with what you want the AI to receive.

This is the reverse of pronunciation - instead of changing how words are spoken, you’re changing how heard words are interpreted.

# "swimmel" -> "SWML" (common mispronunciation)
agent.add_pattern_hint(
    hint="SWML",           # What to help recognize
    pattern="swimmel",     # What users might say
    replace="SWML",        # What to send to the AI
    ignore_case=True
)

# "swig" or "schwaig" -> "SWAIG"
agent.add_pattern_hint(
    hint="SWAIG",
    pattern="swig|schwaig",
    replace="SWAIG",
    ignore_case=True
)

# Phone number format normalization
agent.add_pattern_hint(
    hint="phone format",
    pattern=r"(\d{3})\s*(\d{3})\s*(\d{4})",
    replace=r"(\1) \2-\3",
    ignore_case=False
)

Complete Speech Optimization Example

# 1. Hints: Help ASR recognize domain terms
agent.set_hints(["SWML", "SWAIG", "SignalWire", "API"])

# 2. Pattern hints: Fix common mishearings
agent.add_pattern_hint("SWML", "swimmel", "SWML", ignore_case=True)
agent.add_pattern_hint("SWAIG", "swig", "SWAIG", ignore_case=True)

# 3. Pronunciation: Help TTS say terms correctly
agent.add_pronunciation("API", "A P I")
agent.add_pronunciation("SWML", "swimmel")  # Say it phonetically
agent.add_pronunciation("SignalWire", "Signal Wire")

Summary:

Hints → Help ASR recognize words

Pattern Hints → Transform what was heard into something else

Pronunciation → Help TTS speak words correctly

6. Testing Voice Configuration (15 min)

Using swaig-test

# View language configuration in SWML
swaig-test agent.py --dump-swml | grep -A 20 '"languages"'

Expected Output

"languages": [
  {
    "name": "English",
    "code": "en-US",
    "voice": "rime.spore",
    "speech_fillers": ["Um", "Uh", "Well"],
    "function_fillers": [
      "One moment please...",
      "Let me check..."
    ]
  }
]

Note: If you only see a fillers field (not speech_fillers and function_fillers), you need to provide both filler types in add_language().

Live Testing Checklist

When testing with real calls:

Voice is clear and understandable
Pronunciation of company/product names is correct
Fillers play at appropriate times
Language matches user’s speech
No awkward pauses

Configuration Patterns

Professional Support Agent

agent.add_language(
    "English",
    "en-US",
    "rime.spore",  # Professional voice
    speech_fillers=["Well", "Let me see", "One moment"],
    function_fillers=[
        "One moment please...",
        "Let me check on that for you...",
        "I'm looking into this now..."
    ]
)

Friendly Sales Agent

agent.add_language(
    "English",
    "en-US",
    "rime.marsh",  # Warm, friendly voice
    speech_fillers=["Um", "So", "Oh"],
    function_fillers=[
        "Great question! Let me check...",
        "Let me find that for you...",
        "Sure thing, looking now!",
        "Absolutely, checking now..."
    ]
)

Healthcare Information Line

agent.add_language(
    "English",
    "en-US",
    "rime.cove",  # Calm, measured voice
    speech_fillers=["I see", "Of course", "Certainly"],
    function_fillers=[
        "I understand, let me help...",
        "Of course, one moment...",
        "I'm checking that information now..."
    ]
)

# Spanish option
agent.add_language(
    "Spanish",
    "es-US",
    "gcloud.es-US-Neural2-A",
    speech_fillers=["Bueno", "A ver", "Pues"],
    function_fillers=[
        "Un momento por favor...",
        "Déjeme verificar...",
        "Estoy buscando esa información..."
    ]
)

Common Mistakes

1. Wrong Language Code Format

Wrong:

agent.add_language("English", "english-us", "rime.spore")

Right:

agent.add_language("English", "en-US", "rime.spore")

2. Only Providing One Filler Type

Wrong (falls back to deprecated fillers field):

agent.add_language("English", "en-US", "rime.spore",
    speech_fillers=["Um", "Uh", "Well"])
# Missing function_fillers!

Right (both types required):

agent.add_language("English", "en-US", "rime.spore",
    speech_fillers=["Um", "Uh", "Well"],
    function_fillers=["One moment please...", "Let me check..."])

3. Too Many Fillers

Wrong:

speech_fillers=[
    "Um...", "Uh...", "Well...", "So...",
    "Let's see...", "Hmm...", # 20 more...
]

Keep it to 3-5 appropriate fillers per type.

4. Mismatched Tone

Wrong:

# Professional prompt but casual fillers
agent.prompt_add_section("Role", "You are a formal legal assistant.")
agent.add_language("English", "en-US", "rime.spore",
    speech_fillers=["Yo", "Like", "Ya know"],
    function_fillers=["Sure thing!", "No prob!", "Gotcha!"])

Key Takeaways

Voice selection matters - Match voice to brand and use case
Language codes enable ASR - Correct codes improve recognition
Fillers create naturalness - Fill processing gaps
Pronunciation ensures clarity - Help TTS say acronyms and names correctly
Test with real calls - swaig-test shows config, calls show experience
Consistency is key - Voice, fillers, and prompts should align

Preparation for Lab 1.6

Working agent with prompts configured
Think about your agent’s “voice personality”
Consider what fillers fit your use case

Lab Preview

In Lab 1.6, you will:

Select and configure a voice
Add appropriate filler phrases
Test voice output
Optionally add a second language

Next: Module 1.7 - SWAIG Functions Basics

Learning Objectives

Topics

1. Text-to-Speech Engines (25 min)

Available TTS Engines

Voice Selection

Popular Rime Voices

Choosing the Right Voice

2. Language Configuration (20 min)

Basic Language Setup

Common Language Codes

Multi-Language Agents

3. Filler Phrases (25 min)

What Are Fillers?

Adding Language-Level Fillers

Context-Appropriate Fillers

Function-Specific Fillers

4. Speech Recognition Optimization (15 min)

Language Code Importance

Hints for Better Recognition

5. Pronunciation Rules (15 min)

What Are Pronunciation Rules?

Adding Pronunciation Rules

The ignore_case Parameter

Common Pronunciation Patterns

Bulk Pronunciation Setup

Pronunciation vs Hints

Pattern Hints (ASR Replacement)

Complete Speech Optimization Example

6. Testing Voice Configuration (15 min)

Using swaig-test

Expected Output

Live Testing Checklist

Configuration Patterns

Professional Support Agent

Friendly Sales Agent

Healthcare Information Line

Common Mistakes

1. Wrong Language Code Format

2. Only Providing One Filler Type

3. Too Many Fillers

4. Mismatched Tone

Key Takeaways

Preparation for Lab 1.6

Lab Preview

The `ignore_case` Parameter