Duration 1.5 hours
Day 2 of 2

Learning Objectives

By the end of this module, students will be able to:

  • Select appropriate TTS engines and voices
  • Configure language settings for different locales
  • Implement filler phrases for natural conversation
  • Add pronunciation rules for correct TTS output
  • Test voice configurations effectively

Topics

1. Text-to-Speech Engines (25 min)

Available TTS Engines

Provider Engine Code Example Voice Reference
Amazon Polly amazon amazon.Joanna-Neural Voice IDs
Cartesia cartesia cartesia.a167e0f3-df7e-4d52-a9c3-f949145efdab Voice IDs
Deepgram deepgram deepgram.aura-asteria-en Voice IDs
ElevenLabs elevenlabs elevenlabs.thomas Voice IDs
Google Cloud gcloud gcloud.en-US-Casual-K Voice IDs
Microsoft Azure azure azure.en-US-AvaNeural Voice IDs
OpenAI openai openai.alloy Voice IDs
Rime rime rime.luna:arcana Voice IDs

Pricing: Rime, Cartesia, and ElevenLabs are premium, usage-based providers. Check your SignalWire dashboard for pricing details.

Voice Selection

# Rime voices
agent.add_language("English", "en-US", "rime.spore")

# Google Cloud voices
agent.add_language("English", "en-US", "gcloud.en-US-Casual-K")

# Amazon Polly voices
agent.add_language("English", "en-US", "amazon.Joanna-Neural")

# ElevenLabs voices
agent.add_language("English", "en-US", "elevenlabs.thomas")
Voice Style Best For
rime.spore Professional, clear Business, support
rime.marsh Warm, friendly Hospitality, sales
rime.cove Calm, measured Healthcare, finance
rime.brook Energetic, upbeat Marketing, entertainment

Choosing the Right Voice

Consider:

  1. Brand alignment - Does the voice match your brand?
  2. Use case - Support needs different voice than sales
  3. Clarity - Can users understand clearly?
  4. Fatigue - How does it sound over long calls?

2. Language Configuration (20 min)

Basic Language Setup

agent.add_language(
    name="English",        # Display name
    code="en-US",         # BCP-47 language code
    voice="rime.spore"    # TTS voice
)

Common Language Codes

Language Code Example Voice
English (US) en-US rime.spore
English (UK) en-GB gcloud.en-GB-Neural2-A
Spanish (US) es-US gcloud.es-US-Neural2-A
Spanish (MX) es-MX gcloud.es-MX-Neural2-A
French (CA) fr-CA gcloud.fr-CA-Neural2-A
French (FR) fr-FR gcloud.fr-FR-Neural2-A
German de-DE gcloud.de-DE-Neural2-A
Portuguese (BR) pt-BR gcloud.pt-BR-Neural2-A
Mandarin zh-CN gcloud.cmn-CN-Neural2-A
Japanese ja-JP gcloud.ja-JP-Neural2-A

Multi-Language Agents

# Primary language first
agent.add_language("English", "en-US", "rime.spore")

# Additional languages
agent.add_language("Spanish", "es-US", "gcloud.es-US-Neural2-A")
agent.add_language("French", "fr-CA", "gcloud.fr-CA-Neural2-A")

The AI will detect language and switch automatically!


3. Filler Phrases (25 min)

What Are Fillers?

Fillers are phrases spoken while the AI is processing or waiting. There are two types:

Type Purpose Example
Speech fillers Natural hesitation words during pauses “Um”, “Uh”, “Well”, “Let me think”
Function fillers Phrases while executing functions “One moment please”, “Let me check that”

They make the conversation feel natural.

Adding Language-Level Fillers

Important: You must provide BOTH speech_fillers AND function_fillers together. If you only provide one, the SDK falls back to a deprecated format.

agent.add_language(
    "English",
    "en-US",
    "rime.spore",
    speech_fillers=["Um", "Uh", "Well", "Let me think"],
    function_fillers=[
        "One moment please...",
        "Let me look into that...",
        "Sure, checking now..."
    ]
)

Context-Appropriate Fillers

Professional/Formal:

agent.add_language(
    "English", "en-US", "rime.spore",
    speech_fillers=["Well", "Let me see", "One moment"],
    function_fillers=[
        "One moment please...",
        "Allow me to check...",
        "I'll look into that for you..."
    ]
)

Casual/Friendly:

agent.add_language(
    "English", "en-US", "rime.marsh",
    speech_fillers=["Um", "So", "Let's see"],
    function_fillers=[
        "Sure thing, checking now!",
        "One sec...",
        "Got it, looking now..."
    ]
)

Healthcare/Sensitive:

agent.add_language(
    "English", "en-US", "rime.cove",
    speech_fillers=["I see", "Of course", "Certainly"],
    function_fillers=[
        "I understand, let me help with that...",
        "Of course, checking your information now...",
        "I'm looking into this for you..."
    ]
)

Function-Specific Fillers

Set fillers for specific functions that take time. This uses a separate fillers parameter on the @tool decorator (not the same as language-level fillers):

@agent.tool(
    description="Look up order status",
    fillers=[
        "Looking up your order now...",
        "Checking the system for your order...",
        "One moment while I find that..."
    ]
)
def get_order_status(order_id: str):
    # Takes a few seconds
    return f"Order {order_id} shipped yesterday."

Note: Function-level fillers on @tool are different from language-level speech_fillers/function_fillers. Function fillers override language fillers for that specific function.


4. Speech Recognition Optimization (15 min)

Language Code Importance

The language code affects speech recognition accuracy:

# US English - recognizes American accents better
agent.add_language("English", "en-US", "rime.spore")

# UK English - recognizes British accents better
agent.add_language("English", "en-GB", "rime.cove")

Hints for Better Recognition

Add hints for domain-specific terms:

agent.set_hints([
    "Acme",              # Company name
    "TechCorp",          # Partner name
    "SKU",               # Industry term
    "API",               # Technical term
    "A B C 1 2 3"        # Common format
])

We’ll cover hints in detail in Module 1.7.


5. Pronunciation Rules (15 min)

What Are Pronunciation Rules?

Pronunciation rules tell the TTS engine how to say specific words correctly. This is essential for:

  • Acronyms - “API” should be “A P I” not “appy”
  • Brand names - “SignalWire” should be “Signal Wire”
  • Technical terms - “PostgreSQL” should be “Postgres Q L”
  • Industry jargon - “SaaS” should be “sass” not “S A A S”

Adding Pronunciation Rules

# Spell out acronyms letter by letter
agent.add_pronunciation("API", "A P I")
agent.add_pronunciation("SDK", "S D K")
agent.add_pronunciation("CLI", "C L I")

# Pronounce as a word
agent.add_pronunciation("SIP", "sip", ignore_case=True)
agent.add_pronunciation("VoIP", "voyp")

# Brand names
agent.add_pronunciation("SignalWire", "Signal Wire")
agent.add_pronunciation("PostgreSQL", "Postgres Q L")

The ignore_case Parameter

By default, pronunciation matching is case-sensitive. Use ignore_case=True for flexible matching:

# Only matches "API" exactly
agent.add_pronunciation("API", "A P I")

# Matches "api", "Api", "API", etc.
agent.add_pronunciation("api", "A P I", ignore_case=True)

Common Pronunciation Patterns

Term Pronunciation Notes
API A P I Spell out
SDK S D K Spell out
HTTP H T T P Spell out
URL U R L Spell out
SIP sip Say as word
VoIP voyp Say as word
SQL sequel or S Q L Either works
GIF gif or jif Your choice!

Bulk Pronunciation Setup

For many rules, use set_pronunciations():

agent.set_pronunciations([
    {"replace": "API", "with": "A P I"},
    {"replace": "SDK", "with": "S D K"},
    {"replace": "HTTP", "with": "H T T P", "ignore_case": True},
    {"replace": "SignalWire", "with": "Signal Wire"}
])

Pronunciation vs Hints

Feature Purpose Example
Hints Help ASR hear words correctly agent.set_hints(["SignalWire"])
Pronunciation Help TTS say words correctly agent.add_pronunciation("API", "A P I")

Use both together for best results:

# Help recognize AND pronounce correctly
agent.set_hints(["SignalWire", "API", "SDK"])
agent.add_pronunciation("API", "A P I")
agent.add_pronunciation("SDK", "S D K")
agent.add_pronunciation("SignalWire", "Signal Wire")

Pattern Hints (ASR Replacement)

Sometimes users say something that should be interpreted as something else. Pattern hints let you replace what the ASR heard with what you want the AI to receive.

This is the reverse of pronunciation - instead of changing how words are spoken, you’re changing how heard words are interpreted.

# "swimmel" -> "SWML" (common mispronunciation)
agent.add_pattern_hint(
    hint="SWML",           # What to help recognize
    pattern="swimmel",     # What users might say
    replace="SWML",        # What to send to the AI
    ignore_case=True
)

# "swig" or "schwaig" -> "SWAIG"
agent.add_pattern_hint(
    hint="SWAIG",
    pattern="swig|schwaig",
    replace="SWAIG",
    ignore_case=True
)

# Phone number format normalization
agent.add_pattern_hint(
    hint="phone format",
    pattern=r"(\d{3})\s*(\d{3})\s*(\d{4})",
    replace=r"(\1) \2-\3",
    ignore_case=False
)

Complete Speech Optimization Example

# 1. Hints: Help ASR recognize domain terms
agent.set_hints(["SWML", "SWAIG", "SignalWire", "API"])

# 2. Pattern hints: Fix common mishearings
agent.add_pattern_hint("SWML", "swimmel", "SWML", ignore_case=True)
agent.add_pattern_hint("SWAIG", "swig", "SWAIG", ignore_case=True)

# 3. Pronunciation: Help TTS say terms correctly
agent.add_pronunciation("API", "A P I")
agent.add_pronunciation("SWML", "swimmel")  # Say it phonetically
agent.add_pronunciation("SignalWire", "Signal Wire")

Summary:

  • Hints → Help ASR recognize words
  • Pattern Hints → Transform what was heard into something else
  • Pronunciation → Help TTS speak words correctly

6. Testing Voice Configuration (15 min)

Using swaig-test

# View language configuration in SWML
swaig-test agent.py --dump-swml | grep -A 20 '"languages"'

Expected Output

"languages": [
  {
    "name": "English",
    "code": "en-US",
    "voice": "rime.spore",
    "speech_fillers": ["Um", "Uh", "Well"],
    "function_fillers": [
      "One moment please...",
      "Let me check..."
    ]
  }
]

Note: If you only see a fillers field (not speech_fillers and function_fillers), you need to provide both filler types in add_language().

Live Testing Checklist

When testing with real calls:

  • Voice is clear and understandable
  • Pronunciation of company/product names is correct
  • Fillers play at appropriate times
  • Language matches user’s speech
  • No awkward pauses

Configuration Patterns

Professional Support Agent

agent.add_language(
    "English",
    "en-US",
    "rime.spore",  # Professional voice
    speech_fillers=["Well", "Let me see", "One moment"],
    function_fillers=[
        "One moment please...",
        "Let me check on that for you...",
        "I'm looking into this now..."
    ]
)

Friendly Sales Agent

agent.add_language(
    "English",
    "en-US",
    "rime.marsh",  # Warm, friendly voice
    speech_fillers=["Um", "So", "Oh"],
    function_fillers=[
        "Great question! Let me check...",
        "Let me find that for you...",
        "Sure thing, looking now!",
        "Absolutely, checking now..."
    ]
)

Healthcare Information Line

agent.add_language(
    "English",
    "en-US",
    "rime.cove",  # Calm, measured voice
    speech_fillers=["I see", "Of course", "Certainly"],
    function_fillers=[
        "I understand, let me help...",
        "Of course, one moment...",
        "I'm checking that information now..."
    ]
)

# Spanish option
agent.add_language(
    "Spanish",
    "es-US",
    "gcloud.es-US-Neural2-A",
    speech_fillers=["Bueno", "A ver", "Pues"],
    function_fillers=[
        "Un momento por favor...",
        "Déjeme verificar...",
        "Estoy buscando esa información..."
    ]
)

Common Mistakes

1. Wrong Language Code Format

Wrong:

agent.add_language("English", "english-us", "rime.spore")

Right:

agent.add_language("English", "en-US", "rime.spore")

2. Only Providing One Filler Type

Wrong (falls back to deprecated fillers field):

agent.add_language("English", "en-US", "rime.spore",
    speech_fillers=["Um", "Uh", "Well"])
# Missing function_fillers!

Right (both types required):

agent.add_language("English", "en-US", "rime.spore",
    speech_fillers=["Um", "Uh", "Well"],
    function_fillers=["One moment please...", "Let me check..."])

3. Too Many Fillers

Wrong:

speech_fillers=[
    "Um...", "Uh...", "Well...", "So...",
    "Let's see...", "Hmm...", # 20 more...
]

Keep it to 3-5 appropriate fillers per type.

4. Mismatched Tone

Wrong:

# Professional prompt but casual fillers
agent.prompt_add_section("Role", "You are a formal legal assistant.")
agent.add_language("English", "en-US", "rime.spore",
    speech_fillers=["Yo", "Like", "Ya know"],
    function_fillers=["Sure thing!", "No prob!", "Gotcha!"])

Key Takeaways

  1. Voice selection matters - Match voice to brand and use case
  2. Language codes enable ASR - Correct codes improve recognition
  3. Fillers create naturalness - Fill processing gaps
  4. Pronunciation ensures clarity - Help TTS say acronyms and names correctly
  5. Test with real calls - swaig-test shows config, calls show experience
  6. Consistency is key - Voice, fillers, and prompts should align

Preparation for Lab 1.6

  • Working agent with prompts configured
  • Think about your agent’s “voice personality”
  • Consider what fillers fit your use case

Lab Preview

In Lab 1.6, you will:

  1. Select and configure a voice
  2. Add appropriate filler phrases
  3. Test voice output
  4. Optionally add a second language

Back to top

SignalWire AI Agents Certification Program