Module 3.3: Performance Optimization


Duration	2 hours
Day	6 of 7

Learning Objectives

By the end of this module, students will be able to:

Optimize agent response times
Configure speech and timing parameters
Reduce latency in function calls
Profile and benchmark agent performance

Topics

1. Voice AI Performance Factors (20 min)

The Latency Stack

┌─────────────────────────────────────────────────────┐
│                   Total Latency                      │
├─────────────────────────────────────────────────────┤
│  Speech Recognition (STT)      │  100-300ms         │
├─────────────────────────────────────────────────────┤
│  AI Processing                 │  200-500ms         │
├─────────────────────────────────────────────────────┤
│  Function Execution            │  Variable          │
├─────────────────────────────────────────────────────┤
│  Text-to-Speech (TTS)          │  50-200ms          │
├─────────────────────────────────────────────────────┤
│  Network/Audio                 │  50-100ms          │
└─────────────────────────────────────────────────────┘

Performance Goals

Metric	Good	Acceptable	Poor
First response	<1s	1-2s	>2s
Function response	<500ms	500ms-1s	>1s
Total turn latency	<2s	2-3s	>3s

2. Speech Timing Configuration (30 min)

Attention Timeout

How long to wait for user to start speaking:

agent.set_params({
    "attention_timeout": 10000,  # 10 seconds
    "attention_timeout_prompt": "Are you still there?"
})

End of Speech Detection

Balance between responsiveness and interruption:

agent.set_params({
    # Time to wait after user stops speaking
    "end_of_speech_timeout": 500,  # 500ms default

    # Barge-in settings (barge is enabled by default)
    "barge_min_words": 2  # Require 2 words to interrupt
})

Speed Control

agent.set_params({
    "speech_rate": 1.0,  # 1.0 = normal, 1.2 = faster
    "ai_volume": 1.0     # Volume level
})

Timing Comparison

Setting	Use Case
Short timeout (300ms)	Quick transactions, confirmations
Medium timeout (500ms)	Normal conversation
Long timeout (800ms)	Thoughtful answers, elderly callers

3. Function Performance (35 min)

Fillers for Long Operations

Keep the caller engaged during processing:

@agent.tool(
    description="Search inventory",
    parameters={
        "type": "object",
        "properties": {
            "query": {"type": "string", "description": "Search query"}
        },
        "required": ["query"]
    },
    fillers=[
        "Let me check on that...",
        "One moment while I search...",
        "Still looking..."
    ]
)
def search_inventory(args: dict, raw_data: dict = None) -> SwaigFunctionResult:
    query = args.get("query", "")
    # Long operation
    results = database.search(query)
    return SwaigFunctionResult(f"Found {len(results)} items.")

Timeout Handling

import requests
from requests.exceptions import Timeout

@agent.tool(
    description="Get order status",
    parameters={
        "type": "object",
        "properties": {
            "order_id": {"type": "string", "description": "Order ID"}
        },
        "required": ["order_id"]
    }
)
def get_order(args: dict, raw_data: dict = None) -> SwaigFunctionResult:
    order_id = args.get("order_id", "")
    try:
        response = requests.get(
            f"https://api.example.com/orders/{order_id}",
            timeout=3  # 3 second max
        )
        return SwaigFunctionResult(f"Order status: {response.json()['status']}")

    except Timeout:
        return SwaigFunctionResult(
            "I'm having trouble reaching our order system. "
            "Can I take your number and call you back?"
        )

Async Patterns

import asyncio
import aiohttp

class AsyncAgent(AgentBase):
    def __init__(self):
        super().__init__(name="async-agent")
        self._setup_functions()

    def _setup_functions(self):
        @self.tool(
            description="Check multiple systems",
            parameters={
                "type": "object",
                "properties": {},
                "required": []
            },
            fillers=["Checking all systems..."]
        )
        async def check_all_systems(args: dict, raw_data: dict = None) -> SwaigFunctionResult:
            async with aiohttp.ClientSession() as session:
                # Parallel requests
                tasks = [
                    self.check_system(session, "inventory"),
                    self.check_system(session, "shipping"),
                    self.check_system(session, "billing")
                ]
                results = await asyncio.gather(*tasks, return_exceptions=True)

            # Summarize results
            statuses = [r for r in results if not isinstance(r, Exception)]
            return SwaigFunctionResult(f"Systems checked: {len(statuses)} operational")

        async def check_system(self, session, system):
            async with session.get(f"https://api.example.com/{system}/health") as resp:
                return await resp.json()

Caching Strategies

from functools import lru_cache
from datetime import datetime, timedelta


class CachedAgent(AgentBase):
    def __init__(self):
        super().__init__(name="cached-agent")
        self._cache = {}
        self._cache_ttl = timedelta(minutes=5)

    def _get_cached(self, key: str):
        if key in self._cache:
            value, timestamp = self._cache[key]
            if datetime.now() - timestamp < self._cache_ttl:
                return value
        return None

    def _set_cached(self, key: str, value):
        self._cache[key] = (value, datetime.now())

    @AgentBase.tool(
        description="Get product pricing",
        parameters={
            "type": "object",
            "properties": {
                "product_id": {"type": "string", "description": "Product ID"}
            },
            "required": ["product_id"]
        }
    )
    def get_pricing(self, args: dict, raw_data: dict = None) -> SwaigFunctionResult:
        product_id = args.get("product_id", "")
        # Check cache first
        cached = self._get_cached(f"price:{product_id}")
        if cached:
            return SwaigFunctionResult(f"Price: ${cached}")

        # Fetch and cache
        price = self.fetch_price(product_id)
        self._set_cached(f"price:{product_id}", price)

        return SwaigFunctionResult(f"Price: ${price}")

4. Prompt Optimization (25 min)

Concise Prompts

# BAD: Too verbose
self.prompt_add_section(
    "Role",
    "You are a highly skilled and knowledgeable customer service "
    "representative working for our esteemed company. Your primary "
    "responsibility is to assist customers with their inquiries in "
    "a professional, courteous, and efficient manner while maintaining "
    "the highest standards of service excellence..."
)

# GOOD: Clear and concise
self.prompt_add_section(
    "Role",
    "Customer service agent for TechCorp. Help with orders, "
    "returns, and product questions."
)

Structured Instructions

# Use bullets for clarity
self.prompt_add_section(
    "Response Guidelines",
    bullets=[
        "Keep answers under 2 sentences when possible",
        "Confirm understanding before taking action",
        "Offer alternatives if request can't be fulfilled"
    ]
)

Focused Context

# Only include relevant information
class FocusedAgent(AgentBase):
    def __init__(self, department: str):
        super().__init__(name=f"{department}-agent")

        # Department-specific prompt only
        if department == "sales":
            self._setup_sales_prompt()
        elif department == "support":
            self._setup_support_prompt()

    def _setup_sales_prompt(self):
        self.prompt_add_section(
            "Role",
            "Sales agent. Help with pricing and purchases."
        )
        # Only sales-relevant info

    def _setup_support_prompt(self):
        self.prompt_add_section(
            "Role",
            "Support agent. Troubleshoot issues."
        )
        # Only support-relevant info

5. Profiling and Benchmarking (30 min)

Measuring Performance

import time
import logging

logger = logging.getLogger(__name__)


class ProfiledAgent(AgentBase):
    def __init__(self):
        super().__init__(name="profiled-agent")
        self._setup_functions()

    def _setup_functions(self):
        @self.tool(
            description="Profiled function",
            parameters={
                "type": "object",
                "properties": {
                    "query": {"type": "string", "description": "Query to process"}
                },
                "required": ["query"]
            }
        )
        def my_function(args: dict, raw_data: dict = None) -> SwaigFunctionResult:
            query = args.get("query", "")
            start = time.perf_counter()

            # Your logic here
            result = process_query(query)

            elapsed = time.perf_counter() - start
            logger.info(f"my_function took {elapsed:.3f}s")

            return SwaigFunctionResult(result)

Performance Decorator

import functools
import time
import logging

logger = logging.getLogger(__name__)


def profile(func):
    """Decorator to profile function execution time."""
    @functools.wraps(func)
    def wrapper(*args, **kwargs):
        start = time.perf_counter()
        result = func(*args, **kwargs)
        elapsed = time.perf_counter() - start
        logger.info(f"{func.__name__}: {elapsed:.3f}s")
        return result
    return wrapper


class OptimizedAgent(AgentBase):
    @profile
    def _process_order(self, order_id: str):
        # Implementation
        pass

Load Testing

# Using hey for HTTP load testing
hey -n 100 -c 10 \
  -m POST \
  -H "Content-Type: application/json" \
  -d '{}' \
  http://localhost:3000/agent

# Using locust for complex scenarios
# locustfile.py
from locust import HttpUser, task, between

class AgentUser(HttpUser):
    wait_time = between(1, 3)

    @task
    def call_agent(self):
        self.client.post("/agent", json={})

    @task
    def call_function(self):
        self.client.post("/agent/swaig", json={
            "function": "get_status",
            "args": {}
        })

Performance Dashboard

from datetime import datetime
from collections import defaultdict


class MetricsAgent(AgentBase):
    def __init__(self):
        super().__init__(name="metrics-agent")
        self.metrics = defaultdict(list)

    def record_metric(self, name: str, value: float):
        self.metrics[name].append({
            "value": value,
            "timestamp": datetime.now().isoformat()
        })

    def get_metrics_summary(self):
        summary = {}
        for name, values in self.metrics.items():
            vals = [v["value"] for v in values]
            summary[name] = {
                "count": len(vals),
                "avg": sum(vals) / len(vals) if vals else 0,
                "min": min(vals) if vals else 0,
                "max": max(vals) if vals else 0
            }
        return summary

Performance Optimization Checklist

Speech Settings

Appropriate attention timeout
End-of-speech timeout tuned
Barge-in configured if needed

Function Performance

Timeouts on all external calls
Fillers for slow operations
Caching where appropriate
Async for parallel operations

Prompts

Concise system prompt
Only necessary context included
Clear, structured instructions

Monitoring

Performance logging enabled
Key metrics tracked
Alerts for degradation

Key Takeaways

Latency is cumulative - Every millisecond counts
Use fillers - Keep users engaged during processing
Timeout everything - Never block indefinitely
Cache wisely - Fresh data vs. speed tradeoff
Measure constantly - Can’t improve what you don’t measure

Preparation for Lab 3.3

Identify slow functions in your agents
Gather baseline performance metrics
List external dependencies

Lab Preview

In Lab 3.3, you will:

Profile an existing agent
Implement caching and timeouts
Configure optimal speech settings
Measure improvement with load testing