Duration 2 hours
Day 7 of 7

Learning Objectives

By the end of this module, students will be able to:

  • Diagnose common voice AI issues
  • Use debugging tools effectively
  • Implement error handling strategies
  • Build resilient agents

Topics

1. Common Issues and Symptoms (25 min)

Issue Classification

Category Symptoms Common Causes
Connectivity No response, timeouts Network, auth, SSL
Recognition Wrong understanding Accent, noise, jargon
Logic Wrong actions taken Prompt issues, missing context
Performance Slow responses API delays, cold starts
Integration Function failures API errors, data issues

Diagnostic Decision Tree

Issue Detected
     │
     ├─── Agent not responding?
     │         │
     │         ├── Check: Network connectivity
     │         ├── Check: Authentication
     │         └── Check: Health endpoint
     │
     ├─── Wrong understanding?
     │         │
     │         ├── Check: STT configuration
     │         ├── Check: Language settings
     │         └── Check: Hints/vocabulary
     │
     ├─── Wrong actions?
     │         │
     │         ├── Check: Prompt clarity
     │         ├── Check: Function parameters
     │         └── Check: Context state
     │
     └─── Slow responses?
               │
               ├── Check: Function latency
               ├── Check: External API times
               └── Check: Cold start issues

2. Using swaig-test (30 min)

Basic Testing

# Dump SWML output
swaig-test my_agent.py --dump-swml

# List all registered tools
swaig-test my_agent.py --list-tools

# Test specific function
swaig-test my_agent.py --exec get_order --order_id "12345"

# Test with metadata
swaig-test my_agent.py --exec process_payment \
  --amount "100" \
  --meta customer_id="C001" \
  --meta verified="true"

Debugging Function Issues

# Verbose output
swaig-test my_agent.py --exec my_function --x "test" --verbose

# Check function signature
swaig-test my_agent.py --list-tools --json | jq '.[] | select(.name=="my_function")'

# Test with raw_data simulation
swaig-test my_agent.py --exec my_function \
  --query "help" \
  --raw '{"call_id": "test-123", "meta_data": {"customer": "John"}}'

SWML Validation

# Validate SWML structure
swaig-test my_agent.py --dump-swml --validate

# Check specific sections
swaig-test my_agent.py --dump-swml | jq '.sections.main[0].ai'

# Verify functions are exposed
swaig-test my_agent.py --dump-swml | jq '.sections.main[0].ai.SWAIG.functions'

3. Log-Based Debugging (30 min)

Enable Debug Logging

import logging
import os

# Enable debug logging
logging.basicConfig(
    level=logging.DEBUG if os.getenv("DEBUG") else logging.INFO,
    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
)

logger = logging.getLogger(__name__)


class DebuggableAgent(AgentBase):
    def __init__(self):
        super().__init__(name="debuggable-agent")
        logger.debug(f"Agent initialized: {self._name}")
        self._setup_functions()

    def _setup_functions(self):
        @self.tool(
            description="Debug function",
            parameters={
                "type": "object",
                "properties": {
                    "input": {"type": "string", "description": "Input data"}
                },
                "required": ["input"]
            }
        )
        def debug_function(args: dict, raw_data: dict = None) -> SwaigFunctionResult:
            input = args.get("input", "")
            raw_data = raw_data or {}
            global_data = raw_data.get("global_data", {})
            logger.debug(f"Function called with input: {input}")
            logger.debug(f"raw_data: {raw_data}")
            logger.debug(f"Current global_data: {global_data}")

            result = process(input)
            logger.debug(f"Function result: {result}")

            return SwaigFunctionResult(result)

Run with Debug Mode

# Enable debug logging
DEBUG=1 python my_agent.py

# Or with specific logger
LOGLEVEL=DEBUG python my_agent.py

Analyzing Logs

# Filter for specific call
grep "call-123" agent.log

# Filter for errors
grep -E "(ERROR|CRITICAL)" agent.log

# Filter for function calls
grep "Function called" agent.log

# JSON log parsing
cat agent.log | jq 'select(.level=="ERROR")'

# Timeline analysis
cat agent.log | jq -r '[.timestamp, .message] | @tsv' | head -20

4. Error Handling Strategies (25 min)

Graceful Degradation

class ResilientAgent(AgentBase):
    def __init__(self):
        super().__init__(name="resilient-agent")
        self._setup_functions()

    def _setup_functions(self):
        @self.tool(
            description="Get order status",
            parameters={
                "type": "object",
                "properties": {
                    "order_id": {"type": "string", "description": "Order ID"}
                },
                "required": ["order_id"]
            }
        )
        def get_order_status(args: dict, raw_data: dict = None) -> SwaigFunctionResult:
            order_id = args.get("order_id", "")
            try:
                # Primary: Real-time API
                status = api.get_order(order_id)
                return SwaigFunctionResult(f"Order {order_id}: {status}")

            except ConnectionError:
                logger.warning(f"API unavailable, trying cache")
                try:
                    # Fallback: Cached data
                    status = cache.get(f"order:{order_id}")
                    if status:
                        return SwaigFunctionResult(
                            f"Order {order_id}: {status} "
                            "(Note: This may be slightly outdated)"
                        )
                except Exception:
                    pass

                # Final fallback: Helpful response
                return SwaigFunctionResult(
                    "I'm having trouble checking order status right now. "
                    "Would you like me to have someone call you back?"
                )

            except ValueError as e:
                logger.error(f"Invalid order ID: {order_id}")
                return SwaigFunctionResult(
                    "That doesn't look like a valid order number. "
                    "Order numbers are 8 digits. Could you check and try again?"
                )

            except Exception as e:
                logger.error(f"Unexpected error: {e}", exc_info=True)
                return SwaigFunctionResult(
                    "Something went wrong. Let me connect you with support."
                )

Retry Logic

import time
from functools import wraps


def retry(max_attempts=3, delay=1, backoff=2):
    """Retry decorator with exponential backoff."""
    def decorator(func):
        @wraps(func)
        def wrapper(*args, **kwargs):
            attempts = 0
            current_delay = delay

            while attempts < max_attempts:
                try:
                    return func(*args, **kwargs)
                except (ConnectionError, TimeoutError) as e:
                    attempts += 1
                    if attempts == max_attempts:
                        raise
                    logger.warning(
                        f"Attempt {attempts} failed: {e}. "
                        f"Retrying in {current_delay}s"
                    )
                    time.sleep(current_delay)
                    current_delay *= backoff

        return wrapper
    return decorator


class RetryAgent(AgentBase):
    @retry(max_attempts=3, delay=0.5)
    def _fetch_data(self, id: str):
        return api.get(id)

    @AgentBase.tool(
        description="Get data with retry",
        parameters={
            "type": "object",
            "properties": {
                "id": {"type": "string", "description": "Data ID"}
            },
            "required": ["id"]
        }
    )
    def get_data(self, args: dict, raw_data: dict = None) -> SwaigFunctionResult:
        id = args.get("id", "")
        try:
            data = self._fetch_data(id)
            return SwaigFunctionResult(f"Data: {data}")
        except Exception:
            return SwaigFunctionResult(
                "Unable to retrieve data after multiple attempts."
            )

Circuit Breaker

from datetime import datetime, timedelta


class CircuitBreaker:
    def __init__(self, failure_threshold=5, reset_timeout=60):
        self.failure_threshold = failure_threshold
        self.reset_timeout = reset_timeout
        self.failures = 0
        self.last_failure = None
        self.state = "closed"  # closed, open, half-open

    def call(self, func, *args, **kwargs):
        if self.state == "open":
            if datetime.now() - self.last_failure > timedelta(seconds=self.reset_timeout):
                self.state = "half-open"
            else:
                raise CircuitOpenError("Circuit breaker is open")

        try:
            result = func(*args, **kwargs)
            if self.state == "half-open":
                self.state = "closed"
                self.failures = 0
            return result

        except Exception as e:
            self.failures += 1
            self.last_failure = datetime.now()

            if self.failures >= self.failure_threshold:
                self.state = "open"
                logger.error(f"Circuit breaker opened after {self.failures} failures")

            raise


class CircuitBreakerAgent(AgentBase):
    def __init__(self):
        super().__init__(name="circuit-agent")
        self.api_circuit = CircuitBreaker(failure_threshold=3, reset_timeout=30)
        self._setup_functions()

    def _setup_functions(self):
        @self.tool(
            description="Protected API call",
            parameters={
                "type": "object",
                "properties": {
                    "query": {"type": "string", "description": "API query"}
                },
                "required": ["query"]
            }
        )
        def protected_call(args: dict, raw_data: dict = None) -> SwaigFunctionResult:
            query = args.get("query", "")
            try:
                result = self.api_circuit.call(api.search, query)
                return SwaigFunctionResult(f"Found: {result}")

            except CircuitOpenError:
                return SwaigFunctionResult(
                    "Our search system is temporarily unavailable. "
                    "Please try again in a few minutes."
                )

5. Production Debugging (30 min)

Remote Debugging Setup

import os

if os.getenv("REMOTE_DEBUG"):
    import debugpy
    debugpy.listen(("0.0.0.0", 5678))
    print("Waiting for debugger attach...")
    debugpy.wait_for_client()

Request Replay

import json
from datetime import datetime


class ReplayableAgent(AgentBase):
    def __init__(self):
        super().__init__(name="replayable-agent")
        self.request_log = []
        self._setup_functions()

    def log_request(self, function_name: str, args: dict, raw_data: dict):
        """Log request for replay."""
        entry = {
            "timestamp": datetime.utcnow().isoformat(),
            "function": function_name,
            "args": args,
            "raw_data": raw_data
        }
        self.request_log.append(entry)

        # Also write to file
        with open("requests.jsonl", "a") as f:
            f.write(json.dumps(entry) + "\n")

    def replay_request(self, entry: dict):
        """Replay a logged request."""
        func = getattr(self, entry["function"])
        return func(**entry["args"], raw_data=entry["raw_data"])


# Replay from file
def replay_from_log(agent, log_file: str):
    with open(log_file) as f:
        for line in f:
            entry = json.loads(line)
            print(f"Replaying: {entry['function']}")
            try:
                result = agent.replay_request(entry)
                print(f"Result: {result}")
            except Exception as e:
                print(f"Error: {e}")

Health Check Debugging

@app.get("/debug/health")
async def debug_health():
    """Detailed health check for debugging."""
    results = {
        "timestamp": datetime.utcnow().isoformat(),
        "checks": {}
    }

    # Agent SWML generation
    start = time.perf_counter()
    try:
        swml = agent.get_swml()
        results["checks"]["swml_generation"] = {
            "status": "ok",
            "duration_ms": (time.perf_counter() - start) * 1000,
            "function_count": len(swml.get("functions", []))
        }
    except Exception as e:
        results["checks"]["swml_generation"] = {
            "status": "error",
            "error": str(e)
        }

    # Database connectivity
    start = time.perf_counter()
    try:
        db.execute("SELECT 1")
        results["checks"]["database"] = {
            "status": "ok",
            "duration_ms": (time.perf_counter() - start) * 1000
        }
    except Exception as e:
        results["checks"]["database"] = {
            "status": "error",
            "error": str(e)
        }

    # External API
    start = time.perf_counter()
    try:
        response = requests.get(
            "https://api.example.com/health",
            timeout=5
        )
        results["checks"]["external_api"] = {
            "status": "ok" if response.ok else "degraded",
            "status_code": response.status_code,
            "duration_ms": (time.perf_counter() - start) * 1000
        }
    except Exception as e:
        results["checks"]["external_api"] = {
            "status": "error",
            "error": str(e)
        }

    # Memory usage
    import psutil
    process = psutil.Process()
    results["system"] = {
        "memory_mb": process.memory_info().rss / 1024 / 1024,
        "cpu_percent": process.cpu_percent()
    }

    return results

Troubleshooting Quick Reference

Agent Won’t Start

# Check syntax
python -m py_compile my_agent.py

# Check imports
python -c "from my_agent import *"

# Check environment
env | grep SIGNALWIRE

# Verbose startup
python my_agent.py --verbose

Function Not Called

# Verify function registered
swaig-test my_agent.py --list-tools | grep function_name

# Check SWML includes function
swaig-test my_agent.py --dump-swml | jq '.sections.main[0].ai.SWAIG.functions[] | select(.function=="function_name")'

# Test function directly
swaig-test my_agent.py --exec function_name --param "value"

Slow Performance

# Profile function
python -m cProfile -s cumulative my_agent.py

# Time external calls
curl -w "@timing.txt" -o /dev/null -s https://api.example.com/

# Check memory
python -c "import tracemalloc; tracemalloc.start(); from my_agent import *; print(tracemalloc.get_traced_memory())"

Key Takeaways

  1. Systematic debugging - Follow the decision tree
  2. Use the tools - swaig-test is your friend
  3. Log thoughtfully - Context and correlation
  4. Handle errors gracefully - Fallbacks and retries
  5. Monitor continuously - Catch issues before users do

Preparation for Lab 3.6

  • Identify a problematic agent or function
  • Gather error logs
  • Note reproduction steps

Lab Preview

In Lab 3.6, you will:

  1. Debug a broken agent
  2. Fix common issues
  3. Implement error handling
  4. Add resilience patterns

Back to top

SignalWire AI Agents Certification Program