| Duration | 2 hours |
| Day | 7 of 7 |
Learning Objectives
By the end of this module, students will be able to:
- Diagnose common voice AI issues
- Use debugging tools effectively
- Implement error handling strategies
- Build resilient agents
Topics
1. Common Issues and Symptoms (25 min)
Issue Classification
| Category | Symptoms | Common Causes |
|---|---|---|
| Connectivity | No response, timeouts | Network, auth, SSL |
| Recognition | Wrong understanding | Accent, noise, jargon |
| Logic | Wrong actions taken | Prompt issues, missing context |
| Performance | Slow responses | API delays, cold starts |
| Integration | Function failures | API errors, data issues |
Diagnostic Decision Tree
Issue Detected
│
├─── Agent not responding?
│ │
│ ├── Check: Network connectivity
│ ├── Check: Authentication
│ └── Check: Health endpoint
│
├─── Wrong understanding?
│ │
│ ├── Check: STT configuration
│ ├── Check: Language settings
│ └── Check: Hints/vocabulary
│
├─── Wrong actions?
│ │
│ ├── Check: Prompt clarity
│ ├── Check: Function parameters
│ └── Check: Context state
│
└─── Slow responses?
│
├── Check: Function latency
├── Check: External API times
└── Check: Cold start issues
2. Using swaig-test (30 min)
Basic Testing
# Dump SWML output
swaig-test my_agent.py --dump-swml
# List all registered tools
swaig-test my_agent.py --list-tools
# Test specific function
swaig-test my_agent.py --exec get_order --order_id "12345"
# Test with metadata
swaig-test my_agent.py --exec process_payment \
--amount "100" \
--meta customer_id="C001" \
--meta verified="true"
Debugging Function Issues
# Verbose output
swaig-test my_agent.py --exec my_function --x "test" --verbose
# Check function signature
swaig-test my_agent.py --list-tools --json | jq '.[] | select(.name=="my_function")'
# Test with raw_data simulation
swaig-test my_agent.py --exec my_function \
--query "help" \
--raw '{"call_id": "test-123", "meta_data": {"customer": "John"}}'
SWML Validation
# Validate SWML structure
swaig-test my_agent.py --dump-swml --validate
# Check specific sections
swaig-test my_agent.py --dump-swml | jq '.sections.main[0].ai'
# Verify functions are exposed
swaig-test my_agent.py --dump-swml | jq '.sections.main[0].ai.SWAIG.functions'
3. Log-Based Debugging (30 min)
Enable Debug Logging
import logging
import os
# Enable debug logging
logging.basicConfig(
level=logging.DEBUG if os.getenv("DEBUG") else logging.INFO,
format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
)
logger = logging.getLogger(__name__)
class DebuggableAgent(AgentBase):
def __init__(self):
super().__init__(name="debuggable-agent")
logger.debug(f"Agent initialized: {self._name}")
self._setup_functions()
def _setup_functions(self):
@self.tool(
description="Debug function",
parameters={
"type": "object",
"properties": {
"input": {"type": "string", "description": "Input data"}
},
"required": ["input"]
}
)
def debug_function(args: dict, raw_data: dict = None) -> SwaigFunctionResult:
input = args.get("input", "")
raw_data = raw_data or {}
global_data = raw_data.get("global_data", {})
logger.debug(f"Function called with input: {input}")
logger.debug(f"raw_data: {raw_data}")
logger.debug(f"Current global_data: {global_data}")
result = process(input)
logger.debug(f"Function result: {result}")
return SwaigFunctionResult(result)
Run with Debug Mode
# Enable debug logging
DEBUG=1 python my_agent.py
# Or with specific logger
LOGLEVEL=DEBUG python my_agent.py
Analyzing Logs
# Filter for specific call
grep "call-123" agent.log
# Filter for errors
grep -E "(ERROR|CRITICAL)" agent.log
# Filter for function calls
grep "Function called" agent.log
# JSON log parsing
cat agent.log | jq 'select(.level=="ERROR")'
# Timeline analysis
cat agent.log | jq -r '[.timestamp, .message] | @tsv' | head -20
4. Error Handling Strategies (25 min)
Graceful Degradation
class ResilientAgent(AgentBase):
def __init__(self):
super().__init__(name="resilient-agent")
self._setup_functions()
def _setup_functions(self):
@self.tool(
description="Get order status",
parameters={
"type": "object",
"properties": {
"order_id": {"type": "string", "description": "Order ID"}
},
"required": ["order_id"]
}
)
def get_order_status(args: dict, raw_data: dict = None) -> SwaigFunctionResult:
order_id = args.get("order_id", "")
try:
# Primary: Real-time API
status = api.get_order(order_id)
return SwaigFunctionResult(f"Order {order_id}: {status}")
except ConnectionError:
logger.warning(f"API unavailable, trying cache")
try:
# Fallback: Cached data
status = cache.get(f"order:{order_id}")
if status:
return SwaigFunctionResult(
f"Order {order_id}: {status} "
"(Note: This may be slightly outdated)"
)
except Exception:
pass
# Final fallback: Helpful response
return SwaigFunctionResult(
"I'm having trouble checking order status right now. "
"Would you like me to have someone call you back?"
)
except ValueError as e:
logger.error(f"Invalid order ID: {order_id}")
return SwaigFunctionResult(
"That doesn't look like a valid order number. "
"Order numbers are 8 digits. Could you check and try again?"
)
except Exception as e:
logger.error(f"Unexpected error: {e}", exc_info=True)
return SwaigFunctionResult(
"Something went wrong. Let me connect you with support."
)
Retry Logic
import time
from functools import wraps
def retry(max_attempts=3, delay=1, backoff=2):
"""Retry decorator with exponential backoff."""
def decorator(func):
@wraps(func)
def wrapper(*args, **kwargs):
attempts = 0
current_delay = delay
while attempts < max_attempts:
try:
return func(*args, **kwargs)
except (ConnectionError, TimeoutError) as e:
attempts += 1
if attempts == max_attempts:
raise
logger.warning(
f"Attempt {attempts} failed: {e}. "
f"Retrying in {current_delay}s"
)
time.sleep(current_delay)
current_delay *= backoff
return wrapper
return decorator
class RetryAgent(AgentBase):
@retry(max_attempts=3, delay=0.5)
def _fetch_data(self, id: str):
return api.get(id)
@AgentBase.tool(
description="Get data with retry",
parameters={
"type": "object",
"properties": {
"id": {"type": "string", "description": "Data ID"}
},
"required": ["id"]
}
)
def get_data(self, args: dict, raw_data: dict = None) -> SwaigFunctionResult:
id = args.get("id", "")
try:
data = self._fetch_data(id)
return SwaigFunctionResult(f"Data: {data}")
except Exception:
return SwaigFunctionResult(
"Unable to retrieve data after multiple attempts."
)
Circuit Breaker
from datetime import datetime, timedelta
class CircuitBreaker:
def __init__(self, failure_threshold=5, reset_timeout=60):
self.failure_threshold = failure_threshold
self.reset_timeout = reset_timeout
self.failures = 0
self.last_failure = None
self.state = "closed" # closed, open, half-open
def call(self, func, *args, **kwargs):
if self.state == "open":
if datetime.now() - self.last_failure > timedelta(seconds=self.reset_timeout):
self.state = "half-open"
else:
raise CircuitOpenError("Circuit breaker is open")
try:
result = func(*args, **kwargs)
if self.state == "half-open":
self.state = "closed"
self.failures = 0
return result
except Exception as e:
self.failures += 1
self.last_failure = datetime.now()
if self.failures >= self.failure_threshold:
self.state = "open"
logger.error(f"Circuit breaker opened after {self.failures} failures")
raise
class CircuitBreakerAgent(AgentBase):
def __init__(self):
super().__init__(name="circuit-agent")
self.api_circuit = CircuitBreaker(failure_threshold=3, reset_timeout=30)
self._setup_functions()
def _setup_functions(self):
@self.tool(
description="Protected API call",
parameters={
"type": "object",
"properties": {
"query": {"type": "string", "description": "API query"}
},
"required": ["query"]
}
)
def protected_call(args: dict, raw_data: dict = None) -> SwaigFunctionResult:
query = args.get("query", "")
try:
result = self.api_circuit.call(api.search, query)
return SwaigFunctionResult(f"Found: {result}")
except CircuitOpenError:
return SwaigFunctionResult(
"Our search system is temporarily unavailable. "
"Please try again in a few minutes."
)
5. Production Debugging (30 min)
Remote Debugging Setup
import os
if os.getenv("REMOTE_DEBUG"):
import debugpy
debugpy.listen(("0.0.0.0", 5678))
print("Waiting for debugger attach...")
debugpy.wait_for_client()
Request Replay
import json
from datetime import datetime
class ReplayableAgent(AgentBase):
def __init__(self):
super().__init__(name="replayable-agent")
self.request_log = []
self._setup_functions()
def log_request(self, function_name: str, args: dict, raw_data: dict):
"""Log request for replay."""
entry = {
"timestamp": datetime.utcnow().isoformat(),
"function": function_name,
"args": args,
"raw_data": raw_data
}
self.request_log.append(entry)
# Also write to file
with open("requests.jsonl", "a") as f:
f.write(json.dumps(entry) + "\n")
def replay_request(self, entry: dict):
"""Replay a logged request."""
func = getattr(self, entry["function"])
return func(**entry["args"], raw_data=entry["raw_data"])
# Replay from file
def replay_from_log(agent, log_file: str):
with open(log_file) as f:
for line in f:
entry = json.loads(line)
print(f"Replaying: {entry['function']}")
try:
result = agent.replay_request(entry)
print(f"Result: {result}")
except Exception as e:
print(f"Error: {e}")
Health Check Debugging
@app.get("/debug/health")
async def debug_health():
"""Detailed health check for debugging."""
results = {
"timestamp": datetime.utcnow().isoformat(),
"checks": {}
}
# Agent SWML generation
start = time.perf_counter()
try:
swml = agent.get_swml()
results["checks"]["swml_generation"] = {
"status": "ok",
"duration_ms": (time.perf_counter() - start) * 1000,
"function_count": len(swml.get("functions", []))
}
except Exception as e:
results["checks"]["swml_generation"] = {
"status": "error",
"error": str(e)
}
# Database connectivity
start = time.perf_counter()
try:
db.execute("SELECT 1")
results["checks"]["database"] = {
"status": "ok",
"duration_ms": (time.perf_counter() - start) * 1000
}
except Exception as e:
results["checks"]["database"] = {
"status": "error",
"error": str(e)
}
# External API
start = time.perf_counter()
try:
response = requests.get(
"https://api.example.com/health",
timeout=5
)
results["checks"]["external_api"] = {
"status": "ok" if response.ok else "degraded",
"status_code": response.status_code,
"duration_ms": (time.perf_counter() - start) * 1000
}
except Exception as e:
results["checks"]["external_api"] = {
"status": "error",
"error": str(e)
}
# Memory usage
import psutil
process = psutil.Process()
results["system"] = {
"memory_mb": process.memory_info().rss / 1024 / 1024,
"cpu_percent": process.cpu_percent()
}
return results
Troubleshooting Quick Reference
Agent Won’t Start
# Check syntax
python -m py_compile my_agent.py
# Check imports
python -c "from my_agent import *"
# Check environment
env | grep SIGNALWIRE
# Verbose startup
python my_agent.py --verbose
Function Not Called
# Verify function registered
swaig-test my_agent.py --list-tools | grep function_name
# Check SWML includes function
swaig-test my_agent.py --dump-swml | jq '.sections.main[0].ai.SWAIG.functions[] | select(.function=="function_name")'
# Test function directly
swaig-test my_agent.py --exec function_name --param "value"
Slow Performance
# Profile function
python -m cProfile -s cumulative my_agent.py
# Time external calls
curl -w "@timing.txt" -o /dev/null -s https://api.example.com/
# Check memory
python -c "import tracemalloc; tracemalloc.start(); from my_agent import *; print(tracemalloc.get_traced_memory())"
Key Takeaways
- Systematic debugging - Follow the decision tree
- Use the tools - swaig-test is your friend
- Log thoughtfully - Context and correlation
- Handle errors gracefully - Fallbacks and retries
- Monitor continuously - Catch issues before users do
Preparation for Lab 3.6
- Identify a problematic agent or function
- Gather error logs
- Note reproduction steps
Lab Preview
In Lab 3.6, you will:
- Debug a broken agent
- Fix common issues
- Implement error handling
- Add resilience patterns