Intelligent Call Orchestration — AWS Connect & Chime SMA
An end-to-end voice AI integration that seamlessly bridges an AI agent to a live agent through AWS infrastructure. A caller dials into AWS Connect, is routed to the AI agent via Chime SMA, and when transfer is needed, the AI agent hands the call back to a live agent queue — all within the same PSTN call.
AWS Connect flow plays "Transfer you to Willow" then forwards the call to +13124420260 (Chime SMA). Lambda processes the SIP event, logs it in DynamoDB, and bridges the call to the LiveKit SIP server where the AI agent answers.
When the AI agent determines a transfer is needed, it triggers a call transfer via the LiveKit SIP API. Lambda receives the event, uses CallAndBridge to connect to the AWS Connect live agent queue. The caller hears "Hello, This is Prince William County 311 Team. How can I help you today?"
Measured from the integration test environment. These are the target baselines for production deployment.
The Lambda function is the brain of the call flow. It processes every Chime SMA event and decides what action to take.
DDB serves two critical roles:
TransferInProgress, PendingTransferToPstn fields tell Lambda where each call is in the flowLastEventType, LastEventTime, TransactionIdGetCurrentUserData API for agent status in CCP. If no agents Active → AI informs caller of wait time or offers callback.Enter your phone number to receive a verification code via SMS. You must verify before connecting to the AI agent.
| Metric | Measured | Value |
|---|
Quick-reference glossary for every abbreviation used across this application.
| SIP | Session Initiation Protocol — signaling protocol for establishing, modifying, and terminating voice/video calls |
| PSTN | Public Switched Telephone Network — the global circuit-switched phone network |
| RTP | Real-time Transport Protocol — carries audio/video media payloads over UDP |
| SRTP | Secure RTP — encrypted version of RTP used in WebRTC and secure VoIP |
| DTMF | Dual-Tone Multi-Frequency — touch-tone signals (0-9, *, #) sent in-band or via SIP INFO/RTP events |
| IVR | Interactive Voice Response — automated phone menu system ("Press 1 for…") |
| DID | Direct Inward Dialing — a phone number that routes directly to a specific destination without a switchboard |
| LEG-1 | First call segment: Caller → AWS Connect → Chime SMA → LiveKit SIP → AI Agent |
| LEG-2 | Second call segment: AI Agent → Chime SMA → AWS Connect → Live Agent (silent transfer) |
| CCP | Contact Control Panel — AWS Connect agent desktop for handling calls |
| SMA | SIP Media Application — AWS Chime SDK component that invokes Lambda on each call event |
| SDK | Software Development Kit — AWS Chime SDK provides programmable voice/video APIs |
| VC | Voice Connector — AWS Chime component that bridges SIP trunks to/from external networks |
| DDB | DynamoDB — AWS NoSQL database used for call-state tracking and event journaling |
| SNS | Simple Notification Service — AWS pub/sub messaging for alerts (email, SMS, PagerDuty) |
| CMA | CloudWatch Metric Alarms — threshold-based alerts on AWS metrics (e.g. active call count) |
| PK | Partition Key — the primary key used to distribute and look up items in DynamoDB |
| TTFA | Time To First Audio — elapsed time from SIP call answered to first TTS audio heard by caller |
| CTD | Call Transfer Delay — elapsed time from agent transfer request to LEG-2 connected |
| E2E | End-to-End Duration — total call time measured on the phone system (SIP answered → call ended) |
| WebRTC | Web Real-Time Communication — browser API for peer-to-peer audio/video/data |
| ICE | Interactive Connectivity Establishment — protocol that finds the best network path between peers |
| STUN | Session Traversal Utilities for NAT — discovers a host's public IP for NAT traversal |
| TURN | Traversal Using Relays around NAT — relay server for when direct peer connection fails |
| NAT | Network Address Translation — maps private IPs to public IPs at the router |
| SFU | Selective Forwarding Unit — LiveKit's media server that routes audio/video tracks between participants |
| TTS | Text-to-Speech — converts agent text responses into spoken audio (Deepgram) |
| STT | Speech-to-Text — transcribes caller speech into text for the LLM |
| LLM | Large Language Model — the AI model that generates agent responses (e.g. GPT-4o) |
| VAD | Voice Activity Detection — detects when a speaker starts/stops talking for turn-taking |
| UAT | User Acceptance Testing — pre-production environment for validation (current call limit: 10) |
| API | Application Programming Interface — programmatic endpoints for service interaction |
| JWT | JSON Web Token — signed token used for LiveKit room authentication |
| OTEL | OpenTelemetry — observability framework for distributed traces, metrics, and logs |
| CI/CD | Continuous Integration / Continuous Deployment — automated build, test, and deploy pipeline |