Stream to OpenAI Realtime API agent with cXML
<Stream>
In this guide, we will build a Node.js application that serves a
cXML Script
that initiates a two-way (bidirectional)
<Stream>
to the OpenAI Realtime API.
When a caller initiates a SIP or
PSTN
Wondering why this guide uses cXML to stream to OpenAI, instead of using the native SWML AI integration? Since OpenAI's Realtime API is built for Speech-to-Speech (or "Voice-to-Voice") models, the SignalWire platform must stream audio directly to and from OpenAI instead of handling the STT, TTS, and LLM aspects with our integrated toolchain. This guide showcases the flexibility of the SignalWire platform to integrate with emerging unified audio models.
Prerequisites
Before you begin, ensure you have:
- SignalWire Space - Sign up free
- OpenAI API Key - Get access (requires paid account)
- Node.js 20+ - For running the TypeScript server (Install Node)
- ngrok or other tunneling service - For local development tunneling (Install ngrok)
- Docker (optional) - For containerized deployment
Quickstart
Clone and install
Clone the SignalWire Solutions repository, navigate to this example, and install.
git clone https://github.com/signalwire/solutions-architecture
cd code/cxml-realtime-agent-stream
npm install
Add OpenAI credentials
Select Local or Docker
- Local
- Docker
When running the server on your local machine, store your credentials in a .env
file.
cp .env.example .env
Edit .env
and add your OpenAI API key:
OPENAI_API_KEY=sk-your-actual-api-key-here
When running the server in production with the Docker container, store your credentials in a secrets
folder.
mkdir secrets
echo "sk-your-actual-api-key-here" > secrets/openai_api_key.txt
Run application
- Local
- Docker
npm run build
npm start
docker-compose up --build signalwire-assistant
Your AI assistant webhook is now running at http://localhost:5050/incoming-call
.
Make sure your server is running and the health check passes:
curl http://localhost:5050/health
# Should return: {"status":"healthy"}
Create a cXML Script
Next, we need to tell SignalWire to request cXML from your server when a call comes in.
- Navigate to My Resources in your Dashboard.
- Click Create Resource, select Script as the resource type, and choose
cXML
. - Under
Handle Using
, selectExternal Url
. - Set the
Primary Script URL
to your server's webhook endpoint.
Select the Local tab below if you ran the application locally, and the Docker tab if you're running it with Docker.
- Local
- Docker
Use ngrok to expose port 5050 on your development machine:
ngrok http 5050
Append /incoming-call
to the HTTPS URL returned by ngrok.
https://abc123.ngrok.io/incoming-call
For production environments, set your server URL + /incoming-call
:
https://your-domain.com/incoming-call
For this example, you must include /incoming-call
at the end of your URL. This is the specific webhook endpoint that our application uses to handle incoming calls.
- Give the cXML Script a descriptive name, such as "AI Voice Assistant".
- Save your new Resource.
Assign SIP address or phone number
To test your AI assistant, create a SIP address or phone number and assign it as a handler for your cXML Script Resource.
- From the My Resources tab, select your cXML Script
- Open the Addresses & Phone Numbers tab
- Click Add, and select either SIP Address or Phone Number
- Fill out any required details, and save the configuration
Test application
Dial the SIP address or phone number assigned to your cXML Script. You should now be speaking to your newly created agent!
How it works
Technical Deep Dive
Audio Processing
Audio Processing Pipeline
Audio Flow Details:
- Inbound: Phone → SignalWire → Base64 → Transport → ArrayBuffer → OpenAI
- Outbound: OpenAI → ArrayBuffer → Transport → Base64 → SignalWire → Phone
- Latency: Typically 150-300ms end-to-end
- Quality: Depends on codec choice (G.711 vs PCM16)
Codec Selection Guide
Choose the right audio codec for your use case:
PCM16 @ 24kHz
Crystal clear audio for demos and high-quality applications
G.711 μ-law @ 8kHz
Standard telephony quality, lower bandwidth usage
Configure Audio Format
- SignalWire cXML
- Environment Variable
<!-- High quality audio -->
<Stream url="wss://your-server.com/media-stream" codec="L16@24000h" />
<!-- Standard telephony -->
<Stream url="wss://your-server.com/media-stream" />
# In your .env file
AUDIO_FORMAT=pcm16 # or g711_ulaw
Advanced Configuration
- Interruption Handling
- Audio Timing
- Error Recovery
The transport layer automatically handles interruptions:
// When user interrupts AI speech:
// 1. Transport detects voice activity
// 2. Sends 'clear' event to SignalWire
// 3. Truncates OpenAI audio at last played position
// 4. Resumes with new user input
session.on('interruption', (event) => {
console.log('🛑 User interrupted AI speech');
});
Mark events track audio playback timing:
// Transport sends mark events for each audio chunk
{
"event": "mark",
"mark": { "name": "item123:45" }, // itemId:chunkNumber
"streamSid": "..."
}
// Used for precise interruption timing
Built-in error handling and recovery:
session.on('error', (error) => {
console.error('Session error:', error);
// Transport automatically attempts reconnection
});
transport.on('*', (event) => {
if (event.type === 'transport_error') {
// Handle transport-specific errors
console.error('Transport error:', event.error);
}
});
For production deployments:
- Use G.711 μ-law for standard phone calls (lower latency)
- Use PCM16 for high-fidelity demos (better quality)
- Monitor WebSocket connection stability
- Implement connection pooling for high traffic
- Track audio latency metrics
Deployment
Local development
-
Install dependencies
npm install
-
Set up environment
cp .env.example .env
# Edit .env with your OpenAI API key -
Start your server
npm run build
npm start
# Or for development with hot reload:
npm run dev -
Expose with ngrok
npx ngrok http 5050
# Note the HTTPS URL (e.g., https://abc123.ngrok.io) -
Configure SignalWire webhook
- Use the ngrok HTTPS URL +
/incoming-call
- Example:
https://abc123.ngrok.io/incoming-call
- Use the ngrok HTTPS URL +
-
Test your setup
# Check health endpoint
curl https://abc123.ngrok.io/health
# Should return: {"status":"healthy","timestamp":"..."}
Production with Docker
- Dockerfile
- docker-compose.yml
FROM node:20-alpine
# Install system dependencies
RUN apk add --no-cache dumb-init
WORKDIR /app
# Copy package files
COPY package*.json ./
# Install dependencies
RUN npm ci --only=production && npm cache clean --force
# Copy source code
COPY . .
# Build TypeScript
RUN npm run build
# Create non-root user
RUN addgroup -g 1001 -S nodejs && \
adduser -S nodeuser -u 1001
# Change ownership and switch to non-root user
RUN chown -R nodeuser:nodejs /app
USER nodeuser
EXPOSE 5050
# Use dumb-init for proper signal handling
ENTRYPOINT ["dumb-init", "--"]
CMD ["node", "dist/index.js"]
services:
signalwire-assistant:
build: .
ports:
- "${PORT:-5050}:${PORT:-5050}"
environment:
- PORT=${PORT:-5050}
- AUDIO_FORMAT=pcm16
secrets:
- openai_api_key
restart: unless-stopped
healthcheck:
test: ["CMD", "wget", "--no-verbose", "--tries=1", "--spider", "http://localhost:5050/health"]
interval: 30s
timeout: 10s
retries: 3
start_period: 10s
logging:
driver: "json-file"
options:
max-size: "10m"
max-file: "3"
secrets:
openai_api_key:
file: ./secrets/openai_api_key.txt
Console Output to Look For:
📡 Server running on http://0.0.0.0:5050
🏥 Health check: http://0.0.0.0:5050/health
🔊 Audio format: g711_ulaw (8kHz telephony)
🎙️ Voice: alloy
# When calls come in:
📞 Incoming call - Audio format: g711_ulaw, SignalWire codec: default
📱 Client connected to WebSocket
🔧 Tool call started: get_weather
✅ Tool call completed: get_weather
Complete example
See the GitHub repo for a complete working example, including weather and time function usage, error handling, and a production Docker setup.
SignalWire + OpenAI Realtime
Production-ready implementation with all features