Skip to main content

Stream to OpenAI Realtime API agent with cXML

Put OpenAI Speech-to-Speech models on the phone with cXML <Stream>

In this guide, we will build a Node.js application that serves a cXML Script that initiates a two-way (bidirectional) <Stream> to the OpenAI Realtime API. When a caller initiates a SIP or PSTN

call to the assigned phone number, the SignalWire platform requests and runs the script.

Wondering why this guide uses cXML to stream to OpenAI, instead of using the native SWML AI integration? Since OpenAI's Realtime API is built for Speech-to-Speech (or "Voice-to-Voice") models, the SignalWire platform must stream audio directly to and from OpenAI instead of handling the STT, TTS, and LLM aspects with our integrated toolchain. This guide showcases the flexibility of the SignalWire platform to integrate with emerging unified audio models.

Prerequisites

Before you begin, ensure you have:

  • SignalWire Space - Sign up free
  • OpenAI API Key - Get access (requires paid account)
  • Node.js 20+ - For running the TypeScript server (Install Node)
  • ngrok or other tunneling service - For local development tunneling (Install ngrok)
  • Docker (optional) - For containerized deployment

Quickstart

Clone and install

Clone the SignalWire Solutions repository, navigate to this example, and install.

git clone https://github.com/signalwire/solutions-architecture
cd code/cxml-realtime-agent-stream
npm install

Add OpenAI credentials

Select Local or Docker

When running the server on your local machine, store your credentials in a .env file.

cp .env.example .env

Edit .env and add your OpenAI API key:

.env
OPENAI_API_KEY=sk-your-actual-api-key-here

Run application

npm run build
npm start

Your AI assistant webhook is now running at http://localhost:5050/incoming-call.

Health check

Make sure your server is running and the health check passes:

curl http://localhost:5050/health
# Should return: {"status":"healthy"}

Create a cXML Script

Next, we need to tell SignalWire to request cXML from your server when a call comes in.

  • Navigate to My Resources in your Dashboard.
  • Click Create Resource, select Script as the resource type, and choose cXML.
  • Under Handle Using, select External Url.
  • Set the Primary Script URL to your server's webhook endpoint.

Select the Local tab below if you ran the application locally, and the Docker tab if you're running it with Docker.

Use ngrok to expose port 5050 on your development machine:

ngrok http 5050

Append /incoming-call to the HTTPS URL returned by ngrok. https://abc123.ngrok.io/incoming-call

set routes

For this example, you must include /incoming-call at the end of your URL. This is the specific webhook endpoint that our application uses to handle incoming calls.

  • Give the cXML Script a descriptive name, such as "AI Voice Assistant".
  • Save your new Resource.

Assign SIP address or phone number

To test your AI assistant, create a SIP address or phone number and assign it as a handler for your cXML Script Resource.

  • From the My Resources tab, select your cXML Script
  • Open the Addresses & Phone Numbers tab
  • Click Add, and select either SIP Address or Phone Number
  • Fill out any required details, and save the configuration

Test application

Dial the SIP address or phone number assigned to your cXML Script. You should now be speaking to your newly created agent!


How it works

First, your server needs to handle incoming call webhooks from SignalWire.

Set up the HTTP endpoint

import Fastify from 'fastify';

const app = Fastify();

app.post('/incoming-call', async (req, res) => {
const host = req.headers.host;
const wsUrl = `wss://${host}/media-stream`;

// Return cXML instructions to stream audio
const cxml = `<?xml version="1.0" encoding="UTF-8"?>
<Response>
<Stream url="${wsUrl}" />
</Response>`;

res.type('text/xml').send(cxml);
});

app.listen({ port: 5050, host: '0.0.0.0' });
Webhook URL Format

Your webhook URL must include /incoming-call at the end:

  • Local: https://your-ngrok-url.ngrok.io/incoming-call
  • Production: https://your-domain.com/incoming-call

Next, we will create a WebSocket server to handle bidirectional audio streaming.

Initialize WebSocket Server

import websocket from '@fastify/websocket';
import { SignalWireRealtimeTransportLayer } from '../transports/SignalWireRealtimeTransportLayer.js';
import { RealtimeSession, RealtimeAgent } from '@openai/agents/realtime';
import { AGENT_CONFIG } from '../config.js';

interface SignalWireMessage {
event: 'start' | 'media' | 'stop' | 'mark';
media?: {
payload: string; // Base64 encoded audio
track?: 'inbound' | 'outbound';
};
start?: {
streamSid: string;
callSid: string;
mediaFormat?: {
encoding: string;
sampleRate: number;
channels: number;
};
};
}

app.register(websocket);

app.get('/media-stream', { websocket: true }, async (connection) => {
console.log('📞 Client connected to WebSocket');

try {
// Create SignalWire transport layer with configured audio format
const signalWireTransportLayer = new SignalWireRealtimeTransportLayer({
signalWireWebSocket: connection,
audioFormat: AGENT_CONFIG.audioFormat
});

// Create AI agent and session
const realtimeAgent = new RealtimeAgent(agentConfig);
const session = new RealtimeSession(realtimeAgent, {
transport: signalWireTransportLayer,
model: 'gpt-4o-realtime-preview'
});

// Connect to OpenAI Realtime API
await session.connect({
apiKey: process.env.OPENAI_API_KEY
});

// Handle session events
session.on('agent_tool_start', (context, agent, tool, details) => {
console.log('🔧 Tool call started:', details);
});

} catch (error) {
console.error('❌ Transport initialization failed:', error);
}
});

The SignalWireRealtimeTransportLayer is the critical component that bridges SignalWire's WebSocket protocol with OpenAI's Realtime API:

// Key features of the transport layer:
const transport = new SignalWireRealtimeTransportLayer({
signalWireWebSocket: connection,
audioFormat: 'g711_ulaw' // or 'pcm16'
});

// Automatic handling of:
// 1. Audio format conversion
// 2. Base64 encoding/decoding
// 3. Interruption detection
// 4. Mark event tracking
// 5. Session cleanup

Session Lifecycle:

  1. WebSocket connection → SignalWire connects to /media-stream
  2. Transport creation → Bridge between SignalWire and OpenAI
  3. AI session start → RealtimeSession connects to OpenAI
  4. Audio streaming → Bidirectional real-time audio
  5. Tool execution → Function calls processed server-side
  6. Session cleanup → Graceful disconnect and resource cleanup

SignalWire sends several types of messages through the WebSocket:

EventPurposeKey data
startConnection initializedstreamSid, callSid, mediaFormat
mediaAudio data packet (~20ms)Base64 encoded payload, track
markAudio playback confirmationname (for timing)
stopStream endingNone

Key features

  • Automatic audio format conversion between SignalWire and OpenAI
  • Interruption handling using clear events and mark tracking
  • Base64 encoding/decoding for audio data
  • Session lifecycle management with proper cleanup
  • Error recovery and reconnection handling

Audio Format Support:

  • Input: G.711 μ-law (8kHz) or PCM16 (24kHz) from SignalWire
  • Output: Matches input format automatically
  • OpenAI Integration: Handles format negotiation transparently

Connect your WebSocket bridge to OpenAI's Realtime API for AI processing.

Create the AI Session

import { RealtimeAgent, RealtimeSession } from '@openai/agents/realtime';
import type { RealtimeAgentConfiguration } from '@openai/agents/realtime';
import { SignalWireRealtimeTransportLayer } from '../transports/SignalWireRealtimeTransportLayer.js';
import { allTools } from '../tools/index.js';

// Configure the AI agent
const agentConfig: RealtimeAgentConfiguration = {
name: 'SignalWire Voice Assistant',
instructions: `You are a helpful and friendly voice assistant.
Always start every conversation by greeting the caller first.
You can help with weather information, time queries, and general conversation.
Be concise and friendly in your responses.`,
tools: allTools, // Weather, time, and other tools
voice: 'alloy'
};

async function createAISession(signalWireWebSocket: WebSocket): Promise<RealtimeSession> {
// Create transport layer that bridges SignalWire and OpenAI
const transport = new SignalWireRealtimeTransportLayer({
signalWireWebSocket,
audioFormat: 'g711_ulaw' // or 'pcm16' for HD audio
});

// Create agent and session
const agent = new RealtimeAgent(agentConfig);
const session = new RealtimeSession(agent, {
transport,
model: 'gpt-4o-realtime-preview'
});

// Connect to OpenAI
await session.connect({
apiKey: process.env.OPENAI_API_KEY
});

return session;
}

Send Audio Back to Caller

// Audio is automatically handled by SignalWireRealtimeTransportLayer
// The transport layer manages:
// 1. Audio format conversion (g711_ulaw ↔ pcm16)
// 2. Base64 encoding/decoding
// 3. Chunk timing and interruption handling
// 4. Mark events for tracking audio playback

// Example of session event handling:
session.on('agent_tool_start', (context, agent, tool, details) => {
console.log('🔧 Tool call started:', details);
});

session.on('agent_tool_end', (context, agent, tool, result, details) => {
console.log('✅ Tool call completed:', details);
});

session.on('error', (error) => {
console.error('❌ Session error:', error);
});

Environment Configuration

Set up your environment variables for different deployment scenarios:

Create a .env file in your project root:

# Required
OPENAI_API_KEY=sk-your-actual-api-key-here

# Optional
PORT=5050
AUDIO_FORMAT=g711_ulaw # or 'pcm16' for HD audio
Audio Format Options

Choose the right audio format for your use case:

  • g711_ulaw (8kHz): Standard telephony quality (default)
  • pcm16 (24kHz): High definition audio for demos

Enable your AI to execute server-side tools during conversations.

Define Tools

import { tool as realtimeTool } from '@openai/agents/realtime';
import { z } from 'zod';

// Weather tool using real US National Weather Service API
const weatherTool = realtimeTool({
name: 'get_weather',
description: 'Get current weather information for any US city',
parameters: z.object({
location: z.string().describe('The US city or location to get weather for (include state if needed for clarity)')
}),
execute: async ({ location }) => {
try {
// Step 1: Geocoding - Convert city name to coordinates
const geocodeUrl = `https://nominatim.openstreetmap.org/search?format=json&q=${encodeURIComponent(location)}&countrycodes=us&limit=1`;
const geocodeResponse = await fetch(geocodeUrl, {
headers: {
'User-Agent': 'SignalWire-OpenAI-Voice-Assistant/1.0.0'
}
});

if (!geocodeResponse.ok) {
return 'Sorry, weather information is currently unavailable.';
}

const geocodeData = await geocodeResponse.json();
if (!geocodeData || geocodeData.length === 0) {
return `Sorry, I couldn't find the location "${location}". Please try a different city name.`;
}

const lat = parseFloat(geocodeData[0].lat);
const lon = parseFloat(geocodeData[0].lon);

// Step 2: Get weather from weather.gov
const pointsUrl = `https://api.weather.gov/points/${lat},${lon}`;
const pointsResponse = await fetch(pointsUrl);
const pointsData = await pointsResponse.json();

const forecastUrl = pointsData.properties?.forecast;
if (!forecastUrl) {
return 'Sorry, weather information is currently unavailable.';
}

const forecastResponse = await fetch(forecastUrl);
const forecastData = await forecastResponse.json();

const currentPeriod = forecastData.properties?.periods?.[0];
if (!currentPeriod) {
return 'Sorry, weather information is currently unavailable.';
}

// Format response for voice
const cityName = geocodeData[0].display_name.split(',')[0];
return `In ${cityName}, it's currently ${currentPeriod.detailedForecast.toLowerCase()}`;

} catch (error) {
return 'Sorry, weather information is currently unavailable.';
}
}
});

// Time tool example (no external API required)
const timeTool = realtimeTool({
name: 'get_time',
description: 'Get the current time in Eastern Time',
parameters: z.object({}), // No parameters needed
execute: async () => {
try {
const now = new Date();
const easternTime = now.toLocaleString('en-US', {
timeZone: 'America/New_York',
timeZoneName: 'short',
weekday: 'long',
year: 'numeric',
month: 'long',
day: 'numeric',
hour: 'numeric',
minute: '2-digit'
});
return `The current time in Eastern Time is ${easternTime}.`;
} catch (error) {
return 'Sorry, time information is currently unavailable.';
}
}
});

// Export all tools
export const allTools = [weatherTool, timeTool];

// Add to your AI agent configuration
const agentConfig = {
name: 'SignalWire Voice Assistant',
instructions: `You are a helpful and friendly voice assistant.
Always start every conversation by greeting the caller first.
You can help with weather information, time queries, and general conversation.
Be concise and friendly in your responses.`,
tools: allTools,
voice: 'alloy'
};
  1. User asks: "What's the weather in New York?"
  2. AI recognizes intent: Needs weather information
  3. Function call triggered: get_weather({ location: "New York" })
  4. Server executes: Fetches from weather API
  5. Result returned: AI incorporates into response
  6. User hears: "The weather in New York is 72°F and sunny."

All of this happens in real-time during the conversation.


Technical Deep Dive


Audio Processing

Audio Processing Pipeline

Audio Flow Details:

  • Inbound: Phone → SignalWire → Base64 → Transport → ArrayBuffer → OpenAI
  • Outbound: OpenAI → ArrayBuffer → Transport → Base64 → SignalWire → Phone
  • Latency: Typically 150-300ms end-to-end
  • Quality: Depends on codec choice (G.711 vs PCM16)

Codec Selection Guide

Choose the right audio codec for your use case:

PCM16 @ 24kHz

Crystal clear audio for demos and high-quality applications

G.711 μ-law @ 8kHz

Standard telephony quality, lower bandwidth usage

Configure Audio Format

<!-- High quality audio -->
<Stream url="wss://your-server.com/media-stream" codec="L16@24000h" />

<!-- Standard telephony -->
<Stream url="wss://your-server.com/media-stream" />

Advanced Configuration

The transport layer automatically handles interruptions:

// When user interrupts AI speech:
// 1. Transport detects voice activity
// 2. Sends 'clear' event to SignalWire
// 3. Truncates OpenAI audio at last played position
// 4. Resumes with new user input

session.on('interruption', (event) => {
console.log('🛑 User interrupted AI speech');
});
Performance Optimization

For production deployments:

  • Use G.711 μ-law for standard phone calls (lower latency)
  • Use PCM16 for high-fidelity demos (better quality)
  • Monitor WebSocket connection stability
  • Implement connection pooling for high traffic
  • Track audio latency metrics

Deployment

Local development

  1. Install dependencies

    npm install
  2. Set up environment

    cp .env.example .env
    # Edit .env with your OpenAI API key
  3. Start your server

    npm run build
    npm start

    # Or for development with hot reload:
    npm run dev
  4. Expose with ngrok

    npx ngrok http 5050
    # Note the HTTPS URL (e.g., https://abc123.ngrok.io)
  5. Configure SignalWire webhook

    • Use the ngrok HTTPS URL + /incoming-call
    • Example: https://abc123.ngrok.io/incoming-call
  6. Test your setup

    # Check health endpoint
    curl https://abc123.ngrok.io/health

    # Should return: {"status":"healthy","timestamp":"..."}

Production with Docker

FROM node:20-alpine

# Install system dependencies
RUN apk add --no-cache dumb-init

WORKDIR /app

# Copy package files
COPY package*.json ./

# Install dependencies
RUN npm ci --only=production && npm cache clean --force

# Copy source code
COPY . .

# Build TypeScript
RUN npm run build

# Create non-root user
RUN addgroup -g 1001 -S nodejs && \
adduser -S nodeuser -u 1001

# Change ownership and switch to non-root user
RUN chown -R nodeuser:nodejs /app
USER nodeuser

EXPOSE 5050

# Use dumb-init for proper signal handling
ENTRYPOINT ["dumb-init", "--"]
CMD ["node", "dist/index.js"]

Security & Secrets:

  • Use Docker secrets or external secret management (AWS Secrets Manager, Azure Key Vault)
  • Never commit API keys to version control
  • Use non-root user in Docker containers
  • Implement proper CORS and rate limiting

Monitoring & Observability:

  • Set up health checks (/health endpoint included)
  • Implement structured logging with correlation IDs
  • Monitor WebSocket connection metrics
  • Track audio latency and quality metrics
  • Set up alerting for failed calls

Scalability & Performance:

  • Use horizontal scaling with session affinity
  • Implement connection pooling for high traffic
  • Consider using Redis for session state if needed
  • Monitor memory usage (audio buffers can accumulate)

Error Handling:

  • Graceful degradation when OpenAI API is unavailable
  • Retry logic with exponential backoff
  • Proper WebSocket reconnection handling
  • Fallback responses when tools fail

Development Workflow:

# Local development with hot reload
npm run dev

# Type checking
npm run typecheck

# Production build
npm run build && npm start

# Debug logging
DEBUG=openai-agents:* npm run dev

Console Output to Look For:

📡 Server running on http://0.0.0.0:5050
🏥 Health check: http://0.0.0.0:5050/health
🔊 Audio format: g711_ulaw (8kHz telephony)
🎙️ Voice: alloy

# When calls come in:
📞 Incoming call - Audio format: g711_ulaw, SignalWire codec: default
📱 Client connected to WebSocket
🔧 Tool call started: get_weather
✅ Tool call completed: get_weather

Complete example

See the GitHub repo for a complete working example, including weather and time function usage, error handling, and a production Docker setup.

SignalWire + OpenAI Realtime

Production-ready implementation with all features


Resources