Voice AI infrastructure
for developers
Build voice agents without managing WebSockets, VAD, or latency. We handle the infrastructure, you handle the conversation.
Build voice agents without managing WebSockets, VAD, or latency. We handle the infrastructure, you handle the conversation.
"Voice AI has unique infrastructure demands that traditional cloud architectures aren't built for. By leveraging Cloudflare, Layercode delivers the most performant and low-latency voice AI infrastructure that scales."
"Layercode makes it very easy to build and prototype low-latency voice features for our text-based agents built with NextJS and React."
You built a working prototype in a weekend. But when real users start talking to your agent, everything breaks...
The gap between "cool demo" and production-ready voice AI can be months of work wrangling WebSocket connections, voice activity detection tuning, global deployment, session recording, observability tooling, etc.
Layercode closes that gap.
Vapi and Retell's visual builders work great for simple use cases and rapid prototyping. But when your logic gets complex, you start fighting their platforms instead of building your product. Layercode gives you a webhook: Write TypeScript. Ship.
LiveKit and Pipecat offer open-source frameworks with complete architectural control. But most teams can't afford to spend months on WebRTC, TURN servers, and audio pipeline debugging. Layercode handles the infrastructure. You handle the intelligence.
OpenAI's Realtime API is impressive technology and works well for simple demos. But it's a black box: you can't swap models mid-conversation, control prompts dynamically, or use your own fine-tuned LLM. Layercode calls YOUR backend. You control everything.
Layercode's Node.js SDK integrates with the tools you already use. Here's a complete voice agent backend using the Vercel AI SDK:
import { createOpenAI } from "@ai-sdk/openai";
import { streamText } from "ai";
import { streamResponse } from "@layercode/node-server-sdk";
const openai = createOpenAI({ apiKey: process.env.OPENAI_API_KEY! });
export const POST = async (request: Request) => {
const body = await request.json();
return streamResponse(body, async ({ stream }) => {
if (body.type === "message") {
const { textStream } = streamText({
model: openai("gpt-4o-mini"),
system: "You are a helpful voice assistant.",
messages: [{ role: "user", content: body.text }],
onFinish: () => stream.end(),
});
await stream.ttsTextStream(textStream);
}
});
};Layercode handles real-time audio streaming. You handle the conversation.
Your user talks into their browser, phone, or mobile app. Layercode captures the audio stream at the nearest edge location and runs speech-to-text in real-time.
We send transcribed text to your webhook. You process it with any LLM: OpenAI, Claude, Gemini, etc. Stream your response back via our SDK.
Layercode converts your text to speech and streams audio back to the user. The entire round-trip happens in under a second.
You receive text, you send text. No audio processing, WebSocket management or VAD tuning.
OpenAI, Claude, Gemini, Llama, Mistral, etc. Use whatever model fits your use case.
Vercel, AWS, Railway, your own servers. Layercode connects to it via webhook.
Avoid vendor lock-in: Switch between Deepgram, ElevenLabs, Cartesia, Rime and Inworld with a single config change. Test different models, optimize for cost or quality.
Replay any conversation. Inspect latency breakdowns, and view transcripts to debug production issues.
Every call is recorded automatically. Download audio files, export transcripts, build training datasets. All stored securely.
Pay only for active conversation time. Silence is always free. No minimum commitments.
Connect users via browser, iOS, Android, or phone. Same backend, same pipeline, multiple channels.
One invoice for speech-to-text, text-to-speech, and infrastructure.
Other voice AI platforms run on centralized cloud infrastructure. When your user is in Tokyo and your servers are in Virginia, latency kills the conversation. Pauses feel unnatural. Users talk over the agent. The experience can often fall apart.
Layercode is powered by Cloudflare's global edge network. We process audio at the location nearest to your user, not in a distant data center.
Users connect to the nearest edge location. Speech-to-text, voice activity detection, and audio streaming happen locally in milliseconds VS hundreds of milliseconds.
No capacity planning. No provisioning. Every conversation runs in its own isolated environment that scales automatically with demand.
Platform traffic spikes don't affect your users. Each session runs in complete isolation with dedicated resources.
Deploy once, serve users everywhere. No multi-region setup, no latency-based routing rules, no infrastructure headaches.
Layercode is built for production workloads with enterprise security requirements. Your data is encrypted in transit and at rest. Session recordings are stored securely in SOC 2 compliant infrastructure.
Per-second billing for active conversation time. Silence is free. STT, TTS, and infrastructure costs consolidated into one simple rate. Start with $100 in free credits, no credit card required.
View pricing detailsFrom zero to production-ready in minutes, not days. $100 in free credits to get started.