Voice agents

@cloudflare/voice gives Cloudflare's Agent SDK a streaming voice transport — WebSocket frames carry STT input and TTS output, with Workers AI doing the heavy lifting. ayjnt detects withVoice(), provisions the AI binding, and emits a typed useVoiceAgent hook the client can call without thinking about transport URLs.

The mixin

A voice agent extends a class produced by withVoice(Agent):

agents/chat/agent.ts ts

import { Agent } from "agents";
import { withVoice } from "@cloudflare/voice";
import { WorkersAIFluxSTT, WorkersAITTS } from "@cloudflare/voice/providers";
import type { GeneratedEnv } from "@ayjnt/env";

type State = { transcript: Array<{ role: "user" | "assistant"; text: string }> };

export default class ChatAgent extends withVoice(Agent)<GeneratedEnv, State> {
  override initialState: State = { transcript: [] };

  voice = {
    stt: new WorkersAIFluxSTT({ binding: this.env.AI }),
    tts: new WorkersAITTS({ binding: this.env.AI }),
  };

  // onTranscript fires once a full utterance is recognized.
  async onTranscript(text: string): Promise<string> {
    this.setState({
      transcript: [...this.state.transcript, { role: "user", text }],
    });

    const reply = await this.respondTo(text);

    this.setState({
      transcript: [...this.state.transcript, { role: "assistant", text: reply }],
    });

    return reply; // ← TTS'd back to the client.
  }

  private async respondTo(text: string): Promise<string> {
    // call your LLM here
    return `I heard you say: ${text}`;
  }
}

What ayjnt wires up

Detecting withVoice( in the agent's source file flips a voice feature flag and:

Adds an ai binding to wrangler.jsonc (idempotent — it's already there if you also have browser tools).
Adds AI: Ai to GeneratedEnv.
Generates a typed useVoiceAgent hook in @ayjnt/<agentId> that points at the right route path and uses ayjnt's URL shape instead of the SDK's default.

The generated hook

Voice agents get a different client hook than normal agents. Where a regular agent would generate useAgent from agents/react, a voice agent gets useVoiceAgent from ayjnt/voice/client:

.ayjnt/dist/@ayjnt/chat/index.ts (generated) ts

import { useAyjntVoiceAgent } from "ayjnt/voice/client";

export function useVoiceAgent(opts?: {
  name?: string;
  host?: string;
  enabled?: boolean;
  onReconnect?: () => void;
}) {
  return useAyjntVoiceAgent({
    agent: "chat",
    routePath: "/chat",  // ← ayjnt's URL shape
    ...opts,
  });
}

Why the wrapper: the upstream useVoiceAgent from @cloudflare/voice/react hardcodes a WebSocketVoiceTransport backed by PartySocket, which connects to /agents/<kebab>/<name>. ayjnt's URL shape is /<route>/<instance>; the wrapper supplies a raw-WebSocket transport that connects to the right URL.

Using the hook in your UI

agents/chat/app.tsx tsx

import { useVoiceAgent } from "@ayjnt/chat";

export default function VoiceUI() {
  const voice = useVoiceAgent({ name: "demo" });
  // voice.connected, voice.listening, voice.speaking, voice.error
  // voice.start(), voice.stop()

  return (
    <div>
      <button onClick={voice.connected ? voice.stop : voice.start}>
        {voice.connected ? "Stop" : "Talk"}
      </button>
      <p>{voice.listening ? "Listening…" : voice.speaking ? "Speaking…" : "Idle"}</p>
    </div>
  );
}

STT and TTS providers

The voice mixin doesn't pick STT or TTS for you — it expects the agent to expose a voice = { stt, tts } instance field. The most useful providers from @cloudflare/voice/providers:

WorkersAIFluxSTT — streaming STT against Workers AI's Whisper model.
WorkersAITTS — Workers AI's TTS endpoint. Streams audio frames back over the WebSocket.

You can also implement your own STT/TTS interfaces if you want to point at OpenAI, Deepgram, ElevenLabs, etc.

onTranscript lifecycle

onTranscript(text) is the central hook. It fires once the STT engine flushes a complete utterance (silence detection + finalization). Whatever string you return is sent to the TTS engine and streamed back to the client as audio.

Return undefined to skip the TTS pass — useful when you want to send a state update without speaking.

Detection rules

// ✓ detected
export default class ChatAgent extends withVoice(Agent)<…> {}

// ✓ detected
const VoiceAgent = withVoice(Agent);
export default class ChatAgent extends VoiceAgent<…> {}

// ✗ NOT detected — aliased import
import { withVoice as wv } from "@cloudflare/voice";
export default class ChatAgent extends wv(Agent)<…> {}

The detection is source-level regex matching withVoice( in the file.

Why a custom transport

The URL shape difference

Upstream WebSocketVoiceTransport uses PartySocket with prefix: "agents", which hardcodes a URL shape of /agents/<kebab>/<name>. ayjnt's URL shape is /<route>/<instance> — flatter, no "agents" segment. The generated hook supplies an AyjntVoiceTransport that opens a raw WebSocket to the right URL while keeping the upstream message protocol intact.

Reference

Cloudflare's Voice Agents docs
examples/voice-agent — ChatAgent + UI from this guide.
src/runtime/voiceClient.tsx — AyjntVoiceTransport + the wrapper hook.