ragmulti-agentworkers-ai

Agentic RAG

A two-agent retrieval pipeline. QA agent decomposes the question, fans out retrievals to the Index agent over typed RPC, then composes a grounded answer. Workers AI for both embeddings and generation.

source on GitHub

What you'll learn

Typed cross-agent RPC with `getAgent<T>` for multi-agent pipelines
Cosine-similarity over Workers AI embeddings (bge-base-en)
Calling Workers AI through the HTTP API when bindings aren't available

01 step

Start from the blank scaffold

Every ayjnt example starts here. `bunx ayjnt new` drops a one-agent project with a single `alive` agent that responds "I'm alive" to any request — enough to prove the pipeline works before you replace it with the real thing.

~/my-agent-app

my-app/ (blank scaffold)

agent.ts

package.json

tsconfig.json

.gitignore

README.md

02 step

Two agents, two roles

Index holds an in-memory vector store per corpus (/index/policies, /index/recipes). QA orchestrates plan → retrieve → compose. They talk over typed RPC — `getAgent<IndexAgent>` returns a DO stub with method autocomplete.

~/my-agent-app

my-app/agents/

agent.ts←

shared.ts←Workers AI helper

03 step

agents/shared.ts — Workers AI over HTTP

ayjnt's wrangler.jsonc generator doesn't yet support custom bindings like `AI`, so we hit the Workers AI HTTP API directly. Two secrets: `CF_ACCOUNT_ID` and `CF_API_TOKEN` (needs "Workers AI: Read"). For deploy: `wrangler secret put` each.

agents/shared.ts ts

type AiEnv = { CF_ACCOUNT_ID?: string; CF_API_TOKEN?: string };

export async function runWorkersAi<T = unknown>(env: AiEnv, model: string, body: unknown): Promise<T> {
  const acct = env.CF_ACCOUNT_ID, token = env.CF_API_TOKEN;
  if (!acct || !token) throw new Error("CF_ACCOUNT_ID and CF_API_TOKEN must be set");

  const url = `https://api.cloudflare.com/client/v4/accounts/${acct}/ai/run/${model}`;
  const res = await fetch(url, {
    method: "POST",
    headers: { "content-type": "application/json", authorization: `Bearer ${token}` },
    body: JSON.stringify(body),
  });
  if (!res.ok) throw new Error(`Workers AI ${res.status}: ${await res.text()}`);
  const json = await res.json() as { success?: boolean; result?: T };
  if (!json.success || json.result === undefined) throw new Error("Workers AI returned non-success");
  return json.result;
}

04 step

agents/index/agent.ts — vector store on a DO

Each doc is { id, text, embedding } stored in state. `addDoc(text)` embeds via bge-base-en (768-dim); `search(query, k)` cosines all docs and returns the top k. Both are callable via typed RPC from the QA agent.

agents/index/agent.ts ts

import { Agent } from "agents";
import type { GeneratedEnv } from "@ayjnt/env";
import { runWorkersAi } from "../shared.ts";

type Doc = { id: string; text: string; embedding: number[] };
type State = { docs: Doc[] };

export default class IndexAgent extends Agent<GeneratedEnv, State> {
  override initialState: State = { docs: [] };

  async addDoc(text: string): Promise<{ id: string }> {
    const embedding = await this.embed(text.trim());
    const id = crypto.randomUUID();
    this.setState({ docs: [...this.state.docs, { id, text: text.trim(), embedding }] });
    return { id };
  }

  async search(query: string, k = 3): Promise<{ id: string; text: string; score: number }[]> {
    if (this.state.docs.length === 0) return [];
    const q = await this.embed(query);
    return this.state.docs
      .map((d) => ({ id: d.id, text: d.text, score: cosine(q, d.embedding) }))
      .sort((a, b) => b.score - a.score)
      .slice(0, k);
  }

  private async embed(text: string): Promise<number[]> {
    const r = await runWorkersAi<{ data: number[][] }>(this.env, "@cf/baai/bge-base-en-v1.5", { text: [text] });
    return r.data[0]!;
  }

  // onRequest / DELETE / cosine() elided for brevity — see examples/agentic-rag
  override async onRequest() { return Response.json({ instance: this.name, count: this.state.docs.length }); }
}
function cosine(a: number[], b: number[]) { /* … standard cosine … */ return 0; }

05 step

agents/qa/agent.ts — plan → retrieve → compose

Three steps in one method: (1) llama decomposes the question into 2-3 subqueries; (2) for each subquery, call Index.search via typed RPC; (3) llama composes a grounded answer from the union of evidence. The whole trace (plan + hits + answer) gets stored in state for replay.

agents/qa/agent.ts ts

import { Agent } from "agents";
import { getAgent } from "ayjnt/rpc";
import type { GeneratedEnv } from "@ayjnt/env";
import type IndexAgent from "../index/agent.ts";
import { runWorkersAi } from "../shared.ts";

export default class QAAgent extends Agent<GeneratedEnv, { history: any[]; pending: boolean }> {
  override initialState = { history: [], pending: false };

  override async onRequest(request: Request): Promise<Response> {
    if (request.method !== "POST") return Response.json({ instance: this.name, ...this.state });
    const { question } = (await request.json()) as { question: string };
    this.setState({ ...this.state, pending: true });

    const plan = await this.plan(question);
    const index = await getAgent<IndexAgent>(this.env.INDEX_AGENT, "main");
    const hits = await Promise.all(plan.map(async (sub) => ({ sub, docs: await index.search(sub, 3) })));
    const evidence = hits.flatMap((h) => h.docs.map((d) => d.text))
      .filter((t, i, a) => a.indexOf(t) === i).join("\n\n---\n\n");
    const answer = await this.compose(question, evidence);

    const qa = { id: crypto.randomUUID(), question, plan, hits, answer, at: Date.now() };
    this.setState({ history: [...this.state.history, qa], pending: false });
    return Response.json({ ok: true, qa });
  }

  private async plan(question: string): Promise<string[]> {
    const r = await runWorkersAi<{ response: string }>(this.env, "@cf/meta/llama-3.1-8b-instruct", {
      messages: [
        { role: "system", content: "Return ONLY a JSON array of 2-3 search queries." },
        { role: "user", content: question },
      ],
    });
    const m = r.response.match(/\[[^\]]*\]/);
    try { return m ? JSON.parse(m[0]) : [question]; } catch { return [question]; }
  }

  private async compose(question: string, evidence: string) {
    const r = await runWorkersAi<{ response: string }>(this.env, "@cf/meta/llama-3.1-8b-instruct", {
      messages: [
        { role: "system", content: "Answer using ONLY the context. Be concise. Say so if insufficient." },
        { role: "user", content: `CONTEXT:\n${evidence}\n\nQUESTION:\n${question}` },
      ],
    });
    return r.response.trim();
  }
}

06 step

Run + index + ask

POST to /index to embed documents, POST to /qa to ask. The response includes the plan and the top hits per subquery so you can see how the LLM decomposed the question.

~/my-agent-app

07 step

What it looks like

The trace is the interesting output. You can see what the planner decomposed the question into, which docs each subquery surfaced, and how the composer grounded the final answer.

pipeline trace result

  question:  What is ayjnt and how does it use Durable Objects?
  ────────────────────────────────────────────────────────────
  plan:                                          │
    • what is ayjnt                              │ (llama-3.1)
    • how does ayjnt use durable objects         │
    • durable objects cloudflare workers         │
                                                 │
  retrieve (getAgent<IndexAgent>.search):        │
    for "what is ayjnt":                         │
      [0.87]  ayjnt is a Cloudflare-Workers-nat…│
      [0.67]  An ayjnt agent's URL is derived…  │
    for "how ayjnt uses durable objects":        │
      [0.80]  ayjnt is a Cloudflare-Workers-nat…│
      [0.63]  Durable Objects provide single-i…│
                                                 │
  compose:                                       │
    ayjnt is a framework for Cloudflare Workers  │ (llama-3.1)
    where each folder under agents/ becomes a    │
    Durable Object class. The framework auto-    │
    generates the worker entry point and         │
    wrangler config from the file tree. Each DO  │
    is a single-instance, strongly consistent    │
    stateful object running on the edge.

08 step

Deploy

`ayjnt deploy` checks your git tree is clean + synced with origin, regenerates the wrangler config from scratch, then shells out to `wrangler deploy`. The committed migrations.json file is the source of truth for what's in production.

~/my-agent-app