Build an agent loop from scratch (no framework)

When people first build with LLMs they reach for an agent framework — LangGraph, CrewAI, Mastra, AutoGen — and immediately get lost in nodes, edges, state schemas, and abstractions. Six months later they realize the framework was solving problems they didn't have, while papering over the one they did.

Writing an agent loop from scratch teaches you what an agent actually is. Once you've done it once, you'll know whether you need a framework or not. Spoiler: for 80% of products, you don't.

What an agent really is

A single LLM call answers one question and stops. An agent is an LLM in a loop with tools. That's it. The pseudocode:

while not done:
  response = llm.generate(messages, tools)
  if response.is_final:
    return response.text
  for tool_call in response.tool_calls:
    result = execute(tool_call)
    messages.append(tool_result(result))

Everything else — multi-agent orchestration, routing, supervisors — is a wrapper on this. Get this loop right and you can fake the rest.

A working example: a research agent

Let's build a tiny agent that can answer factual questions by searching the web. Two tools: search(query) and fetch(url). We'll use the Anthropic SDK in TypeScript, but the shape is identical for OpenAI / Gemini.

import Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic();

const tools = [
  {
    name: "search",
    description: "Search the web. Returns top 5 results with title + URL + snippet.",
    input_schema: {
      type: "object",
      properties: { query: { type: "string" } },
      required: ["query"],
    },
  },
  {
    name: "fetch",
    description: "Fetch the full text of a web page. Returns up to 5000 chars.",
    input_schema: {
      type: "object",
      properties: { url: { type: "string" } },
      required: ["url"],
    },
  },
];

async function executeTool(name: string, input: any): Promise<string> {
  if (name === "search") return await webSearch(input.query);
  if (name === "fetch") return await fetchPage(input.url);
  return `Unknown tool: ${name}`;
}

async function runAgent(question: string, maxSteps = 10) {
  const messages: any[] = [{ role: "user", content: question }];

  for (let step = 0; step < maxSteps; step++) {
    const response = await client.messages.create({
      model: "claude-sonnet-4-7",
      max_tokens: 4096,
      tools,
      messages,
    });

    messages.push({ role: "assistant", content: response.content });

    if (response.stop_reason === "end_turn") {
      const textBlock = response.content.find((b) => b.type === "text");
      return textBlock?.text ?? "";
    }

    if (response.stop_reason === "tool_use") {
      const toolResults = [];
      for (const block of response.content) {
        if (block.type !== "tool_use") continue;
        const result = await executeTool(block.name, block.input);
        toolResults.push({
          type: "tool_result",
          tool_use_id: block.id,
          content: result,
        });
      }
      messages.push({ role: "user", content: toolResults });
      continue;
    }

    throw new Error(`Unexpected stop_reason: ${response.stop_reason}`);
  }

  throw new Error(`Agent exceeded ${maxSteps} steps`);
}

That's the entire agent. Forty-two lines. Run it with runAgent("What did Anthropic announce in March 2026?") and watch the model search, read pages, and synthesize an answer.

What the loop does, step by step

Send the user's question + tool schemas to the model.
Model decides: do I have enough info to answer, or do I need a tool?
If stop_reason === "end_turn" it answered. Return.
If stop_reason === "tool_use" it requested one or more tools. Execute each. Append results to the message history.
Loop. The model now sees its previous tool calls and the results.

The model is not "calling" the tool — your code is. The model just emits a structured request, you run the actual function, and you append the result so the model sees it on the next turn.

The five things that go wrong

Building this once teaches you the failure modes that all frameworks try to abstract away:

Infinite loops. Without maxSteps an agent that keeps requesting tools (or that returns to a tool that always fails) burns through your budget. Always cap.
Token bloat. Each turn the message history grows. Long-running agents eventually hit the context window. Solutions: summarize old turns, drop tool results once they're no longer needed, or use prompt caching.
Tool errors aren't fatal. If fetch(url) 404s, return that as a tool result, not as an exception. The model can recover. Throwing kills the loop.
JSON schema drift. If your tool's input schema is sloppy, the model will pass weird inputs. Make schemas strict, use enums, validate on the server.
Stop conditions. "Done" is harder than it sounds. Sometimes the model wants to ask a clarifying question (which means it's not done but also has no tool calls). Read stop_reason carefully.

Adding the things you actually need

Once the basic loop works, you'll incrementally want:

Streaming. Show the model's reasoning as it happens. Anthropic's stream API gives you this; just buffer the text deltas.
System prompt. A persona, output format constraints, refusal rules. Goes in the system parameter, not messages.
Memory. Long-running conversations need a summary or vector recall. Don't reach for a memory library on day one — just save and reload messages from disk.
Tracing / logging. Print every model output, every tool call, every result. You will read these logs hundreds of times.
Cost tracking. Each response.usage has token counts. Sum them up.

When NOT to roll your own

Multi-agent orchestration with shared state. If you have five agents that need to coordinate via a graph with retries, parallel branches, and human-in-the-loop, LangGraph genuinely solves real problems. Roll your own first, then graduate when you feel the pain.
You need observability out of the box. LangSmith / Langfuse give you UI for free. Worth it if you're not going to build your own dashboard.
You're a team of >5 working on agent code. A shared framework gives you onboarding docs, type hints, and a common vocabulary.

For a single product with one or two specialized agents, the from-scratch loop is faster to ship, easier to debug, and lets you swap models without rewriting your stack.

What you've actually learned

After writing this loop, the entire "agent" thing demystifies. Multi-agent systems? It's just one loop calling another loop as a tool. Planning agents? It's the same loop with planner_step and executor_step tools. Self-correction? It's the same loop with critic as a tool.

Frameworks aren't wrong, they're just earlier than you think. Build the 50-line version first.