How to pick an agent framework: LangGraph vs CrewAI vs Mastra vs no framework

Every six months a new "definitive" agent framework launches. By 2026 there are dozens. The honest answer most senior builders have settled on: for simple tool-use loops, no framework. For graph-shaped multi-step workflows, LangGraph. For Python-only multi-agent demos, CrewAI. For TypeScript shops, Mastra. Anything beyond that is mostly noise.

The dirty secret: most agents don't need a framework

If your agent is "call LLM, parse tool calls, run tools, loop until done," you do not need a framework. That's about 30 lines of code in any language. The Anthropic SDK, OpenAI SDK, and Vercel AI SDK all support tool use natively. You write a while loop, you handle errors yourself, and you can debug what's happening because you wrote it.

The industry's most thoughtful builders (Anthropic's own team in their public agent posts, Cursor's team, etc.) repeatedly say: "start without a framework, add one when you have a specific reason." Frameworks add abstraction layers that, when something breaks, you have to debug both your code and the framework's idea of what was supposed to happen.

The pattern of failure: team picks LangChain on day 1, spends weeks fighting framework abstractions, ends up writing custom code that bypasses the framework, ends up with the worst of both worlds.

LangGraph: the right choice when you actually have a graph

LangChain's LangGraph is the framework most senior builders begrudgingly respect. It treats agent flows as explicit state graphs — nodes are steps, edges are transitions, state is shared. When you have a real multi-step workflow with branches, retries, and human-in-the-loop checkpoints, this model maps cleanly to the problem.

Use LangGraph when: your agent has 5+ distinct steps, multiple decision branches, persistence between runs, or human-in-the-loop approval gates. The state-as-graph mental model becomes valuable when the workflow is genuinely complex.

LangGraph is also where LangChain has fixed many of LangChain's old sins. The newer Python and JS SDKs are cleaner. Observability via LangSmith is built in. Persistence is straightforward.

Weakness: still complex. The conceptual overhead is real. Don't reach for it for simple ReAct loops.

CrewAI: the demo-first multi-agent framework

CrewAI's hook is multi-agent: define a few "agents" with roles, give them tasks, watch them collaborate. Demos look great. The simplicity of the API is genuinely appealing.

The reality at production: CrewAI's role-based mental model — "the marketer", "the researcher", "the editor" — is mostly theater. Real multi-agent systems work better as a single coherent agent with branching, not multiple personas talking to each other. The roleplay approach burns a lot of tokens for marginal quality lift over a well-prompted single agent.

Use CrewAI when: you need a quick demo of agentic behavior, you specifically benefit from role separation (rare), or you're prototyping a workflow you'll later rewrite. Don't ship CrewAI to production for anything where reliability matters.

Mastra: the TypeScript-first option that's actually good

Mastra is the agent framework most worth your attention if you're building in TypeScript / Next.js. It's deliberately scoped: workflows, agents, RAG, evals, memory, observability. The DX is excellent — autocomplete works, types help, integration with Vercel and Cloudflare deployment is native.

Use Mastra when: your stack is Next.js or Node.js, you want the structure of LangGraph without the Python lock-in, or you want first-class observability and eval support without bolting on three vendors.

Mastra is newer, the community is smaller, and the API is still moving. But the design choices are sound, the maintainers are responsive, and it doesn't have LangChain's accumulated baggage.

OpenAI Assistants API and Anthropic's Computer Use

OpenAI Assistants API — fine for simple chat assistants with file search. Limited for production agents because state management is opaque (you don't see your own thread state). Most teams who started here graduated to building tool-use loops directly.
Anthropic Computer Use — Claude controlling a browser/desktop. Cool, slow, expensive, brittle. Worth experimenting with for specific use cases (data entry automation, QA testing) but not a general-purpose framework.

Pydantic AI, LlamaIndex Agents, AutoGen, smolagents

Pydantic AI — clean Python, type-driven. Smaller scope than LangGraph but well-designed. Worth considering for Python-only single-agent workflows.
LlamaIndex Agents — the agent layer of LlamaIndex. Use if you're already deep in LlamaIndex for RAG. Standalone, less compelling.
Microsoft AutoGen — research-y. Multi-agent. Demos are interesting, production usage is rare. Keep watching.
HuggingFace smolagents — minimal Python framework, code-as-action paradigm (agent writes Python rather than tool calls). Interesting for technical use cases.

When NOT to use any framework

For 80% of agent use cases, write the loop yourself. Pseudocode:

messages = [{role: 'user', content: query}]
while True:
    response = llm.create(messages=messages, tools=tools)
    if response.stop_reason == 'end_turn':
        return response.content
    for tool_call in response.tool_calls:
        result = run_tool(tool_call.name, tool_call.args)
        messages.append({role: 'tool', content: result})

That's it. Add error handling, retries with exponential backoff, structured logging, and you have a production agent in a single file. You'll understand exactly what happens. You can debug it. You won't fight an abstraction.

When framework actually helps

You have a graph-shaped workflow with branches, joins, retries, human approval. → LangGraph or Mastra Workflows.
You need persistent agent state across long-running runs. → LangGraph's checkpointing or Mastra's memory.
You need built-in observability and you don't want to integrate Langfuse/Helicone/Pydantic Logfire yourself. → Frameworks include this.
You have multiple devs and want a shared mental model. → A framework provides shared vocabulary.

Watch for framework drift

Frameworks in this space change fast. The 2024 LangChain you knew is not the 2026 LangChain. CrewAI changed its core API twice in 2025. Mastra reorganized its workflow API in late 2025. Pin your dependencies, read the changelog before upgrades, and accept that your framework choice today will look outdated in 18 months.

This is another reason "no framework" remains attractive: a hand-rolled agent loop in 30 lines doesn't have a changelog to follow.

Decision tree

Simple tool-use loop, single agent: no framework, just SDK
Multi-step workflow with branches, Python: LangGraph
Multi-step workflow, TypeScript: Mastra
Type-safe single Python agent: Pydantic AI
Quick multi-agent demo: CrewAI (don't ship it)
Browser/desktop automation: Anthropic Computer Use

Next steps

Read about agent loop patterns: ReAct, Plan-and-Execute, Reflexion
Look into agent observability tools: Langfuse, Helicone, LangSmith
Read about agent memory strategies (separate concept worth its own deep dive)
Build a no-framework agent first; only adopt one when you can name the specific limitation