Skip to content

Intro★★★★★6 min read

What is tool use (function calling)? How LLMs reach beyond text

Tool use lets an LLM call functions you provide — search the web, query a DB, send an email. It's the bridge from chat to action, and the foundation of every modern AI agent.

Tool use (also called function calling) is the mechanism that turns an LLM from a chatbot that talks into one that does. You describe a function — its name, parameters, what it returns — and the model decides when to call it as part of answering. The provider's API handles the orchestration: model picks tool, you execute it, return results, model continues. Almost every agent in 2026 is a tool-use loop with extras.

The basic mechanic

You define a tool with a JSON schema:

{
  "name": "get_weather",
  "description": "Get current weather for a city",
  "input_schema": {
    "type": "object",
    "properties": {
      "city": { "type": "string", "description": "e.g. Taipei" }
    },
    "required": ["city"]
  }
}

You pass this list of tools alongside your prompt. The model sees them and may decide to call one. When it does, instead of plain text, it returns a structured tool-use block: "call get_weather with {city: "Taipei"}". Your code:

  1. Receives the tool-use request.
  2. Actually calls getWeather("Taipei") — hits a real weather API.
  3. Sends the result back to the model.
  4. The model continues the conversation, now armed with weather data.

This loop can go many rounds. Modern Claude, GPT-5, and Gemini all support parallel tool use (call multiple tools in one turn) and multi-turn tool use (call → respond → call again).

What makes good tool design

Tool quality is most of agent quality. Three principles:

Clear, narrow names. search_web beats do_research. The model picks the tool whose name and description match the situation; vague names cause wrong picks.

Detailed descriptions. The description is part of the prompt. Be specific: "Returns top 10 search results from Google. Use this when the user asks about current events or facts not likely in your training data. Don't use for code lookups — use search_docs instead."

Tight, validated input schemas. Specify types, enums where possible (status: 'open' | 'closed'), required vs optional. Validate on the way in — if the model passes garbage, return an error message that guides the next call.

Outputs the model can actually use. Return structured data (JSON) for things the model needs to reason over. Return short text snippets for things meant to be read out. Don't dump 50,000 tokens of HTML.

What LLMs actually call

The categories of tools that show up in real agents:

  • Information retrievalsearch_web, fetch_url, query_database, vector_search.
  • Computationrun_python_code, calculate, parse_csv. Sandboxed code execution is huge for math/data tasks.
  • External actionssend_email, create_calendar_event, post_message. Writing actions need extra care.
  • System interactionread_file, write_file, run_shell_command. The basis of coding agents.
  • Domain APIslookup_customer, create_invoice, update_ticket. Your business logic.

Anthropic's computer-use mode adds take_screenshot, move_mouse, click, type — turning the entire OS into tools.

How tool use changes prompting

With tools available, the prompt shifts from "answer this" to "figure out how to answer this with these tools." Models become noticeably more helpful when given the right tools because they stop hallucinating things they could have looked up.

A powerful pattern: always-on retrieval. Give the model a search_knowledge_base tool. Prompt it to use the tool whenever the user asks about your domain. The model decides when retrieval is needed — no need for a hard-coded pipeline.

Another pattern: specialized sub-agents as tools. Wrap a different LLM call (with different model, prompt, tools) as a single "tool" the parent agent can call. Cheaper models for narrow tasks, frontier model for orchestration.

When NOT to add tools

  • One-shot tasks. If a single LLM call solves the user's question ("summarize this paragraph"), tools just add latency.
  • Tools whose execution is unreliable. Models will call broken tools, get errors, retry, fail. Bad for UX. Fix the tool first.
  • Tools the model can't choose between. Three near-duplicate tools with vague descriptions makes the model pick wrong. Consolidate.
  • Hidden side effects. Don't expose tools that delete things, send messages, or move money without explicit confirmation in the loop. Wrap with a permission step.

Failure modes to watch

Three common ones in production:

Tool-call infinite loops. The model calls the same tool repeatedly because results aren't satisfying or it doesn't know what else to try. Solution: step limits, deduplication of recent calls, give the model an explicit "give up" path.

Hallucinated tool calls. The model invents a tool that doesn't exist. Modern APIs validate this and reject; older code might silently run garbage. Validate tool names server-side.

Wrong arguments. The model passes correctly-typed but wrong-content data: city: "the city of the user's choice". Strict descriptions and validation catch this; an explicit error message back to the model usually fixes it next turn.

The relationship to MCP

MCP (Model Context Protocol) is a standardized way of exposing tools across AI clients. An MCP server defines tools the same way you'd define them in your own app, but in a standardized protocol any MCP-aware client can use. Tool use is the underlying capability; MCP is the interoperability layer on top.

Further reading

  • What is an AI agent
  • What is MCP (Model Context Protocol)
  • Structured outputs from LLMs: tool use, JSON mode, schemas
  • Build an agent loop from scratch (no framework)
  • Defending against prompt injection: realistic guardrails for 2026

Last updated: 2026-04-29

We use cookies

Anonymous analytics help us improve the site. You can opt out anytime. Learn more