Tool Use in LLM Agents: From Local Functions to the Model Context Protocol

An LLM that can only generate text is a fancy autocomplete engine. The moment you give it tools — the ability to call functions, query APIs, execute code, read databases — you get something categorically different. You get an agent.

Tool use is the architectural boundary between "chatbot" and "agent." It's also where most production systems break down. Not because tool calling is hard to implement. Because the ecosystem has exploded into four or five distinct patterns, each with different trade-offs, and most teams pick one without understanding why.

This article is a map. Local tools, API-based tools, plugin tools, MCP, stateful tools — what they are, when they make sense, and where they'll burn you. We'll use LangChain throughout, since that's where most of this runs in practice. And we'll cover tool configuration and error handling, because that's where the bugs live.

mermaid

graph LR
    A([Agent]) --> B[Local Tools\nfast · in-process · no isolation]
    A --> C[API-Based Tools\nnetwork · shared · fault-isolated]
    A --> D[Plugin Tools\nthird-party · provider-managed]
    A --> E[MCP Servers\nprotocol-standard · cross-framework]
    A --> F[Stateful Tools\npersistent session · high risk]

    style A fill:#4A90E2,color:#fff,stroke:#2c6fad
    style B fill:#6BCF7F,color:#fff,stroke:#4aad61
    style C fill:#6BCF7F,color:#fff,stroke:#4aad61
    style D fill:#6BCF7F,color:#fff,stroke:#4aad61
    style E fill:#6BCF7F,color:#fff,stroke:#4aad61
    style F fill:#FFA07A,color:#fff,stroke:#cc6040

Each of these is a different answer to the same question: where does tool logic live, and who owns it? The right answer depends on your scale, your team structure, and your tolerance for operational complexity.

How LangChain Structures LLM Interactions

Before tools make sense, you need to understand how LangChain models the conversation itself.

Every interaction with an LLM is a sequence of messages. LangChain formalizes this as a list passed to the model, and there are two main types you'll work with constantly:

HumanMessage — input from the user or orchestration layer
AIMessage — the model's response, which may include text, tool call requests, or both

There are others — most importantly ToolMessage, which carries the result of a tool execution back into the context. (FunctionMessage served a similar role in older LangChain versions but is deprecated; use ToolMessage in current code.) These two message types are your core loop.

code

from langchain_core.messages import HumanMessage, AIMessage, SystemMessagemessages = [    SystemMessage(content="You are a helpful assistant."),    HumanMessage(content="What's the weather in Mumbai?"),]

When a model decides to call a tool, the AIMessage it returns doesn't contain the final answer — it contains a tool_calls field describing which tool to invoke and with what arguments. Your application is responsible for executing that tool and appending a ToolMessage with the result. Then the model gets the full context and generates its final response.

This message-passing loop is how LangGraph manages agent state. If you want the full picture of how this fits into a stateful graph, From LLMs to Agents: The Mindset Shift Nobody Talks About covers that transition in depth.

Binding and Invoking Tools

Tools get attached to a model using .bind_tools(). This tells the model what functions are available, what their inputs look like, and how to call them. The model doesn't execute tools — it requests them. Your code executes them.

code

from langchain_openai import ChatOpenAIfrom langchain_core.tools import tool@tooldef get_weather(city: str) -> str:    """Get the current weather for a given city."""    return f"The weather in {city} is 32°C and humid."model = ChatOpenAI(model="gpt-4o")model_with_tools = model.bind_tools([get_weather])response = model_with_tools.invoke([HumanMessage(content="Weather in Mumbai?")])print(response.tool_calls)# [{'name': 'get_weather', 'args': {'city': 'Mumbai'}, 'id': 'call_abc123'}]

The @tool decorator does three things: it registers the function as a tool, extracts the schema from the type annotations, and uses the docstring as the tool's description. That description is what the model reads to decide whether and when to call the tool. Write it carefully.

Local Tools

Local tools are functions that run in the same process as your agent. No network call, no API key, no external dependency — just Python.

They're the default starting point for most agentic systems, and they cover a surprisingly wide range of use cases: data transformation, calculations, string processing, in-memory state manipulation, calling internal modules, wrapping SDK clients that are already initialized.

What a Local Tool Looks Like

code

from langchain_core.tools import toolfrom pydantic import BaseModel, Fieldimport jsonclass SearchInput(BaseModel):    query: str = Field(description="The search query to look up")    max_results: int = Field(default=5, description="Maximum number of results to return")@tool(args_schema=SearchInput)def search_knowledge_base(query: str, max_results: int = 5) -> str:    """    Search the internal knowledge base for documents matching the query.    Returns a JSON list of matching document titles and snippets.    """    # Actual implementation would hit a vector store or search index    results = [{"title": f"Doc about {query}", "snippet": "..."}]    return json.dumps(results[:max_results])

A few things worth noting about this pattern:

The args_schema gives you explicit Pydantic validation on inputs before anything executes. If the model hallucinates a wrong type, Pydantic catches it before your function does.

The docstring is part of the tool's metadata sent to the model. It's not documentation for you — it's an instruction to the LLM. Be specific about what the tool returns, not just what it does.

The return type should be a string. Models read tool results as text. If you return a dict or list, LangChain will coerce it, but you're better off controlling the serialization explicitly.

Schema Is the Interface

A local tool's schema is its primary interface with the model. That schema includes:

Name
Description (from docstring)
Input parameter names, types, and field descriptions
Which parameters are required vs. optional

code

print(search_knowledge_base.name)         # "search_knowledge_base"print(search_knowledge_base.description)  # The docstringprint(search_knowledge_base.args_schema.schema())  # Full JSON schema

This metadata is what the model uses to decide: should I call this tool, and if so, with what arguments? Poor descriptions lead to wrong calls. Vague parameter names lead to hallucinated inputs. This is one of the most common failure modes in agentic systems — not the model being dumb, but the tool schema being unclear.

Where Local Tools Break Down

Local tools are fine for a single agent on one machine. In production, you'll hit these walls fast:

Scalability. Every agent instance loads every tool into memory. In a multi-agent setup with dozens of specialized tools, you can't selectively deploy tools to specific instances without restructuring your codebase.

Shared logic becomes a liability. Two agents needing the same tool means either a shared module (tight coupling) or copied implementation (divergence). When business logic changes, there's no central versioning, no staged rollout, no rollback — just redeploys.

No fault boundary. A bug in a local tool can take down the entire agent process. There's nothing separating tool execution from the orchestration layer.

Local tools are the right starting point. They're not the right ending point for anything that needs to run reliably at scale.

API-Based Tools

The straightforward solution to local tool limitations: move tool execution behind an HTTP boundary. Your agent calls a service; the service runs the tool; the service returns a result.

This is the same decomposition you'd apply to any distributed system. Tools become microservices. Or more commonly, they become thin wrappers around existing internal APIs your organization already maintains.

What an API Tool Looks Like

code

from langchain_core.tools import toolimport httpx@tooldef get_stock_price(ticker: str) -> str:    """    Get the current stock price for a given ticker symbol.    Returns price in USD as a string.    """    response = httpx.get(        f"https://api.internal.company.com/stocks/{ticker}",        headers={"Authorization": "Bearer ..."},        timeout=5.0,    )    response.raise_for_status()    data = response.json()    return f"{ticker}: ${data['price']:.2f}"

From LangChain's perspective, this is still just a tool — same @tool decorator, same schema, same invocation pattern. The implementation difference is the entire point: you get independent deployability, horizontal scaling, fault isolation (a failing service returns an error rather than crashing the agent), and a single shared implementation that every agent calls.

The trade-off is latency and operational complexity. A local tool executes in under 1ms for pure computation, or 5–50ms if it's hitting an already-initialized SDK client. Move it behind HTTP and you're looking at 50–200ms for a co-located internal service, 200–500ms for a third-party API, and 500ms+ once you add auth token refresh, retries, and cold starts. In a 10-step ReAct loop, that delta between local and API is 5 seconds of wall-clock time the user is waiting. You need auth, observability, timeouts, and circuit breakers on both ends.

For most production systems, the right model is a mix: fast utility functions as local tools, shared business logic behind APIs. The 7 GenAI Architectures article covers how this plays out in practice.

Plugin Tools

Plugin tools take the API model further: instead of building your own tool APIs, you consume third-party capabilities that model providers have pre-integrated.

The idea originated with OpenAI's ChatGPT plugin system. The model knows about a catalog of available plugins, can reason about which ones to call, and the provider handles the protocol details. It's tool use as a marketplace.

How Plugin Tools Are Built

Most plugin tools are generated from OpenAPI specs. If a service has an OpenAPI/Swagger schema, you can auto-generate LangChain tools from it without writing a single tool wrapper by hand:

code

from langchain_community.agent_toolkits.openapi import plannerfrom langchain_community.agent_toolkits.openapi.spec import reduce_openapi_specimport yaml, httpx# Fetch the OpenAPI specraw_spec = yaml.safe_load(httpx.get("https://api.example.com/openapi.yaml").text)spec = reduce_openapi_spec(raw_spec)# Build tools from every operation in the specfrom langchain_community.utilities.requests import RequestsWrapperrequests_wrapper = RequestsWrapper(headers={"Authorization": "Bearer ..."})agent = planner.create_openapi_agent(    spec,    requests_wrapper,    llm=model,    verbose=True,)

reduce_openapi_spec trims the spec down to what the LLM can fit in context. Each API endpoint becomes a callable tool. The model reads the spec's description fields to decide which endpoints to call and how to construct parameters.

This is the right pattern when you're integrating a third-party API that already has a well-documented OpenAPI spec: Stripe, Twilio, Notion, HubSpot. You get a full set of tools in minutes instead of writing wrappers for every endpoint.

The schema normalization problem. Different providers describe tool schemas differently. OpenAI uses one JSON schema format. Gemini uses another. Anthropic's tool input schema has its own structure. LangChain normalizes across these at the framework level — bind_tools() handles the provider-specific serialization — but if you're calling provider APIs directly, you're responsible for the translation:

code

# Anthropic tool schemaanthropic_tool = {    "name": "get_weather",    "description": "Get current weather for a city",    "input_schema": {        "type": "object",        "properties": {            "city": {"type": "string", "description": "City name"}        },        "required": ["city"]    }}# OpenAI tool schema — same tool, different envelopeopenai_tool = {    "type": "function",    "function": {        "name": "get_weather",        "description": "Get current weather for a city",        "parameters": {            "type": "object",            "properties": {                "city": {"type": "string", "description": "City name"}            },            "required": ["city"]        }    }}

Same tool, two different schemas. LangChain's @tool decorator generates one canonical definition and handles the conversion. This is one of the concrete reasons to use a framework rather than raw API calls — you write the tool once, deploy it against any provider.

The Ecosystem Today

Gemini has the most mature built-in plugins: Google Workspace, Search grounding, and code execution as first-class capabilities, not third-party wrappers. Anthropic skipped the plugin marketplace entirely — their ecosystem bet is MCP (next section). Microsoft's Phi leans into the Graph API (Office 365, Teams, Azure); strong inside that ecosystem, irrelevant outside it.

On the open-source side: LlamaIndex tool specs, Haystack components, and LangChain Hub cover common integrations (Slack, Notion, GitHub, Jira) well enough that you rarely need to write those wrappers yourself.

The hard limit across all of them: vendor lock-in. Tools built for one provider's format don't port. That's the problem MCP was built to solve.

Model Context Protocol (MCP)

MCP is a protocol specification, not a framework. It defines a standard way for LLM applications to communicate with tool servers, so that any MCP-compatible client can use any MCP-compatible server — regardless of which model or framework you're using.

The analogy that holds: MCP is to agent tools what USB-C is to device charging. Before USB-C, every device had its own cable. After USB-C, the interface is standardized and the hardware is interchangeable.

Here's the architecture at a glance before we get into the protocol details:

mermaid

graph TD
    LLM[LLM]
    Agent[LangGraph Agent]
    Client[MCP Client<br/>langchain-mcp-adapters]

    S1[MCP Server<br/>GitHub]
    S2[MCP Server<br/>Filesystem]
    S3[MCP Server<br/>PostgreSQL]
    S4[MCP Server<br/>Custom API]

    LLM <-->|tool_calls / ToolMessage| Agent
    Agent <-->|session management| Client

    Client <-->|JSON-RPC over stdio| S1
    Client <-->|JSON-RPC over stdio| S2
    Client <-->|JSON-RPC over HTTP/SSE| S3
    Client <-->|JSON-RPC over HTTP/SSE| S4

    style LLM fill:#4A90E2,color:#fff,stroke:#2c6fad
    style Agent fill:#7B68EE,color:#fff,stroke:#5a4ecc
    style Client fill:#7B68EE,color:#fff,stroke:#5a4ecc
    style S1 fill:#6BCF7F,color:#fff,stroke:#4aad61
    style S2 fill:#6BCF7F,color:#fff,stroke:#4aad61
    style S3 fill:#6BCF7F,color:#fff,stroke:#4aad61
    style S4 fill:#6BCF7F,color:#fff,stroke:#4aad61

The key insight: the agent doesn't know or care what's behind each MCP server. The client discovers capabilities at runtime via tools/list. Add a new server, remove an old one — the agent adapts without a redeploy.

The Protocol

MCP defines two sides:

MCP Server — exposes capabilities: tools, resources (files, database records, API responses), and prompts. A server might expose a search_web tool, a read_file resource, and a summarize_document prompt. Servers can be local processes (stdio transport) or remote services (HTTP/SSE transport). They're language-agnostic — the spec is JSON-RPC over stdio or HTTP, so you can implement a server in Python, Go, TypeScript, or anything else.

MCP Client — the agent or orchestration layer that connects to servers, discovers their capabilities, and routes tool calls to the appropriate server. A client maintains a session with one or more servers and handles the protocol handshake, capability negotiation, and message routing.

The core message types:

tools/list — client asks server what tools it exposes
tools/call — client invokes a specific tool with arguments
resources/list, resources/read — client reads resources
sampling/createMessage — server can request LLM completions from the client (bidirectional)

That last one is interesting — servers can call back into the LLM. This enables genuinely recursive, multi-model workflows.

mermaid

sequenceDiagram
    participant User
    participant Agent as LangGraph Agent<br/>(MCP Client)
    participant LLM as LLM
    participant Adapter as LangChain<br/>MCP Adapter
    participant S1 as MCP Server<br/>(Filesystem)
    participant S2 as MCP Server<br/>(GitHub)

    User->>Agent: User message

    Agent->>LLM: messages + bound tools
    LLM-->>Agent: AIMessage (tool_calls: [read_file])

    Agent->>Adapter: route tool call → read_file
    Adapter->>S1: tools/call { name: "read_file", args: {...} }
    S1-->>Adapter: ToolResult { content: "..." }
    Adapter-->>Agent: ToolMessage

    Agent->>LLM: messages + ToolMessage
    LLM-->>Agent: AIMessage (tool_calls: [search_repo])

    Agent->>Adapter: route tool call → search_repo
    Adapter->>S2: tools/call { name: "search_repo", args: {...} }
    S2-->>Adapter: ToolResult { content: "..." }
    Adapter-->>Agent: ToolMessage

    Note over S2,Agent: sampling/createMessage<br/>(server calls back into LLM)
    S2->>Agent: sampling/createMessage { prompt: "..." }
    Agent->>LLM: createMessage request
    LLM-->>Agent: completion
    Agent-->>S2: sampling result

    Agent->>LLM: full message history
    LLM-->>Agent: AIMessage (final answer)
    Agent-->>User: Response

LangChain + MCP

LangChain has first-class MCP support through langchain-mcp-adapters:

code

from mcp import ClientSession, StdioServerParametersfrom mcp.client.stdio import stdio_clientfrom langchain_mcp_adapters.tools import load_mcp_toolsfrom langchain_openai import ChatOpenAIfrom langgraph.prebuilt import create_react_agent# Connect to a local MCP serverserver_params = StdioServerParameters(    command="uvx",    args=["mcp-server-filesystem", "/tmp/workspace"],)async with stdio_client(server_params) as (read, write):    async with ClientSession(read, write) as session:        await session.initialize()                # Load all tools exposed by this server        tools = await load_mcp_tools(session)                model = ChatOpenAI(model="gpt-4o")        agent = create_react_agent(model, tools)                result = await agent.ainvoke({            "messages": [{"role": "user", "content": "List all Python files in the workspace"}]        })

load_mcp_tools discovers the server's tool catalog and wraps each tool in a LangChain-compatible BaseTool. From that point, LangGraph doesn't care that these tools come from MCP — they look identical to local tools from the graph's perspective.

You can connect to multiple MCP servers simultaneously:

code

from langchain_mcp_adapters.client import MultiServerMCPClientasync with MultiServerMCPClient(    {        "filesystem": {"command": "uvx", "args": ["mcp-server-filesystem", "/workspace"]},        "github": {"url": "https://mcp.github.com", "transport": "streamable_http"},        "postgres": {"command": "uvx", "args": ["mcp-server-postgres", DATABASE_URL]},    }) as client:    tools = client.get_tools()    agent = create_react_agent(model, tools)

Each server is isolated. Tools from the filesystem server and tools from the GitHub server are all available to the agent with no additional integration work.

What MCP Gets Right

Interoperability. A tool server written once works with any MCP client — Claude, GPT-4, Llama, your custom orchestration layer. The ecosystem compounds.

Separation of concerns. Tool implementation is completely decoupled from agent implementation. Teams can own their MCP servers independently. Tool servers can be updated, versioned, and deployed without touching agent code.

Discovery. The client doesn't need to know tool schemas in advance. It connects to a server, calls tools/list, and gets the full catalog dynamically. This enables genuinely dynamic tool loading based on context.

Bidirectionality. Servers can request LLM completions from the client. This supports patterns like tool-level reasoning, embedded summarization, and multi-model chains that are awkward to implement with static tool definitions.

What MCP Doesn't Solve Yet

Be honest about the gaps:

Authentication and authorization are not standardized. The MCP spec doesn't define how servers authenticate clients or how fine-grained permissions work. Every server implements its own auth. There's no standard for capability tokens, scoped credentials, or revocation. If you're building security-critical systems, you need to layer auth on top — the protocol won't do it for you. This connects directly to the patterns in Credential Scoping for Agents.

Tool discovery at scale is unsolved. If your organization has 200 MCP servers, how does an agent know which ones to connect to? Static configuration doesn't scale. Dynamic discovery registries don't exist in a standardized form yet.

Error semantics are underspecified. Tool failures return JSON-RPC errors, but the protocol doesn't define retry semantics, partial failure handling, or how clients should reason about transient vs. permanent failures.

Server health and observability. There's no standard for health checks, metrics, or distributed tracing across MCP server boundaries. You're stitching together your own observability story. The patterns from Agentic AI Observability apply here, but you'll need to instrument MCP calls explicitly.

MCP is early but directionally correct. It will become the standard. Building tool servers MCP-compatible today means you're not locked into any one framework tomorrow.

Stateful Tools

Most tools are stateless: input in, output out, nothing persists. Stateful tools break that model. They maintain state across calls — session objects, database connections, in-memory caches, workflow state machines.

Why You'd Want Them

Some tool interactions are inherently multi-step. A browser automation tool needs to maintain a session across page loads. A database tool might manage a transaction across multiple queries. A file editing tool might stage changes before committing. Implementing these as multiple stateless calls creates coordination overhead and race conditions. A stateful tool wraps the complexity.

code

from langchain_core.tools import BaseToolfrom typing import Optionalimport psycopg2class PostgresSessionTool(BaseTool):    name: str = "postgres_query"    description: str = """    Execute SQL queries within a persistent database session.    Maintains transaction state across multiple calls.    Always call commit_transaction or rollback_transaction when done.    """        _connection: Optional[psycopg2.extensions.connection] = None    _cursor: Optional[psycopg2.extensions.cursor] = None    def _run(self, query: str, operation: str = "query") -> str:        if operation == "connect":            self._connection = psycopg2.connect(DATABASE_URL)            self._cursor = self._connection.cursor()            return "Connected to database"                if operation == "query":            self._cursor.execute(query)            return str(self._cursor.fetchall())                if operation == "commit":            self._connection.commit()            return "Transaction committed"                if operation == "rollback":            self._connection.rollback()            return "Transaction rolled back"                raise ValueError(f"Unknown operation: {operation}")

The Security Problems

Stateful tools are where the most dangerous security bugs live.

State leakage across sessions. If a stateful tool is shared across multiple concurrent agent runs (which is the natural implementation in a server), one agent's state can bleed into another. A browser session, an open file handle, a pending transaction — all can be read or modified by the wrong agent if you're not careful about isolation.

Persistent side effects from partial failures. A stateless tool that fails leaves nothing behind. A stateful tool that fails mid-workflow can leave state partially applied: a transaction open, a file locked, a session authenticated. The agent might retry from the beginning while the previous call's state is still live.

Expanded blast radius. A compromised stateful tool has access to everything in its session. A database connection tool that's been manipulated via prompt injection can execute arbitrary queries against an open connection that already has elevated permissions.

How to mitigate:

First, never share stateful tool instances across agent invocations. Create a fresh instance per agent run. This is the same principle as not sharing database connection pools across untrusted tenants.

Second, implement explicit lifecycle management and enforce it. Tools should have initialize() and cleanup() methods, and your orchestration layer should call cleanup() regardless of whether the agent succeeded or failed. LangGraph's node lifecycle hooks are useful here.

Third, scope credentials to the minimum required for each operation. An agent that needs to read from a database shouldn't be holding a connection with write permissions. This applies at every layer — the Tool Execution Firewall pattern is directly relevant when tools have persistent capabilities.

Fourth, log every state transition. Stateful tools are where audit trails become critical. You need to reconstruct exactly what state the tool was in at every point in the agent's execution. This isn't optional for anything running in production. See Agent Audit Trails: Logging Context, Not Just Actions.

Automated Tool Development

As tool catalogs grow, the engineering cost of maintaining them grows with it. One emerging pattern: use the model to generate tools, not just call them.

Foundation Models as Tool Makers

Frontier models are good at generating well-formed tool definitions from a natural language description or an existing schema. The practical use case is wrapping APIs you don't control — given an OpenAPI spec or a plain English description of what a service does, a model can produce a correctly typed, docstring-annotated Python function ready to bind.

Here's a concrete pattern: you have an internal service with a spec, and you want a tool without writing the wrapper yourself.

code

from langchain_openai import ChatOpenAIfrom langchain_core.tools import toolimport jsonllm = ChatOpenAI(model="gpt-4o")def generate_tool_from_description(service_name: str, description: str, example_response: dict) -> str:    """Ask the LLM to produce a @tool-decorated Python function."""    prompt = f"""Write a LangChain @tool function for this internal service.Service: {service_name}Description: {description}Example response shape: {json.dumps(example_response, indent=2)}Requirements:- Use @tool decorator from langchain_core.tools- Type-annotated parameters- Docstring that describes what the tool returns, not just what it does- Return type must be str- Call httpx.get with a placeholder URL- Include basic error handlingReturn only the Python function, no explanation."""    response = llm.invoke(prompt)    return response.content# Usagegenerated_code = generate_tool_from_description(    service_name="inventory-service",    description="Returns current stock levels for a given SKU",    example_response={"sku": "ABC-123", "quantity": 42, "warehouse": "MUM-01"})print(generated_code)

The output is a starting point, not a finished tool. You still need to: review the generated docstring (it determines how the model calls the tool), replace placeholder URLs with real endpoints, and validate the schema with tool.args_schema.schema() before binding. The generation removes the boilerplate; the review gate removes the risk.

Don't skip the review step. A generated tool with a vague docstring will be called incorrectly by the same model that generated it.

Real-Time Code Generation

The more aggressive pattern: agents that write code, test it, and execute it in a loop — the "code interpreter" approach. The agent doesn't generate a reusable tool definition; it generates a one-shot script to answer a specific question.

code

from langchain_openai import ChatOpenAIfrom langchain_core.tools import toolllm = ChatOpenAI(model="gpt-4o")@tooldef generate_and_execute(task: str, data_context: str) -> str:    """    Generate and execute a Python script to complete a data analysis task.    Provide the task description and any relevant data context.    Returns execution output or a descriptive error.    """    code_prompt = f"""Write a Python script to: {task}Available context: {data_context}Rules:- Use only stdlib + pandas + numpy- Print results to stdout- Handle exceptions and print errors clearly- No file I/O, no network calls"""    # Intentional nested LLM call: this tool itself invokes the LLM to generate code.    # The outer agent decides *when* to use this tool; the inner call handles *what* to generate.    # Two-level pattern — not a mistake, but document it clearly for anyone reading the code.    code_response = llm.invoke(code_prompt)    code = code_response.content.strip().removeprefix("```python").removesuffix("```").strip()    # In production: run this in a sandboxed subprocess, not exec()    import subprocess, sys    result = subprocess.run(        [sys.executable, "-c", code],        capture_output=True, text=True, timeout=10    )    if result.returncode != 0:        return f"Execution failed:\n{result.stderr[:500]}"    return result.stdout[:1000]

Note the subprocess over exec() — it gives you process isolation, timeout enforcement, and stderr capture. PythonREPLTool from langchain_experimental runs code in the host process with full permissions. That's acceptable locally; in production you want a container boundary with network isolation and no credential access.

The pattern works well for: data analysis over known schemas, metric calculations, format transformations. It breaks for: anything requiring persistent state, network access to internal services, or tasks where the generated code is likely to be wrong in non-obvious ways that need human judgment to catch.

Tool Use Configuration

The `tool_choice` Parameter

When you bind tools to a model, you can control how aggressively the model uses them via tool_choice:

code

# Auto: model decides when to use tools (default)model.bind_tools(tools, tool_choice="auto")# Required: model MUST call at least one toolmodel.bind_tools(tools, tool_choice="required")# Any: alias for required in some providersmodel.bind_tools(tools, tool_choice="any")# Specific: force a specific tool callmodel.bind_tools(tools, tool_choice={"type": "function", "function": {"name": "get_weather"}})

auto is the default and the right choice for most agentic workflows. The model reasons about whether tools are needed.

required / any forces a tool call on every invocation. Use this when you know the model should always take an action — a router that must classify every input, a data extraction pipeline where every call should produce structured output, a workflow where proceeding without tool use is always wrong. The risk: the model will call something, even if no tool is appropriate. In edge cases, it'll call the closest thing it can find with nonsense arguments. Test your edge cases before deploying this mode.

Specific tool forcing is for structured extraction and workflow checkpoints where you need a guaranteed schema on every call. Combine with a Pydantic output parser for reliable structured output.

The Too Many Tools Problem

There's a limit to how many tools you can bind to an agent before reasoning quality degrades. In practice, this threshold sits somewhere around 20–30 tools depending on the model. Beyond that, models start making wrong tool selections — choosing a vaguely relevant tool over the correct one, or failing to call any tool when one is clearly needed. The problem is attention: every tool's schema consumes context, and when the list is long, the signal-to-noise ratio in the tool selection step drops.

The production fix is dynamic tool selection: don't bind all tools upfront. Select the relevant subset for each request at runtime.

code

from langchain_openai import ChatOpenAI, OpenAIEmbeddingsfrom langchain_community.vectorstores import FAISS# All available tools, with their descriptions as the retrieval corpusall_tools = [get_customer_profile, update_ticket_status, search_knowledge_base,             calculate_sla_deadline, get_product_details, check_inventory, ...]# Build a vector index over tool descriptions at startuptool_descriptions = [    {"name": t.name, "description": t.description}    for t in all_tools]tool_index = FAISS.from_texts(    texts=[t["description"] for t in tool_descriptions],    embedding=OpenAIEmbeddings(),    metadatas=tool_descriptions,)tool_lookup = {t.name: t for t in all_tools}def select_tools_for_request(user_query: str, k: int = 8) -> list:    """Retrieve the k most relevant tools for this query."""    results = tool_index.similarity_search(user_query, k=k)    return [tool_lookup[r.metadata["name"]] for r in results            if r.metadata["name"] in tool_lookup]# Per-request: bind only the relevant toolsdef run_agent(user_query: str):    relevant_tools = select_tools_for_request(user_query, k=8)    model = ChatOpenAI(model="gpt-4o").bind_tools(relevant_tools)    # ... rest of agent invocation

This keeps the bound tool list to 8–10 per request regardless of catalog size. The embedding index adds ~5ms at query time — a worthwhile trade for consistent reasoning quality across a large tool catalog.

One important detail: tool descriptions need to be retrieval-optimized, not just model-optimized. A description written to tell the LLM how to use a tool may not retrieval-match the user queries that should trigger it. Write two descriptions if needed — one for retrieval, one for the model's schema — or use a separate retrieval_hint field during indexing.

When Things Break: Error Handling

Tool call failures in production follow predictable patterns. Here's the handling stack, in order:

1. Validate before the model sees it

Schema validation catches malformed inputs before they reach your tool's implementation. Define strict Pydantic schemas and let validation fail fast:

code

class QueryInput(BaseModel):    query: str = Field(min_length=1, max_length=1000)    filters: dict[str, str] = Field(default_factory=dict)    limit: int = Field(default=10, ge=1, le=100)@tool(args_schema=QueryInput)def search_records(query: str, filters: dict, limit: int) -> str:    """Search records matching query with optional filters."""    ...

If the model sends limit: "ten" instead of limit: 10, Pydantic rejects it before your code runs. The error message goes back to the model, which usually self-corrects on the next attempt.

2. Retry with exponential backoff

Transient failures — network timeouts, rate limits, temporary service unavailability — need retry logic. Don't implement this inside the tool; implement it at the tool invocation layer so it applies uniformly:

code

from tenacity import retry, stop_after_attempt, wait_exponential, retry_if_exception_typeimport httpx@retry(    stop=stop_after_attempt(3),    wait=wait_exponential(multiplier=1, min=1, max=10),    retry=retry_if_exception_type((httpx.TimeoutException, httpx.HTTPStatusError)),)def _call_api_with_retry(url: str, payload: dict) -> dict:    response = httpx.post(url, json=payload, timeout=10.0)    response.raise_for_status()    return response.json()@tooldef my_api_tool(input: str) -> str:    """Call the internal API."""    result = _call_api_with_retry("https://api.internal.com/endpoint", {"input": input})    return result["output"]

2a. Retry with feedback (LLM self-correction)

Exponential backoff handles infrastructure failures. A different failure mode needs a different fix: the model called the tool with wrong arguments. The tool ran, returned an error, and the model needs to see that error in context to correct itself. This is the retry-with-feedback loop.

LangChain's ToolMessage is the mechanism. When a tool call fails, append a ToolMessage with the error — the model reads it and retries with corrected arguments, usually successfully:

code

from langchain_core.messages import AIMessage, ToolMessage, HumanMessagefrom langchain_openai import ChatOpenAIimport jsonmodel = ChatOpenAI(model="gpt-4o").bind_tools(tools)def run_with_tool_feedback(user_query: str, max_retries: int = 3) -> str:    messages = [HumanMessage(content=user_query)]    for attempt in range(max_retries):        response = model.invoke(messages)        messages.append(response)        if not response.tool_calls:            # No tool call — final text response            return response.content        # Execute each tool call; catch failures and feed them back        all_succeeded = True        for tc in response.tool_calls:            try:                tool_fn = tool_lookup[tc["name"]]                result = tool_fn.invoke(tc["args"])                messages.append(ToolMessage(                    content=result,                    tool_call_id=tc["id"],                ))            except Exception as e:                # Feed the error back — model will see it and can correct                messages.append(ToolMessage(                    content=f"Error: {str(e)}. Check your arguments and try again.",                    tool_call_id=tc["id"],                ))                all_succeeded = False        if all_succeeded:            continue  # Let model generate next step or final answer    return "Max retries reached. Could not complete the request."

The key: don't silently swallow tool errors or raise immediately. Return a descriptive ToolMessage with the error and let the model attempt self-correction. GPT-4o and Claude 3.5+ handle this well — they'll adjust argument types, fix field names, or switch to a different tool. Cap retries at 2–3; if the model can't correct after that, the problem is usually the tool schema, not the model.

3. Fall back gracefully

Not every tool failure should abort the agent. Define what the fallback behavior is for each tool and encode it explicitly:

code

@tooldef get_real_time_price(ticker: str) -> str:    """Get real-time stock price. Falls back to last known price if unavailable."""    try:        return fetch_live_price(ticker)    except PriceServiceUnavailable:        cached = get_cached_price(ticker)        if cached:            return f"{ticker}: ${cached['price']:.2f} (cached {cached['age_minutes']}m ago)"        return f"Price unavailable for {ticker}. Please try again later."

Return a meaningful string the model can reason about. Don't raise an exception unless the failure is genuinely unrecoverable and you want the agent to stop.

4. Log everything

Every tool call — its inputs, its outputs, its latency, its success or failure — should be logged with the agent's run ID. This is what lets you reconstruct exactly what happened when something goes wrong in production:

code

import structlogimport timefrom functools import wrapslogger = structlog.get_logger()def logged_tool(func):    @wraps(func)    def wrapper(*args, **kwargs):        start = time.monotonic()        tool_name = func.__name__        try:            result = func(*args, **kwargs)            logger.info("tool.success",                tool=tool_name,                args=kwargs,                latency_ms=(time.monotonic() - start) * 1000,            )            return result        except Exception as e:            logger.error("tool.failure",                tool=tool_name,                args=kwargs,                error=str(e),                latency_ms=(time.monotonic() - start) * 1000,            )            raise    return wrapper# Apply to any tool — stacks cleanly with @tool@logged_tool@tooldef get_customer_profile(customer_id: str) -> str:    """Fetch customer profile from CRM by customer ID."""    ...

The MCP vs RAG vs Tools article goes into when tool use is the right answer versus retrieval or direct reasoning — worth reading alongside this if you're making architectural decisions.

Comparison: Choosing Your Tool Pattern

Before the decision framework, here's the full picture side by side:

Tool Type	Latency	Fault Isolation	Scalability	Operational Complexity	Best For
Local	Very low	None — tool crash = agent crash	Poor	Low	Computation, formatting, in-process logic
API-based	Medium	High — service boundary	Good	Medium	Shared business logic, centralized impl
Plugin	Medium	Provider-dependent	Medium	Low	Third-party commoditized capabilities
MCP	Medium	High — server boundary	Excellent	Medium	Multi-framework ecosystems, tool distribution
Stateful	Variable	Risky — shared state leaks	Hard	High	Multi-step processes requiring persistent session

"Operational complexity" here means the surface area you're responsible for — not the difficulty of the initial implementation. A local tool is trivial to add and painful to maintain at scale. An MCP server requires more upfront work but gives you independent deployability, versioning, and the ability to reuse across any MCP-compatible agent.

The latency column deserves a note: all networked tool types (API, Plugin, MCP) add a round trip. In a multi-step ReAct loop where an agent makes 8–12 tool calls, a 100ms tool latency difference compounds into 1–1.2 seconds. Profile your tool latency early, especially for latency-sensitive user-facing workflows.

What This Looks Like in Practice

Here's how a customer support agent would actually decompose its tool layer across types:

mermaid

graph LR
    AGENT([Customer Support Agent])

    AGENT --> LOCAL[Local Tools]
    AGENT --> API[API-Based Tools]
    AGENT --> MCP[MCP Servers]
    AGENT --> STATEFUL[Stateful Tool]

    LOCAL --> L1["format_ticket_summary(ticket)\n→ str"]
    LOCAL --> L2["validate_email(email)\n→ bool"]
    LOCAL --> L3["calculate_sla_deadline(priority, created_at)\n→ datetime"]

    API --> A1["get_customer_profile(customer_id)\nCRM lookup · shared across agents"]
    API --> A2["update_ticket_status(ticket_id, status)\nInternal ticketing API"]

    MCP --> M1["mcp-server-knowledge-base\nSearch internal docs, FAQs, runbooks"]
    MCP --> M2["mcp-server-incident-history\nQuery past incidents and resolutions"]

    STATEFUL --> S1["BrowserSessionTool\nNavigate support portal · fill forms\nIsolated per session · explicit cleanup"]

    style AGENT fill:#4A90E2,color:#fff,stroke:#2c6fad

    style LOCAL fill:#6BCF7F,color:#fff,stroke:#4aad61
    style L1 fill:#98D8C8,color:#333,stroke:#6ab8a8
    style L2 fill:#98D8C8,color:#333,stroke:#6ab8a8
    style L3 fill:#98D8C8,color:#333,stroke:#6ab8a8

    style API fill:#6BCF7F,color:#fff,stroke:#4aad61
    style A1 fill:#98D8C8,color:#333,stroke:#6ab8a8
    style A2 fill:#98D8C8,color:#333,stroke:#6ab8a8

    style MCP fill:#7B68EE,color:#fff,stroke:#5a4ecc
    style M1 fill:#B8B0F0,color:#333,stroke:#8880cc
    style M2 fill:#B8B0F0,color:#333,stroke:#8880cc

    style STATEFUL fill:#FFA07A,color:#fff,stroke:#cc6040
    style S1 fill:#FFCBA4,color:#333,stroke:#cc9060

The split is intentional, not arbitrary. Formatting and validation are pure functions — no reason to add a network hop. CRM lookup is shared across three other agents; centralizing it as an API means one place to update auth, rate limiting, and field mappings. The knowledge base and incident history are MCP servers because the platform team owns them independently, and they'll eventually be used by agents built in different frameworks. Browser automation is stateful because navigating the support portal is inherently multi-step — you can't do it with a single stateless call.

Each tool type earns its place by solving a specific problem. When you don't have that specific problem, use the simpler type.

Conclusion

Tool use is the thing that makes agents actually useful. But the ecosystem has grown fast, and most teams are building on assumptions they've never examined.

Here's where things actually stand:

Local tools are where you start. They're fast, simple, and appropriate for anything that doesn't need to scale beyond a single process. Most agents should have at least some local tools for computation, formatting, and in-process logic.

API-based tools are what production systems actually run on. You want fault isolation, independent deployability, and shared implementation. The operational overhead is real but unavoidable at scale.

Plugin tools are useful for standard integrations you don't want to maintain. Use them for commoditized capabilities — search, calendar, email — not for anything business-critical that needs your specific behavior.

MCP is the right architectural bet for anything that needs to survive framework churn. Build your tool servers MCP-compatible now. The protocol is still maturing, but the direction is clear: this is how tools will be distributed and consumed going forward.

Stateful tools are powerful and dangerous. Use them when you genuinely need persistent state across tool calls. Isolate instances per agent run. Log every state transition. Treat them with the same security posture you'd apply to any component with persistent elevated access.

Automated tool development is still early-stage. It works for specific patterns — code interpretation, OpenAPI-to-tool generation, schema-driven extraction. Don't build production systems that depend on LLM-generated tools without a human review gate somewhere in the loop.

On configuration: default to auto for tool_choice. Use required deliberately and with edge-case testing. And implement the full error handling stack — schema validation, retry with backoff, graceful degradation, and logging — before you call anything production-ready.

Which Tool Type Do You Actually Need?

mermaid

flowchart TD
    START([Start here]) --> Q1{Simple computation,\nformatting, or\nin-process logic?}

    Q1 -->|Yes| LOCAL[Local Tool]
    Q1 -->|No| Q2{Shared capability\nused by multiple\nagents?}

    Q2 -->|Yes| API[API-Based Tool]
    Q2 -->|No| Q3{Third-party service\nyou don't want to\nmaintain wrappers for?}

    Q3 -->|Yes| PLUGIN[Plugin Tool\nOpenAPI generation\nif they have a spec]
    Q3 -->|No| Q4{Tools need to work\nacross frameworks,\nteams, or providers?}

    Q4 -->|Yes| MCP[MCP Server]
    Q4 -->|No| Q5{Multi-step process\nrequiring persistent\nsession state?}

    Q5 -->|Yes| STATEFUL[Stateful Tool\nisolation + explicit\nlifecycle management]
    Q5 -->|No| LOCAL2[Local Tool\nstart simple]

    style START fill:#4A90E2,color:#fff,stroke:#2c6fad
    style LOCAL fill:#6BCF7F,color:#fff,stroke:#4aad61
    style LOCAL2 fill:#6BCF7F,color:#fff,stroke:#4aad61
    style API fill:#6BCF7F,color:#fff,stroke:#4aad61
    style PLUGIN fill:#6BCF7F,color:#fff,stroke:#4aad61
    style MCP fill:#7B68EE,color:#fff,stroke:#5a4ecc
    style STATEFUL fill:#FFA07A,color:#fff,stroke:#cc6040
    style Q1 fill:#FFD93D,color:#333,stroke:#ccaa00
    style Q2 fill:#FFD93D,color:#333,stroke:#ccaa00
    style Q3 fill:#FFD93D,color:#333,stroke:#ccaa00
    style Q4 fill:#FFD93D,color:#333,stroke:#ccaa00
    style Q5 fill:#FFD93D,color:#333,stroke:#ccaa00

Start at the top. Go down only when you have a concrete reason to. Every step down the list adds complexity — make sure it's complexity you actually need.

The agents that survive production aren't the ones with the most tools. They're the ones that know when to use each tool, handle failures without collapsing, and make every tool call observable. Build toward that from the start.

For a deeper look at how tool execution fits into the broader security architecture of agentic systems, see Zero Trust Agents and The Agent DMZ.

References

LangChain & LangGraph

LangChain Tools documentation — https://python.langchain.com/docs/concepts/tools/
LangChain bind_tools() reference — https://python.langchain.com/docs/how_to/tool_calling/
LangChain OpenAPI toolkit — https://python.langchain.com/docs/integrations/toolkits/openapi/
LangGraph prebuilt ReAct agent — https://langchain-ai.github.io/langgraph/reference/prebuilt/
langchain-mcp-adapters repository — https://github.com/langchain-ai/langchain-mcp-adapters

Model Context Protocol (MCP)

MCP specification — https://modelcontextprotocol.io/specification
MCP Python SDK — https://github.com/modelcontextprotocol/python-sdk
MCP server: filesystem — https://github.com/modelcontextprotocol/servers/tree/main/src/filesystem
MCP server: postgres — https://github.com/modelcontextprotocol/servers/tree/main/src/postgres

Provider Tool Use APIs

Anthropic tool use documentation — https://docs.anthropic.com/en/docs/build-with-claude/tool-use
OpenAI function calling documentation — https://platform.openai.com/docs/guides/function-calling
Google Gemini function calling — https://ai.google.dev/gemini-api/docs/function-calling

OpenAPI Specification

OpenAPI 3.x specification — https://spec.openapis.org/oas/v3.1.0

Libraries

Pydantic v2 documentation — https://docs.pydantic.dev/latest/
Tenacity retry library — https://tenacity.readthedocs.io/en/latest/
FAISS (Facebook AI Similarity Search) — https://github.com/facebookresearch/faiss
httpx — https://www.python-httpx.org/
psycopg2 — https://www.psycopg.org/docs/
structlog — https://www.structlog.org/en/stable/

ReAct Reasoning Pattern

Yao, S., Zhao, J., Yu, D., et al. (2023). ReAct: Synergizing Reasoning and Acting in Language Models. ICLR 2023. https://arxiv.org/abs/2210.03629

RLHF / Instruction Tuning

Ouyang, L., Wu, J., Jiang, X., et al. (2022). Training language models to follow instructions with human feedback. arXiv:2203.02155. https://arxiv.org/abs/2203.02155

AI Engineering

Agentic AI

The Agent Trust Problem: Why Security Theater Won't Save Us from Agentic AI

Langgraph

From LLMs to Agents: The Mindset Shift Nobody Talks About

Follow for more technical deep dives on AI/ML systems, production engineering, and building real-world applications:

How LangChain Structures LLM Interactions

Binding and Invoking Tools

Local Tools

What a Local Tool Looks Like

Schema Is the Interface

Where Local Tools Break Down

API-Based Tools

What an API Tool Looks Like

Plugin Tools

How Plugin Tools Are Built

The Ecosystem Today

Model Context Protocol (MCP)

The Protocol

LangChain + MCP

What MCP Gets Right

What MCP Doesn't Solve Yet

Stateful Tools

Why You'd Want Them

The Security Problems

Automated Tool Development

Foundation Models as Tool Makers

Real-Time Code Generation

Tool Use Configuration

The tool_choice Parameter

The Too Many Tools Problem

When Things Break: Error Handling

Comparison: Choosing Your Tool Pattern

What This Looks Like in Practice

Conclusion

Which Tool Type Do You Actually Need?

References

Related Articles

Comments

The `tool_choice` Parameter