How Large Language Models Use Tools

Posted on 2025-10-18

Background

1. Transformer

The Transformer is the foundation of all modern large language models (LLMs), including GPT, Claude, Llama, and Gemini.
It is a sequence model that predicts each next token given all previous tokens:

1	P(x_t \| x_1, x_2, …, x_{t-1})

Transformers use self-attention to capture relationships between tokens, allowing them to reason over long contexts and generate coherent, contextually relevant language.

However, Transformers are purely text-based — they do not natively know how to fetch real-time data or interact with tools and APIs.

That limitation is what protocols like MCP (Model Context Protocol) are designed to solve.

2. RAG (Retrieval-Augmented Generation)

RAG extends LLMs by combining them with retrieval systems to access external knowledge at inference time.

How it works

A retriever searches a knowledge base or document store for relevant context.
The retrieved information is injected into the model’s prompt.
The LLM generates an answer grounded in that external information.

Example using LangChain:

from langchain.chains import RetrievalQA
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain_community.vectorstores import FAISS

embeddings = OpenAIEmbeddings()
vectorstore = FAISS.from_texts(["docs"], embedding=embeddings)
retriever = vectorstore.as_retriever()
llm = ChatOpenAI(model="gpt-4-turbo")
qa = RetrievalQA.from_chain_type(llm=llm, retriever=retriever)
qa.run("What are the key findings?")

RAG rose to prominence in 2023, during the surge of enterprise ChatGPT applications, when developers needed to connect LLMs to private or frequently updated data sources without retraining the models themselves.

It became the standard solution for building internal chatbots and knowledge-grounded assistants.

However, RAG is read-only — it retrieves information before generation but cannot act, update, or call live systems.

To address that limitation, the AI industry moved toward MCP, which adds dynamic, runtime interactivity.

3. MCP (Model Context Protocol)

The Model Context Protocol (MCP), introduced by Anthropic in November 2024, is an open standard that defines how large language models can discover, call, and interact with external tools, APIs, or resources during inference. It evolved from earlier function-calling mechanisms, generalizing them into a structured and interoperable protocol that any orchestrator can implement.

MCP allows an LLM to emit structured messages to request operations, receive results, and use those results in its reasoning loop.

This transforms a static, text-only LLM into a fully interactive reasoning agent.

A typical MCP message looks like this:

{
  "type": "mcp_call",
  "target": "weather_api",
  "action": "get_forecast",
  "parameters": {"city": "Paris"}
}

When the model generates such a message, the MCP runtime (for example, Claude, AWS Bedrock, or another orchestrator) intercepts it, executes the corresponding request on the specified API or service, and returns the result to the model. The runtime then feeds that response back into the model’s context window, allowing it to continue reasoning with current information. This capability enables dynamic tool use, live data access, and decision-driven behavior — far beyond what RAG’s static, pre-retrieval model can do.

MCP has been adopted as a foundation for tool-using LLMs, powering systems like Claude 3, AWS Bedrock Agents, and integration frameworks like LangGraph. Through MCP, an LLM is no longer limited to “read-only” operations; it can now interact with the world, performing tasks, running analyses, and chaining live data operations — all using text as its interface.

4. Comparison: Different Tool-Calling Approaches

While MCP is Anthropic’s protocol, different LLM providers implement tool calling in varying ways:

Approach	Provider(s)	Key Characteristics	Standardization
Function Calling	OpenAI, Google	Native API support; tools defined in API request; JSON schema for parameters	Provider-specific
MCP	Anthropic, Claude	Open protocol; server-client architecture; tool discovery and registration	Open standard
ReAct Pattern	LangChain, Custom	Reasoning + Acting loop; text-based tool descriptions; requires prompt engineering	Framework-level
Bedrock Agents	AWS	Managed service; action groups; integrates with AWS services	AWS-specific
Tool Use API	Various (Cohere, etc)	Provider-specific implementations; similar concepts, different formats	Provider-specific

Key Differences:

OpenAI Function Calling: Tightly integrated into the API, tools are passed as parameters in each request. Models are fine-tuned to generate function calls in JSON format.
MCP: Focuses on tool discovery and interoperability. MCP servers expose tools that any MCP-compatible client can use, promoting reusability across different applications.
ReAct Pattern: More flexible but requires careful prompt design. The model explicitly reasons about which tool to use and why, making the decision process more transparent.

Each approach has trade-offs between ease of use, flexibility, and vendor lock-in.

How Can a Text-Only Transformer “Use” a Tool?

1. The Flow

Although Transformers operate on text, MCP gives them a way to “use” external tools through structured communication.

The LLM emits formatted output (like JSON) that the runtime interprets as an MCP request, executes, and then re-injects as text.

Step-by-step flow

Step 1: User asks a question

1	"What's the current stock price of Tesla?"

Step 2: LLM reasoning (Transformer)

Recognizes it needs external data
Generates a structured MCP call:

1	{ "type": "mcp_call", "target": "finance_api", "action": "get_stock", "parameters": {"ticker": "TSLA"} }

Step 3: MCP runtime intercepts this structured output and performs the request

Step 4: MCP server/tool executes the call and returns a result:

1	{"price": 280.14, "currency": "USD"}

Step 5: Runtime injects the result back into the model’s context:

1	<tool_response>{"price":280.14,"currency":"USD"}</tool_response>

Step 6: LLM continues generation

1	"Tesla's current stock price is approximately $280."

The Transformer itself remains unaware of APIs or tools — it only generates tokens. But through MCP, those tokens are interpreted as actionable commands, bridging the model’s reasoning with external computation.

2. When the LLM “Decides” to Use MCP Tools

At the start of a session, the LLM is informed about available MCP tools — their names, descriptions, input types, and expected outputs.

Using that metadata, it decides dynamically when a particular request requires external execution.

The model’s internal reasoning determines when to emit a structured mcp_call, and the MCP runtime handles everything else.

3. How MCP Tools Are Used

Expose tools through MCP configuration files or manifests (including endpoints, parameters, and descriptions).
Provide metadata about available tools to the LLM before or at runtime.
Allow the model to reason about when to use those tools.
Execute the structured calls through the MCP runtime and feed the results back as context.

This creates a closed reasoning-action-feedback loop, allowing a model to think, act, and iterate — effectively integrating decision-making with real-world capabilities.

4. Error Handling in MCP

When a tool call fails or returns an error, the MCP runtime handles it gracefully:

Tool Unavailable: If the requested tool doesn’t exist, the runtime returns an error message to the LLM, which can then try alternative approaches or inform the user.
Execution Failure: If the tool execution fails (e.g., API timeout, invalid parameters), the error details are returned to the model as context:

1	<tool_error>{"error": "API timeout", "tool": "weather_api", "details": "Request timed out after 30s"}</tool_error>

Model Retry Logic: The LLM can analyze the error and decide whether to:
- Retry with corrected parameters
- Use a different tool
- Inform the user about the limitation

This error handling ensures robust agent behavior in production environments where external services may be unreliable.

Example — Using AWS Bedrock Agents (MCP in Action)

1. What is AWS Bedrock Agents?

AWS Bedrock Agents (part of Amazon Bedrock) is a managed service for building AI agents that can use tools and execute actions.
It enables large language models to:

Discover and describe available tools through action groups,
Issue structured function calls at runtime,
Execute those calls securely and at scale, and
Incorporate the results directly into their reasoning context.
In short, Bedrock Agents operationalizes tool-calling architecture for production environments.

2. How Bedrock Agents Implements Tool Calling

Component	Role
Action Groups	Registers APIs, Lambda functions, or databases as callable tools.
Agent Runtime	Manages execution of function calls and streams results back.
LLM (Transformer)	Emits structured function call requests dynamically during reasoning.
Tool Server	External endpoint (Lambda, API Gateway) that performs the requested operation.

3. Example Flow

Step 1 — Register Tool

tools:
  - name: weather_api
    protocol: MCP
    endpoint: https://api.weather.example.com/get_forecast
    description: Provides weather forecasts for a city.
    inputs:
      city: string

Step 2 — User Query

1	“What’s the weather in Seattle?”

Step 3 — LLM Generates MCP Call

{
  "type": "mcp_call",
  "target": "weather_api",
  "action": "get_forecast",
  "parameters": {"city": "Seattle"}
}

Step 4 — AgentCore Executes the Call

Tool returns:

1	{"temperature": 15, "condition": "Cloudy"}

Step 5 — Response Returned to LLM

1	<tool_response>{"temperature":15,"condition":"Cloudy"}</tool_response>

Step 6 — LLM Generates Final Answer

1	"It's 15°C and cloudy in Seattle today."

This demonstrates a complete tool-calling reasoning-action loop in a real-world deployment using AWS Bedrock Agents.

Summary

Concept	Role	Focus (Tool Integration)
Transformer (LLM)	Predicts tokens and emits structured tool calls	Foundation for reasoning
RAG	Retrieves static knowledge (popular in 2023 for data-grounded chatbots)	Can be wrapped as a tool or resource
MCP	Protocol for tool discovery, calls, and responses	Enables dynamic reasoning and external interaction
Bedrock Agents	AWS-managed agent service	Executes and secures tool calls at scale
External Tools	APIs, microservices, or databases	Provide live data and perform real-world actions

References

Model Context Protocol Documentation - Official MCP specification by Anthropic
OpenAI Function Calling Guide - OpenAI’s tool use implementation
AWS Bedrock Agents Documentation - AWS Bedrock Agents overview
LangChain Documentation - Framework for building LLM applications
Attention Is All You Need - Original Transformer paper (Vaswani et al., 2017)
ReAct: Synergizing Reasoning and Acting in Language Models - ReAct pattern paper (Yao et al., 2022)