How Large Language Models Use Tools

Background

1. Transformer

The Transformer is the foundation of all modern large language models (LLMs), including GPT, Claude, Llama, and Gemini.
It is a sequence model that predicts each next token given all previous tokens:

1
P(x_t | x_1, x_2, …, x_{t-1})

Transformers use self-attention to capture relationships between tokens, allowing them to reason over long contexts and generate coherent, contextually relevant language.

However, Transformers are purely text-based — they do not natively know how to fetch real-time data or interact with tools and APIs.

That limitation is what protocols like MCP (Model Context Protocol) are designed to solve.

2. RAG (Retrieval-Augmented Generation)

RAG extends LLMs by combining them with retrieval systems to access external knowledge at inference time.

How it works

  • A retriever searches a knowledge base or document store for relevant context.
  • The retrieved information is injected into the model’s prompt.
  • The LLM generates an answer grounded in that external information.

Example using LangChain:

1
2
3
4
5
6
7
8
9
10
from langchain.chains import RetrievalQA
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain_community.vectorstores import FAISS

embeddings = OpenAIEmbeddings()
vectorstore = FAISS.from_texts(["docs"], embedding=embeddings)
retriever = vectorstore.as_retriever()
llm = ChatOpenAI(model="gpt-4-turbo")
qa = RetrievalQA.from_chain_type(llm=llm, retriever=retriever)
qa.run("What are the key findings?")

RAG rose to prominence in 2023, during the surge of enterprise ChatGPT applications, when developers needed to connect LLMs to private or frequently updated data sources without retraining the models themselves.

It became the standard solution for building internal chatbots and knowledge-grounded assistants.

However, RAG is read-only — it retrieves information before generation but cannot act, update, or call live systems.

To address that limitation, the AI industry moved toward MCP, which adds dynamic, runtime interactivity.

3. MCP (Model Context Protocol)

The Model Context Protocol (MCP), introduced by Anthropic in November 2024, is an open standard that defines how large language models can discover, call, and interact with external tools, APIs, or resources during inference. It evolved from earlier function-calling mechanisms, generalizing them into a structured and interoperable protocol that any orchestrator can implement.

MCP allows an LLM to emit structured messages to request operations, receive results, and use those results in its reasoning loop.

This transforms a static, text-only LLM into a fully interactive reasoning agent.

A typical MCP message looks like this:

1
2
3
4
5
6
{
"type": "mcp_call",
"target": "weather_api",
"action": "get_forecast",
"parameters": {"city": "Paris"}
}

When the model generates such a message, the MCP runtime (for example, Claude, AWS Bedrock, or another orchestrator) intercepts it, executes the corresponding request on the specified API or service, and returns the result to the model. The runtime then feeds that response back into the model’s context window, allowing it to continue reasoning with current information. This capability enables dynamic tool use, live data access, and decision-driven behavior — far beyond what RAG’s static, pre-retrieval model can do.

MCP has been adopted as a foundation for tool-using LLMs, powering systems like Claude 3, AWS Bedrock Agents, and integration frameworks like LangGraph. Through MCP, an LLM is no longer limited to “read-only” operations; it can now interact with the world, performing tasks, running analyses, and chaining live data operations — all using text as its interface.

4. Comparison: Different Tool-Calling Approaches

While MCP is Anthropic’s protocol, different LLM providers implement tool calling in varying ways:

Approach Provider(s) Key Characteristics Standardization
Function Calling OpenAI, Google Native API support; tools defined in API request; JSON schema for parameters Provider-specific
MCP Anthropic, Claude Open protocol; server-client architecture; tool discovery and registration Open standard
ReAct Pattern LangChain, Custom Reasoning + Acting loop; text-based tool descriptions; requires prompt engineering Framework-level
Bedrock Agents AWS Managed service; action groups; integrates with AWS services AWS-specific
Tool Use API Various (Cohere, etc) Provider-specific implementations; similar concepts, different formats Provider-specific

Key Differences:

  • OpenAI Function Calling: Tightly integrated into the API, tools are passed as parameters in each request. Models are fine-tuned to generate function calls in JSON format.
  • MCP: Focuses on tool discovery and interoperability. MCP servers expose tools that any MCP-compatible client can use, promoting reusability across different applications.
  • ReAct Pattern: More flexible but requires careful prompt design. The model explicitly reasons about which tool to use and why, making the decision process more transparent.

Each approach has trade-offs between ease of use, flexibility, and vendor lock-in.

How Can a Text-Only Transformer “Use” a Tool?

1. The Flow

Although Transformers operate on text, MCP gives them a way to “use” external tools through structured communication.

The LLM emits formatted output (like JSON) that the runtime interprets as an MCP request, executes, and then re-injects as text.

Step-by-step flow

Step 1: User asks a question

1
"What's the current stock price of Tesla?"

Step 2: LLM reasoning (Transformer)

  • Recognizes it needs external data
  • Generates a structured MCP call:
1
{ "type": "mcp_call", "target": "finance_api", "action": "get_stock", "parameters": {"ticker": "TSLA"} }

Step 3: MCP runtime intercepts this structured output and performs the request

Step 4: MCP server/tool executes the call and returns a result:

1
{"price": 280.14, "currency": "USD"}

Step 5: Runtime injects the result back into the model’s context:

1
<tool_response>{"price":280.14,"currency":"USD"}</tool_response>

Step 6: LLM continues generation

1
"Tesla's current stock price is approximately $280."

The Transformer itself remains unaware of APIs or tools — it only generates tokens. But through MCP, those tokens are interpreted as actionable commands, bridging the model’s reasoning with external computation.

2. When the LLM “Decides” to Use MCP Tools

At the start of a session, the LLM is informed about available MCP tools — their names, descriptions, input types, and expected outputs.

Using that metadata, it decides dynamically when a particular request requires external execution.

The model’s internal reasoning determines when to emit a structured mcp_call, and the MCP runtime handles everything else.

3. How MCP Tools Are Used

  1. Expose tools through MCP configuration files or manifests (including endpoints, parameters, and descriptions).
  2. Provide metadata about available tools to the LLM before or at runtime.
  3. Allow the model to reason about when to use those tools.
  4. Execute the structured calls through the MCP runtime and feed the results back as context.

This creates a closed reasoning-action-feedback loop, allowing a model to think, act, and iterate — effectively integrating decision-making with real-world capabilities.

4. Error Handling in MCP

When a tool call fails or returns an error, the MCP runtime handles it gracefully:

  1. Tool Unavailable: If the requested tool doesn’t exist, the runtime returns an error message to the LLM, which can then try alternative approaches or inform the user.

  2. Execution Failure: If the tool execution fails (e.g., API timeout, invalid parameters), the error details are returned to the model as context:

1
<tool_error>{"error": "API timeout", "tool": "weather_api", "details": "Request timed out after 30s"}</tool_error>
  1. Model Retry Logic: The LLM can analyze the error and decide whether to:
    • Retry with corrected parameters
    • Use a different tool
    • Inform the user about the limitation

This error handling ensures robust agent behavior in production environments where external services may be unreliable.

Example — Using AWS Bedrock Agents (MCP in Action)

1. What is AWS Bedrock Agents?

AWS Bedrock Agents (part of Amazon Bedrock) is a managed service for building AI agents that can use tools and execute actions.
It enables large language models to:

  • Discover and describe available tools through action groups,
  • Issue structured function calls at runtime,
  • Execute those calls securely and at scale, and
  • Incorporate the results directly into their reasoning context.
    In short, Bedrock Agents operationalizes tool-calling architecture for production environments.

2. How Bedrock Agents Implements Tool Calling

Component Role
Action Groups Registers APIs, Lambda functions, or databases as callable tools.
Agent Runtime Manages execution of function calls and streams results back.
LLM (Transformer) Emits structured function call requests dynamically during reasoning.
Tool Server External endpoint (Lambda, API Gateway) that performs the requested operation.

3. Example Flow

Step 1 — Register Tool

1
2
3
4
5
6
7
tools:
- name: weather_api
protocol: MCP
endpoint: https://api.weather.example.com/get_forecast
description: Provides weather forecasts for a city.
inputs:
city: string

Step 2 — User Query

1
“What’s the weather in Seattle?”

Step 3 — LLM Generates MCP Call

1
2
3
4
5
6
{
"type": "mcp_call",
"target": "weather_api",
"action": "get_forecast",
"parameters": {"city": "Seattle"}
}

Step 4 — AgentCore Executes the Call

Tool returns:

1
{"temperature": 15, "condition": "Cloudy"}

Step 5 — Response Returned to LLM

1
<tool_response>{"temperature":15,"condition":"Cloudy"}</tool_response>

Step 6 — LLM Generates Final Answer

1
"It's 15°C and cloudy in Seattle today."

This demonstrates a complete tool-calling reasoning-action loop in a real-world deployment using AWS Bedrock Agents.

Summary

Concept Role Focus (Tool Integration)
Transformer (LLM) Predicts tokens and emits structured tool calls Foundation for reasoning
RAG Retrieves static knowledge (popular in 2023 for data-grounded chatbots) Can be wrapped as a tool or resource
MCP Protocol for tool discovery, calls, and responses Enables dynamic reasoning and external interaction
Bedrock Agents AWS-managed agent service Executes and secures tool calls at scale
External Tools APIs, microservices, or databases Provide live data and perform real-world actions

References