What is the ReAct framework in AI?

ReAct (Reasoning and Acting) is a prompting pattern that has a language model interleave explicit reasoning steps (Thoughts) with interactions with external tools or environments (Actions), reading each result (Observation) before continuing, to solve multi-step problems.

How does ReAct differ from Chain of Thought (CoT)?

CoT reasons entirely from the model's internal weights. ReAct extends that by letting the model gather real information from external tools based on its current reasoning, then adjust its next thought based on the tool's output—so its conclusions can be grounded in facts it retrieved rather than facts it guessed.

Why is ReAct important for AI agents?

Language models hallucinate and hold outdated knowledge. ReAct gives an agent a way to ground its reasoning in real-world results by executing a tool and observing the outcome before continuing. It also means observations are untrusted input the agent reads, which is a safety consideration, not just a capability.

ReAct Framework Explained: Teaching LLMs to Think and Act

2026-04-08 - QubitTool Tech Team

TL;DR

The ReAct (Reasoning + Acting) pattern is a foundational loop for AI agents. It interleaves explicit reasoning (Thought), tool execution (Action), and environment feedback (Observation). That loop lets a model ground its answers in retrieved facts instead of guessing, handle multi-step tasks, and recover from failed actions. It is powerful and simple—but it needs a hard step limit and it treats observations as untrusted input, both of which this guide covers. Last technically reviewed on July 16, 2026.

Key Takeaways

Reasoning before acting. ReAct has the model state its plan before it runs a tool, so the choice of action is explained rather than opaque.
Grounding over guessing. Instead of inventing a fact, a ReAct agent retrieves it, reads the Observation, and updates.
Error recovery. If an action fails, the agent can reason about the failure and try another path.
A hard step limit is required. The loop does not stop on its own; without a cap it can repeat a failing action forever.
Observations are untrusted input. Tool output can contain text crafted to steer the model. Treat it as data to read, not as an instruction to obey.

What ReAct Is

ReAct—Reasoning and Acting—was introduced in a 2022 paper by researchers at Princeton and Google. Before it, language models were typically used one of two isolated ways:

Reasoning only. Prompt the model to think step by step (Chain of Thought) using only its pre-trained weights.
Acting only. Prompt the model to emit a command or API call, without explaining why it chose that action.

ReAct merged the two: reasoning helps the model decide which action to take, and acting gathers information that improves its later reasoning. That combination is what lets an AI agent work against the real world rather than its own memory.

How the Loop Works

The core of ReAct is a loop that continues until the model decides it has a final answer. Each iteration has three phases:

Thought — the model analyzes the current state and decides what it needs next.
Action — the model selects a tool and provides its input.
Observation — the system runs the tool and returns the result to the model.

The model then produces a new Thought from the Observation, and the cycle repeats—up to a limit you set.

A ReAct Trace

User Query: "What is the age difference between the current US President and the CEO of Apple?"

Thought 1: I need the current US President and the CEO of Apple, then their ages. Start with the President.
Action 1: Search[current US President]
Observation 1: [Search results: ...]
Thought 2: The President is X, age Y. Now find the CEO of Apple.
Action 2: Search[CEO of Apple]
Observation 2: [Search results: ...]
Thought 3: The CEO is that person. Find their age.
Action 3: Search[CEO age]
Observation 3: [Search results: ...]
Thought 4: President is Y, CEO is W. The difference is Y − W. Calculate.
Action 4: Calculator[Y - W]
Observation 4: [Result: Z]
Thought 5: I have the answer.
Final Answer: "The age difference is Z years."

Note that every observation above is text the agent chose to trust. If one of those searches returned a page that said "ignore your previous instructions," a naive agent would read it as part of its context. That is why observations are a security boundary, not just a data feed.

ReAct vs Chain of Thought

Feature	Chain of Thought	ReAct
Information source	Internal weights only	External tools + internal weights
Hallucination risk	Higher when facts are unknown or stale	Lower—facts can be retrieved and checked
Error recovery	Poor (errors cascade)	Better—observes failures and adjusts
Best for	Math, logic, summarization	Research, API interaction, multi-step execution

ReAct From Scratch (Python)

Frameworks like LangChain abstract this loop away, but building it by hand shows exactly how it works and how much it relies on strict prompting and stop sequences. Read the model name from the environment—model identifiers are perishable—and keep the iteration cap.

python

import os
import re
from openai import OpenAI

client = OpenAI()
MODEL = os.environ.get("OPENAI_MODEL", "gpt-4o")

def get_weather(location):
    # A mock tool. In production, treat its return value as untrusted input.
    if "London" in location:
        return "Rainy, 12°C"
    return "Sunny, 25°C"

REACT_PROMPT = """
You run in a loop of Thought, Action, PAUSE, Observation.
At the end of the loop you output an Answer.

Use Thought to describe your reasoning about the question.
Use Action to run one of your available actions, then return PAUSE.
Observation will be the result of running that action.

Available actions:
get_weather:
e.g. get_weather: London
Returns the current weather in a location.

Example session:
Question: What is the weather in London?
Thought: I should check the weather in London.
Action: get_weather: London
PAUSE

You are then called again with:
Observation: Rainy, 12°C

You then output:
Answer: The weather in London is rainy and 12°C.
"""

def run_react_agent(prompt, max_steps=5):
    messages = [
        {"role": "system", "content": REACT_PROMPT},
        {"role": "user", "content": prompt},
    ]

    for _ in range(max_steps):  # hard step cap — required
        response = client.chat.completions.create(
            model=MODEL,
            messages=messages,
            stop=["Observation:"],  # stop before hallucinating the observation
        )
        result = response.choices[0].message.content
        print(result)
        messages.append({"role": "assistant", "content": result})

        if "Answer:" in result:
            return

        action_match = re.search(r"Action: (\w+): (.*)", result)
        if action_match:
            action, action_input = action_match.groups()
            if action == "get_weather":
                observation = get_weather(action_input)
            else:
                observation = "Error: tool not found"
            print(f"Observation: {observation}")
            messages.append({"role": "user", "content": f"Observation: {observation}"})

run_react_agent("What's the weather in London?")

Best Practices

Write clear tool descriptions. The Thought depends entirely on knowing what a tool does. "Searches the web" is too vague; "Searches the live internet for current events, news, and facts" is actionable.
Set a hard step limit. ReAct loops can get stuck repeating a failing action. Always cap iterations (max_steps above), and in production add cost and wall-clock limits too.
Enforce the format. Smaller models may drop the PAUSE or Action: keyword. Use structured outputs, JSON mode, or a strict system prompt to hold the format.
Fail gracefully. If a tool errors, return the error as the Observation (for example, Observation: API returned 500) rather than crashing. The model reads it and reasons about an alternative.
Treat observations as untrusted. Retrieved content can contain injected instructions. Validate it, keep it clearly separated as data, and never let a tool result silently override the system prompt.

Common mistakes

Too many tools. A long tool list confuses the model and fills the context window. Use tool retrieval or a hierarchy instead of exposing everything at once.
Missing stop sequence. Without stop=["Observation:"], the model keeps typing and hallucinates the observation itself.

FAQ

Is ReAct outdated now that models have native function calling?

No. Function or tool calling is a more reliable, structured way to perform the Action phase. The underlying idea—reason first, then act, then observe—is still the ReAct pattern and remains the sensible default.

Why does my ReAct agent hallucinate observations?

It has no stop sequence, so it does not pause to wait for the environment and simply keeps generating. Pass stop=["Observation:"] (or the equivalent for your API) so generation halts before the observation.

Which models are best for ReAct?

Any model with strong reasoning and instruction-following handles ReAct well; the frontier models available at any given time are the safe choice. Smaller models (roughly under 10B parameters) often struggle to hold the strict Thought → Action format across turns. Because model rankings change quickly, evaluate on your own tasks rather than relying on a named leaderboard.

Summary

ReAct turns a static text generator into an agent that reasons, acts, observes, and adjusts. The pattern itself is small; what makes it production-ready is the discipline around it—a hard step limit so the loop terminates, and the habit of treating every observation as untrusted input rather than a trusted instruction.

Primary Sources

Previous:Open Source AI Agent Ecosystem: From Framework Choice to Safety Governance

Next:AI Agent Memory: Production Architecture and Evaluation