What is Agent Trajectory?
Agent Trajectory is the ordered record of an AI agent run, including observations, messages, decisions, tool calls, tool results, errors, approvals, state changes, and final outputs.
How It Works
An Agent Trajectory is the evidence trail for what an agent actually did. It is different from the final answer: it captures the intermediate steps that led to the answer or action. Trajectories are essential for debugging, evaluation, audit, cost analysis, and safety review. They should be structured enough to replay or inspect, but they also require careful privacy handling because they may contain user data, retrieved documents, tool outputs, and sensitive reasoning artifacts.
Key Characteristics
- Ordered run record: preserves the sequence of observations, decisions, actions, and results
- Debugging asset: helps locate the step where an agent became wrong, stuck, or unsafe
- Evaluation input: can be scored for tool choice, evidence use, policy compliance, and task success
- Audit trail: records approvals, side effects, errors, and final outputs
- Privacy-sensitive: may contain prompts, user data, retrieved context, and tool outputs
Common Use Cases
- Debugging why an agent called the wrong tool
- Evaluating whether a RAG agent used appropriate evidence
- Auditing externally visible actions such as emails or tickets
- Training regression tests from failed or successful agent runs
- Calculating cost and latency by step in an autonomous workflow
Example
Loading code...Frequently Asked Questions
How is Agent Trajectory different from a chat transcript?
A chat transcript records visible messages. A trajectory includes internal steps such as tool calls, retrieved evidence, approvals, errors, state changes, and intermediate observations.
Why are trajectories important for evaluation?
Final answers do not reveal whether the agent used the right evidence or took unsafe steps. Trajectories let evaluators judge process quality, not only output quality.
Can trajectories be replayed?
Sometimes. Replay requires stable tool versions, stored inputs, deterministic settings where possible, and careful handling of external side effects.
What should be redacted from trajectories?
Sensitive user data, credentials, private documents, unnecessary raw prompts, and high-risk tool outputs should be redacted or access-controlled according to policy.