Pydantic Agent
PydanticAI Agent Framework: Technical Analysis
Architecture
Agent as Core Container: The fundamental concept in PydanticAI is the Agent, which serves as the primary interface to interact with LLMs. An Agent instance encapsulates several components of an AI-driven application’s logic:
- System Prompt(s): One or more developer-defined instructions that prime the model (static or dynamic). Static system prompts can be given when creating the Agent, while dynamic prompts can be added via decorated functions (executed at runtime to inject context).
- Function Tool(s): A set of Python functions that the LLM is allowed to
call during its reasoning process. These “tools” (registered with
@agent.tool) enable the model to query external data or perform computations. Each tool’s signature (except the contextual argument) is used to define a schema for the LLM to invoke it, and its docstring serves as the description presented to the model. - Structured Result Type: An optional Pydantic model or data type that
defines the expected structure of the final answer. By specifying
result_typewhen constructing the Agent, developers enforce that the conversation must end with the LLM returning data that can be parsed into this type (e.g. a customBaseModelor even a simple type likebool). - Dependency Injection (Deps): A dependency dataclass or model type for
runtime context. The Agent is parameterized by a
deps_typerepresenting resources or state (database connections, user info, etc.) that can be injected into tools and prompt functions via aRunContext. This provides a type-safe way to pass in contextual data each run (injected asctx.depsin tool functions). - LLM Model & Settings: The Agent can be configured with a default LLM
backend and optional settings. The model is specified by a string or model
class (e.g.,
"openai:gpt-4o"for GPT-4) and is resolved to aModelinterface internally. Settings like temperature, max tokens, etc., can be provided viaModelSettingseither at Agent creation or per run, and will be merged appropriately. PydanticAI’s design is model-agnostic – it supports providers such as OpenAI, Anthropic, Cohere, Google Vertex, etc., through a unified interface, and adding new model integrations requires implementing a simple interface in thepydantic_ai.modelsmodule.
Type-Safe, Pythonic Design: The framework is built by the creators of
Pydantic with an emphasis on clean architecture and type safety. Agents are
generically typed (Agent[DepsType, ResultType]) so that your IDE and static
type checkers can catch mismatches early. For example, an agent expecting
SupportDependencies and returning SupportResult will be of type
Agent[SupportDependencies, SupportResult]. This generic design, paired with
Pydantic’s runtime validation, ensures that both development and execution are
type-consistent – if you misuse a dependency or tool signature, you’ll either
get a type checker error or a clear runtime validation error. The overall
architecture is inspired by FastAPI’s approach (dependency injection and
Pydantic models), aiming to give developers a familiar, Pythonic experience when
building AI agents.
Internal Graph Engine: Under the hood, PydanticAI utilizes a graph-based execution model to orchestrate agent logic. Each step in an agent’s reasoning (injecting system prompts, calling the LLM, handling a tool invocation, validating results, etc.) is represented as a node in a directed graph, implemented by the companion library Pydantic Graph. This design helps avoid monolithic or “spaghetti” control flow by breaking the conversation into discrete, testable units. The Agent builds a graph of these nodes for each run, enabling complex multi-step interactions and even branching logic in a structured way. (For example, one node might represent sending a user prompt to the model, which leads to either a “tool call” node or a “final result” node depending on the model’s reply.) This graph architecture is largely abstracted away from the user API, but it provides a robust foundation for managing loops, retries, and multi-agent workflows in a maintainable fashion.
Multi-Agent and Reusability: PydanticAI agents are designed to be reusable
components. Much like a FastAPI app or router, an Agent can be instantiated
once (as a module-level object, for example) and used for many queries over its
lifetime. In more advanced scenarios, multiple agents can be composed to handle
different parts of a workflow or to interact with each other for complex tasks.
The framework doesn’t enforce a specific multi-agent protocol, but because each
Agent exposes a simple .run() interface, you can have one agent call another
(even as a tool) or coordinate them via custom code or graphs. This flexibility
allows developers to build hierarchical or cooperative agent systems if needed,
on top of the same core abstractions.
Data and Execution Flow
High-Level Flow: When you invoke an agent (e.g.
result = await agent.run(user_input, deps=...)), PydanticAI manages a
multi-turn conversation loop with the LLM, handling tool usage and validations
internally. The execution flow can be outlined as follows:
-
Initializing the Conversation: The Agent prepares the initial message list. This typically includes the system prompt(s) – all static instructions and any dynamically generated system prompts from your
@agent.system_promptfunctions are evaluated now – followed by the user’s prompt message. These are added to an internal message history buffer. (If you provided amessage_historyfrom a prior run to continue a conversation, those messages are prepended accordingly.) -
Model Request: The agent sends the conversation context to the configured LLM Model. Before calling the model’s API, PydanticAI assembles the tool definitions and (if a structured result is expected) a special result schema definition for the model. For LLMs that support function calling (like OpenAI’s), this means providing a JSON schema for each available tool and a pseudo-function for the final result. The parameters for each tool are derived from the function’s signature (excluding the
RunContext), and PydanticAI automatically generates a JSON Schema including types and descriptions (extracted from the docstring) for the model to use. This allows the model to decide during its output whether to call a function (tool) and with what arguments. The model request is made (via an async API call, typically), and the agent logs usage (tokens, API calls) along the way for monitoring. -
Handling the Model’s Response: The LLM’s response is captured and parsed into response parts. PydanticAI normalizes different provider outputs into a common format – typically a sequence of parts which could be plain text segments or function call invocations. For example, an OpenAI response with a function call arrives as structured data (function name and JSON arguments), whereas other models might return a special formatted text indicating a tool call. In both cases, the framework interprets these into
TextPartandToolCallPartobjects internally. At this stage, the agent examines the parts: if the model’s answer includes one or more tool calls, those take priority to be executed. -
Tool Invocation (if any): For each
ToolCallPartin the response, PydanticAI will invoke the corresponding Python function (the tool) with the provided arguments. This is done by constructing aRunContext(carrying the user dependencies, current usage stats, etc.) and calling your tool function. The arguments from the LLM are validated against the function’s schema automatically – Pydantic will coerce and check types. If the LLM provided an invalid argument (e.g. wrong type or fails a Pydantic field validation), the framework will catch the validation error and package it into a special retry message that informs the LLM of the mistake. (The conversation is then set up such that the model gets this feedback and can attempt to call the tool again with corrected parameters.) Assuming the arguments are valid, the tool function executes and returns a result (e.g. a number from a database lookup). The agent takes the tool’s result and serializes it into a message that represents the tool’s output. This result message is appended to the conversation history as if the “assistant” (agent) responded with the tool’s output. -
Continuing the Loop: With the tool output now in the context, the agent prompts the LLM again. Essentially, after any tool call, the Agent creates a new model request that includes: (a) the original conversation messages, plus (b) a system-level instruction or formatted content indicating the tool’s result (so the model can use that information), and then (c) the latest user question if the conversation continues. This loop of “model reply -> possibly tool -> model reply…” continues until a termination condition is reached. Each iteration appends new messages (tool calls, results, or corrected prompts) to the
message_historyand invokes the model again, respecting any usage limits (e.g., max number of calls or tokens) configured. The framework’s internal Graph logic makes these decisions: for example, if the LLM returns both a final answer and a tool call (which can happen in some model responses), a strategy (configurable viaEndStrategy) determines whether to finalize early or execute remaining tool calls. By default the agent will handle all requested tool calls first, unless a final result has been confidently produced. -
Finalizing the Result: Eventually, the LLM produces an answer that signals the end of the conversation. In many cases with structured output, the model will “call” a special result function (internally defined by PydanticAI based on your
result_type) containing the fields of the answer. For instance, if the desired result is aSupportResultmodel with fieldssupport_advice,block_card, andrisk, the model might finish by calling a function (say"FinalAnswer") with those three fields as arguments. PydanticAI recognizes this as the final result. It validates the returned data against theSupportResultschema, just as it would for a tool call, ensuring all fields are present and of correct type. The validated object (a Pydantic model or specified type) becomes the outcome of the run. If instead the model simply returns a raw text answer (which is allowed if no structured schema was set, or if the model failed to use the function schema), the agent can still attempt to parse it. Ifresult_typeis a simple type likestror otherwise text is permitted, it will treat the text as the final result and run any validation on it. If a structured result was expected but the model gives plain text, the agent will not accept it; it will issue a corrective prompt (e.g. a system message like “Plain text responses are not permitted, please call one of the functions instead.”) and loop back to the model request step. After final validation, the conversation loop ends. -
Returning the Output: The Agent returns a
RunResultobject to the caller, which includes the final parsed result data and metadata. For convenience,result.dataholds the actual result (already a Pydantic model or Python type as defined) and is typically what developers use. For example, after running the support agent,result.datawould be an instance ofSupportResultwith fields populated, or simply a Pythonbool/strif that was the result type. The RunResult also contains the full message history and usage information in case you need to inspect the conversation or token counts. Notably, by the time the result is returned, it is guaranteed to be validated against the schema you provided for accuracy and completeness – if the model couldn’t produce a valid result within the retry limits, an exception would be raised instead. In practice, PydanticAI tries to give the model multiple chances to self-correct (feeding it errors) before giving up, which makes the outcome reliable once delivered.
Throughout this flow, PydanticAI handles the complexity of error cases (like
exceeding token limits or invalid responses) by raising structured exceptions
(e.g., UnexpectedModelBehavior for truly unsupported outputs) or by enforcing
usage limits via graceful stops. The developer can focus on defining the tools,
prompts, and schemas, and trust the agent to drive the LLM and tools to a valid
solution through this loop. The design also supports both synchronous and
streaming interactions – for instance, you could use agent.run_sync() for a
blocking call, or agent.run_stream() to yield partial results/tokens as they
arrive (under the hood, there are corresponding StreamModelRequestNode and
other nodes to handle streaming). In summary, the data/execution flow ensures
that the LLM is guided to produce a well-structured result through iterative
prompting and tool use, much like a dialogue between the AI and a set of
utilities, mediated by the Agent. Each “turn” is validated, logged, and checked,
culminating in a reliable outcome.
Code Structure
Repository and Module Layout: The PydanticAI codebase is organized into a
set of Python modules that reflect the framework’s conceptual components. At a
high level, the library is divided into the pydantic_ai package and a closely
related pydantic_graph package (included for graph-based control flow). Key
modules and their responsibilities include:
pydantic_ai.agent: Defines theAgentclass and its methods (likerun,run_sync, etc.), along with supporting classes likeEndStrategyand internal utilities. This is effectively the core of the framework – the Agent class is a dataclass that aggregates configuration (model, prompts, tools, result schema) and implements the logic to build and execute the conversation graph. It also holds internal registries for system prompt functions and tools that are attached via decorators.pydantic_ai.tools: Contains the definition of the Tool system. AToolobject wraps a user-defined function (the actual Python callable) along with metadata like its name, description, and JSON schema. The module provides the@Agent.tooldecorator which, when used, creates a Tool from the function and registers it to an Agent’s internal tool list. It also defines theRunContextclass which is passed into tool functions for dependency access. Essentially, this module handles function tool registration and execution mechanics.pydantic_ai.models: A subpackage with submodules for each supported LLM provider (OpenAI, Anthropic, Cohere, Google Gemini/Vertex, etc.). Each provider module implements a common interface – typically a subclass of a baseModelclass – that knows how to format requests and parse responses for that backend. For example,models.openaiwill have classes to call OpenAI’s chat API, converting PydanticAI’s internal message format to the OpenAI API format and vice versa. Similarly,models.anthropicadapts to Anthropic’s Claude API, which might not support function calling natively, so it will implement a strategy to embed tool calls in the prompt. There’s also amodels.function(for treating a local function as a pseudo-model) andmodels.test(a mock model for testing). This modular structure makes the agent code largely independent of any single LLM service – adding support for a new model is as simple as creating a new module that provides the necessary request/response logic.pydantic_ai.messages: Defines the classes used to represent message data in the conversation. This includesModelMessage(an abstract base or union for different message roles) and specific message part types. Notably, it defines things likeModelRequest(an outgoing prompt with potentially multiple parts),ModelResponse(the model’s answer, which may be composed of text and tool-call parts),TextPart,ToolCallPart,RetryPromptPart, etc. These classes allow the agent to treat a conversation uniformly, regardless of backend, by breaking down model outputs into a standard structure. Themessagesmodule essentially codifies the “language” of conversation between the agent and model (including how function calls are represented as message parts).pydantic_ai.result: Contains utilities for managing the structured result schema and validation. When an Agent has aresult_type, the framework creates a corresponding Result Schema object (likely containing a JSON Schema or Pydantic model schema) used to guide the LLM. This module defines how to convert a Pydantic model or Python type into a schema, and how to validate model outputs against it (including custom validators). It also defines theResultValidatorprotocol, which allows plugging in extra validation logic on the final result if needed. The logic that checks if the LLM’s final function call matches the schema and either accepts it or raises aToolRetryError(to ask for correction) lives here. In short,pydantic_ai.resultis responsible for final answer validation and any post-processing of the LLM’s output into the desired Python object.pydantic_ai.settingsandpydantic_ai.usage: These modules handle configuration of model calls and usage tracking.settingsdefinesModelSettings(e.g., default parameters like temperature, max_tokens, etc., which can be merged with per-run overrides), andmerge_model_settingslogic.usagedefines aUsageclass to accumulate token counts and aUsageLimitsclass to enforce limits across one or multiple runs. The Agent uses these to decide when to halt further calls (for example, if a token budget is exhausted) or simply to report how many tokens were used.pydantic_ai.exceptions: Defines exception types for error handling. For example,UnexpectedModelBehavioris raised if the model returns something completely off-schema or if retries are exceeded. There are also specific exceptions for things like exceeding usage limits or model-specific errors. This module centralizes how errors in the agent’s operation are represented, making it easier for user code to catch and handle them appropriately.pydantic_graph: This is a sub-package included with PydanticAI that provides a generic typed graph execution framework. It defines base classes likeBaseNodeandGraphand manages theGraphRunContext(which carries the state and dependencies through the graph). In the context of PydanticAI, there are specialized node classes defined (some within internal_agent_graph.pyusingpydantic_graph), such asUserPromptNode,ModelRequestNode,HandleResponseNode,FinalResultNode, etc., each corresponding to a phase of the agent’s flow. TheGraphclass handles executing these nodes in sequence until anEndnode is reached, carrying the result. This design enables advanced control flows (like conditional branches or loops) to be encoded cleanly. Whilepydantic_graphcould be used independently for other state machines, in PydanticAI it’s mostly an internal mechanism to organize the agent run logic. There is also a utility to output a graph diagram (using Mermaid) for debugging flows, though that’s more of a development aid.
Beyond these, the repository contains example scripts and documentation
(e.g. the examples/ folder and the Sphinx documentation source) to illustrate
usage patterns such as building a chat app or a retrieval-augmented generation
(RAG) pipeline. The codebase is written in modern Python, utilizing asyncio
for concurrency and dataclasses for configuration containers (most core
classes like Agent and Tool are dataclasses for ease of instantiation and repr).
Generics from the typing module are used extensively to maintain type
information (e.g., Agent[DepsT, ResultT]), and Pydantic V2 features (like
BaseModel for results) are leveraged for data validation. This results in a
clean folder structure where each concern is in its own module, and cross-module
interactions are minimal and well-defined (for instance, the Agent module uses
the Graph module to run, the Tools module to prepare function schemas, the
Models module to actually interface with an API, etc.). The separation of
concerns makes the code relatively approachable: one could inspect
pydantic_ai.agent to understand high-level logic, dive into
pydantic_ai.models.openai to see how OpenAI API calls are made, or look at
pydantic_ai.tools to see how a tool’s schema is constructed. All of these
pieces come together when an Agent’s run() is called, as described in the flow
above.
Implementation Details and Design Choices
Tool Schema and Function Calling: One of the standout implementation
features of PydanticAI is how it bridges Python functions with LLM function
calling. When you decorate a function with @agent.tool, the library
introspects its signature to automatically generate a JSON Schema that describes
that tool’s parameters and return type to the LLM. It strips out the first
RunContext parameter (since that’s internal) and uses the remaining parameter
types (via type hints) to build the schema. The developer’s docstring is parsed
(using the Griffe library) to pull in human-readable descriptions for the
function and its parameters. All this is packaged into the API call – for
OpenAI, it goes into the functions list of the ChatCompletion request; for
other models, the schema might be embedded in the prompt or handled in a
model-specific way. At runtime, if the model “calls” a function, PydanticAI
constructs a ToolCallPart containing the function name and arguments. The
actual invocation uses Python’s reflection: the framework finds the Tool
object by name and calls its underlying Python function. Before calling, it
validates and converts the arguments using Pydantic validation rules
(essentially Tool.validate() will run the JSON args through the schema, which
uses Pydantic’s type coercion). If validation fails – say a required field is
missing or type is wrong – the Tool.validate() raises a ToolRetryError
internally. The agent catches this and creates a special RetryPromptPart
message for the model, effectively saying: “the function call had an error with
these details, please try again”. This pattern lets the model self-correct
without the developer having to write any error-handling logic. Only if the
model fails repeatedly to call the tool correctly (exceeding a retry limit) will
the agent surface an error to the developer. This approach is similar to how
robust APIs handle bad requests and is a powerful pattern: PydanticAI uses the
combination of a strict schema and LLM guidance to get well-formed function
calls.
Structured Result Enforcement: PydanticAI’s goal is to ensure the final
output conforms to the developer’s expected structure. If a result_type is
specified for the Agent, the framework essentially treats the final answer as
another “function call” (often referred to as the result tool). Internally, it
synthesizes a function schema for the result type – for example, if
result_type is a Pydantic BaseModel with certain fields, the JSON schema for
that model becomes the schema of a pseudo-function that the LLM can call to end
the conversation. The model doesn’t see this as anything special – it’s just one
more function it’s allowed to use (often the only one that actually ends the
interaction). When the model chooses to call it, PydanticAI knows the
conversation is concluding. The implementation in _agent_graph.py checks for
this result function among the tool calls. If found, it validates the payload
just like a normal tool. On success, it wraps the result data in a
MarkFinalResult object and signals the graph to terminate. If validation
fails, it triggers a retry just as with normal tools. By modeling the final
answer as a function call, the framework cleverly uses the same mechanism for
both intermediate tool calls and final result, simplifying the control logic.
Additionally, if text output is allowed (e.g. no result schema or an explicit
allow_text_result flag for that run), the agent can accept a plain answer. In
the code, _handle_text_response will join all text parts returned by the model
and then attempt to validate that string against the result schema (for a simple
type, this might just be returning the string; for a Pydantic model, it could
involve parsing JSON in the text). If that fails and schema was required, it
again uses the retry mechanism to ask the model to output via the proper
structured format. This design demonstrates the use of a Retry/Repair
Pattern – rather than immediately failing on malformed output, the agent gives
the LLM guided feedback to fix its output, which often results in a correct
answer on a subsequent try.
Graph-Oriented Control Flow: Internally, the agent’s run loop is not written
as a typical while loop, but rather constructed as a chain of node transitions
using the Pydantic Graph system. Each node class represents a distinct stage of
processing and has a run() method that performs an action and returns the next
node to execute. For example, the UserPromptNode takes the initial user prompt
and returns a ModelRequestNode with the prepared first request. The
ModelRequestNode actually calls the LLM (awaiting the API call) and then
returns a HandleResponseNode containing the model’s raw response. The
HandleResponseNode examines the response parts and decides whether to branch
to a FinalResultNode (if the conversation is done) or back to a
ModelRequestNode (to continue the dialogue after tool execution). Finally,
FinalResultNode.run() produces an End signal carrying the MarkFinalResult
data, which stops the graph execution. This graph approach is essentially an
implementation of the State pattern or a workflow engine, making the complex
logic easier to maintain and extend. For instance, adding a new kind of node
(say, a node that handles a special kind of message or a reflection step) would
be easier than altering a giant loop. It also naturally allows asynchronous
operation – each node’s run() can be async and await only the necessary
tasks (like the actual model API call or an async database query in a tool). The
code for these nodes lives in a private module _agent_graph.py and leverages
generics to tie together the types (the graph’s state carries the message
history, the deps, etc., as generics to each node). While users of PydanticAI
don’t interact with these nodes directly, this design decision is key to the
framework’s extensibility and clarity. It’s an advanced pattern that sets
PydanticAI apart from simpler loop-based agent implementations, and is aimed at
keeping the flow organized even as new features (like self-reflection, branching
conversations, or multi-agent graphs) are introduced.
Model Abstraction and API Calls: To keep the agent logic provider-agnostic,
PydanticAI employs an abstraction for model APIs. When an Agent is created with
a model name or class, it resolves to a Model object (e.g., an instance of
OpenAIModel or AnthropicModel) via a registry or factory. Each Model class
implements methods like request(messages, settings) which returns a
ModelResponse (containing parts), and possibly an agent_model(...) helper
used in the graph. For example, in the OpenAI integration, agent_model likely
wraps the OpenAI ChatCompletion call with the function definitions prepared. For
providers without native function calling, the model class might implement a
workaround (such as injecting a special delimiter and JSON into the prompt that
the agent will parse back into a ToolCallPart). The key design here is that the
Agent doesn’t need to know the details – it just calls a method on a Model
interface. This is evident in the code where the agent computes
model_used = await self._get_model(...) and then later calls
model_used.request(...) with the message history. The response is then broken
down via the model class into parts that the HandleResponse node can interpret
uniformly. By isolating API logic in pydantic_ai.models, the framework also
makes testing easier (the test model can simulate responses), and it
future-proofs the system against API changes (only the model adapter might need
updates if an API endpoint changes, leaving agent logic untouched). The
inclusion of many providers out of the box, all conforming to the same internal
Model protocol, demonstrates a deliberate architectural choice to be
model-agnostic and pluggable.
Logging and Monitoring: Recognizing the need for observability in complex AI
applications, the implementation integrates with Pydantic Logfire (an
OpenTelemetry-based logging system) in a non-intrusive way. In the code, you’ll
notice context managers like with _logfire.span(...) around major steps
(preparing the model, making the request, handling the response). These create
trace spans for each part of the agent’s operation, which can be sent to Logfire
if enabled. Importantly, if the developer hasn’t installed or configured
Logfire, these calls default to no-ops, so there’s virtually no overhead. This
pattern – optional instrumentation – is a thoughtful implementation detail that
provides deep debugging capability (e.g., live view of what messages were sent,
what tool was called, how long each step took) without burdening the default
runtime. Developers can opt in by installing pydantic-ai[logfire] and get
instant traceability into their agent’s behavior, which is very useful in
production. Additionally, the agent records all messages and results in the
RunResult, so even without external tooling, one can log or inspect the
conversation. The usage tracking mentioned earlier is another implementation
aspect: every token in/out and every tool call increments counters in a Usage
object, which can be examined after the run or used to halt the run if
predefined UsageLimits are passed in.
Coding Patterns: The codebase largely follows modern Python best practices:
heavy use of type hints, dataclasses for configuration, and descriptive names.
The use of final decorators (from typing_extensions) on classes like Agent
indicates that it’s not meant to be subclassed by users – composition is
preferred for extension. The library encourages composition: you add behavior by
attaching functions (tools, validators, prompt funcs) to an Agent rather than
subclassing Agent. This is a deliberate design to keep things explicit and avoid
the pitfalls of deep inheritance hierarchies. Where necessary, PydanticAI also
uses metaprogramming techniques; for instance, decorating a function as a tool
wraps it in a Tool object and attaches it to the agent’s internal dict. There
is careful handling of async vs sync – most public APIs have both async and sync
versions (run vs run_sync), implemented by running the async loop internally
for the sync call for user convenience. The framework is also extensively
tested (as indicated by coverage reports), and structured to allow injecting
test models or using a “test mode” (via the models.test provider) to simulate
LLM behavior for offline tests.
In summary, the implementation of PydanticAI marries strong software engineering principles with the dynamic needs of AI systems. It uses Pydantic to enforce correctness at every boundary (input, tool args, output), and it uses an innovative graph-based engine to manage the complex control flow of agent-tool interactions. The code structure is logical and modular, reflecting the conceptual model of the framework, and it emphasizes reliability (through validation and retries) and developer ergonomics (through type hints, familiar patterns, and optional debugging aids). All these details work in concert to fulfill PydanticAI’s promise: making it less painful to build production-grade GenAI applications by handling the messy parts (parsing, validation, multi-step orchestration) in a robust, type-safe way.
References: The analysis above is based on the official PydanticAI documentation and the PydanticAI source code on GitHub, which provide deeper insights into the framework’s design and usage. For a hands-on understanding, the PydanticAI docs site contains a “Hello World” example and a Bank Support Agent example that illustrate how these components come together in practice, and the repository’s README and examples directory are excellent resources to see the framework in action.