Pydantic Agent

Feb 7, 2025 · 24 min read ·

PydanticAI Agent Framework: Technical Analysis

Architecture

Agent as Core Container: The fundamental concept in PydanticAI is the Agent, which serves as the primary interface to interact with LLMs. An Agent instance encapsulates several components of an AI-driven application’s logic:

System Prompt(s): One or more developer-defined instructions that prime the model (static or dynamic). Static system prompts can be given when creating the Agent, while dynamic prompts can be added via decorated functions (executed at runtime to inject context).
Function Tool(s): A set of Python functions that the LLM is allowed to call during its reasoning process. These “tools” (registered with @agent.tool) enable the model to query external data or perform computations. Each tool’s signature (except the contextual argument) is used to define a schema for the LLM to invoke it, and its docstring serves as the description presented to the model.
Structured Result Type: An optional Pydantic model or data type that defines the expected structure of the final answer. By specifying result_type when constructing the Agent, developers enforce that the conversation must end with the LLM returning data that can be parsed into this type (e.g. a custom BaseModel or even a simple type like bool).
Dependency Injection (Deps): A dependency dataclass or model type for runtime context. The Agent is parameterized by a deps_type representing resources or state (database connections, user info, etc.) that can be injected into tools and prompt functions via a RunContext. This provides a type-safe way to pass in contextual data each run (injected as ctx.deps in tool functions).
LLM Model & Settings: The Agent can be configured with a default LLM backend and optional settings. The model is specified by a string or model class (e.g., "openai:gpt-4o" for GPT-4) and is resolved to a Model interface internally. Settings like temperature, max tokens, etc., can be provided via ModelSettings either at Agent creation or per run, and will be merged appropriately. PydanticAI’s design is model-agnostic – it supports providers such as OpenAI, Anthropic, Cohere, Google Vertex, etc., through a unified interface, and adding new model integrations requires implementing a simple interface in the pydantic_ai.models module.

Type-Safe, Pythonic Design: The framework is built by the creators of Pydantic with an emphasis on clean architecture and type safety. Agents are generically typed (Agent[DepsType, ResultType]) so that your IDE and static type checkers can catch mismatches early. For example, an agent expecting SupportDependencies and returning SupportResult will be of type Agent[SupportDependencies, SupportResult]. This generic design, paired with Pydantic’s runtime validation, ensures that both development and execution are type-consistent – if you misuse a dependency or tool signature, you’ll either get a type checker error or a clear runtime validation error. The overall architecture is inspired by FastAPI’s approach (dependency injection and Pydantic models), aiming to give developers a familiar, Pythonic experience when building AI agents.

Internal Graph Engine: Under the hood, PydanticAI utilizes a graph-based execution model to orchestrate agent logic. Each step in an agent’s reasoning (injecting system prompts, calling the LLM, handling a tool invocation, validating results, etc.) is represented as a node in a directed graph, implemented by the companion library Pydantic Graph. This design helps avoid monolithic or “spaghetti” control flow by breaking the conversation into discrete, testable units. The Agent builds a graph of these nodes for each run, enabling complex multi-step interactions and even branching logic in a structured way. (For example, one node might represent sending a user prompt to the model, which leads to either a “tool call” node or a “final result” node depending on the model’s reply.) This graph architecture is largely abstracted away from the user API, but it provides a robust foundation for managing loops, retries, and multi-agent workflows in a maintainable fashion.

Multi-Agent and Reusability: PydanticAI agents are designed to be reusable components. Much like a FastAPI app or router, an Agent can be instantiated once (as a module-level object, for example) and used for many queries over its lifetime. In more advanced scenarios, multiple agents can be composed to handle different parts of a workflow or to interact with each other for complex tasks. The framework doesn’t enforce a specific multi-agent protocol, but because each Agent exposes a simple .run() interface, you can have one agent call another (even as a tool) or coordinate them via custom code or graphs. This flexibility allows developers to build hierarchical or cooperative agent systems if needed, on top of the same core abstractions.

Data and Execution Flow

High-Level Flow: When you invoke an agent (e.g. result = await agent.run(user_input, deps=...)), PydanticAI manages a multi-turn conversation loop with the LLM, handling tool usage and validations internally. The execution flow can be outlined as follows:

Initializing the Conversation: The Agent prepares the initial message list. This typically includes the system prompt(s) – all static instructions and any dynamically generated system prompts from your @agent.system_prompt functions are evaluated now – followed by the user’s prompt message. These are added to an internal message history buffer. (If you provided a message_history from a prior run to continue a conversation, those messages are prepended accordingly.)
Model Request: The agent sends the conversation context to the configured LLM Model. Before calling the model’s API, PydanticAI assembles the tool definitions and (if a structured result is expected) a special result schema definition for the model. For LLMs that support function calling (like OpenAI’s), this means providing a JSON schema for each available tool and a pseudo-function for the final result. The parameters for each tool are derived from the function’s signature (excluding the RunContext), and PydanticAI automatically generates a JSON Schema including types and descriptions (extracted from the docstring) for the model to use. This allows the model to decide during its output whether to call a function (tool) and with what arguments. The model request is made (via an async API call, typically), and the agent logs usage (tokens, API calls) along the way for monitoring.
Handling the Model’s Response: The LLM’s response is captured and parsed into response parts. PydanticAI normalizes different provider outputs into a common format – typically a sequence of parts which could be plain text segments or function call invocations. For example, an OpenAI response with a function call arrives as structured data (function name and JSON arguments), whereas other models might return a special formatted text indicating a tool call. In both cases, the framework interprets these into TextPart and ToolCallPart objects internally. At this stage, the agent examines the parts: if the model’s answer includes one or more tool calls, those take priority to be executed.
Tool Invocation (if any): For each ToolCallPart in the response, PydanticAI will invoke the corresponding Python function (the tool) with the provided arguments. This is done by constructing a RunContext (carrying the user dependencies, current usage stats, etc.) and calling your tool function. The arguments from the LLM are validated against the function’s schema automatically – Pydantic will coerce and check types. If the LLM provided an invalid argument (e.g. wrong type or fails a Pydantic field validation), the framework will catch the validation error and package it into a special retry message that informs the LLM of the mistake. (The conversation is then set up such that the model gets this feedback and can attempt to call the tool again with corrected parameters.) Assuming the arguments are valid, the tool function executes and returns a result (e.g. a number from a database lookup). The agent takes the tool’s result and serializes it into a message that represents the tool’s output. This result message is appended to the conversation history as if the “assistant” (agent) responded with the tool’s output.
Continuing the Loop: With the tool output now in the context, the agent prompts the LLM again. Essentially, after any tool call, the Agent creates a new model request that includes: (a) the original conversation messages, plus (b) a system-level instruction or formatted content indicating the tool’s result (so the model can use that information), and then (c) the latest user question if the conversation continues. This loop of “model reply -> possibly tool -> model reply…” continues until a termination condition is reached. Each iteration appends new messages (tool calls, results, or corrected prompts) to the message_history and invokes the model again, respecting any usage limits (e.g., max number of calls or tokens) configured. The framework’s internal Graph logic makes these decisions: for example, if the LLM returns both a final answer and a tool call (which can happen in some model responses), a strategy (configurable via EndStrategy) determines whether to finalize early or execute remaining tool calls. By default the agent will handle all requested tool calls first, unless a final result has been confidently produced.
Finalizing the Result: Eventually, the LLM produces an answer that signals the end of the conversation. In many cases with structured output, the model will “call” a special result function (internally defined by PydanticAI based on your result_type) containing the fields of the answer. For instance, if the desired result is a SupportResult model with fields support_advice, block_card, and risk, the model might finish by calling a function (say "FinalAnswer") with those three fields as arguments. PydanticAI recognizes this as the final result. It validates the returned data against the SupportResult schema, just as it would for a tool call, ensuring all fields are present and of correct type. The validated object (a Pydantic model or specified type) becomes the outcome of the run. If instead the model simply returns a raw text answer (which is allowed if no structured schema was set, or if the model failed to use the function schema), the agent can still attempt to parse it. If result_type is a simple type like str or otherwise text is permitted, it will treat the text as the final result and run any validation on it. If a structured result was expected but the model gives plain text, the agent will not accept it; it will issue a corrective prompt (e.g. a system message like “Plain text responses are not permitted, please call one of the functions instead.”) and loop back to the model request step. After final validation, the conversation loop ends.
Returning the Output: The Agent returns a RunResult object to the caller, which includes the final parsed result data and metadata. For convenience, result.data holds the actual result (already a Pydantic model or Python type as defined) and is typically what developers use. For example, after running the support agent, result.data would be an instance of SupportResult with fields populated, or simply a Python bool/str if that was the result type. The RunResult also contains the full message history and usage information in case you need to inspect the conversation or token counts. Notably, by the time the result is returned, it is guaranteed to be validated against the schema you provided for accuracy and completeness – if the model couldn’t produce a valid result within the retry limits, an exception would be raised instead. In practice, PydanticAI tries to give the model multiple chances to self-correct (feeding it errors) before giving up, which makes the outcome reliable once delivered.

Throughout this flow, PydanticAI handles the complexity of error cases (like exceeding token limits or invalid responses) by raising structured exceptions (e.g., UnexpectedModelBehavior for truly unsupported outputs) or by enforcing usage limits via graceful stops. The developer can focus on defining the tools, prompts, and schemas, and trust the agent to drive the LLM and tools to a valid solution through this loop. The design also supports both synchronous and streaming interactions – for instance, you could use agent.run_sync() for a blocking call, or agent.run_stream() to yield partial results/tokens as they arrive (under the hood, there are corresponding StreamModelRequestNode and other nodes to handle streaming). In summary, the data/execution flow ensures that the LLM is guided to produce a well-structured result through iterative prompting and tool use, much like a dialogue between the AI and a set of utilities, mediated by the Agent. Each “turn” is validated, logged, and checked, culminating in a reliable outcome.

Code Structure

Repository and Module Layout: The PydanticAI codebase is organized into a set of Python modules that reflect the framework’s conceptual components. At a high level, the library is divided into the pydantic_ai package and a closely related pydantic_graph package (included for graph-based control flow). Key modules and their responsibilities include:

pydantic_ai.agent: Defines the Agent class and its methods (like run, run_sync, etc.), along with supporting classes like EndStrategy and internal utilities. This is effectively the core of the framework – the Agent class is a dataclass that aggregates configuration (model, prompts, tools, result schema) and implements the logic to build and execute the conversation graph. It also holds internal registries for system prompt functions and tools that are attached via decorators.
pydantic_ai.tools: Contains the definition of the Tool system. A Tool object wraps a user-defined function (the actual Python callable) along with metadata like its name, description, and JSON schema. The module provides the @Agent.tool decorator which, when used, creates a Tool from the function and registers it to an Agent’s internal tool list. It also defines the RunContext class which is passed into tool functions for dependency access. Essentially, this module handles function tool registration and execution mechanics.
pydantic_ai.models: A subpackage with submodules for each supported LLM provider (OpenAI, Anthropic, Cohere, Google Gemini/Vertex, etc.). Each provider module implements a common interface – typically a subclass of a base Model class – that knows how to format requests and parse responses for that backend. For example, models.openai will have classes to call OpenAI’s chat API, converting PydanticAI’s internal message format to the OpenAI API format and vice versa. Similarly, models.anthropic adapts to Anthropic’s Claude API, which might not support function calling natively, so it will implement a strategy to embed tool calls in the prompt. There’s also a models.function (for treating a local function as a pseudo-model) and models.test (a mock model for testing). This modular structure makes the agent code largely independent of any single LLM service – adding support for a new model is as simple as creating a new module that provides the necessary request/response logic.
pydantic_ai.messages: Defines the classes used to represent message data in the conversation. This includes ModelMessage (an abstract base or union for different message roles) and specific message part types. Notably, it defines things like ModelRequest (an outgoing prompt with potentially multiple parts), ModelResponse (the model’s answer, which may be composed of text and tool-call parts), TextPart, ToolCallPart, RetryPromptPart, etc. These classes allow the agent to treat a conversation uniformly, regardless of backend, by breaking down model outputs into a standard structure. The messages module essentially codifies the “language” of conversation between the agent and model (including how function calls are represented as message parts).
pydantic_ai.result: Contains utilities for managing the structured result schema and validation. When an Agent has a result_type, the framework creates a corresponding Result Schema object (likely containing a JSON Schema or Pydantic model schema) used to guide the LLM. This module defines how to convert a Pydantic model or Python type into a schema, and how to validate model outputs against it (including custom validators). It also defines the ResultValidator protocol, which allows plugging in extra validation logic on the final result if needed. The logic that checks if the LLM’s final function call matches the schema and either accepts it or raises a ToolRetryError (to ask for correction) lives here. In short, pydantic_ai.result is responsible for final answer validation and any post-processing of the LLM’s output into the desired Python object.
pydantic_ai.settings and pydantic_ai.usage: These modules handle configuration of model calls and usage tracking. settings defines ModelSettings (e.g., default parameters like temperature, max_tokens, etc., which can be merged with per-run overrides), and merge_model_settings logic. usage defines a Usage class to accumulate token counts and a UsageLimits class to enforce limits across one or multiple runs. The Agent uses these to decide when to halt further calls (for example, if a token budget is exhausted) or simply to report how many tokens were used.
pydantic_ai.exceptions: Defines exception types for error handling. For example, UnexpectedModelBehavior is raised if the model returns something completely off-schema or if retries are exceeded. There are also specific exceptions for things like exceeding usage limits or model-specific errors. This module centralizes how errors in the agent’s operation are represented, making it easier for user code to catch and handle them appropriately.
pydantic_graph: This is a sub-package included with PydanticAI that provides a generic typed graph execution framework. It defines base classes like BaseNode and Graph and manages the GraphRunContext (which carries the state and dependencies through the graph). In the context of PydanticAI, there are specialized node classes defined (some within internal _agent_graph.py using pydantic_graph), such as UserPromptNode, ModelRequestNode, HandleResponseNode, FinalResultNode, etc., each corresponding to a phase of the agent’s flow. The Graph class handles executing these nodes in sequence until an End node is reached, carrying the result. This design enables advanced control flows (like conditional branches or loops) to be encoded cleanly. While pydantic_graph could be used independently for other state machines, in PydanticAI it’s mostly an internal mechanism to organize the agent run logic. There is also a utility to output a graph diagram (using Mermaid) for debugging flows, though that’s more of a development aid.

Beyond these, the repository contains example scripts and documentation (e.g. the examples/ folder and the Sphinx documentation source) to illustrate usage patterns such as building a chat app or a retrieval-augmented generation (RAG) pipeline. The codebase is written in modern Python, utilizing asyncio for concurrency and dataclasses for configuration containers (most core classes like Agent and Tool are dataclasses for ease of instantiation and repr). Generics from the typing module are used extensively to maintain type information (e.g., Agent[DepsT, ResultT]), and Pydantic V2 features (like BaseModel for results) are leveraged for data validation. This results in a clean folder structure where each concern is in its own module, and cross-module interactions are minimal and well-defined (for instance, the Agent module uses the Graph module to run, the Tools module to prepare function schemas, the Models module to actually interface with an API, etc.). The separation of concerns makes the code relatively approachable: one could inspect pydantic_ai.agent to understand high-level logic, dive into pydantic_ai.models.openai to see how OpenAI API calls are made, or look at pydantic_ai.tools to see how a tool’s schema is constructed. All of these pieces come together when an Agent’s run() is called, as described in the flow above.

Implementation Details and Design Choices

Tool Schema and Function Calling: One of the standout implementation features of PydanticAI is how it bridges Python functions with LLM function calling. When you decorate a function with @agent.tool, the library introspects its signature to automatically generate a JSON Schema that describes that tool’s parameters and return type to the LLM. It strips out the first RunContext parameter (since that’s internal) and uses the remaining parameter types (via type hints) to build the schema. The developer’s docstring is parsed (using the Griffe library) to pull in human-readable descriptions for the function and its parameters. All this is packaged into the API call – for OpenAI, it goes into the functions list of the ChatCompletion request; for other models, the schema might be embedded in the prompt or handled in a model-specific way. At runtime, if the model “calls” a function, PydanticAI constructs a ToolCallPart containing the function name and arguments. The actual invocation uses Python’s reflection: the framework finds the Tool object by name and calls its underlying Python function. Before calling, it validates and converts the arguments using Pydantic validation rules (essentially Tool.validate() will run the JSON args through the schema, which uses Pydantic’s type coercion). If validation fails – say a required field is missing or type is wrong – the Tool.validate() raises a ToolRetryError internally. The agent catches this and creates a special RetryPromptPart message for the model, effectively saying: “the function call had an error with these details, please try again”. This pattern lets the model self-correct without the developer having to write any error-handling logic. Only if the model fails repeatedly to call the tool correctly (exceeding a retry limit) will the agent surface an error to the developer. This approach is similar to how robust APIs handle bad requests and is a powerful pattern: PydanticAI uses the combination of a strict schema and LLM guidance to get well-formed function calls.

Structured Result Enforcement: PydanticAI’s goal is to ensure the final output conforms to the developer’s expected structure. If a result_type is specified for the Agent, the framework essentially treats the final answer as another “function call” (often referred to as the result tool). Internally, it synthesizes a function schema for the result type – for example, if result_type is a Pydantic BaseModel with certain fields, the JSON schema for that model becomes the schema of a pseudo-function that the LLM can call to end the conversation. The model doesn’t see this as anything special – it’s just one more function it’s allowed to use (often the only one that actually ends the interaction). When the model chooses to call it, PydanticAI knows the conversation is concluding. The implementation in _agent_graph.py checks for this result function among the tool calls. If found, it validates the payload just like a normal tool. On success, it wraps the result data in a MarkFinalResult object and signals the graph to terminate. If validation fails, it triggers a retry just as with normal tools. By modeling the final answer as a function call, the framework cleverly uses the same mechanism for both intermediate tool calls and final result, simplifying the control logic. Additionally, if text output is allowed (e.g. no result schema or an explicit allow_text_result flag for that run), the agent can accept a plain answer. In the code, _handle_text_response will join all text parts returned by the model and then attempt to validate that string against the result schema (for a simple type, this might just be returning the string; for a Pydantic model, it could involve parsing JSON in the text). If that fails and schema was required, it again uses the retry mechanism to ask the model to output via the proper structured format. This design demonstrates the use of a Retry/Repair Pattern – rather than immediately failing on malformed output, the agent gives the LLM guided feedback to fix its output, which often results in a correct answer on a subsequent try.

Graph-Oriented Control Flow: Internally, the agent’s run loop is not written as a typical while loop, but rather constructed as a chain of node transitions using the Pydantic Graph system. Each node class represents a distinct stage of processing and has a run() method that performs an action and returns the next node to execute. For example, the UserPromptNode takes the initial user prompt and returns a ModelRequestNode with the prepared first request. The ModelRequestNode actually calls the LLM (awaiting the API call) and then returns a HandleResponseNode containing the model’s raw response. The HandleResponseNode examines the response parts and decides whether to branch to a FinalResultNode (if the conversation is done) or back to a ModelRequestNode (to continue the dialogue after tool execution). Finally, FinalResultNode.run() produces an End signal carrying the MarkFinalResult data, which stops the graph execution. This graph approach is essentially an implementation of the State pattern or a workflow engine, making the complex logic easier to maintain and extend. For instance, adding a new kind of node (say, a node that handles a special kind of message or a reflection step) would be easier than altering a giant loop. It also naturally allows asynchronous operation – each node’s run() can be async and await only the necessary tasks (like the actual model API call or an async database query in a tool). The code for these nodes lives in a private module _agent_graph.py and leverages generics to tie together the types (the graph’s state carries the message history, the deps, etc., as generics to each node). While users of PydanticAI don’t interact with these nodes directly, this design decision is key to the framework’s extensibility and clarity. It’s an advanced pattern that sets PydanticAI apart from simpler loop-based agent implementations, and is aimed at keeping the flow organized even as new features (like self-reflection, branching conversations, or multi-agent graphs) are introduced.

Model Abstraction and API Calls: To keep the agent logic provider-agnostic, PydanticAI employs an abstraction for model APIs. When an Agent is created with a model name or class, it resolves to a Model object (e.g., an instance of OpenAIModel or AnthropicModel) via a registry or factory. Each Model class implements methods like request(messages, settings) which returns a ModelResponse (containing parts), and possibly an agent_model(...) helper used in the graph. For example, in the OpenAI integration, agent_model likely wraps the OpenAI ChatCompletion call with the function definitions prepared. For providers without native function calling, the model class might implement a workaround (such as injecting a special delimiter and JSON into the prompt that the agent will parse back into a ToolCallPart). The key design here is that the Agent doesn’t need to know the details – it just calls a method on a Model interface. This is evident in the code where the agent computes model_used = await self._get_model(...) and then later calls model_used.request(...) with the message history. The response is then broken down via the model class into parts that the HandleResponse node can interpret uniformly. By isolating API logic in pydantic_ai.models, the framework also makes testing easier (the test model can simulate responses), and it future-proofs the system against API changes (only the model adapter might need updates if an API endpoint changes, leaving agent logic untouched). The inclusion of many providers out of the box, all conforming to the same internal Model protocol, demonstrates a deliberate architectural choice to be model-agnostic and pluggable.

Logging and Monitoring: Recognizing the need for observability in complex AI applications, the implementation integrates with Pydantic Logfire (an OpenTelemetry-based logging system) in a non-intrusive way. In the code, you’ll notice context managers like with _logfire.span(...) around major steps (preparing the model, making the request, handling the response). These create trace spans for each part of the agent’s operation, which can be sent to Logfire if enabled. Importantly, if the developer hasn’t installed or configured Logfire, these calls default to no-ops, so there’s virtually no overhead. This pattern – optional instrumentation – is a thoughtful implementation detail that provides deep debugging capability (e.g., live view of what messages were sent, what tool was called, how long each step took) without burdening the default runtime. Developers can opt in by installing pydantic-ai[logfire] and get instant traceability into their agent’s behavior, which is very useful in production. Additionally, the agent records all messages and results in the RunResult, so even without external tooling, one can log or inspect the conversation. The usage tracking mentioned earlier is another implementation aspect: every token in/out and every tool call increments counters in a Usage object, which can be examined after the run or used to halt the run if predefined UsageLimits are passed in.

Coding Patterns: The codebase largely follows modern Python best practices: heavy use of type hints, dataclasses for configuration, and descriptive names. The use of final decorators (from typing_extensions) on classes like Agent indicates that it’s not meant to be subclassed by users – composition is preferred for extension. The library encourages composition: you add behavior by attaching functions (tools, validators, prompt funcs) to an Agent rather than subclassing Agent. This is a deliberate design to keep things explicit and avoid the pitfalls of deep inheritance hierarchies. Where necessary, PydanticAI also uses metaprogramming techniques; for instance, decorating a function as a tool wraps it in a Tool object and attaches it to the agent’s internal dict. There is careful handling of async vs sync – most public APIs have both async and sync versions (run vs run_sync), implemented by running the async loop internally for the sync call for user convenience. The framework is also extensively tested (as indicated by coverage reports), and structured to allow injecting test models or using a “test mode” (via the models.test provider) to simulate LLM behavior for offline tests.

In summary, the implementation of PydanticAI marries strong software engineering principles with the dynamic needs of AI systems. It uses Pydantic to enforce correctness at every boundary (input, tool args, output), and it uses an innovative graph-based engine to manage the complex control flow of agent-tool interactions. The code structure is logical and modular, reflecting the conceptual model of the framework, and it emphasizes reliability (through validation and retries) and developer ergonomics (through type hints, familiar patterns, and optional debugging aids). All these details work in concert to fulfill PydanticAI’s promise: making it less painful to build production-grade GenAI applications by handling the messy parts (parsing, validation, multi-step orchestration) in a robust, type-safe way.

References: The analysis above is based on the official PydanticAI documentation and the PydanticAI source code on GitHub, which provide deeper insights into the framework’s design and usage. For a hands-on understanding, the PydanticAI docs site contains a “Hello World” example and a Bank Support Agent example that illustrate how these components come together in practice, and the repository’s README and examples directory are excellent resources to see the framework in action.