Agno Agent Framework
Phidata (Agno) Agent Framework – Technical Analysis
Architecture
Core Components: Phidata (rebranded as Agno) is an open-source framework for building AI agents and agent-based workflows. At its core is the Agent – an autonomous program that uses an LLM (Large Language Model) to interpret instructions and act. An Agent is highly modular: it is configured with a reasoning model (LLM interface), optional tools, a knowledge base, and memory/storage for context. These components work in unison to let the agent plan solutions, fetch information, and remember past interactions. The design is model-agnostic, meaning it can work with any LLM (OpenAI GPT series, local models, etc.) without being tied to a specific provider. It’s also multi-modal – agents can handle text by default, and integrate image, audio, or video capabilities via appropriate tools or models, enabling vision or speech tasks.
Tools (Functions): Tools are pluggable functions that extend an agent’s abilities to interact with external systems or data. In Phidata, “tools are functions that an Agent can run to achieve tasks” like web search, database queries, sending emails, etc.. You can attach either built-in toolkits (pre-made integrations such as a DuckDuckGo web search or YFinance stock lookup) or any custom Python function as a tool. Internally, the agent treats each tool as an action it can call (via function calling) during its reasoning process. This design avoids rigid chaining – the agent dynamically decides if/when to invoke a tool based on the query and its reasoning. Tools are registered with the agent’s model, which allows the model to generate a JSON-formatted function call when a tool is needed. In essence, the framework leverages the LLM’s ability to call functions (for models that support it) to execute these tools. This architecture principle – LLM + tools – follows the ReAct pattern (reasoning and acting) but implements it in a simple, Pythonic way (no complex DAG or chain objects).
Knowledge Base: Agents can be endowed with a domain-specific knowledge base
to augment the LLM’s training data with up-to-date or specialized information.
The knowledge base is typically a vector database (e.g. Qdrant, LanceDB,
Chroma) that stores embedded text chunks for semantic search. Phidata provides
an AgentKnowledge abstraction to manage this. Developers can load documents
(PDFs, websites, etc.) into the knowledge store; under the hood the text is
chunked, embedded, and indexed in the vector DB. At query time, the agent can
retrieve relevant pieces of knowledge: either automatically via a tool call
(the agent “searches” its knowledge base when needed, known as Agentic RAG) or
by pre-fetching context before the LLM is invoked. By default, Phidata enables
Agentic RAG, meaning it gives the model a tool to query the vector store on
the fly. This keeps the system prompt lean and only pulls in information as
required. (There’s also an add_context mode to always inject retrieval results
into the prompt, but by default search_knowledge=True lets the agent decide
when to call the knowledge tool.) The knowledge component thus plugs into the
architecture as either another tool or as a context provider, ensuring the agent
can reason over both its trained knowledge and supplemental data.
Memory and Storage: To carry on coherent conversations and personalize
responses, agents have a memory subsystem. Phidata defines three levels of
memory: (1) recent chat history (the last few messages in the conversation),
(2) persistent user memory (notes about the user or session), and (3)
conversation summaries for long contexts. Every agent invocation is part of
a session identified by a session id, with each user query/response pair
considered a run within that session. Out of the box, each Agent has an
in-memory AgentMemory object that tracks the message history and can summarize
or truncate it as needed. For longer-term retention or multi-session continuity,
the framework supports pluggable storage backends. For example, you can
attach a PgAgentStorage or SqliteAgentStorage so that all sessions and runs
are saved to a database. This allows agents to “remember” past conversations
even if the application restarts, by reloading the last session state from the
DB. In summary, memory is a first-class citizen in the architecture – the agent
uses short-term memory for context within the prompt (often the last 3-5
messages to avoid overflow) and can leverage long-term memory via storage for
persistence beyond a single chat session.
Multi-Agent Teams: Phidata’s architecture also enables composition of
agents – agents can collaborate as a team. Rather than a single monolithic
agent handling everything, you can create specialized agents and a coordinator
agent that delegates tasks. The framework supports this via a team parameter:
an agent can be initialized with a list of other agent instances as its team.
This “team leader” agent is backed by its own LLM model which is responsible for
routing subtasks to the appropriate team member. In practice, each sub-agent
can be considered a tool from the leader’s perspective. For example, you might
have a WebSearchAgent, FinanceAgent, and then a parent AnalysisAgent that
includes the first two in its team. When a complex query comes in, the
AnalysisAgent’s model can decide to ask the WebSearchAgent for information or
the FinanceAgent for data, then compile the results. The coordination is often
achieved by formulating the system instructions for the leader agent to
explicitly delegate (“First, use the Web Searcher to gather data… Then ask
the Finance Agent…”). Underneath, the framework handles invoking a sub-agent
like a function call and returning its output so the leader’s LLM can
incorporate it. This multi-agent orchestration is built-in but considered
experimental – it’s powerful for prototyping and simulating “AI teams,” though
the documentation advises that open-ended agent teams may be less reliable on
very complex tasks (due to current LLM limitations) and suggests using more
deterministic Workflows for production needs. (Workflows in Phidata are a
separate feature that let you script a fixed sequence of agent/tool steps with
logic, offering more predictable control than free-form multi-agent prompting.)
Design Principles: The architecture of Phidata/Agno is guided by simplicity and performance. Unlike some agent frameworks that require constructing complex chains or graphs of calls, Phidata embraces a “just pure Python” philosophy. An agent and its components are simple Python objects – you configure them in code without needing to define elaborate YAMLs or flow diagrams. This simplicity leads to minimal overhead: creating an agent is extremely lightweight. In fact, benchmarks showed agent instantiation and startup in Phidata is several orders of magnitude faster (on the order of 1000x+) than a certain chain-based framework. The memory footprint is also much smaller, since there are no heavy intermediate objects – just the model and whatever tools/knowledge you explicitly attach. This efficient architecture makes it feasible to spin up many agents or handle rapid interactions without lag. Furthermore, the design is extensible – because tools are just Python callables and knowledge bases are standard vector stores, developers can easily plug in new capabilities. To summarize the architecture: Phidata provides a minimalistic yet powerful skeleton (Agent + Model + Tools + Memory + Knowledge), and each piece cleanly interfaces with the others to enable dynamic, tool-using, multi-modal agents.
Flow
Agent Execution Flow: When you prompt a Phidata/Agno agent, a well-defined sequence of steps is executed from input to output. At a high level, the agent takes your query and orchestrates how the LLM and tools are used to produce an answer. Below is a typical flow for a single-agent interaction:
-
Prompt Construction: The agent first builds the system prompt that will guide the LLM. This is done using the agent’s configured description and instructions. The description is a high-level role or personality (“You are an enthusiastic news reporter…”), and the instructions are concrete guidelines or steps the agent should follow. Phidata combines these into a single system message – for example, it will place the description at the top and then list each instruction prefixed by “- ” or under an “## Instructions” section. (If
markdown=Trueor other flags are set, additional default instructions are auto-appended, such as “format your output in Markdown” whenmarkdown=True.) The developer can also supply an explicit system prompt template or use defaults. Essentially, this step establishes the context and rules that govern the agent’s behavior for this query. -
Include Context (Memory/Knowledge): Next, the agent incorporates any relevant context from memory or knowledge bases, depending on configuration. If the agent has prior chat history and is configured to include it (e.g.
add_history_to_messages=True), recent conversation messages will be added to the prompt (as preceding user/assistant turns) so the model sees the context. If the conversation is long, the agent might use a summary instead of raw history beyond a certain limit (Phidata can auto-summarize to keep prompts concise). Similarly, ifadd_context=Truefor the knowledge base, the agent will perform a vector search for the user’s question before calling the model, and attach the top results (e.g. as an extra system message or as part of the user query). However, by default Phidata uses the dynamic tool approach for knowledge (covered next), so typically this pre-fetch step is skipped unless explicitly enabled. At this stage, the agent also prepares any special context tools – for instance, ifread_chat_history=Trueorsearch_knowledge=True, it ensures functions are available to let the model pull in chat history or knowledge on demand. After this step, the agent has constructed the initial message set: usually one system message (with rules/instructions and possibly memory context) and the new user message (the query). -
Tool Availability Setup: If the agent has tools or a knowledge base, it registers them so that the model can use them. For OpenAI Chat models, this means the agent supplies a list of function definitions (name, description, parameters) corresponding to each tool in
agent.tools. For example, aDuckDuckGosearch tool might be exposed as a function"search_web(query: str) -> str"with a description. If sub-agents (team members) are present, each may likewise be exposed as a callable function (e.g., a function named after the sub-agent that takes a query and returns that agent’s answer). Under the hood, Phidata sets the tool mode to “auto” by default when any tools are present. “Auto” mode means the model can choose freely between answering directly or invoking a function call. (If no tools were present, the mode is “none”, meaning the model must directly answer; there’s also a way to force a specific tool call by settingtool_choice, but typically it’s on auto.) This tool registration is a key part of the flow – it tells the LLM “you have these tools you can call if needed”. Phidata’s design uses the native function-calling interface of the model if available, so the model will produce a structured JSON indicating a tool invocation when appropriate, rather than a free-text instruction that needs parsing. In essence, by this step the agent has armed the model with optional actions (tools and possibly built-in actions likesearch_knowledge) it can take to fulfill the request. -
LLM Reasoning & Tool Use: The agent now sends the prompt to the LLM (the model’s API). This initial call includes the system message, the user’s question, and the function definitions for tools. The model responds either with an answer or with a function call. Thanks to the function interface, if the model decides a tool is needed (say, it needs to look up information), the output is not a final answer but a message indicating an
action– e.g., call functionsearch_webwith input “latest AI news”. Phidata’s runtime detects this and treats it as an event to handle. The framework then executes the requested tool by calling the corresponding Python function or sub-agent. For example, ifDuckDuckGo.search()was invoked, the agent calls that, gets back search results, and formats the result (often as a string or JSON) to return to the model. This tool result is typically injected as a new assistant message (as if the assistant “returned” from the function with some data). With OpenAI’s API, this is done by sending the function’s output in a special assistant message of role “function” along with the function name, allowing the model to receive the result in the next step. At this point, the agent calls the LLM again, now providing the conversation so far (including the user query, the model’s function call, and the function’s result). The model then continues its reasoning with this new information in hand. It may decide it has enough to answer, or it might chain another tool call. Phidata will loop through this perception-action cycle as needed: model proposes an action, agent executes it and gives the observation back, model updates its thoughts, and so on. This repeats until the model produces a message that is a final answer rather than another tool request. (The framework sets a safety limit on tool call loops – e.g.,tool_call_limit– to avoid infinite cycles, but in practice the model usually stops after a few steps.) Notably, all of this happens seamlessly – the developer only calledagent.run()oragent.print_response(), and internally the agent is managing the message passing between the model and tools. -
Multi-Agent Delegation (if applicable): In a multi-agent team scenario, the above loop expands to include sub-agents as callable tools. For instance, if an agent team has members “HackerNews Researcher” and “Web Searcher”, the leader agent’s model might output a function call like
hackernews_researcher(query="X")as its next action. The Phidata framework will recognize this corresponds to a team member agent and trigger that agent’srun()method behind the scenes. The sub-agent (with its own model and tools) will execute independently and return a result (e.g. a summary of HackerNews stories). The main agent then receives that result as if a tool had returned. It can then proceed, possibly calling another sub-agent (likeweb_searcher(query="Y")next) as per its instructions, and finally compose the final answer. This is how message passing between agents is implemented – the leader uses the same mechanism of function calls to “ask” sub-agents for help. Each sub-agent’s output is fed back into the leader’s context for the final reasoning. From a flow perspective, it’s an nested loop: the leader agent’s LLM may pause to yield control to a sub-agent, which itself might do LLM calls and tool uses, then return a result to the leader’s flow. Phidata abstracts this so the user just seesagent_team.print_response(...)and gets a consolidated answer. This cooperative flow is powerful, but as noted, the on-the-fly coordination can sometimes misstep if the prompts or roles are not clear. Phidata’s docs suggest keeping team tasks fairly structured, or using a predefined Workflow when exact sequencing is required. -
Final Response and Structured Output: Once the LLM produces a final answer (no more tool calls), the agent returns that output. If streaming was enabled (
stream=True), the answer might have been progressively printed to the console/UI as the model generated it. If a structured output model was specified (a Pydantic schema for the response), the agent will attempt to parse the model’s answer into that schema. In many cases, Phidata can coerce the LLM to output JSON compliant with the schema by using a special “JSON mode” or OpenAI function calling to a dummy function representing the schema. For example, if you defined aMovieScriptPydantic model for the agent’s response, the final answer might be captured as aRunResponseobject whosecontentattribute is an instance ofMovieScriptfilled with the LLM’s generated fields. This allows downstream code to access structured data (likeresponse.content.genrefor the movie genre) instead of parsing text. If no structured model was set, the response is just a text string (which can still be retrieved via theRunResponsefor logging or displayed directly). Finally, the agent logs the interaction: it records the conversation (messages, tool calls, etc.) in its memory. If persistent storage is attached, the new run is saved to the database so that the session can be resumed later. At this point, the user receives the answer, and the flow for that query is complete.
Overall, the execution flow in Phidata is a loop of [Think → Act → Observe] cycles driven by the LLM, very much following the agent paradigm. Notably, the heavy lifting of deciding which tool to use or what to do next is learned behavior from the LLM’s prompts (e.g. the system instructions might encourage using the knowledge base first, etc.), rather than being hardcoded logic. The framework provides the scaffolding (prompt assembly, tool invocation, result injection, memory management), and the LLM “agent brain” navigates within that scaffold to solve the task. This flow, combined with function-calling, makes the interactions robust and less error-prone – the model’s tool requests are explicit and machine-readable, and each step’s state is transparently stored as a message, which simplifies debugging and monitoring of the agent’s reasoning process.
Code Structure
Repository and Package Layout: The Phidata/Agno codebase is organized into
clear modules that correspond to the major concepts of the framework. The main
Python package was historically named phi (for Phidata) and may now be
accessible as agno – in code examples you’ll see imports like
from phi.agent import Agent or from agno.agent import Agent, depending on
the version. Inside this package, key sub-packages include:
-
agent– This contains the definition of theAgentclass and related classes. TheAgentclass is central; it likely inherits from a BaseModel (possibly Pydantic’s) to define its numerous configuration fields (model, tools, knowledge, memory, instructions, etc.) with default values. For example, an Agent has attributes likename,description,tools,knowledge,memory,response_model, flags likemarkdown,show_tool_calls, etc., all defined in one place. The agent module also defines methods such asrun()(execute a single prompt and get a result) andprint_response()(convenience to run and stream/print the answer), and internal helper methods for building prompts or handling tool calls. In usage, this is the class developers instantiate and interact with. The agent package also defines data structures for responses and memory – e.g.,RunResponse(seen imported in examples) which encapsulates the model’s output and any metadata, and possiblyAgentMemory/AgentRunclasses to represent stored conversations. -
models– This module (often imported asphi.model) contains integrations for various LLM providers or types. Each supported model is typically a class implementing a common interface to generate chat completions. For instance,OpenAIChatis provided for OpenAI’s GPT-3.5/GPT-4 models, wrapping the API calls and handling of function-calling. There might be classes for other APIs (e.g., Cohere, Anthropic) or local models (like HuggingFace or Llama through an API). The model objects can have identifiers (id="gpt-4o"in examples indicates a particular OpenAI model variant) and possibly configuration like temperature, max tokens, etc. This separation allows the agent to be agnostic of which LLM is used – as long as the model class adheres to the expected interface (likely agenerate()orchat()method that accepts messages and optional function specs), the Agent can work with it. -
tools– This package provides a library of ready-made tool integrations (often called toolkits). Each tool is typically a small class or just a function that encapsulates some external interaction. For example,phi.tools.duckduckgomight contain aDuckDuckGoclass with asearchmethod, or a function that calls the DuckDuckGo API. Similarly,phi.tools.yfinancefor stock data,phi.tools.newspaper4kto scrape and summarize articles, etc. These toolkit classes often define anameanddescription(used in function-calling interface) and arunor__call__that actually performs the action (like doing an HTTP request). The code structure makes it easy to add new tools: one can simply write a function or class and pass it in the Agent’s tools list (the framework will wrap it appropriately if it’s a raw function). The tools package covers common needs (search, calculations, file I/O, etc.), and developers can consult the “Available Toolkits” documentation for what’s included. Notably, because tools are just functions, they can also be defined inline – as shown in the docs, you can write a custom Python function (e.g.,get_top_hackernews_stories) and directly include it inAgent(tools=[my_function]). The Agent code will detect it and handle it just like a built-in tool. -
knowledge– This module contains classes for different types of knowledge bases. Examples includePDFUrlKnowledgeBase,WebsiteKnowledgeBase(to fetch and index data from PDF files or websites), and possibly simpler ones likeTextKnowledgeBasefor in-memory text. These classes handle loading documents (using chunkers and embedders), and provide a method (likesearch(query)) to retrieve relevant snippets. AnAgentKnowledgeclass acts as a wrapper that the Agent understands – the agent can callknowledge.load()to index data and later use the knowledge base via thesearch_knowledgetool or context injection. Theknowledgepackage, together with thevectordbpackage, is what enables Retrieval-Augmented Generation flows. -
vectordb– Under knowledge, the vectordb sub-package integrates specific vector database backends. For instance,phi.vectordb.qdranthas aQdrantclass that implements storing embeddings and doing similarity search. Likewise,phi.vectordb.lancedbprovides aLanceDbclass. These classes abstract the details of connecting to the database, creating collections/tables, and querying. They are used by KnowledgeBase classes when loading or searching data. By isolating vector DB logic here, one could swap out Qdrant for Pinecone or FAISS by implementing a new class without changing the Agent logic. -
embeddings– Since knowledge bases require text embeddings, the framework includes an embeddings module. Here, you’ll find classes likeOpenAIEmbedder(to call OpenAI’s embedding API), possibly HuggingFace or local embedding models, etc. The knowledge loaders use these to convert text chunks into vectors. The separation of embedders means you can choose different embedding models (trade off quality vs cost) easily by plugging a different embedder into your knowledge config. -
memory/storage– The naming is a bit nuanced: thememoryconcept is mostly handled within the Agent, but the storage module contains implementations for persistent memory storage. Underphi.storage.agentthere are classes likePgAgentStorage(Postgres) andSqlAgentStorage(SQLite) for saving session data. These use databases to store chat histories, agent state, etc., typically via ORMs or direct SQL. For example,PgAgentStoragelikely defines a table schema (with session_id, messages, etc.) and methods to write/read runs. By providing these storage classes, the framework allows easy configuration of persistence – you instantiate (with DB connection info) and pass it to the Agent. Thestoragepackage might also contain other types (e.g., a memory storage that is just in-memory or file-based) and possibly storage for knowledge or other artifacts. In short, if it involves saving to disk or DB, it lives instorage. (Note: The docs also mention “Workflows” storage and such – but those are beyond the agent’s immediate scope.) -
workflows– Phidata also has a concept of Workflow for chaining agents/tools in a predetermined logic. If present,phi.workflowsmight allow defining a class inheriting from aWorkflowbase, where you add agents and implement arun()method manually (the Project’s docs hint at this approach). This is more for orchestrating multiple steps or scheduling tasks (e.g., an agent asks another agent and then sends an email). The existence of a workflows module indicates the framework’s flexibility to handle not just interactive Q&A, but also automations – one could deploy a workflow that triggers on events and uses agents internally. The question focus is the agent framework, so we note workflows as an extension for complex pipelines. -
Other modules: There are likely additional utility modules, for example
chunking(as listed in docs) for text splitting logic, and perhaps autilsor logging module. The project repository also shows directories like cookbook and tests. The cookbook contains example agent scripts and templates (for instance, predefined agent configurations for common tasks – the docs’ “Examples” like Recipe Creator Agent, Shopping Agent, etc., correspond to scripts in the cookbook). This helps developers by providing ready-to-run demos. The tests directory contains unit tests and performance benchmarks; e.g., there are scripts to measure instantiation time and memory usage vs other frameworks, ensuring that the performance claims are continuously validated.
In terms of coding patterns, Phidata emphasizes configuration via class
attributes and dataclasses/Pydantic models rather than imperative code. For
example, instead of writing code to manually manage prompt strings each time,
you instantiate an Agent with a set of parameters and the framework internally
uses those to compose prompts. The heavy use of default parameters (with
sensible defaults for things like search_knowledge, markdown, max_loops
for reasoning, etc.) means an Agent can be spun up with minimal code. The
snippet below from an official example demonstrates how many components come
together in a structured way:
1from phi.agent import Agent
2from phi.model.openai import OpenAIChat
3from phi.tools.duckduckgo import DuckDuckGo
4from phi.storage.agent.postgres import PgAgentStorage
5
6agent = Agent(
7 name="Web Assistant",
8 model=OpenAIChat(id="gpt-4o"),
9 tools=[DuckDuckGo()],
10 storage=PgAgentStorage(table_name="agent_sessions", db_url="<postgres-uri>"),
11 show_tool_calls=True,
12 markdown=True
13)
Even without knowing the internals, one can infer the code structure from such
an example: phi.agent.Agent ties together a model
(phi.model.openai.OpenAIChat), a tool from phi.tools, and a storage from
phi.storage.agent. This reflects a clean folder hierarchy where each
sub-package corresponds to a feature area (Agent, Model, Tools, Knowledge,
etc.), making the codebase quite navigable.
Use of Pydantic and Data Models: One notable implementation detail in the
code is the integration of Pydantic for data modeling. Phidata allows agents
to enforce output structure via response_model (alias output_model) which is
defined as a Pydantic BaseModel. In the code, the Agent likely has an
attribute response_model: Type[BaseModel] = None and a flag
structured_outputs: bool. When structured outputs are enabled, the agent’s
logic will either call the model in a mode that yields JSON or will parse the
final text through response_model.parse_obj. For example, in the Structured
Output example, a MovieScript model is defined (subclass of BaseModel with
fields for setting, ending, genre, etc.) and passed to the Agent. The agent then
produces a result that can be accessed as a Pydantic object. Internally, this
suggests the Agent class is itself possibly a Pydantic model for convenience
(defining all these fields with types helps with validation and defaults).
Indeed, community references indicate class Agent(BaseModel): ... to define
the agent’s schema (attributes like llm, tools, memory etc.) in code. This
approach means much of the agent’s configuration is declarative. It also means
the agent (or rather its config/state) can be easily serialized (to JSON, for
logging or monitoring) and that the framework can rely on Pydantic’s validation
for inputs/outputs. So, the code style is a blend of object-oriented (for the
operational logic) and data-model driven (for configuration and I/O).
Dependencies: Under the hood, the framework uses a number of dependencies
which are reflected in the code structure. For example, HTTPX is used for web
requests in tools (as seen in the HackerNews tool example using httpx.get).
SQLAlchemy or psycopg2 might be used in storage classes (for database access).
The OpenAI API (via the openai Python package) is a dependency for the
OpenAIChat model. Vector DB clients like qdrant-client or lancedb are
dependencies for those connectors. These are organized within their modules
(e.g., the Qdrant class uses the qdrant-client under the hood, LanceDB class
uses the lancedb Python SDK). The design favors composition over invention
– Phidata doesn’t reimplement a vector store or an LLM; it wraps existing
services in a unified interface. This keeps the code relatively lean: each
module is a thin integration layer, and the Agent class ties them together with
the logic of prompting and function-calling.
In summary, the code structure of Phidata/Agno is logically separated by feature
and mirrors the conceptual architecture: an Agent class orchestrating Models,
Tools, Knowledge, and Storage, each implemented in its own module. The use of
Pydantic models and Pythonic constructs makes the code approachable and
maintainable, allowing developers to reason about each part (e.g., check the
phi.tools directory for available actions, or the phi.vectordb for how to
plug in a new DB). This modular design also facilitates testing – each piece can
be tested in isolation (and indeed the repository’s tests likely include
separate tests for tool functionality, model integration, etc.). Overall, the
project’s structure underscores its goals of simplicity and flexibility,
enabling contributions or custom extensions without needing to modify a
monolithic core.
Implementation Details
Tool Invocation via Function Calling: One of the standout implementation
choices in Phidata/Agno is how it handles tool use. Instead of relying on the
LLM to output some special string that the framework then has to parse (as older
agent frameworks did), Phidata leverages OpenAI’s function calling API (and
analogous mechanisms for other models where possible). When you attach a tool to
an Agent, under the hood it’s represented as a function schema that the
model can call. The agent sets tool_choice="auto" by default, meaning the
model is free to decide if a tool should be called or not. When the model
chooses to use a tool, it actually returns a JSON object indicating the function
name and arguments, rather than a textual instruction. The framework’s code is
implemented to catch this response. Concretely, in the Agent’s run() method,
after sending the prompt to the model, it likely checks the response: if the
model’s output contains a function_call field (as the OpenAI API would for
function calls), the Agent parses out which tool was requested and the
parameters. It then calls the corresponding Python function (which could be a
simple def or a method on a Toolkit class) with those parameters. The result
(Python object) is converted to a string or JSON serializable form. Then the
Agent sends that back to the model in a new API call, but importantly, it labels
it as the result of the function. The implementation likely uses the model’s
messages list – appending an assistant message like: role: “function”, name:
“ToolName”, content: “{... result ...}”. This informs the LLM of the outcome.
The model then continues. All of this is done in a loop within Agent.run():
check for function_call, execute, append result, and query again, until a
final message of role “assistant” with no function call is obtained. The
pseudocode from a similar ReAct agent (not directly from Phidata, but
illustrating the approach) shows a loop with decide_next_action() parsing the
model output for either a tool name or a final answer, and repeating
accordingly. Phidata’s implementation aligns with this pattern but simplifies it
using the model’s native JSON output to avoid regex parsing. This design yields
a robust and safe execution loop – the model can’t trick the agent into doing
something unintended without explicitly calling a known function, and the
arguments are parsed by JSON (avoiding misreading text). It’s a modern design
choice that puts Phidata in line with current best practices for LLM agents.
Task Handling and Reasoning: The reasoning capability in Phidata is worth
noting. By setting reasoning=True on an Agent, an experimental feature is
enabled where the agent will “think” through a problem in multiple steps
internally before finalizing an answer. Implementation-wise, this likely means
the agent’s system prompt is augmented with something that triggers
chain-of-thought. Possibly it uses a hidden intermediate step where the model is
asked to produce a step-by-step solution which is then validated or trimmed. The
documentation mentions that reasoning combines CoT (Chain-of-Thought) and tool
use, and is currently limited to OpenAI models. This suggests the code might
prompt the model with something like “Think step by step and propose a solution,
but do not reveal the reasoning until final” and then capture the model’s
intermediate thoughts via function calls or a special token. There might even be
an internal tool for “self-reflection” – but from an implementation stance, one
can imagine the agent doing multiple model calls: one to get a draft reasoning,
another to verify or refine it, then the final answer. Since it’s experimental,
the code likely has guarded logic (maybe the agent class has a method
_run_reasoning() that wraps around the normal run loop). The important part is
that this feature shows Phidata’s extensibility: they can bolt on new patterns
(like self-consistency or self-critique loops) without breaking the core
architecture. They explicitly caution it breaks ~20% of the time, meaning the
implementation isn’t foolproof yet – a sign that it’s a work in progress and
probably disabled unless requested.
Memory Implementation: Phidata’s approach to memory is both in-memory and
persistent. Each Agent likely has an AgentMemory object which stores a list of
past AgentRun objects (each containing input, output, maybe tool calls) and
raw message logs. The in-memory list can be used for quick access (e.g., to
retrieve the last user query). The implementation probably keeps memory trim –
e.g., storing at most N recent messages if not using storage. When storage
(database) is attached via storage=..., the agent still uses the in-memory for
the current session, but additionally writes each run to the DB. On a new
session (or agent restart), it can load runs from the DB to reconstruct memory.
The AgentMemory could be a Pydantic model or a simple class with methods like
add_message(), get_recent_history(count) etc. For user memory (personal
notes) and summary, the agent likely uses additional fields – possibly the agent
has a user_memory: Dict[str, Any] and summary: str that can be managed via
tools or explicitly by the developer. Interestingly, the docs mention a tool
for reading chat history (read_chat_history=True) which adds a tool allowing
the model to fetch older messages on demand. Implementation-wise, if that flag
is on, the Agent registers a function (maybe called read_history) that when
invoked by the model will return a chunk of prior conversation (maybe summarized
if long). Similarly, read_tool_call_history=True provides a function to list
what tools have been used so far. These are clever features implemented as just
more tools – the agent doesn’t automatically dump the entire history in the
prompt, but the model can pull it if needed by calling that function. The code
for these tools likely lives in the agent class or a submodule and is added to
the tool list internally. It’s a neat, modular implementation of memory
retrieval.
Structured Outputs: The framework’s support for structured outputs is
implemented via a combination of prompt techniques and Pydantic parsing. When
structured_outputs=True on an agent with a given response_model, the agent
will aim to get the model’s final answer in JSON format. One method (as hinted
by code) is using a special model id like "gpt-4o-2024-08-06" – which might
correspond to an OpenAI model snapshot that had function-calling or better
formatting. Possibly Phidata internally calls the OpenAI API with a “function”
representing the output schema. For example, they might register a
pseudo-function like output_schema() with parameters matching the Pydantic
model fields, so that the model, instead of free-form answering, will choose to
call output_schema and thus return a JSON object that the API directly gives
to the agent (as a function result). This is speculation, but it aligns with how
one might enforce structure using function calling. Alternatively, the agent
could append an instruction: “Provide the answer in a JSON format matching this
schema: {…}” and then use Pydantic to parse the result. In either case, the
Agent’s implementation will take the model’s output (string or function-call
dict) and do response_model.parse_raw(...) or similar to populate a Pydantic
object. If parsing fails (e.g., model’s JSON was invalid), the agent might even
retry or fix brackets – robust implementation would account for minor model
deviations. The presence of a RunResponse class with a .content that can be
either text or an object suggests the code treats the final output
polymorphically. This feature demonstrates a design optimization: by using
Pydantic, they get automatic data validation and conversion. A developer’s
Pydantic model might have type hints (like date: datetime), and if the model
returns a date string, Pydantic will auto-convert it. This reduces the need for
custom parsing code in the framework.
Performance Optimizations: Phidata’s creators paid attention to performance at the implementation level. A few likely optimizations:
- Lazy initialization: Tools and knowledge bases might not load heavy
resources until used. For example, if you attach a PDF knowledge base, it
might delay embedding the documents until you call
load()explicitly (as seen in an example where they callagent.knowledge.load()once, then can reuse it for queries). This prevents slow setup when it’s not needed. - Minimal wrappers: The Agent class doesn’t create intermediate objects for each step (some frameworks create a new “chain” object per query, etc.). In Phidata, the Agent and Model instances are long-lived and reused for calls, and the internal loop simply uses Python control flow. The memory and knowledge retrieval are straightforward list/dict operations or single queries to a DB – nothing computationally heavy on the Python side. As a result, the overhead per call is low, nearly just the API calls to the LLM and tools execution. This is why they report “Agent creation is 6000x faster than LangGraph” and very low memory use – their agent creation likely just sets up some Python objects without any network calls or complex object graphs, whereas other frameworks might construct multi-step pipelines on init. They also possibly reuse HTTP sessions for tools (e.g., the HTTPX client could be persisted) to avoid reinit costs.
- Concurrent design: While not explicitly stated, the design could allow
concurrent agent calls if each call is independent. For example, one could
spin up multiple Agents (each with its own model instance or API key) to
handle parallel queries. Nothing in the implementation fundamentally prevents
it, since each agent encapsulates its state. Moreover, since it’s just Python
code orchestrating API calls, you could use
asyncioor multi-threading with these agents. The absence of external state or singletons in the core code suggests good thread-safety – an important but subtle implementation detail.
Monitoring and UI Hooks: Phidata includes an optional cloud component
(phidata.app) and an Agent UI for chatting with agents. In the code, this likely
corresponds to a flag like monitoring=True that, when enabled, causes the
agent to send logs or telemetry to the Phidata cloud. The docs imply that with
monitoring on, each agent run is logged to an online dashboard (Agent Monitoring
Platform). Implementation-wise, this could be as simple as an HTTP POST in the
Agent’s run() to an API endpoint with the conversation data (probably using
the same Pydantic models to serialize). The UI (playground) might use a
WebSocket or polling to fetch these messages in real-time. The framework’s code
is designed such that these are optional – if you don’t enable monitoring or UI,
none of those network calls happen. This modularity is likely achieved by having
the Agent check a config and call a Logger or Tracker class if present.
In conclusion, the implementation of Phidata (Agno) balances simplicity with
advanced capabilities. It builds on Python standards (data classes, function
calls) and modern LLM API features (function calling, Pydantic integration) to
provide a streamlined yet powerful agent runtime. Key design patterns include:
the ReAct loop implemented via function calls, the use of composition (Agent
containing Tools/Knowledge objects) and not inheritance for capabilities,
extensive use of configuration models, and an emphasis on defaults that “just
work”. This means a developer can do a lot with a few lines of code, as the
framework’s implementation handles the intricate parts like prompt assembly,
tool execution, context management, error handling (e.g., unknown tool requests)
and so on. All these are done in a way that is transparent – with
show_tool_calls=True, you can even see in the output when a tool was used and
what it returned, which is great for debugging. The combination of a clear
architecture, logical code structure, and thoughtful implementation details
(like leveraging JSON function calls and Pydantic models) makes Phidata/Agno a
robust yet developer-friendly framework for building AI agents. It exemplifies
how to harness the power of LLM “reasoning + acting” while abstracting the
boilerplate, allowing engineers to focus on crafting the right prompts, tools,
and knowledge to solve their domain problems.
References:
- Phidata (Agno) Official Documentation – Introduction & Concepts, Tools, Knowledge, Memory & Storage, Teams, Reasoning.
- Phidata GitHub Repository (agno-agi/agno) – README and Examples, Code Imports Illustration.
- Analytics Vidhya – “Building an Agentic RAG with Phidata” (Tarun R. Jain, 2024).
- Dev.to – “Building AI Agents with Agno (Phidata) – Tutorial” (Mehmet Akar, 2023).
- Medium – “Phidata: An open-source platform to build, ship and monitor agentic systems” (Shravan Koninti, 2023). (Plus various code examples and configuration snippets from official sources as cited throughout.)