Overview
The LlmAgent is the foundational agent class in Fast Agent that provides LLM interaction, conversation management, and rich console display capabilities. It serves as the base for all other agent types.
LlmAgent is typically used as a base class. For practical applications, consider using ToolAgent or McpAgent which extend LlmAgent with additional capabilities.
Key Capabilities
Message Management Multi-turn conversations with history tracking
Streaming Display Real-time response rendering with markdown support
Stop Handling Graceful handling of completion reasons
Usage Tracking Token counting and cost estimation
Class Hierarchy
LlmAgent
↓ extends
LlmDecorator
↓ uses
FastAgentLLMProtocol
LlmAgent : UI and interaction layer
LlmDecorator : Core LLM logic and operations
FastAgentLLMProtocol : Provider interface (OpenAI, Anthropic, etc.)
Source Code Reference
The LlmAgent class is defined in:
src/fast_agent/agents/llm_agent.py
Key methods at:
generate_impl(): src/fast_agent/agents/llm_agent.py:543
show_assistant_message(): src/fast_agent/agents/llm_agent.py:109
structured_impl(): src/fast_agent/agents/llm_agent.py:699
Basic Usage
Creating an Agent
import asyncio
from fast_agent.agents.agent_types import AgentConfig
from fast_agent.agents.llm_agent import LlmAgent
from fast_agent.core import Core
from fast_agent.llm.model_factory import ModelFactory
async def main ():
# Initialize Fast Agent core
core = Core()
await core.initialize()
# Configure the agent
config = AgentConfig(
name = "assistant" ,
instruction = "You are a helpful AI assistant." ,
model = "gpt-4o-mini"
)
# Create and initialize agent
agent = LlmAgent(config, context = core.context)
await agent.attach_llm(ModelFactory.create_factory( "gpt-4o-mini" ))
# Send a message
response = await agent.send( "What is Fast Agent?" )
print (response)
await core.cleanup()
asyncio.run(main())
Generating Responses
from fast_agent.core.prompt import Prompt
from fast_agent.types import PromptMessageExtended
# Method 1: Simple send
response = await agent.send( "Hello!" )
# Method 2: Generate with message list
messages = [
Prompt.user( "Explain Python decorators" ),
]
response = await agent.generate(messages, None )
text = response.first_text()
# Method 3: Full message control
message = PromptMessageExtended(
role = "user" ,
content = "What's 2+2?"
)
response = await agent.generate([message], None )
Message Display
Display Properties
The agent uses ConsoleDisplay for rich terminal output:
# Access display component
display = agent.display
# Check if streaming is enabled
enabled, mode = display.resolve_streaming_preferences()
print ( f "Streaming: { enabled } , Mode: { mode } " )
# Output: Streaming: True, Mode: markdown
Custom Message Display
from rich.text import Text
# Display assistant message with custom formatting
response_message = PromptMessageExtended(
role = "assistant" ,
content = "Here's the answer..."
)
await agent.show_assistant_message(
response_message,
name = "MyAgent" ,
model = "gpt-4o" ,
additional_message = Text( " \n Processed in 1.2s" , style = "dim" ),
render_markdown = True
)
User Message Display
user_message = PromptMessageExtended(
role = "user" ,
content = "Analyze this code"
)
agent.show_user_message(user_message)
Conversation History
Accessing History
# Get full conversation history
history = agent.message_history
for msg in history:
print ( f " { msg.role } : { msg.content[: 50 ] } ..." )
if msg.tool_calls:
print ( f " Tool calls: { len (msg.tool_calls) } " )
Managing History
# Clear conversation history
agent.clear()
# Clear history and prompts
agent.clear( clear_prompts = True )
# Load custom history
from fast_agent.types import PromptMessageExtended
conversation = [
PromptMessageExtended( role = "user" , content = "Hello" ),
PromptMessageExtended( role = "assistant" , content = "Hi! How can I help?" ),
PromptMessageExtended( role = "user" , content = "Tell me about Fast Agent" ),
]
agent.load_message_history(conversation)
History Configuration
config = AgentConfig(
name = "stateless_agent" ,
instruction = "Answer questions without context." ,
use_history = False # Disable history tracking
)
Streaming Responses
Automatic Streaming
Streaming is enabled by default and handled automatically:
# This will stream the response token-by-token
response = await agent.send(
"Write a long story about a space explorer"
)
Controlling Streaming
# Disable streaming for next turn only
agent.force_non_streaming_next_turn( reason = "debugging output" )
response = await agent.send( "Generate code" )
# This response won't stream
response = await agent.send( "Another request" )
# Streaming resumes
Streaming Internals
# Check if streaming is active
if agent._active_stream_handle:
print ( "Currently streaming" )
# Close active streaming display
agent.close_active_streaming_display( reason = "starting parallel operation" )
Stop Reason Handling
Understanding Stop Reasons
from fast_agent.types import LlmStopReason
response = await agent.generate([Prompt.user( "Hi" )], None )
match response.stop_reason:
case LlmStopReason. END_TURN :
print ( "Completed normally" )
case LlmStopReason. MAX_TOKENS :
print ( "Hit token limit - consider increasing max_tokens" )
case LlmStopReason. TOOL_USE :
print ( "Requested tool execution" )
case LlmStopReason. SAFETY :
print ( "Safety filter triggered" )
case LlmStopReason. ERROR :
print ( "Error occurred during generation" )
case LlmStopReason. CANCELLED :
print ( "User cancelled generation" )
Error Channel Details
from fast_agent.constants import FAST_AGENT_ERROR_CHANNEL
from fast_agent.mcp.helpers.content_helpers import get_text
response = await agent.generate([Prompt.user( "Test" )], None )
if response.stop_reason == LlmStopReason. ERROR :
if response.channels and FAST_AGENT_ERROR_CHANNEL in response.channels:
error_blocks = response.channels[ FAST_AGENT_ERROR_CHANNEL ]
error_text = get_text(error_blocks[ 0 ])
print ( f "Error details: { error_text } " )
Structured Output
Basic Structured Generation
from pydantic import BaseModel
class Recipe ( BaseModel ):
name: str
ingredients: list[ str ]
steps: list[ str ]
cook_time_minutes: int
messages = [Prompt.user( "Give me a recipe for chocolate chip cookies" )]
recipe, message = await agent.structured(
messages,
Recipe,
None
)
if recipe:
print ( f "Recipe: { recipe.name } " )
print ( f "Ingredients: { len (recipe.ingredients) } " )
print ( f "Cook time: { recipe.cook_time_minutes } minutes" )
Complex Models
from typing import Literal
from pydantic import BaseModel, Field
class Sentiment ( BaseModel ):
sentiment: Literal[ "positive" , "negative" , "neutral" ]
confidence: float = Field( ge = 0 , le = 1 )
key_phrases: list[ str ]
explanation: str
text = """I love using Fast Agent! The documentation is clear
and the API is intuitive."""
messages = [Prompt.user( f "Analyze sentiment: { text } " )]
result, _ = await agent.structured(messages, Sentiment, None )
if result:
print ( f "Sentiment: { result.sentiment } ( { result.confidence :.0%} )" )
print ( f "Key phrases: { ', ' .join(result.key_phrases) } " )
Usage Tracking
Accessing Usage Data
# Get usage accumulator
usage = agent.usage_accumulator
if usage:
print ( f "Input tokens: { usage.input_tokens } " )
print ( f "Output tokens: { usage.output_tokens } " )
print ( f "Total tokens: { usage.total_tokens } " )
print ( f "Total cost: $ { usage.total_cost :.4f} " )
print ( f "Context window: { usage.context_usage_percentage :.1f} %" )
# Get current model name
model_name = agent.llm.model_name if agent.llm else None
print ( f "Using model: { model_name } " )
# Check context percentage during tool calls
if agent.usage_accumulator:
ctx_pct = agent.usage_accumulator.context_usage_percentage
if ctx_pct and ctx_pct > 80 :
print ( "Warning: Approaching context window limit" )
Advanced Features
Workflow Telemetry
from fast_agent.workflow_telemetry import WorkflowTelemetryProvider
# Create custom telemetry provider
class MyTelemetry ( WorkflowTelemetryProvider ):
async def emit_delegation_step ( self , step_data ):
print ( f "Delegation: { step_data } " )
# Attach to agent
agent.workflow_telemetry = MyTelemetry()
Message Hooks
Extend LlmAgent to add custom hooks:
class CustomAgent ( LlmAgent ):
async def generate_impl (
self ,
messages ,
request_params ,
tools
):
# Pre-processing hook
print ( f "Generating response for { len (messages) } messages" )
# Call parent implementation
response = await super ().generate_impl(
messages,
request_params,
tools
)
# Post-processing hook
print ( f "Response generated: { response.stop_reason } " )
return response
Custom Display
from fast_agent.ui.console_display import ConsoleDisplay
# Create custom display
class CustomDisplay ( ConsoleDisplay ):
async def show_assistant_message ( self , message , ** kwargs ):
# Custom rendering logic
print ( f "[CUSTOM] { message.content } " )
await super ().show_assistant_message(message, ** kwargs)
# Attach to agent
agent.display = CustomDisplay( config = core.context.config)
Configuration Reference
AgentConfig Parameters
config = AgentConfig(
# Required
name = "my_agent" ,
# System prompt
instruction = "You are a helpful assistant." ,
# Model selection
model = "gpt-4o-mini" ,
# History management
use_history = True ,
# Description
description = "A general-purpose AI assistant" ,
# Request parameters
default_request_params = RequestParams(
max_tokens = 4096 ,
temperature = 0.7 ,
use_history = True
),
# Agent type
agent_type = AgentType. BASIC ,
# API configuration
api_key = "sk-..." , # Optional override
)
RequestParams
from fast_agent.types import RequestParams
params = RequestParams(
max_tokens = 2048 ,
temperature = 0.5 ,
top_p = 0.9 ,
use_history = True ,
systemPrompt = "Custom system message"
)
response = await agent.generate(
[Prompt.user( "Hello" )],
params
)
Best Practices
Clear history between unrelated conversations
Monitor context window usage with usage_accumulator
Use use_history=False for stateless queries
Consider history size impact on latency and cost
Leave streaming enabled for better UX
Disable streaming for debugging or testing
Close streams before starting parallel operations
Handle streaming errors gracefully
Always check stop_reason after generation
Handle MAX_TOKENS by increasing limits or summarizing
Implement retry logic for ERROR stop reasons
Log errors with context for debugging
Next Steps
Tool Agent Add function calling capabilities
MCP Agent Connect to MCP servers for tools and resources
LLM Providers Configure different LLM providers
Message Types Learn about message structures