Evaluator-Optimizer Workflow

Overview

The Evaluator-Optimizer workflow implements iterative refinement through a feedback loop between two specialized agents: a generator that produces content and an evaluator that assesses quality and provides actionable feedback. This pattern continues until content meets quality standards or maximum refinements are reached.

This pattern is inspired by Anthropic’s “Building Effective Agents” research and is ideal for high-quality content generation.

Key Features

Iterative Refinement: Continuous improvement through feedback loops
Quality-Driven: Stops when content meets specified quality rating
Structured Evaluation: Uses Pydantic models for consistent feedback
Actionable Feedback: Evaluator provides specific improvement areas
Best Response Tracking: Returns highest-quality result even if target not reached

Basic Usage

import asyncio
from fast_agent import FastAgent

fast = FastAgent("Evaluator-Optimizer")

# Define generator agent
@fast.agent(
    name="generator",
    instruction="""You are a career coach specializing in cover letter writing.
    You are tasked with generating a compelling cover letter given the job posting,
    candidate details, and company information. Tailor the response to the company 
    and job requirements.""",
    servers=["fetch"],
    model="gpt-5-nano.low",
    use_history=True,
)

# Define evaluator agent
@fast.agent(
    name="evaluator",
    instruction="""Evaluate the following response based on the criteria below:
    1. Clarity: Is the language clear, concise, and grammatically correct?
    2. Specificity: Does the response include relevant and concrete details?
    3. Relevance: Does the response align with the prompt?
    4. Tone and Style: Is the tone professional and appropriate?
    5. Persuasiveness: Does the response effectively highlight value?
    6. Grammar and Mechanics: Are there spelling or grammatical issues?
    7. Feedback Alignment: Has the response addressed previous feedback?

    For each criterion:
    - Provide a rating (EXCELLENT, GOOD, FAIR, or POOR)
    - Offer specific feedback or suggestions for improvement

    Summarize your evaluation with overall quality rating and specific feedback.""",
    model="o3-mini.medium",
)

# Define the evaluator-optimizer workflow
@fast.evaluator_optimizer(
    name="cover_letter_writer",
    generator="generator",
    evaluator="evaluator",
    min_rating="EXCELLENT",
    max_refinements=3,
)
async def main() -> None:
    async with fast.run() as agent:
        job_posting = (
            "Software Engineer at LastMile AI. Responsibilities include developing AI systems, "
            "collaborating with cross-functional teams, and enhancing scalability. Skills required: "
            "Python, distributed systems, and machine learning."
        )
        candidate_details = (
            "Alex Johnson, 3 years in machine learning, contributor to open-source AI projects, "
            "proficient in Python and TensorFlow. Motivated by building scalable AI systems."
        )
        company_information = (
            "Look up from the LastMile AI About page: https://lastmileai.dev/about"
        )

        await agent.cover_letter_writer.send(
            f"Write a cover letter for the following job posting: {job_posting}\n\n"
            f"Candidate Details: {candidate_details}\n\n"
            f"Company information: {company_information}",
        )

if __name__ == "__main__":
    asyncio.run(main())

Configuration Parameters

name

string

required

Name of the evaluator-optimizer workflow

generator

string

required

Name of the content generator agent

evaluator

string

required

Name of the evaluator agent

min_rating

QualityRating

default:"GOOD"

Minimum acceptable quality rating: EXCELLENT, GOOD, FAIR, or POOR

Maximum number of refinement iterations

refinement_instruction

string

Optional custom instruction for refinement process

Quality Ratings

The workflow uses a structured quality rating system:

class QualityRating(str, Enum):
    POOR = "POOR"          # Major improvements needed
    FAIR = "FAIR"          # Several improvements needed
    GOOD = "GOOD"          # Minor improvements possible
    EXCELLENT = "EXCELLENT" # No improvements needed

Evaluation Response Format

class EvaluationResult(BaseModel):
    rating: QualityRating
    feedback: str
    needs_improvement: bool
    focus_areas: List[str]

Example evaluation:

{
  "rating": "GOOD",
  "feedback": "Cover letter demonstrates strong technical alignment but could better highlight specific achievements and company research.",
  "needs_improvement": true,
  "focus_areas": [
    "Add quantifiable achievements from past projects",
    "Reference specific LastMile AI products or initiatives",
    "Strengthen the closing call-to-action"
  ]
}

How It Works

Workflow Steps:

Initial Generation: Generator creates first version
Evaluation: Evaluator assesses quality and provides feedback
Quality Check: Compare rating against minimum threshold
Refinement: If needed and refinements remain, generator improves based on feedback
Iteration: Repeat evaluation and refinement
Completion: Return best result when quality met or max refinements reached

Generator History Modes

The generator’s use_history setting affects refinement:

With History (`use_history=True`)

@fast.agent(
    name="generator",
    instruction="Your instruction here",
    use_history=True,  # Generator sees conversation context
)

Refinement prompt references feedback conversationally:

You are tasked with improving your previous response.
This is iteration 2 of the refinement process.

<fastagent:feedback>
  <rating>GOOD</rating>
  <details>Strong technical content but needs more specific examples</details>
  <focus-areas>
    * Add quantifiable achievements
    * Reference specific company initiatives
  </focus-areas>
</fastagent:feedback>

Create an improved version...

Without History (`use_history=False`)

The previous iteration’s full output is included in refinement prompt.

Advanced Examples

Research Report Generator

@fast.agent(
    "researcher",
    instruction="""Research a topic thoroughly using available sources.
    Produce a comprehensive, well-cited report with clear sections.""",
    servers=["fetch"],
    model="sonnet",
    use_history=True,
)
@fast.agent(
    "research_evaluator",
    instruction="""Evaluate research reports on:
    1. Depth of research and source quality
    2. Accuracy and factual correctness
    3. Organization and structure
    4. Citation quality and completeness
    5. Clarity and readability
    
    Provide detailed, actionable feedback for improvement.""",
    model="sonnet",
)
@fast.evaluator_optimizer(
    name="research_assistant",
    generator="researcher",
    evaluator="research_evaluator",
    min_rating="EXCELLENT",
    max_refinements=5,
)
async def main() -> None:
    async with fast.run() as agent:
        await agent.research_assistant.send(
            "Produce a comprehensive report on the environmental impact of cryptocurrency mining"
        )

Code Quality Improver

@fast.agent(
    "code_generator",
    instruction="""Generate clean, well-documented Python code that solves the given problem.
    Follow PEP 8 style guidelines and include docstrings.""",
    servers=["filesystem"],
    use_history=True,
)
@fast.agent(
    "code_reviewer",
    instruction="""Review code for:
    1. Correctness and bug-free implementation
    2. Code style and PEP 8 compliance
    3. Documentation quality
    4. Performance and efficiency
    5. Error handling
    6. Test coverage
    
    Provide specific line-level feedback.""",
    model="sonnet",
)
@fast.evaluator_optimizer(
    name="code_improver",
    generator="code_generator",
    evaluator="code_reviewer",
    min_rating="GOOD",
    max_refinements=4,
)
async def main() -> None:
    async with fast.run() as agent:
        await agent.code_improver.send(
            "Create a Python class for managing a connection pool with automatic retry logic"
        )

Marketing Copy Optimizer

@fast.agent(
    "copywriter",
    instruction="""Write compelling marketing copy that engages the target audience,
    highlights key benefits, and includes a strong call-to-action.""",
    use_history=True,
)
@fast.agent(
    "copy_critic",
    instruction="""Evaluate marketing copy on:
    1. Attention-grabbing headline
    2. Clear value proposition
    3. Emotional resonance
    4. Call-to-action effectiveness
    5. Brand voice alignment
    6. Grammar and readability
    
    Focus on conversion optimization.""",
)
@fast.evaluator_optimizer(
    name="copy_optimizer",
    generator="copywriter",
    evaluator="copy_critic",
    min_rating="EXCELLENT",
    max_refinements=3,
)
async def main() -> None:
    async with fast.run() as agent:
        await agent.copy_optimizer.send(
            "Write landing page copy for a new AI-powered project management tool"
        )

Provide domain-specific guidance for the refinement process:

CUSTOM_REFINEMENT = """
You are an academic writing specialist.
Each refinement should strengthen:
1. Thesis clarity and argumentation
2. Evidence quality and citation accuracy
3. Academic tone and formality
4. Logical flow between paragraphs
5. Critical analysis depth
"""

@fast.evaluator_optimizer(
    name="academic_writer",
    generator="writer",
    evaluator="evaluator",
    refinement_instruction=CUSTOM_REFINEMENT,
    min_rating="EXCELLENT",
    max_refinements=4,
)

Best Practices

Specific Evaluator Instructions

Define clear, measurable evaluation criteria for consistent feedback

Actionable Feedback

Ensure evaluator provides specific, implementable improvement suggestions

Appropriate Refinement Limits

Balance quality goals with cost - typically 3-5 refinements

Generator History Management

Use history mode when refinements build on conversation context

Performance Considerations

Cost Scaling: Each refinement doubles the LLM calls (generator + evaluator).Example with max_refinements=3:

Initial: 2 calls (generate + evaluate)
Refinement 1: 2 calls
Refinement 2: 2 calls
Refinement 3: 2 calls
Total: 8 LLM calls

Use appropriate max_refinements and consider using cheaper models for initial iterations.

Tracking Refinement History

Access the refinement history for debugging or analysis:

result = await agent.my_optimizer.send("Generate content...")

# Access refinement history
history = agent.my_optimizer.refinement_history
for iteration in history:
    print(f"Attempt {iteration['attempt']}")
    print(f"Rating: {iteration['evaluation']['rating']}")
    print(f"Feedback: {iteration['evaluation']['feedback']}")

Use Cases

Content Creation: Blog posts, articles, marketing copy
Technical Writing: Documentation, reports, specifications
Creative Writing: Stories, scripts, poetry with quality standards
Code Generation: Iteratively improve code quality and style
Academic Writing: Research papers, essays, thesis work
Legal Documents: Contracts, policies with accuracy requirements
Translation: Improve translation quality through feedback
Resume/Cover Letters: Personalized, high-quality job applications

Feature	Evaluator-Optimizer	Chain	MAKER	Orchestrator
Feedback Loop	✅ Iterative	❌ One-shot	❌ Voting only	❌ Linear
Quality Control	✅ Explicit	❌ None	✅ Statistical	❌ None
Refinement	✅ Guided	❌ None	❌ None	❌ None
Agent Count	2 (gen + eval)	Multiple	1 + wrapper	Multiple
Best For	Quality content	Pipelines	High reliability	Complex tasks

MAKER - Statistical reliability through voting (different quality approach)
Chain - Multi-stage processing without feedback loops
Orchestrator - Complex task decomposition without iterative refinement

​Overview

​Key Features

​Basic Usage

​Configuration Parameters

​Quality Ratings

​Evaluation Response Format

​How It Works

​Generator History Modes

​With History (use_history=True)

​Without History (use_history=False)

​Advanced Examples

​Research Report Generator

​Code Quality Improver

​Marketing Copy Optimizer

​Custom Refinement Instructions

​Best Practices

Specific Evaluator Instructions

Actionable Feedback

Appropriate Refinement Limits

Generator History Management

​Performance Considerations

​Tracking Refinement History

​Use Cases

​Comparison with Related Patterns

​Related Patterns

Overview

Key Features

Basic Usage

Configuration Parameters

Quality Ratings

Evaluation Response Format

How It Works

Generator History Modes

With History (`use_history=True`)

Without History (`use_history=False`)

Advanced Examples

Research Report Generator

Code Quality Improver

Marketing Copy Optimizer

Custom Refinement Instructions

Best Practices

Performance Considerations

Tracking Refinement History

Use Cases

Comparison with Related Patterns

Related Patterns