MAKER Workflow - Fast Agent

Overview

MAKER (Massively decomposed Agentic processes with K-voting Error Reduction) achieves high reliability by sampling a worker agent multiple times and using “first-to-ahead-by-k” voting to select the consensus response. This pattern trades compute for accuracy, enabling cheap models to achieve reliability suitable for million-step tasks.

Based on “Solving a Million-Step LLM Task with Zero Errors” (arXiv:2511.09030)
Credit: Lucid Programmer (PR author)

Key Features

Statistical Consensus: Multiple samples voted to find agreement
First-to-ahead-by-k: Winner needs k-vote margin over alternatives
Red-Flagging: Discard suspicious responses before voting
Provable Bounds: Mathematical error guarantees based on per-step success rate
Cost-Effective: Cheap models with voting can replace expensive models

When to Use MAKER

Ideal Use Cases

Long chains of simple steps where rare errors compound:

ETL Pipelines: 1000s of row transformations - one bad parse = corrupted data
Code Migration: 1000s of file changes - one syntax error = build fails
Document Processing: 1000s of pages - one missed field = compliance failure
Data Validation: Millions of records - one wrong validation = bad data in prod
Automated Testing: 1000s of assertions - one false positive = wasted debugging
Cost Optimization: Cheap model + voting replaces expensive model

When NOT to use MAKER:

Single classifications (just use a good model - 95% accuracy is fine)
Creative/open-ended tasks (no “correct” answer to vote on)
Complex reasoning (need smarter model, not more samples)
Tasks where occasional errors are acceptable

The Math Behind MAKER

95% per-step accuracy over 100 steps:
  0.95^100 = 0.6% overall success ❌

99.9% per-step accuracy (with MAKER) over 100 steps:
  0.999^100 = 90% overall success ✅

For million-step tasks:
  Even 99% per-step fails
  MAKER enables 99.99%+ per-step reliability

Basic Usage

import asyncio
from fast_agent import FastAgent

fast = FastAgent("MAKER Example")

# Define a classifier using a cheap model (Haiku)
@fast.agent(
    name="classifier",
    model="claude-3-haiku-20240307",
    instruction="""You are a customer support intent classifier.
Classify the customer message into exactly one of: COMPLAINT, QUESTION, REQUEST, FEEDBACK.
Respond with ONLY the single word classification, nothing else.

Examples:
- "This product is broken!" → COMPLAINT
- "How do I reset my password?" → QUESTION
- "Please cancel my subscription" → REQUEST
- "Just wanted to say I love the new feature" → FEEDBACK""",
)

# Wrap with MAKER for reliable, consistent classification
@fast.maker(
    name="reliable_classifier",
    worker="classifier",
    k=3,  # Require 3-vote margin for consensus
    max_samples=10,  # Max attempts before falling back to plurality
    match_strategy="normalized",  # Ignore case/whitespace differences
    red_flag_max_length=20,  # Discard verbose responses (should be one word)
)
async def main():
    async with fast.run() as agent:
        # Classify ambiguous customer messages
        result = await agent.reliable_classifier.send("I've been waiting for 3 days now.")
        print(result)

if __name__ == "__main__":
    asyncio.run(main())

Configuration Parameters

name

string

required

Name of the MAKER workflow

worker

string

required

Name of the worker agent to sample from

int

default:"3"

Voting margin required (first-to-ahead-by-k). Higher k = more reliable but more samples needed. Paper recommends k ≥ 3 for high reliability.

max_samples

int

default:"50"

Maximum samples before falling back to plurality vote

match_strategy

MatchStrategy

default:"exact"

How to compare responses for voting:

exact: Character-for-character match
normalized: Ignore case/whitespace
structured: Parse and compare JSON

match_fn

Callable[[str], str]

Custom normalization function (overrides match_strategy)

red_flag_max_length

int

Discard responses longer than this (characters). Per the paper, overly long responses correlate with errors.

red_flag_validator

Callable[[str], bool]

Custom validator function. Return False to red-flag (discard) the response.

How First-to-Ahead-by-k Works

Example with k=3:

Sample 1: "COMPLAINT"     Votes: {COMPLAINT: 1}
Sample 2: "COMPLAINT"     Votes: {COMPLAINT: 2}
Sample 3: "QUESTION"      Votes: {COMPLAINT: 2, QUESTION: 1}
Sample 4: "COMPLAINT"     Votes: {COMPLAINT: 3, QUESTION: 1}
Sample 5: "COMPLAINT"     Votes: {COMPLAINT: 4, QUESTION: 1}
                           Leader margin: 4 - 1 = 3 ✅
                           Winner: "COMPLAINT" (converged)

Match Strategies

Exact Match

match_strategy="exact"  # "Hello" ≠ "hello"

Normalized Match

match_strategy="normalized"  # "Hello World" = "hello  world" = "HELLO WORLD"

Structured Match

match_strategy="structured"  # {"a": 1, "b": 2} = {"b": 2, "a": 1}

Custom Match Function

def custom_normalizer(response: str) -> str:
    # Extract only digits
    return "".join(c for c in response if c.isdigit())

@fast.maker(
    name="number_extractor",
    worker="extractor",
    k=3,
    match_fn=custom_normalizer,  # Overrides match_strategy
)

Red-Flagging

Red-flagging improves effective success rate by discarding confused responses:

Length-Based Red-Flagging

@fast.maker(
    name="concise_classifier",
    worker="classifier",
    k=3,
    red_flag_max_length=20,  # Expect one-word answers
)

Custom Validation

def validate_classification(response: str) -> bool:
    valid_classes = {"COMPLAINT", "QUESTION", "REQUEST", "FEEDBACK"}
    return response.strip().upper() in valid_classes

@fast.maker(
    name="validated_classifier",
    worker="classifier",
    k=3,
    red_flag_validator=validate_classification,
)

Accessing Voting Results

result = await agent.reliable_classifier.send("Message to classify")

# Access detailed voting statistics
stats = agent.reliable_classifier.last_result
print(f"Winner: {stats.winner}")
print(f"Votes: {stats.votes}")  # e.g., {'COMPLAINT': 4, 'QUESTION': 1}
print(f"Total samples: {stats.total_samples}")
print(f"Discarded samples: {stats.discarded_samples}")
print(f"Winning margin: {stats.margin}")
print(f"Converged: {stats.converged}")  # True if k-margin achieved

Advanced Examples

Data Validation Pipeline

@fast.agent(
    "email_validator",
    model="haiku",
    instruction="""Validate if the input is a properly formatted email address.
Respond with only: VALID or INVALID""",
)
@fast.maker(
    name="reliable_email_validator",
    worker="email_validator",
    k=5,  # Very high reliability for data validation
    max_samples=15,
    match_strategy="normalized",
    red_flag_max_length=10,
)

async def validate_million_emails(emails: list[str]) -> list[bool]:
    results = []
    async with fast.run() as agent:
        for email in emails:
            result = await agent.reliable_email_validator.send(email)
            results.append(result == "VALID")
    return results

Code Syntax Checker

@fast.agent(
    "syntax_checker",
    model="gpt-3.5-turbo",  # Cheap model
    instruction="""Check if the Python code has syntax errors.
Respond with only: VALID or ERROR""",
)
@fast.maker(
    name="reliable_syntax_checker",
    worker="syntax_checker",
    k=3,
    max_samples=12,
    match_strategy="normalized",
)

async def check_codebase(files: list[str]) -> dict[str, bool]:
    results = {}
    async with fast.run() as agent:
        for file_path in files:
            with open(file_path) as f:
                code = f.read()
            result = await agent.reliable_syntax_checker.send(
                f"Check syntax: {code}"
            )
            results[file_path] = result == "VALID"
    return results

Structured Data Extraction

@fast.agent(
    "data_extractor",
    model="haiku",
    instruction="""Extract name, email, and phone from the text.
Respond with JSON: {"name": "...", "email": "...", "phone": "..."}""",
)
@fast.maker(
    name="reliable_extractor",
    worker="data_extractor",
    k=4,
    max_samples=20,
    match_strategy="structured",  # Compare parsed JSON
    red_flag_max_length=200,
)

async def extract_contact_info(documents: list[str]) -> list[dict]:
    results = []
    async with fast.run() as agent:
        for doc in documents:
            result = await agent.reliable_extractor.send(doc)
            results.append(json.loads(result))
    return results

Cost vs. Reliability Tradeoff

Higher k

More Reliable

Higher confidence in consensus
Better error bounds
More samples needed
Higher cost

Lower k

Faster/Cheaper

Quicker convergence
Fewer samples on average
Lower cost
Less strict consensus

Recommended k values:

k=2: Low-stakes, cost-sensitive
k=3: Standard (good balance)
k=5: High-stakes, critical accuracy
k=7+: Mission-critical, zero-error tolerance

Performance Characteristics

Typical Convergence (k=3, 95% per-step accuracy):
- Average samples: 5-7
- Convergence rate: 90%+
- Effective accuracy: 99.9%+

Cost Example:
- Worker: Haiku @ $0.25/MTok
- Average 6 samples per task
- 1000 tasks = 6000 calls
- Still cheaper than 1000 calls to expensive model

Best Practices

Simple Worker Tasks

MAKER works best with simple, deterministic tasks where there’s a “correct” answer

Red-Flag Aggressively

Discard obvious errors early to improve effective success rate

Appropriate k

Match k to your reliability needs and cost constraints

Monitor Convergence

Track convergence rates to tune k and max_samples

Debugging

Enable detailed logging to see voting progress:

import logging
logging.basicConfig(level=logging.DEBUG)

# You'll see:
# DEBUG: Sample 1: 1 votes for this response
# DEBUG: Sample 2: 2 votes for this response
# DEBUG: Sample 3: 1 votes for alternative response
# DEBUG: Sample 4: 3 votes for this response
# DEBUG: MAKER converged: 3 votes, margin 2, 4 samples

Use Cases by Industry

Finance: Transaction classification, fraud detection flags
Healthcare: Medical coding, diagnosis categorization
Legal: Document classification, clause identification
Manufacturing: Quality control checks, defect classification
E-commerce: Product categorization, review sentiment
DevOps: Log analysis, error classification

Comparison with Other Patterns

Feature	MAKER	Evaluator-Optimizer	Chain	Router
Error Reduction	✅ Statistical	✅ Feedback-driven	❌ None	❌ None
Reliability Guarantee	✅ Mathematical	❌ Heuristic	❌ None	❌ None
Task Type	Simple, deterministic	Complex, creative	Any	Any
Cost Model	Multiple samples	Multiple iterations	Single pass	Single pass
Best For	High-volume, zero-error	Quality content	Pipelines	Routing

Evaluator-Optimizer - Quality through feedback (different approach)
Parallel - Multiple agents without voting
Chain - Sequential processing where MAKER can be a step

​Overview

​Key Features

​When to Use MAKER

Ideal Use Cases

​The Math Behind MAKER

​Basic Usage

​Configuration Parameters

​How First-to-Ahead-by-k Works

​Match Strategies

​Exact Match

​Normalized Match

​Structured Match

​Custom Match Function

​Red-Flagging

​Length-Based Red-Flagging

​Custom Validation

​Accessing Voting Results

​Advanced Examples

​Data Validation Pipeline

​Code Syntax Checker

​Structured Data Extraction

​Cost vs. Reliability Tradeoff

Higher k

Lower k

​Performance Characteristics

​Best Practices

Simple Worker Tasks

Red-Flag Aggressively

Appropriate k

Monitor Convergence

​Debugging

​Use Cases by Industry

​Comparison with Other Patterns

​Related Patterns

Overview

Key Features

When to Use MAKER

The Math Behind MAKER

Basic Usage

Configuration Parameters

How First-to-Ahead-by-k Works

Match Strategies

Exact Match

Normalized Match

Structured Match

Custom Match Function

Red-Flagging

Length-Based Red-Flagging

Custom Validation

Accessing Voting Results

Advanced Examples

Data Validation Pipeline

Code Syntax Checker

Structured Data Extraction

Cost vs. Reliability Tradeoff

Performance Characteristics

Best Practices

Debugging

Use Cases by Industry

Comparison with Other Patterns

Related Patterns