MCP Sampling - Fast Agent

MCP Sampling allows MCP servers to request LLM inference through the client. This enables servers to implement sophisticated tool behaviors that leverage AI capabilities without requiring their own API keys or model access.

Overview

With MCP Sampling, servers can:

Create messages: Request LLM completions during tool execution
Use tools: Provide tools for the LLM to use during sampling
Manage conversations: Build multi-turn interactions within a single tool call
Delegate reasoning: Offload complex logic to the LLM

This is particularly powerful for tools that need AI assistance to complete their tasks.

Quick Start

Basic Sampling Configuration

Configure sampling models in fastagent.config.yaml:

mcp:
  servers:
    sampling_server:
      command: "uv"
      args: ["run", "sampling_server.py"]
      sampling:
        model: "haiku"  # Model to use for sampling requests

Using a Sampling-Enabled Tool

import asyncio
from fast_agent import FastAgent

fast = FastAgent("Sampling Example")

@fast.agent(
    instruction="You are a helpful assistant",
    servers=["sampling_tools"]
)
async def main():
    async with fast.run() as agent:
        # Call tool that uses sampling internally
        result = await agent("Call the fetch_secret tool")
        print(result)

if __name__ == "__main__":
    asyncio.run(main())

How It Works

Agent calls tool

The agent invokes a tool from a sampling-enabled MCP server

Server requests sampling

The server calls ctx.session.create_message() with a prompt and optional tools

Client executes sampling

fast-agent sends the request to the configured LLM

Server processes response

The server receives the LLM response and continues tool execution

Result returned

The tool completes and returns the final result to the agent

Creating Sampling-Enabled Tools

Basic Sampling Tool

from mcp.server.fastmcp import Context, FastMCP
from mcp.types import (
    CallToolResult,
    SamplingMessage,
    TextContent,
)

mcp = FastMCP("Sampling Demo")

@mcp.tool()
async def analyze_text(ctx: Context, text: str) -> CallToolResult:
    """Analyze text using LLM sampling"""
    
    # Request sampling from the client
    result = await ctx.session.create_message(
        max_tokens=256,
        messages=[
            SamplingMessage(
                role="user",
                content=TextContent(
                    type="text",
                    text=f"Analyze this text for sentiment: {text}"
                ),
            )
        ],
    )
    
    # Extract response text
    if isinstance(result.content, TextContent):
        response_text = result.content.text
    elif isinstance(result.content, list):
        text_parts = [c.text for c in result.content if isinstance(c, TextContent)]
        response_text = "\n".join(text_parts)
    else:
        response_text = str(result.content)
    
    return CallToolResult(
        content=[TextContent(type="text", text=response_text)]
    )

Sampling with Tools

Provide tools for the LLM to use during sampling:

from mcp.types import Tool, ToolChoice, ToolUseContent, ToolResultContent

# Define calculator tools
CALCULATOR_TOOLS = [
    Tool(
        name="add",
        description="Add two numbers together",
        inputSchema={
            "type": "object",
            "properties": {
                "a": {"type": "number"},
                "b": {"type": "number"},
            },
            "required": ["a", "b"],
        },
    ),
    Tool(
        name="multiply",
        description="Multiply two numbers",
        inputSchema={
            "type": "object",
            "properties": {
                "a": {"type": "number"},
                "b": {"type": "number"},
            },
            "required": ["a", "b"],
        },
    ),
]

@mcp.tool()
async def calculate(ctx: Context, expression: str) -> CallToolResult:
    """Calculate mathematical expression using LLM with tools"""
    
    # Initial sampling request with tools
    result = await ctx.session.create_message(
        max_tokens=256,
        messages=[
            SamplingMessage(
                role="user",
                content=TextContent(
                    type="text",
                    text=f"Calculate: {expression}"
                ),
            )
        ],
        tools=CALCULATOR_TOOLS,
    )
    
    # Handle tool use
    if result.stopReason == "toolUse":
        tool_uses = [c for c in result.content if isinstance(c, ToolUseContent)]
        
        if tool_uses:
            # Execute the tool
            tool_use = tool_uses[0]
            if tool_use.name == "add":
                result_value = tool_use.input["a"] + tool_use.input["b"]
            elif tool_use.name == "multiply":
                result_value = tool_use.input["a"] * tool_use.input["b"]
            
            # Send tool result back for final response
            final_result = await ctx.session.create_message(
                max_tokens=256,
                messages=[
                    SamplingMessage(
                        role="user",
                        content=TextContent(type="text", text=f"Calculate: {expression}"),
                    ),
                    SamplingMessage(role="assistant", content=result.content),
                    SamplingMessage(
                        role="user",
                        content=[
                            ToolResultContent(
                                type="tool_result",
                                toolUseId=tool_use.id,
                                content=[TextContent(type="text", text=str(result_value))],
                            )
                        ],
                    ),
                ],
                tools=CALCULATOR_TOOLS,
            )
            
            return CallToolResult(
                content=[TextContent(type="text", text=final_result.content.text)]
            )
    
    # No tool use, return direct response
    return CallToolResult(
        content=[TextContent(type="text", text=result.content.text)]
    )

Complete Example: Secret Code Tool

This example demonstrates the full sampling-with-tools flow:

import logging
from mcp.server.fastmcp import Context, FastMCP
from mcp.types import (
    CallToolResult,
    SamplingMessage,
    TextContent,
    Tool,
    ToolChoice,
    ToolResultContent,
    ToolUseContent,
)

mcp = FastMCP("Sampling With Tools Demo")

# Tool that the LLM can call during sampling
SECRET_CODE_TOOL = Tool(
    name="get_secret",
    description="Returns a secret code. You must call this tool to get the secret.",
    inputSchema={
        "type": "object",
        "properties": {},
        "required": [],
    },
)

SECRET_CODE = "WHISKEY-TANGO-FOXTROT-42"

@mcp.tool()
async def fetch_secret(ctx: Context) -> CallToolResult:
    """
    Tool that uses sampling with tools to fetch a secret code.
    
    The LLM must call get_secret tool to retrieve the code.
    """
    
    # Request sampling with tool requirement
    result = await ctx.session.create_message(
        max_tokens=256,
        messages=[
            SamplingMessage(
                role="user",
                content=TextContent(
                    type="text",
                    text="Call the get_secret tool to retrieve the secret code, then tell me what it is.",
                ),
            )
        ],
        tools=[SECRET_CODE_TOOL],
        tool_choice=ToolChoice(mode="required"),  # Force tool use
    )
    
    # Handle tool execution
    if result.stopReason == "toolUse":
        tool_uses = [c for c in result.content if isinstance(c, ToolUseContent)]
        
        if tool_uses:
            # Execute get_secret - return the secret
            tool_results = [
                ToolResultContent(
                    type="tool_result",
                    toolUseId=tu.id,
                    content=[TextContent(type="text", text=f"SECRET: {SECRET_CODE}")],
                )
                for tu in tool_uses
            ]
            
            # Get final response with secret included
            final_result = await ctx.session.create_message(
                max_tokens=256,
                messages=[
                    SamplingMessage(
                        role="user",
                        content=TextContent(
                            type="text",
                            text="Call the get_secret tool to retrieve the secret code, then tell me what it is.",
                        ),
                    ),
                    SamplingMessage(role="assistant", content=result.content),
                    SamplingMessage(role="user", content=tool_results),
                ],
                tools=[SECRET_CODE_TOOL],
            )
            
            # Extract final text
            if isinstance(final_result.content, list):
                text_parts = [c.text for c in final_result.content if isinstance(c, TextContent)]
                final_text = "\n".join(text_parts)
            else:
                final_text = final_result.content.text
            
            return CallToolResult(
                content=[TextContent(type="text", text=final_text)]
            )
    
    # Sampling didn't work as expected
    return CallToolResult(
        content=[TextContent(type="text", text="ERROR: Tool was not called")],
        isError=True,
    )

if __name__ == "__main__":
    mcp.run()

Model Selection

Per-Server Configuration

Configure different models for different servers:

mcp:
  servers:
    lightweight_tools:
      command: "uv"
      args: ["run", "simple_server.py"]
      sampling:
        model: "haiku"  # Fast, cheap model
    
    complex_analysis:
      command: "uv"
      args: ["run", "analysis_server.py"]
      sampling:
        model: "sonnet"  # More capable model
    
    reasoning_tools:
      command: "uv"
      args: ["run", "reasoning_server.py"]
      sampling:
        model: "o3-mini.high"  # Reasoning model

Model Aliases

Use model aliases defined in your configuration:

models:
  aliases:
    fast: "claude-3-haiku-20240307"
    smart: "claude-3-5-sonnet-20241022"
    reasoning: "o3-mini.high"

mcp:
  servers:
    my_server:
      sampling:
        model: "fast"  # Uses alias

Use Cases

Text Analysis

Tools that analyze, summarize, or extract information from text using LLM reasoning

Code Generation

Generate code snippets, configurations, or scripts based on requirements

Data Transformation

Transform data between formats using LLM understanding of structure

Decision Making

Tools that need to make complex decisions based on context and criteria

Validation

Validate inputs, outputs, or configurations using LLM reasoning

Mathematical Reasoning

Solve math problems by providing calculator tools to the LLM

Best Practices

Choose Appropriate Models

Match the model to the task complexity:

# Simple tasks
sampling:
  model: "haiku"  # Fast, cost-effective

# Complex reasoning
sampling:
  model: "sonnet"  # More capable

# Mathematical/logical
sampling:
  model: "o3-mini.high"  # Reasoning optimized

Limit Token Usage

Set appropriate max_tokens for sampling requests:

# Brief responses
result = await ctx.session.create_message(
    max_tokens=100,
    messages=[...]
)

# Detailed analysis
result = await ctx.session.create_message(
    max_tokens=2000,
    messages=[...]
)

Handle Tool Loops

Implement proper tool execution loops:

max_iterations = 10
iteration = 0

while iteration < max_iterations:
    result = await ctx.session.create_message(...)
    
    if result.stopReason == "toolUse":
        # Execute tools and continue
        iteration += 1
    else:
        # Got final answer
        break

Error Handling

Handle sampling failures gracefully:

try:
    result = await ctx.session.create_message(...)
except Exception as e:
    return CallToolResult(
        content=[TextContent(
            type="text",
            text=f"Sampling failed: {str(e)}"
        )],
        isError=True,
    )

Advanced Patterns

Multi-Turn Reasoning

@mcp.tool()
async def complex_analysis(ctx: Context, topic: str) -> CallToolResult:
    """Multi-turn analysis with iterative refinement"""
    
    # First turn: Initial analysis
    result1 = await ctx.session.create_message(
        max_tokens=500,
        messages=[
            SamplingMessage(
                role="user",
                content=TextContent(
                    type="text",
                    text=f"Provide initial analysis of: {topic}"
                ),
            )
        ],
    )
    
    # Second turn: Deep dive based on initial analysis
    result2 = await ctx.session.create_message(
        max_tokens=1000,
        messages=[
            SamplingMessage(role="user", content=result1.content),
            SamplingMessage(
                role="assistant",
                content=TextContent(type="text", text="Analysis complete."),
            ),
            SamplingMessage(
                role="user",
                content=TextContent(
                    type="text",
                    text="Now provide detailed recommendations based on your analysis."
                ),
            ),
        ],
    )
    
    return CallToolResult(
        content=[TextContent(type="text", text=result2.content.text)]
    )

Conditional Tool Provision

def get_tools_for_task(task_type: str) -> list[Tool]:
    """Return appropriate tools based on task"""
    if task_type == "math":
        return CALCULATOR_TOOLS
    elif task_type == "research":
        return SEARCH_TOOLS
    return []

@mcp.tool()
async def smart_task(ctx: Context, task: str, task_type: str) -> CallToolResult:
    """Execute task with appropriate tools"""
    tools = get_tools_for_task(task_type)
    
    result = await ctx.session.create_message(
        max_tokens=1000,
        messages=[
            SamplingMessage(
                role="user",
                content=TextContent(type="text", text=task),
            )
        ],
        tools=tools,
    )
    
    # Handle tool execution...

Troubleshooting

Sampling model not configured

Ensure model is specified in config:

mcp:
  servers:
    my_server:
      sampling:
        model: "haiku"  # Required

Tool execution loops infinitely

Add iteration limits:

MAX_ITERATIONS = 10
for i in range(MAX_ITERATIONS):
    result = await ctx.session.create_message(...)
    if result.stopReason != "toolUse":
        break

Model lacks capability

Switch to more capable model:

sampling:
  model: "sonnet"  # More capable than haiku

MCP Servers

Learn about MCP server configuration

Prompts

Use prompts in sampling requests

Model Configuration

Configure models and aliases

Tool Development

Create custom MCP tools

​Overview

​Quick Start

​Basic Sampling Configuration

​Using a Sampling-Enabled Tool

​How It Works

​Creating Sampling-Enabled Tools

​Basic Sampling Tool

​Sampling with Tools

​Complete Example: Secret Code Tool

​Model Selection

​Per-Server Configuration

​Model Aliases

​Use Cases