Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

drewtech/pyagent

Open more actions menu

Repository files navigation

Pydantic AI SQL Analysis Agent Demo

Tests License Python

A production-ready demo showcasing Pydantic AI core features for experienced developers. This agent converts natural language questions into SQL queries, executes them, and returns type-safe structured results.

New to this project? Start with docs/QUICKSTART.md for setup and validation. Then explore docs/NAVIGATION.md for a complete learning path.

🎯 Features Demonstrated

1. Function Calling & Tool Integration

The agent uses two registered tools to solve problems:

  • write_sql_draft() - Generates SQL from natural language questions
  • execute_sql_query() - Safely executes validated SQL and returns results

The agent autonomously decides which tools to use and chains them together.

2. Structured Outputs with Type Safety

All agent responses are validated against Pydantic models:

class AnalysisResult(BaseModel):
    user_question: str
    query: str                              # The executed SQL
    summary: str                            # Human-readable findings
    insights: list[str]                     # Key observations
    row_count: int                          # Result size
    sample_data: list[dict[str, Any]]       # Data preview

If the agent violates the schema, Pydantic validates and the agent retries automatically.

3. Dependency Injection & Context

The agent receives dependencies through RunContext:

@agent.tool
async def execute_sql_query(
    ctx: RunContext[AgentDeps],  # Type-safe context
    sql: str,
    description: str = "",
) -> QueryResult:
    result = ctx.deps.db.execute_query(sql)  # Access injected DB
    return QueryResult(...)

4. Full Type Safety

  • Function signatures provide IDE auto-completion
  • Pydantic validates all inputs/outputs
  • Type hints catch errors at write-time, not runtime

� Documentation

Document Purpose
docs/QUICKSTART.md 5-minute setup and validation — Start here!
docs/NAVIGATION.md Project overview and learning paths
docs/MCP_SERVER_GUIDE.md MCP pattern explanation and tool documentation
docs/MCP_IMPLEMENTATION_GUIDE.md Deep-dive code walkthrough and architecture
docs/MCP_DEMO_SUMMARY.md What was built: features and test results
docs/EXTENSIONS.md Advanced features: FastAPI, caching, monitoring, multi-agent patterns
docs/CI_CD_SETUP.md Travis CI & GitHub Actions pipeline configuration
docs/EXTERNAL_MCP_INTEGRATION.md Integrating SaaS MCP endpoints (Context7, custom services)
docs/CONTEXT7_TUTORIAL.md Quick tutorial: using Context7 documentation AI service

📋 Project Structure

pyagent/
├── src/
│   ├── __init__.py          # Package definition
│   ├── models.py            # Pydantic output schemas
│   ├── database.py          # SQLite layer with sample data
│   ├── agent.py             # Agent definition and tools
│   ├── mcp_server.py        # Local MCP analytics server
│   ├── external_mcp_client.py # SaaS MCP client (Context7, custom)
│   └── main.py              # CLI interface
├── tests/                   # Unit and integration tests
├── docs/                    # Complete documentation guides
│   ├── QUICKSTART.md        # Setup and validation (start here)
│   ├── NAVIGATION.md        # Overview and learning paths
│   ├── MCP_SERVER_GUIDE.md  # Tool documentation
│   ├── MCP_IMPLEMENTATION_GUIDE.md  # Deep-dive walkthrough
│   ├── MCP_DEMO_SUMMARY.md  # Feature summary
│   ├── EXTENSIONS.md        # Advanced patterns
│   ├── CI_CD_SETUP.md       # Pipeline configuration
│   └── EXTERNAL_MCP_INTEGRATION.md  # SaaS MCP integration
├── examples_mcp.py          # MCP server usage examples
├── examples_external_mcp.py # External/SaaS MCP integration examples
├── .github/workflows/       # GitHub Actions workflows
├── .travis.yml              # Travis CI configuration
├── pyproject.toml           # Dependencies and build config
├── .env.example             # Environment template
├── .gitignore               # Git ignore rules
├── LICENSE                  # MIT License
└── README.md                # This file

🚀 Quick Start

1. Install Dependencies

# Clone the repository
cd /home/drew/code/pyagent

# Create virtual environment (optional but recommended)
python3.10 -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -e .

2. Set Up API Key

# Copy the template
cp .env.example .env

# Edit .env and add your OpenAI API key
# Get it from: https://platform.openai.com/api-keys
OPENAI_API_KEY=sk-proj-xxxxx...

3. Run the Demo

python -m src.main

💡 Example Queries

The demo includes 4 pre-built examples:

1️⃣ Top Customers by Lifetime Value

Question: "Who are the top 3 customers by lifetime value? Include their email and country."

Expected Output:

  • Generated SQL query with proper JOINs and ORDER BY
  • Results showing customers sorted by spending
  • Insights about high-value customer locations

2️⃣ Orders by Status Distribution

Question: "How many orders are in each status (delivered, pending, shipped)? Show the count for each status."

Expected Output:

  • SQL with GROUP BY clause
  • Order counts per status
  • Distribution insights

3️⃣ Average Order Value by Category

Question: "What is the average order value for products in each category?"

Expected Output:

  • SQL with JOINs and aggregations
  • Average spend per product category
  • Category comparison insights

4️⃣ Customer Purchase Patterns

Question: "Which customers have made more than one purchase? Show their name and order count."

Expected Output:

  • SQL with GROUP BY and HAVING
  • Repeat customer identification
  • Pattern insights

Custom Questions

You can also ask your own questions about the data!

📊 Sample Data

The demo includes a pre-populated SQLite database:

Customers (8 records)

  • IDs 1-8 with names, emails, signup dates, countries
  • Lifetime value tracking from $1,800 to $6,100

Products (8 records)

Electronics:     Laptop Pro, Wireless Mouse, USB-C Cable, Mechanical Keyboard
Office:          Desk Lamp, Ergonomic Chair, Standing Desk
Stationery:      Notebook Set

Orders (12 records)

  • Various statuses: delivered, pending, shipped
  • Links customers to products with quantities and amounts
  • Date range: January 2024 - March 2024

🏗️ Code Architecture

Agent Definition (src/agent.py)

agent = Agent(
    model="anthropic:claude-3-5-sonnet-20241022",
    deps_type=AgentDeps,              # Injected dependencies
    output_type=AnalysisResult,       # Type-safe output
    system_prompt="""..."""           # Instructions
)

@agent.tool
async def execute_sql_query(
    ctx: RunContext[AgentDeps],
    sql: str,
) -> QueryResult:
    """Tool description for LLM - shown to agent"""
    result = ctx.deps.db.execute_query(sql)
    return QueryResult(...)

Key Points:

  • model= specifies Claude (easily swap for GPT-4, Gemini, etc.)
  • deps_type= provides type-safe dependency injection
  • output_type= guarantees structured JSON output
  • @agent.tool decorator registers functions as LLM-callable tools

Database Layer (src/database.py)

db = Database(":memory:")  # In-memory SQLite
result = db.execute_query("SELECT...")

# Returns:
{
    "columns": ["id", "name", ...],
    "rows": [{"id": 1, "name": "Alice", ...}, ...],
    "row_count": 42
}

Safety Features:

  • validate_sql() checks for SQL injection and dangerous operations
  • Only SELECT queries allowed (no UPDATE/DELETE/DROP)
  • Parsed with sqlglot for syntax validation

Models (src/models.py)

class AnalysisResult(BaseModel):
    user_question: str           # Original question
    query: str                   # Generated SQL
    summary: str                 # Human-readable summary
    insights: list[str]          # Key findings
    row_count: int               # Number of rows
    sample_data: list[dict[str, Any]]  # Data preview

All fields are validated by Pydantic; agent retries on validation failure.

🧪 Testing

Run the included tests:

# Install dev dependencies
pip install -e .[dev]

# Run tests
pytest tests/

# Run with coverage
pytest --cov=src tests/

🔄 Extensibility

Adding More Tools

@agent.tool
async def get_customer_graph(
    ctx: RunContext[AgentDeps],
    customer_id: int
) -> dict[str, Any]:
    """Build network graph of customer connections."""
    # Implementation
    return graph_data

Switching Models

Change one line to use a different provider:

# Using OpenAI (default)
agent = Agent(model="openai:gpt-4o", ...)

# Try with Anthropic Claude
agent = Agent(model="anthropic:claude-3-5-sonnet-20241022", ...)

# Try with Google Gemini
agent = Agent(model="google-gla:gemini-2.0-flash", ...)

# Try with local Ollama
agent = Agent(model="ollama:llama2", ...)

Adding Custom Output Validators

@agent.result_validator
def validate_insights(result: AnalysisResult) -> AnalysisResult:
    """Custom validation beyond Pydantic model."""
    if len(result.insights) == 0:
        raise ValueError("Must provide at least one insight")
    return result

🎓 Learning Resources

Understanding the Features

  1. Function Calling: The agent sees tool definitions and decides when to call them

    • Tools are Python functions with docstrings
    • Agent sees docstrings and uses them as instructions
    • Return types become part of agent context
  2. Structured Outputs: Pydantic models ensure type-safe responses

    • Agent is constrained to return valid JSON
    • Automatic retry if validation fails
    • IDE auto-completion for result fields
  3. Type Safety: Full type annotations enable:

    • Write-time error detection
    • IDE auto-completion for tool parameters
    • Runtime validation of all inputs/outputs

Official Docs

⚖️ License

MIT License - See LICENSE file

🚢 Production Considerations

This demo covers essential patterns for production use:

Error Handling: Query validation and graceful failures
Type Safety: Full Pydantic validation of inputs/outputs
Scalability: Async/await support for concurrent requests
Observability: Structured outputs enable detailed logging
Security: SQL injection prevention with query validation

For production, additionally consider:

  • Model fallback strategies (use cheaper model if primary fails)
  • Rate limiting and quota management
  • Usage tracking and cost monitoring
  • Fine-tuned prompts for your specific domain
  • Logging and observability integration

🤔 FAQ

Q: How does the agent decide which tool to use?
A: The LLM reads tool docstrings and signatures, then decides which tool is appropriate for the user's question.

Q: What happens if the SQL query is invalid?
A: The database validation layer returns an error, which is fed back to the agent for retry.

Q: Can I use this with my own database?
A: Yes! Modify Database class in src/database.py to connect to your actual database.

Q: Does it work with other LLMs?
A: Yes! Change the model parameter in src/agent.py to any supported provider (OpenAI, Google, Groq, etc.)

About

Playing around with pydantic-ai

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Morty Proxy This is a proxified and sanitized view of the page, visit original site.