docs: add modular CrewAI + Promptfoo evaluation example (agent-task prompt testing) #5807

Ayush7614 · Oct 3, 2025

This PR introduces a new example demonstrating how to evaluate modular agent-task prompts in the CrewAI framework using Promptfoo.

Key additions:

agents.yaml and tasks.yaml showing role, goal, backstory, description, and expected_output separation
A promptfooconfig.yaml setup that composes these modular prompts for evaluation
Sample test cases using llm-rubric assertions to benchmark prompt quality
README.md documentation explaining setup, usage, and evaluation flow
requirements.txt with all dependencies

… and fixes

coderabbitai · Oct 3, 2025

📝 Walkthrough

Walkthrough

Adds a new example directory demonstrating a CrewAI-style modular prompt setup for Promptfoo. Introduces agents.yaml and tasks.yaml to define an agent (trend_researcher) and a task (trend_identification_task). Implements composer.py to render templated fields and compose chat messages, with optional OpenAI chat completions integration driven by environment variables. Provides promptfooconfig.yaml configuring a file-based provider pointing to composer.py, default checks, and two example tests. Adds README with setup, usage, and troubleshooting, plus requirements.txt for dependencies.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Mixed file types (Python, YAML, Markdown) with one moderate Python module (composer.py) implementing config loading, templating, and optional API calls
New configuration and data files are straightforward; logic mostly centralized in composer.py
Heterogeneous changes require per-file context but limited overall complexity

Possibly related issues

Promptfoo-Crewai Integration #5146 — Implements a CrewAI-style modular prompt example with agents/tasks YAML, a composer provider, and Promptfoo integration as described in the issue.

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	The description clearly outlines the example’s purpose and enumerates the key files and features added, directly reflecting the changes in agents.yaml, tasks.yaml, promptfooconfig.yaml, README.md, and requirements.txt.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Title Check	✅ Passed	The title accurately reflects the main change by summarizing the addition of a modular CrewAI and Promptfoo evaluation example focused on agent-task prompt testing. It is concise, specific, and free of unnecessary details. A reviewer scanning the history can understand the primary purpose at a glance.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 6

🧹 Nitpick comments (5)

examples/crewai-promptfoo-modular/README.md (1)
80-86: Add a cleanup section to the README.

The README is missing instructions for cleaning up resources after running the example, which is required by the coding guidelines. Consider adding a section explaining how to remove generated files, stop any running processes, or reset the environment.

Add a cleanup section after the Troubleshooting section:
## Cleanup

After running the evaluation, you may want to clean up generated files:

```bash
# Remove evaluation results
rm -rf promptfoo-outputs/

# Remove cached data (if any)
rm -rf .promptfoo/
If you set environment variables temporarily, unset them:
unset OPENAI_API_KEY
As per coding guidelines.

</blockquote></details>
<details>
<summary>examples/crewai-promptfoo-modular/promptfooconfig.yaml (1)</summary><blockquote>

`4-11`: **Consider adding a mix of providers for comparison.**

The configuration currently uses only one provider. Per learnings, including a mix of providers when comparing model performance helps demonstrate Promptfoo's capabilities and provides users with comparative insights.



Consider adding alternative providers to the tests section:

```yaml
providers:
  - id: file://./composer.py
    label: "Composer Provider (GPT-4o-mini)"
    config:
      agent_id: trend_researcher
      task_id: trend_identification_task
      model: gpt-4o-mini
      temperature: 0.2
  
  - id: file://./composer.py
    label: "Composer Provider (Claude)"
    config:
      agent_id: trend_researcher
      task_id: trend_identification_task
      model: claude-3-5-sonnet-20241022
      temperature: 0.2
Then reference different providers in the tests to show model comparison.

Based on learnings.
examples/crewai-promptfoo-modular/composer.py (3)
5-10: Handle missing OPENAI_API_KEY more gracefully at import time.

The OpenAI client is initialized at module import (line 7) without checking if OPENAI_API_KEY is set. If the key is missing, the OpenAI SDK may use a default or raise an exception, potentially causing import failures even when OpenAI isn't needed.

Apply this diff to defer client initialization:
 try:
     from openai import OpenAI
-    oai_client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))
-    USE_OPENAI = True
+    USE_OPENAI = True
+    oai_client = None  # Initialize lazily
 except ImportError:
     USE_OPENAI = False
+    oai_client = None
Then in call_api, initialize the client when needed:
     try:
         if USE_OPENAI:
+            global oai_client
+            if oai_client is None:
+                api_key = os.getenv("OPENAI_API_KEY")
+                if not api_key:
+                    return {"error": "OPENAI_API_KEY environment variable not set"}
+                oai_client = OpenAI(api_key=api_key)
             response = oai_client.chat.completions.create(
59-79: Improve error handling specificity and consider rate limiting.

The broad exception handler on line 78 catches all exceptions and returns a generic error. Consider handling specific exception types (authentication errors, rate limits, network errors) to provide more actionable feedback to users.

Apply this diff:
     try:
         if USE_OPENAI:
+            if oai_client is None:
+                return {"error": "OpenAI client not initialized"}
             response = oai_client.chat.completions.create(
                 model=cfg.get("model", "gpt-4o-mini"),
                 messages=messages,
                 temperature=cfg.get("temperature", 0.2),
             )
             return {"output": response.choices[0].message.content}
         else:
             return {"error": "OpenAI not installed or configured."}
+    except Exception as e:
+        import openai
+        if isinstance(e, openai.AuthenticationError):
+            return {"error": "OpenAI authentication failed. Check OPENAI_API_KEY."}
+        elif isinstance(e, openai.RateLimitError):
+            return {"error": f"OpenAI rate limit exceeded: {e}"}
+        elif isinstance(e, openai.APIError):
+            return {"error": f"OpenAI API error: {e}"}
-    except Exception as e:
-        return {"error": f"Composer error: {e}"}
+        else:
+            return {"error": f"Composer error: {type(e).__name__}: {e}"}
1-3: Consider adding standard library and third-party import separation.

Following the Google Python Style Guide, separate standard library imports (os) from third-party imports (yaml) with a blank line.

Apply this diff:
 import os
+
 import yaml
 from typing import Any, Dict
As per coding guidelines (follow Google Python Style Guide).

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between cc94440 and 625e1d8.

📒 Files selected for processing (6)

examples/crewai-promptfoo-modular/README.md (1 hunks)
examples/crewai-promptfoo-modular/agents.yaml (1 hunks)
examples/crewai-promptfoo-modular/composer.py (1 hunks)
examples/crewai-promptfoo-modular/promptfooconfig.yaml (1 hunks)
examples/crewai-promptfoo-modular/requirements.txt (1 hunks)
examples/crewai-promptfoo-modular/tasks.yaml (1 hunks)

🧰 Additional context used

📓 Path-based instructions (8)

examples/*/README.md

📄 CodeRabbit inference engine (.cursor/rules/examples.mdc)

examples/*/README.md: The README.md must begin with the folder name as an H1 heading
Every example README must include instructions on how to run it with 'npx promptfoo@latest init --example example-name'
Include a comprehensive README.md that explains the purpose, prerequisites, instructions, and expected outputs for the example
Document any model-specific capabilities or limitations in examples
Clearly list all required environment variables at the beginning of the README
For each environment variable, explain its purpose, how to obtain it, and any default values or constraints in the README
Include a sample .env file or instructions when multiple environment variables are needed in the README
Document any required API keys or credentials in the README
Provide instructions for cleaning up resources after running the example in the README
When creating examples for specific providers, explain any provider-specific configuration in the README
When creating examples for specific providers, document required environment variables in the README
When creating examples for specific providers, include information about pricing or usage limits in the README
When creating examples for specific providers, highlight unique features or capabilities in the README
When creating examples for specific providers, compare to similar providers where appropriate in the README

Files:

examples/crewai-promptfoo-modular/README.md

examples/*/{README.md,promptfooconfig.yaml}

📄 CodeRabbit inference engine (.cursor/rules/examples.mdc)

Include placeholder values for secrets/credentials in the README or configuration files

Files:

examples/crewai-promptfoo-modular/README.md
examples/crewai-promptfoo-modular/promptfooconfig.yaml

{site/**,examples/**}

📄 CodeRabbit inference engine (.cursor/rules/gh-cli-workflow.mdc)

Any pull request that only touches files in 'site/' or 'examples/' directories must use the 'docs:' prefix in the PR title, not 'feat:' or 'fix:'

Files:

examples/crewai-promptfoo-modular/README.md
examples/crewai-promptfoo-modular/agents.yaml
examples/crewai-promptfoo-modular/tasks.yaml
examples/crewai-promptfoo-modular/promptfooconfig.yaml
examples/crewai-promptfoo-modular/requirements.txt
examples/crewai-promptfoo-modular/composer.py

examples/**

📄 CodeRabbit inference engine (.cursor/rules/gh-cli-workflow.mdc)

When modifying examples, update existing files in 'examples/' instead of adding new ones (e.g., replace outdated model IDs rather than introducing new example files)

Put examples in examples/ with a clear README.md

Files:

examples/crewai-promptfoo-modular/README.md
examples/crewai-promptfoo-modular/agents.yaml
examples/crewai-promptfoo-modular/tasks.yaml
examples/crewai-promptfoo-modular/promptfooconfig.yaml
examples/crewai-promptfoo-modular/requirements.txt
examples/crewai-promptfoo-modular/composer.py

examples/*/promptfooconfig.yaml

📄 CodeRabbit inference engine (.cursor/rules/examples.mdc)

examples/*/promptfooconfig.yaml: Include a working promptfooconfig.yaml (or equivalent) file in each example directory
Always include the YAML schema reference at the top of configuration files: '# yaml-language-server: $schema=https://promptfoo.dev/config-schema.json'
Follow the specified field order in all configuration files: description, env (optional), prompts, providers, defaultTest (optional), scenarios (optional), tests
Ensure all configuration files pass YAML lint validation
When referencing external files in configuration, always use the 'file://' prefix
Always use the latest model versions available in 2025 in configuration files
For OpenAI, prefer models like 'openai:o3-mini' and 'openai:gpt-4o-mini' in configuration files
For Anthropic, prefer models like 'anthropic:claude-3-7-sonnet-20250219' in configuration files
For open-source models, use the latest versions available (e.g., latest Llama) in configuration files
Include a mix of providers when comparing model performance in configuration files
When demonstrating specialized capabilities (vision, audio, etc.), use models that support those features in configuration files
Format configuration files consistently
When creating examples for specific providers, always use the latest available model versions for that provider in configuration files
Update model versions when new ones become available in configuration files

Files:

examples/crewai-promptfoo-modular/promptfooconfig.yaml

examples/**/requirements.txt

📄 CodeRabbit inference engine (.cursor/rules/python.mdc)

When adding Python examples, update relevant requirements.txt files

Files:

examples/crewai-promptfoo-modular/requirements.txt

**/*.py

📄 CodeRabbit inference engine (.cursor/rules/python.mdc)

**/*.py: Use Python 3.9 or later
Follow the Google Python Style Guide
Use type hints to improve code readability and catch potential errors
Use ruff for linting and formatting
Run ruff check --fix for general linting
Run ruff check --select I --fix for import sorting
Run ruff format for formatting
Keep the Python codebase simple and minimal, without unnecessary external dependencies
When implementing custom providers, prompts, or asserts in Python, follow the promptfoo API patterns

Files:

examples/crewai-promptfoo-modular/composer.py

examples/**/*.py

📄 CodeRabbit inference engine (.cursor/rules/python.mdc)

Place example Python code in the examples/ directory with appropriate documentation

Files:

examples/crewai-promptfoo-modular/composer.py

🧠 Learnings (11)

📓 Common learnings

Learnt from: CR
PR: promptfoo/promptfoo#0
File: .cursor/rules/examples.mdc:0-0
Timestamp: 2025-07-18T17:25:46.665Z
Learning: Applies to examples/*/promptfooconfig.yaml : Include a working promptfooconfig.yaml (or equivalent) file in each example directory

Learnt from: CR
PR: promptfoo/promptfoo#0
File: .cursor/rules/examples.mdc:0-0
Timestamp: 2025-07-18T17:25:38.444Z
Learning: Applies to examples/*/promptfooconfig*.yaml : Include a working promptfooconfig.yaml (or equivalent) file in each example

📚 Learning: 2025-07-18T17:25:38.444Z

Learnt from: CR
PR: promptfoo/promptfoo#0
File: .cursor/rules/examples.mdc:0-0
Timestamp: 2025-07-18T17:25:38.444Z
Learning: Applies to examples/*/promptfooconfig*.yaml : Follow the specific field order in all configuration files: description, env (optional), prompts, providers, defaultTest (optional), scenarios (optional), tests

Applied to files:

examples/crewai-promptfoo-modular/promptfooconfig.yaml

📚 Learning: 2025-07-18T17:25:38.444Z

Learnt from: CR
PR: promptfoo/promptfoo#0
File: .cursor/rules/examples.mdc:0-0
Timestamp: 2025-07-18T17:25:38.444Z
Learning: Applies to examples/*/promptfooconfig*.yaml : Include a working promptfooconfig.yaml (or equivalent) file in each example

Applied to files:

examples/crewai-promptfoo-modular/promptfooconfig.yaml

📚 Learning: 2025-07-18T17:25:46.665Z

Learnt from: CR
PR: promptfoo/promptfoo#0
File: .cursor/rules/examples.mdc:0-0
Timestamp: 2025-07-18T17:25:46.665Z
Learning: Applies to examples/*/promptfooconfig.yaml : Include a working promptfooconfig.yaml (or equivalent) file in each example directory

Applied to files:

examples/crewai-promptfoo-modular/promptfooconfig.yaml

📚 Learning: 2025-07-18T17:25:38.445Z

Learnt from: CR
PR: promptfoo/promptfoo#0
File: .cursor/rules/examples.mdc:0-0
Timestamp: 2025-07-18T17:25:38.445Z
Learning: Applies to examples/*/promptfooconfig*.yaml : Include a mix of providers when comparing model performance in configuration files

Applied to files:

examples/crewai-promptfoo-modular/promptfooconfig.yaml

📚 Learning: 2025-07-18T17:25:46.665Z

Learnt from: CR
PR: promptfoo/promptfoo#0
File: .cursor/rules/examples.mdc:0-0
Timestamp: 2025-07-18T17:25:46.665Z
Learning: Applies to examples/*/promptfooconfig.yaml : Format configuration files consistently

Applied to files:

examples/crewai-promptfoo-modular/promptfooconfig.yaml

📚 Learning: 2025-07-18T17:25:46.665Z

Learnt from: CR
PR: promptfoo/promptfoo#0
File: .cursor/rules/examples.mdc:0-0
Timestamp: 2025-07-18T17:25:46.665Z
Learning: Applies to examples/*/promptfooconfig.yaml : Always include the YAML schema reference at the top of configuration files: '# yaml-language-server: $schema=https://promptfoo.dev/config-schema.json'

Applied to files:

examples/crewai-promptfoo-modular/promptfooconfig.yaml

📚 Learning: 2025-07-18T17:25:38.445Z

Learnt from: CR
PR: promptfoo/promptfoo#0
File: .cursor/rules/examples.mdc:0-0
Timestamp: 2025-07-18T17:25:38.445Z
Learning: Applies to examples/*/promptfooconfig*.yaml : Format configuration files consistently

Applied to files:

examples/crewai-promptfoo-modular/promptfooconfig.yaml

📚 Learning: 2025-07-18T17:25:38.444Z

Learnt from: CR
PR: promptfoo/promptfoo#0
File: .cursor/rules/examples.mdc:0-0
Timestamp: 2025-07-18T17:25:38.444Z
Learning: Applies to examples/*/promptfooconfig*.yaml : Always include the YAML schema reference at the top of configuration files: '# yaml-language-server: $schema=https://promptfoo.dev/config-schema.json'

Applied to files:

examples/crewai-promptfoo-modular/promptfooconfig.yaml

📚 Learning: 2025-07-18T17:25:38.444Z

Learnt from: CR
PR: promptfoo/promptfoo#0
File: .cursor/rules/examples.mdc:0-0
Timestamp: 2025-07-18T17:25:38.444Z
Learning: Applies to examples/*/promptfooconfig*.yaml : For trivial test cases in configuration, make them quirky and fun to increase engagement

Applied to files:

examples/crewai-promptfoo-modular/promptfooconfig.yaml

📚 Learning: 2025-07-18T17:25:46.665Z

Learnt from: CR
PR: promptfoo/promptfoo#0
File: .cursor/rules/examples.mdc:0-0
Timestamp: 2025-07-18T17:25:46.665Z
Learning: Applies to examples/*/promptfooconfig.yaml : Follow the specified field order in all configuration files: description, env (optional), prompts, providers, defaultTest (optional), scenarios (optional), tests

Applied to files:

examples/crewai-promptfoo-modular/promptfooconfig.yaml

🧬 Code graph analysis (1)

examples/crewai-promptfoo-modular/composer.py (1)

examples/redteam-medical-agent/src/llm.js (1)

openai (11-13)

🔇 Additional comments (5)

examples/crewai-promptfoo-modular/tasks.yaml (1)

1-7: LGTM!

The task definition is well-structured with clear description, expected output format (max 10 bullet points), and proper agent linkage to trend_researcher.

examples/crewai-promptfoo-modular/README.md (1)

20-26: LGTM! Comprehensive prerequisites section.

The prerequisites are clearly listed with version requirements and explanations. The optional Ollama mention provides flexibility for users who want to avoid API costs.

examples/crewai-promptfoo-modular/agents.yaml (1)

1-8: LGTM! Well-structured agent definition.

The agent definition uses appropriate template variables ({{topic}}, {{input_text}}), provides clear role and goal statements, and includes a contextual backstory. The structure integrates correctly with the composer workflow.

examples/crewai-promptfoo-modular/composer.py (1)

16-20: LGTM! Clean template rendering implementation.

The render function provides straightforward variable substitution with proper type hints. The implementation is simple and effective for the use case.

examples/crewai-promptfoo-modular/requirements.txt (1)

2-2: Verified crewai 0.201.1 as the latest stable version on PyPI.

coderabbitai · Oct 3, 2025

examples/crewai-promptfoo-modular/composer.py

+def compose_messages(agent_id: str, task_id: str, vars: Dict[str, Any]):
+    """Build chat messages from YAML files."""
+    agents = yaml.safe_load(open(AGENTS_PATH))
+    tasks = yaml.safe_load(open(TASKS_PATH))
+
+    agent = agents[agent_id]
+    task = tasks[task_id]
+
+    role = render(agent["role"], vars)
+    goal = render(agent["goal"], vars)
+    backstory = render(agent["backstory"], vars)
+    description = render(task["description"], vars)
+    expected_output = render(task["expected_output"], vars)
+
+    system_message = f"""Role: {role}
+Goal: {goal}
+Backstory: {backstory}
+You must strictly follow the expected output format.
+"""
+
+    user_message = f"""Task description:
+{description}
+
+Input text:
+{vars.get("input_text", "")}
+
+Expected output:
+{expected_output}
+"""
+
+    return [
+        {"role": "system", "content": system_message},
+        {"role": "user", "content": user_message},
+    ]


🛠️ Refactor suggestion | 🟠 Major

Add return type hint and error handling for missing YAML keys.

The function is missing a return type hint (should be List[Dict[str, str]]), and doesn't handle cases where the YAML files are missing, malformed, or lack expected keys.

Apply this diff:

-def compose_messages(agent_id: str, task_id: str, vars: Dict[str, Any]): +def compose_messages(agent_id: str, task_id: str, vars: Dict[str, Any]) -> list[dict[str, str]]: """Build chat messages from YAML files.""" + try: - agents = yaml.safe_load(open(AGENTS_PATH)) - tasks = yaml.safe_load(open(TASKS_PATH)) + with open(AGENTS_PATH) as f: + agents = yaml.safe_load(f) + with open(TASKS_PATH) as f: + tasks = yaml.safe_load(f) + except FileNotFoundError as e: + raise FileNotFoundError(f"Required YAML file not found: {e.filename}") + except yaml.YAMLError as e: + raise ValueError(f"Invalid YAML format: {e}") - agent = agents[agent_id] - task = tasks[task_id] + if agent_id not in agents: + raise KeyError(f"Agent '{agent_id}' not found in {AGENTS_PATH}") + if task_id not in tasks: + raise KeyError(f"Task '{task_id}' not found in {TASKS_PATH}") + + agent = agents[agent_id] + task = tasks[task_id] + + required_agent_keys = ["role", "goal", "backstory"] + required_task_keys = ["description", "expected_output"] + + for key in required_agent_keys: + if key not in agent: + raise KeyError(f"Missing required key '{key}' in agent '{agent_id}'") + for key in required_task_keys: + if key not in task: + raise KeyError(f"Missing required key '{key}' in task '{task_id}'")

As per coding guidelines (use type hints, follow Google Python Style Guide).

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

def compose_messages(agent_id: str, task_id: str, vars: Dict[str, Any]):

"""Build chat messages from YAML files."""

agents = yaml.safe_load(open(AGENTS_PATH))

tasks = yaml.safe_load(open(TASKS_PATH))

agent = agents[agent_id]

task = tasks[task_id]

role = render(agent["role"], vars)

goal = render(agent["goal"], vars)

backstory = render(agent["backstory"], vars)

description = render(task["description"], vars)

expected_output = render(task["expected_output"], vars)

system_message = f"""Role: {role}

Goal: {goal}

Backstory: {backstory}

You must strictly follow the expected output format.

"""

user_message = f"""Task description:

{description}

Input text:

{vars.get("input_text", "")}

Expected output:

{expected_output}

"""

return [

{"role": "system", "content": system_message},

{"role": "user", "content": user_message},

]

def compose_messages(agent_id: str, task_id: str, vars: Dict[str, Any]) -> list[dict[str, str]]:

"""Build chat messages from YAML files."""

try:

with open(AGENTS_PATH) as f:

agents = yaml.safe_load(f)

with open(TASKS_PATH) as f:

tasks = yaml.safe_load(f)

except FileNotFoundError as e:

raise FileNotFoundError(f"Required YAML file not found: {e.filename}")

except yaml.YAMLError as e:

raise ValueError(f"Invalid YAML format: {e}")

if agent_id not in agents:

raise KeyError(f"Agent '{agent_id}' not found in {AGENTS_PATH}")

if task_id not in tasks:

raise KeyError(f"Task '{task_id}' not found in {TASKS_PATH}")

agent = agents[agent_id]

task = tasks[task_id]

required_agent_keys = ["role", "goal", "backstory"]

required_task_keys = ["description", "expected_output"]

for key in required_agent_keys:

if key not in agent:

raise KeyError(f"Missing required key '{key}' in agent '{agent_id}'")

for key in required_task_keys:

if key not in task:

raise KeyError(f"Missing required key '{key}' in task '{task_id}'")

role = render(agent["role"], vars)

goal = render(agent["goal"], vars)

backstory = render(agent["backstory"], vars)

description = render(task["description"], vars)

expected_output = render(task["expected_output"], vars)

system_message = f"""Role: {role}

Goal: {goal}

Backstory: {backstory}

You must strictly follow the expected output format.

"""

user_message = f"""Task description:

{description}

Input text:

{vars.get("input_text", "")}

Expected output:

{expected_output}

"""

return [

{"role": "system", "content": system_message},

{"role": "user", "content": user_message},

]

🤖 Prompt for AI Agents

In examples/crewai-promptfoo-modular/composer.py around lines 23 to 56, add a return type hint of List[Dict[str, str]] to compose_messages and harden YAML loading and key access: open AGENTS_PATH and TASKS_PATH with try/except to catch FileNotFoundError and yaml.YAMLError and raise a clear ValueError, validate that agent_id and task_id exist in the loaded dicts and that required keys ("role","goal","backstory" for agent and "description","expected_output" for task) are present, raising informative errors if missing or malformed, and ensure vars is typed/checked (Dict[str, Any]) before rendering so function always returns the expected List[Dict[str, str]].

coderabbitai · Oct 3, 2025

examples/crewai-promptfoo-modular/promptfooconfig.yaml

+# yaml-language-server: $schema=https://promptfoo.dev/config-schema.json
+description: "CrewAI modular agent-task evaluation (code-first)"
+
+providers:
+  - id: file://./composer.py
+    label: "Composer Provider"
+    config:
+      agent_id: trend_researcher
+      task_id: trend_identification_task
+      model: gpt-4.1
+      temperature: 0.2
+
+prompts:
+  - "{{input_text}}"
+
+defaultTest:
+  assert:
+    - type: llm-rubric
+      provider: openai:gpt-4.1
+      value: |
+        Evaluate the response for:
+        1. Instruction adherence (bullet points, <=10)
+        2. Topic focus ({{topic}})
+        3. Depth (not generic)
+        4. Grounding in input_text
+    - type: javascript
+      value: |
+        // Ensure bullet points exist
+        return /\n[-*•]\s/.test(String(output));
+
+tests:
+  - description: "Electric Vehicles Trend Analysis"
+    options:
+      provider: "Composer Provider"
+    vars:
+      topic: "Electric Vehicles"
+      input_text: |
+        The global EV market grew 35% year-over-year in 2024, driven by Asia and Europe...
+
+  - description: "AI in Healthcare Trend Analysis"
+    options:
+      provider: "Composer Provider"
+    vars:
+      topic: "Artificial Intelligence in Healthcare"
+      input_text: |
+        Hospitals are rapidly adopting AI diagnostic tools and pharmaceutical companies are investing in AI drug discovery...


⚠️ Potential issue | 🟡 Minor

Reorder fields to match the required configuration structure.

Per coding guidelines and learnings, the field order must be: description, env (optional), prompts, providers, defaultTest (optional), scenarios (optional), tests. Currently, providers appears before prompts.

Apply this diff to reorder:

# yaml-language-server: $schema=https://promptfoo.dev/config-schema.json description: "CrewAI modular agent-task evaluation (code-first)" +prompts: + - "{{input_text}}" + providers: - id: file://./composer.py label: "Composer Provider" config: agent_id: trend_researcher task_id: trend_identification_task model: gpt-4.1 temperature: 0.2 -prompts: - - "{{input_text}}" - defaultTest:

Based on learnings.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

# yaml-language-server: $schema=https://promptfoo.dev/config-schema.json

description: "CrewAI modular agent-task evaluation (code-first)"

providers:

- id: file://./composer.py

label: "Composer Provider"

config:

agent_id: trend_researcher

task_id: trend_identification_task

model: gpt-4.1

temperature: 0.2

prompts:

- "{{input_text}}"

defaultTest:

assert:

- type: llm-rubric

provider: openai:gpt-4.1

value: |

Evaluate the response for:

1. Instruction adherence (bullet points, <=10)

2. Topic focus ({{topic}})

3. Depth (not generic)

4. Grounding in input_text

- type: javascript

value: |

// Ensure bullet points exist

return /\n[-*•]\s/.test(String(output));

tests:

- description: "Electric Vehicles Trend Analysis"

options:

provider: "Composer Provider"

vars:

topic: "Electric Vehicles"

input_text: |

The global EV market grew 35% year-over-year in 2024, driven by Asia and Europe...

- description: "AI in Healthcare Trend Analysis"

options:

provider: "Composer Provider"

vars:

topic: "Artificial Intelligence in Healthcare"

input_text: |

Hospitals are rapidly adopting AI diagnostic tools and pharmaceutical companies are investing in AI drug discovery...

# yaml-language-server: $schema=https://promptfoo.dev/config-schema.json

description: "CrewAI modular agent-task evaluation (code-first)"

prompts:

- "{{input_text}}"

providers:

- id: file://./composer.py

label: "Composer Provider"

config:

agent_id: trend_researcher

task_id: trend_identification_task

model: gpt-4.1

temperature: 0.2

defaultTest:

assert:

- type: llm-rubric

provider: openai:gpt-4.1

value: |

Evaluate the response for:

1. Instruction adherence (bullet points, <=10)

2. Topic focus ({{topic}})

3. Depth (not generic)

4. Grounding in input_text

- type: javascript

value: |

// Ensure bullet points exist

return /\n[-*•]\s/.test(String(output));

tests:

- description: "Electric Vehicles Trend Analysis"

options:

provider: "Composer Provider"

vars:

topic: "Electric Vehicles"

input_text: |

The global EV market grew 35% year-over-year in 2024, driven by Asia and Europe...

- description: "AI in Healthcare Trend Analysis"

options:

provider: "Composer Provider"

vars:

topic: "Artificial Intelligence in Healthcare"

input_text: |

Hospitals are rapidly adopting AI diagnostic tools and pharmaceutical companies are investing in AI drug discovery...

🤖 Prompt for AI Agents

In examples/crewai-promptfoo-modular/promptfooconfig.yaml around lines 1-46, the top-level fields are out of the required order (providers appears before prompts); reorder the YAML so the top-level keys follow: description, env (if present), prompts, providers, defaultTest (if present), scenarios (if present), tests — specifically move the prompts block to appear before the providers block while preserving all existing content and indentation and leaving provider, defaultTest and tests unchanged.

coderabbitai · Oct 3, 2025

examples/crewai-promptfoo-modular/promptfooconfig.yaml

+    config:
+      agent_id: trend_researcher
+      task_id: trend_identification_task
+      model: gpt-4.1


⚠️ Potential issue | 🟠 Major

Use a valid OpenAI model identifier.

gpt-4.1 is not a valid OpenAI model name. According to coding guidelines, prefer latest 2025 models like openai:gpt-4o-mini or openai:o3-mini. Note that composer.py defaults to gpt-4o-mini if no model is specified.

Apply this diff:

agent_id: trend_researcher task_id: trend_identification_task - model: gpt-4.1 + model: gpt-4o-mini temperature: 0.2

As per coding guidelines.

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents

In examples/crewai-promptfoo-modular/promptfooconfig.yaml around line 10, the model field uses an invalid identifier "gpt-4.1"; replace it with a valid OpenAI model identifier such as "openai:gpt-4o-mini" or "openai:o3-mini" (include the "openai:" prefix), or remove the field to let composer.py default to gpt-4o-mini; update the value accordingly so it conforms to the coding guidelines.

coderabbitai · Oct 3, 2025

examples/crewai-promptfoo-modular/promptfooconfig.yaml

+defaultTest:
+  assert:
+    - type: llm-rubric
+      provider: openai:gpt-4.1


⚠️ Potential issue | 🟠 Major

Use a valid OpenAI model identifier in the rubric provider.

openai:gpt-4.1 is not a valid model. Use a current model like openai:gpt-4o or openai:gpt-4o-mini.

Apply this diff:

assert: - type: llm-rubric - provider: openai:gpt-4.1 + provider: openai:gpt-4o value: |

As per coding guidelines.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

provider: openai:gpt-4.1

assert:

- type: llm-rubric

provider: openai:gpt-4o

value: |

🤖 Prompt for AI Agents

In examples/crewai-promptfoo-modular/promptfooconfig.yaml around line 19, the rubric provider uses an invalid OpenAI model identifier `openai:gpt-4.1`; replace it with a supported model such as `openai:gpt-4o` or `openai:gpt-4o-mini` to comply with current OpenAI model names, updating the provider line accordingly and saving the file.

coderabbitai · Oct 3, 2025

examples/crewai-promptfoo-modular/README.md

@@ -0,0 +1,86 @@
+# crewai-modular


⚠️ Potential issue | 🟡 Minor

Fix the H1 heading to match the folder name.

The heading is crewai-modular but the folder name is crewai-promptfoo-modular. Per coding guidelines, the README.md must begin with the folder name as an H1 heading.

Apply this diff:

-# crewai-modular +# crewai-promptfoo-modular

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

# crewai-modular

# crewai-promptfoo-modular

🤖 Prompt for AI Agents

In examples/crewai-promptfoo-modular/README.md around line 1, the H1 heading currently reads "crewai-modular" but must match the folder name; replace the existing H1 with an H1 that exactly matches the folder name crewai-promptfoo-modular (i.e., update the first line to be the folder name as the H1).

coderabbitai · Oct 3, 2025

examples/crewai-promptfoo-modular/requirements.txt

+# Optional: Environment variable management from .env files
+# Uncomment if you want to use .env files for API keys
+python-dotenv>=1.1.1


⚠️ Potential issue | 🟡 Minor

Inconsistent comment about python-dotenv being optional.

The comment says "Uncomment if you want to use .env files" but the dependency is already uncommented on line 12. This contradicts the README which instructs users to "uncomment python-dotenv in requirements.txt" for .env file support.

Apply this diff to align the comment with the actual state:

-# Optional: Environment variable management from .env files -# Uncomment if you want to use .env files for API keys -python-dotenv>=1.1.1 +# Environment variable management from .env files +# Comment out if you don't need .env file support +python-dotenv>=1.1.1

Alternatively, if you want to make it truly optional by default, comment it out:

-# Optional: Environment variable management from .env files -# Uncomment if you want to use .env files for API keys -python-dotenv>=1.1.1 +# Optional: Environment variable management from .env files +# Uncomment the line below if you want to use .env files for API keys +# python-dotenv>=1.1.1

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

# Optional: Environment variable management from .env files

# Uncomment if you want to use .env files for API keys

python-dotenv>=1.1.1

# Environment variable management from .env files

# Comment out if you don't need .env file support

python-dotenv>=1.1.1

🤖 Prompt for AI Agents

In examples/crewai-promptfoo-modular/requirements.txt around lines 10 to 12, the comment states to "Uncomment if you want to use .env files" while python-dotenv>=1.1.1 is already uncommented; update the file so the comment and dependency state match the README by either commenting out the python-dotenv line (prefix with #) to make it optional by default, or change the comment to indicate the dependency is included by default — pick one approach and apply it consistently.

Ayush7614 · Oct 3, 2025

cc @mldangelo

Ayush7614 and others added 5 commits September 28, 2025 18:48

refine and extend e2b sandbox evaluation guide with improved examples…

455adec

… and fixes

chore: apply Ruff fixes and formatting

70ffab6

style(prettier): fix YAML formatting in e2b promptfooconfig

e9249d3

docs Add modular CrewAI + Promptfoo setup

21f31ba

Merge branch 'main' into ayushnew

625e1d8

Ayush7614 changed the title ~~Add modular CrewAI + Promptfoo evaluation example (agent-task prompt testing)~~ docs: add modular CrewAI + Promptfoo evaluation example (agent-task prompt testing) Oct 3, 2025

coderabbitai bot reviewed Oct 3, 2025

View reviewed changes

style(prettier): format modular CrewAI README and config

28ce207

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

docs: add modular CrewAI + Promptfoo evaluation example (agent-task prompt testing) #5807

docs: add modular CrewAI + Promptfoo evaluation example (agent-task prompt testing) #5807

Uh oh!

Ayush7614 commented Oct 3, 2025

Uh oh!

coderabbitai bot commented Oct 3, 2025 •

edited

Loading

Walkthrough

Estimated code review effort

Possibly related issues

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Oct 3, 2025

Uh oh!

coderabbitai bot Oct 3, 2025

Uh oh!

coderabbitai bot Oct 3, 2025

Uh oh!

coderabbitai bot Oct 3, 2025

Uh oh!

coderabbitai bot Oct 3, 2025

Uh oh!

coderabbitai bot Oct 3, 2025

Uh oh!

Ayush7614 commented Oct 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Search code, repositories, users, issues, pull requests...

Uh oh!

docs: add modular CrewAI + Promptfoo evaluation example (agent-task prompt testing) #5807

Are you sure you want to change the base?

docs: add modular CrewAI + Promptfoo evaluation example (agent-task prompt testing) #5807

Uh oh!

Conversation

Ayush7614 commented Oct 3, 2025

Key additions:

Uh oh!

coderabbitai bot commented Oct 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Estimated code review effort

Possibly related issues

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 3, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 3, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 3, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 3, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 3, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 3, 2025

Choose a reason for hiding this comment

Uh oh!

Ayush7614 commented Oct 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coderabbitai bot commented Oct 3, 2025 •

edited

Loading