Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Conversation

Ayush7614
Copy link
Contributor

This PR introduces a new example demonstrating how to evaluate modular agent-task prompts in the CrewAI framework using Promptfoo.

Key additions:

  • agents.yaml and tasks.yaml showing role, goal, backstory, description, and expected_output separation
  • A promptfooconfig.yaml setup that composes these modular prompts for evaluation
  • Sample test cases using llm-rubric assertions to benchmark prompt quality
  • README.md documentation explaining setup, usage, and evaluation flow
  • requirements.txt with all dependencies

@Ayush7614 Ayush7614 changed the title Add modular CrewAI + Promptfoo evaluation example (agent-task prompt testing) docs: add modular CrewAI + Promptfoo evaluation example (agent-task prompt testing) Oct 3, 2025
Copy link
Contributor

coderabbitai bot commented Oct 3, 2025

📝 Walkthrough

Walkthrough

Adds a new example directory demonstrating a CrewAI-style modular prompt setup for Promptfoo. Introduces agents.yaml and tasks.yaml to define an agent (trend_researcher) and a task (trend_identification_task). Implements composer.py to render templated fields and compose chat messages, with optional OpenAI chat completions integration driven by environment variables. Provides promptfooconfig.yaml configuring a file-based provider pointing to composer.py, default checks, and two example tests. Adds README with setup, usage, and troubleshooting, plus requirements.txt for dependencies.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

  • Mixed file types (Python, YAML, Markdown) with one moderate Python module (composer.py) implementing config loading, templating, and optional API calls
  • New configuration and data files are straightforward; logic mostly centralized in composer.py
  • Heterogeneous changes require per-file context but limited overall complexity

Possibly related issues

  • Promptfoo-Crewai Integration #5146 — Implements a CrewAI-style modular prompt example with agents/tasks YAML, a composer provider, and Promptfoo integration as described in the issue.

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed The description clearly outlines the example’s purpose and enumerates the key files and features added, directly reflecting the changes in agents.yaml, tasks.yaml, promptfooconfig.yaml, README.md, and requirements.txt.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Title Check ✅ Passed The title accurately reflects the main change by summarizing the addition of a modular CrewAI and Promptfoo evaluation example focused on agent-task prompt testing. It is concise, specific, and free of unnecessary details. A reviewer scanning the history can understand the primary purpose at a glance.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🧹 Nitpick comments (5)
examples/crewai-promptfoo-modular/README.md (1)

80-86: Add a cleanup section to the README.

The README is missing instructions for cleaning up resources after running the example, which is required by the coding guidelines. Consider adding a section explaining how to remove generated files, stop any running processes, or reset the environment.

Add a cleanup section after the Troubleshooting section:

## Cleanup

After running the evaluation, you may want to clean up generated files:

```bash
# Remove evaluation results
rm -rf promptfoo-outputs/

# Remove cached data (if any)
rm -rf .promptfoo/

If you set environment variables temporarily, unset them:

unset OPENAI_API_KEY

As per coding guidelines.

</blockquote></details>
<details>
<summary>examples/crewai-promptfoo-modular/promptfooconfig.yaml (1)</summary><blockquote>

`4-11`: **Consider adding a mix of providers for comparison.**

The configuration currently uses only one provider. Per learnings, including a mix of providers when comparing model performance helps demonstrate Promptfoo's capabilities and provides users with comparative insights.



Consider adding alternative providers to the tests section:

```yaml
providers:
  - id: file://./composer.py
    label: "Composer Provider (GPT-4o-mini)"
    config:
      agent_id: trend_researcher
      task_id: trend_identification_task
      model: gpt-4o-mini
      temperature: 0.2
  
  - id: file://./composer.py
    label: "Composer Provider (Claude)"
    config:
      agent_id: trend_researcher
      task_id: trend_identification_task
      model: claude-3-5-sonnet-20241022
      temperature: 0.2

Then reference different providers in the tests to show model comparison.

Based on learnings.

examples/crewai-promptfoo-modular/composer.py (3)

5-10: Handle missing OPENAI_API_KEY more gracefully at import time.

The OpenAI client is initialized at module import (line 7) without checking if OPENAI_API_KEY is set. If the key is missing, the OpenAI SDK may use a default or raise an exception, potentially causing import failures even when OpenAI isn't needed.

Apply this diff to defer client initialization:

 try:
     from openai import OpenAI
-    oai_client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))
-    USE_OPENAI = True
+    USE_OPENAI = True
+    oai_client = None  # Initialize lazily
 except ImportError:
     USE_OPENAI = False
+    oai_client = None

Then in call_api, initialize the client when needed:

     try:
         if USE_OPENAI:
+            global oai_client
+            if oai_client is None:
+                api_key = os.getenv("OPENAI_API_KEY")
+                if not api_key:
+                    return {"error": "OPENAI_API_KEY environment variable not set"}
+                oai_client = OpenAI(api_key=api_key)
             response = oai_client.chat.completions.create(

59-79: Improve error handling specificity and consider rate limiting.

The broad exception handler on line 78 catches all exceptions and returns a generic error. Consider handling specific exception types (authentication errors, rate limits, network errors) to provide more actionable feedback to users.

Apply this diff:

     try:
         if USE_OPENAI:
+            if oai_client is None:
+                return {"error": "OpenAI client not initialized"}
             response = oai_client.chat.completions.create(
                 model=cfg.get("model", "gpt-4o-mini"),
                 messages=messages,
                 temperature=cfg.get("temperature", 0.2),
             )
             return {"output": response.choices[0].message.content}
         else:
             return {"error": "OpenAI not installed or configured."}
+    except Exception as e:
+        import openai
+        if isinstance(e, openai.AuthenticationError):
+            return {"error": "OpenAI authentication failed. Check OPENAI_API_KEY."}
+        elif isinstance(e, openai.RateLimitError):
+            return {"error": f"OpenAI rate limit exceeded: {e}"}
+        elif isinstance(e, openai.APIError):
+            return {"error": f"OpenAI API error: {e}"}
-    except Exception as e:
-        return {"error": f"Composer error: {e}"}
+        else:
+            return {"error": f"Composer error: {type(e).__name__}: {e}"}

1-3: Consider adding standard library and third-party import separation.

Following the Google Python Style Guide, separate standard library imports (os) from third-party imports (yaml) with a blank line.

Apply this diff:

 import os
+
 import yaml
 from typing import Any, Dict

As per coding guidelines (follow Google Python Style Guide).

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between cc94440 and 625e1d8.

📒 Files selected for processing (6)
  • examples/crewai-promptfoo-modular/README.md (1 hunks)
  • examples/crewai-promptfoo-modular/agents.yaml (1 hunks)
  • examples/crewai-promptfoo-modular/composer.py (1 hunks)
  • examples/crewai-promptfoo-modular/promptfooconfig.yaml (1 hunks)
  • examples/crewai-promptfoo-modular/requirements.txt (1 hunks)
  • examples/crewai-promptfoo-modular/tasks.yaml (1 hunks)
🧰 Additional context used
📓 Path-based instructions (8)
examples/*/README.md

📄 CodeRabbit inference engine (.cursor/rules/examples.mdc)

examples/*/README.md: The README.md must begin with the folder name as an H1 heading
Every example README must include instructions on how to run it with 'npx promptfoo@latest init --example example-name'
Include a comprehensive README.md that explains the purpose, prerequisites, instructions, and expected outputs for the example
Document any model-specific capabilities or limitations in examples
Clearly list all required environment variables at the beginning of the README
For each environment variable, explain its purpose, how to obtain it, and any default values or constraints in the README
Include a sample .env file or instructions when multiple environment variables are needed in the README
Document any required API keys or credentials in the README
Provide instructions for cleaning up resources after running the example in the README
When creating examples for specific providers, explain any provider-specific configuration in the README
When creating examples for specific providers, document required environment variables in the README
When creating examples for specific providers, include information about pricing or usage limits in the README
When creating examples for specific providers, highlight unique features or capabilities in the README
When creating examples for specific providers, compare to similar providers where appropriate in the README

Files:

  • examples/crewai-promptfoo-modular/README.md
examples/*/{README.md,promptfooconfig.yaml}

📄 CodeRabbit inference engine (.cursor/rules/examples.mdc)

Include placeholder values for secrets/credentials in the README or configuration files

Files:

  • examples/crewai-promptfoo-modular/README.md
  • examples/crewai-promptfoo-modular/promptfooconfig.yaml
{site/**,examples/**}

📄 CodeRabbit inference engine (.cursor/rules/gh-cli-workflow.mdc)

Any pull request that only touches files in 'site/' or 'examples/' directories must use the 'docs:' prefix in the PR title, not 'feat:' or 'fix:'

Files:

  • examples/crewai-promptfoo-modular/README.md
  • examples/crewai-promptfoo-modular/agents.yaml
  • examples/crewai-promptfoo-modular/tasks.yaml
  • examples/crewai-promptfoo-modular/promptfooconfig.yaml
  • examples/crewai-promptfoo-modular/requirements.txt
  • examples/crewai-promptfoo-modular/composer.py
examples/**

📄 CodeRabbit inference engine (.cursor/rules/gh-cli-workflow.mdc)

When modifying examples, update existing files in 'examples/' instead of adding new ones (e.g., replace outdated model IDs rather than introducing new example files)

Put examples in examples/ with a clear README.md

Files:

  • examples/crewai-promptfoo-modular/README.md
  • examples/crewai-promptfoo-modular/agents.yaml
  • examples/crewai-promptfoo-modular/tasks.yaml
  • examples/crewai-promptfoo-modular/promptfooconfig.yaml
  • examples/crewai-promptfoo-modular/requirements.txt
  • examples/crewai-promptfoo-modular/composer.py
examples/*/promptfooconfig.yaml

📄 CodeRabbit inference engine (.cursor/rules/examples.mdc)

examples/*/promptfooconfig.yaml: Include a working promptfooconfig.yaml (or equivalent) file in each example directory
Always include the YAML schema reference at the top of configuration files: '# yaml-language-server: $schema=https://promptfoo.dev/config-schema.json'
Follow the specified field order in all configuration files: description, env (optional), prompts, providers, defaultTest (optional), scenarios (optional), tests
Ensure all configuration files pass YAML lint validation
When referencing external files in configuration, always use the 'file://' prefix
Always use the latest model versions available in 2025 in configuration files
For OpenAI, prefer models like 'openai:o3-mini' and 'openai:gpt-4o-mini' in configuration files
For Anthropic, prefer models like 'anthropic:claude-3-7-sonnet-20250219' in configuration files
For open-source models, use the latest versions available (e.g., latest Llama) in configuration files
Include a mix of providers when comparing model performance in configuration files
When demonstrating specialized capabilities (vision, audio, etc.), use models that support those features in configuration files
Format configuration files consistently
When creating examples for specific providers, always use the latest available model versions for that provider in configuration files
Update model versions when new ones become available in configuration files

Files:

  • examples/crewai-promptfoo-modular/promptfooconfig.yaml
examples/**/requirements.txt

📄 CodeRabbit inference engine (.cursor/rules/python.mdc)

When adding Python examples, update relevant requirements.txt files

Files:

  • examples/crewai-promptfoo-modular/requirements.txt
**/*.py

📄 CodeRabbit inference engine (.cursor/rules/python.mdc)

**/*.py: Use Python 3.9 or later
Follow the Google Python Style Guide
Use type hints to improve code readability and catch potential errors
Use ruff for linting and formatting
Run ruff check --fix for general linting
Run ruff check --select I --fix for import sorting
Run ruff format for formatting
Keep the Python codebase simple and minimal, without unnecessary external dependencies
When implementing custom providers, prompts, or asserts in Python, follow the promptfoo API patterns

Files:

  • examples/crewai-promptfoo-modular/composer.py
examples/**/*.py

📄 CodeRabbit inference engine (.cursor/rules/python.mdc)

Place example Python code in the examples/ directory with appropriate documentation

Files:

  • examples/crewai-promptfoo-modular/composer.py
🧠 Learnings (11)
📓 Common learnings
Learnt from: CR
PR: promptfoo/promptfoo#0
File: .cursor/rules/examples.mdc:0-0
Timestamp: 2025-07-18T17:25:46.665Z
Learning: Applies to examples/*/promptfooconfig.yaml : Include a working promptfooconfig.yaml (or equivalent) file in each example directory
Learnt from: CR
PR: promptfoo/promptfoo#0
File: .cursor/rules/examples.mdc:0-0
Timestamp: 2025-07-18T17:25:38.444Z
Learning: Applies to examples/*/promptfooconfig*.yaml : Include a working promptfooconfig.yaml (or equivalent) file in each example
📚 Learning: 2025-07-18T17:25:38.444Z
Learnt from: CR
PR: promptfoo/promptfoo#0
File: .cursor/rules/examples.mdc:0-0
Timestamp: 2025-07-18T17:25:38.444Z
Learning: Applies to examples/*/promptfooconfig*.yaml : Follow the specific field order in all configuration files: description, env (optional), prompts, providers, defaultTest (optional), scenarios (optional), tests

Applied to files:

  • examples/crewai-promptfoo-modular/promptfooconfig.yaml
📚 Learning: 2025-07-18T17:25:38.444Z
Learnt from: CR
PR: promptfoo/promptfoo#0
File: .cursor/rules/examples.mdc:0-0
Timestamp: 2025-07-18T17:25:38.444Z
Learning: Applies to examples/*/promptfooconfig*.yaml : Include a working promptfooconfig.yaml (or equivalent) file in each example

Applied to files:

  • examples/crewai-promptfoo-modular/promptfooconfig.yaml
📚 Learning: 2025-07-18T17:25:46.665Z
Learnt from: CR
PR: promptfoo/promptfoo#0
File: .cursor/rules/examples.mdc:0-0
Timestamp: 2025-07-18T17:25:46.665Z
Learning: Applies to examples/*/promptfooconfig.yaml : Include a working promptfooconfig.yaml (or equivalent) file in each example directory

Applied to files:

  • examples/crewai-promptfoo-modular/promptfooconfig.yaml
📚 Learning: 2025-07-18T17:25:38.445Z
Learnt from: CR
PR: promptfoo/promptfoo#0
File: .cursor/rules/examples.mdc:0-0
Timestamp: 2025-07-18T17:25:38.445Z
Learning: Applies to examples/*/promptfooconfig*.yaml : Include a mix of providers when comparing model performance in configuration files

Applied to files:

  • examples/crewai-promptfoo-modular/promptfooconfig.yaml
📚 Learning: 2025-07-18T17:25:46.665Z
Learnt from: CR
PR: promptfoo/promptfoo#0
File: .cursor/rules/examples.mdc:0-0
Timestamp: 2025-07-18T17:25:46.665Z
Learning: Applies to examples/*/promptfooconfig.yaml : Format configuration files consistently

Applied to files:

  • examples/crewai-promptfoo-modular/promptfooconfig.yaml
📚 Learning: 2025-07-18T17:25:46.665Z
Learnt from: CR
PR: promptfoo/promptfoo#0
File: .cursor/rules/examples.mdc:0-0
Timestamp: 2025-07-18T17:25:46.665Z
Learning: Applies to examples/*/promptfooconfig.yaml : Always include the YAML schema reference at the top of configuration files: '# yaml-language-server: $schema=https://promptfoo.dev/config-schema.json'

Applied to files:

  • examples/crewai-promptfoo-modular/promptfooconfig.yaml
📚 Learning: 2025-07-18T17:25:38.445Z
Learnt from: CR
PR: promptfoo/promptfoo#0
File: .cursor/rules/examples.mdc:0-0
Timestamp: 2025-07-18T17:25:38.445Z
Learning: Applies to examples/*/promptfooconfig*.yaml : Format configuration files consistently

Applied to files:

  • examples/crewai-promptfoo-modular/promptfooconfig.yaml
📚 Learning: 2025-07-18T17:25:38.444Z
Learnt from: CR
PR: promptfoo/promptfoo#0
File: .cursor/rules/examples.mdc:0-0
Timestamp: 2025-07-18T17:25:38.444Z
Learning: Applies to examples/*/promptfooconfig*.yaml : Always include the YAML schema reference at the top of configuration files: '# yaml-language-server: $schema=https://promptfoo.dev/config-schema.json'

Applied to files:

  • examples/crewai-promptfoo-modular/promptfooconfig.yaml
📚 Learning: 2025-07-18T17:25:38.444Z
Learnt from: CR
PR: promptfoo/promptfoo#0
File: .cursor/rules/examples.mdc:0-0
Timestamp: 2025-07-18T17:25:38.444Z
Learning: Applies to examples/*/promptfooconfig*.yaml : For trivial test cases in configuration, make them quirky and fun to increase engagement

Applied to files:

  • examples/crewai-promptfoo-modular/promptfooconfig.yaml
📚 Learning: 2025-07-18T17:25:46.665Z
Learnt from: CR
PR: promptfoo/promptfoo#0
File: .cursor/rules/examples.mdc:0-0
Timestamp: 2025-07-18T17:25:46.665Z
Learning: Applies to examples/*/promptfooconfig.yaml : Follow the specified field order in all configuration files: description, env (optional), prompts, providers, defaultTest (optional), scenarios (optional), tests

Applied to files:

  • examples/crewai-promptfoo-modular/promptfooconfig.yaml
🧬 Code graph analysis (1)
examples/crewai-promptfoo-modular/composer.py (1)
examples/redteam-medical-agent/src/llm.js (1)
  • openai (11-13)
🔇 Additional comments (5)
examples/crewai-promptfoo-modular/tasks.yaml (1)

1-7: LGTM!

The task definition is well-structured with clear description, expected output format (max 10 bullet points), and proper agent linkage to trend_researcher.

examples/crewai-promptfoo-modular/README.md (1)

20-26: LGTM! Comprehensive prerequisites section.

The prerequisites are clearly listed with version requirements and explanations. The optional Ollama mention provides flexibility for users who want to avoid API costs.

examples/crewai-promptfoo-modular/agents.yaml (1)

1-8: LGTM! Well-structured agent definition.

The agent definition uses appropriate template variables ({{topic}}, {{input_text}}), provides clear role and goal statements, and includes a contextual backstory. The structure integrates correctly with the composer workflow.

examples/crewai-promptfoo-modular/composer.py (1)

16-20: LGTM! Clean template rendering implementation.

The render function provides straightforward variable substitution with proper type hints. The implementation is simple and effective for the use case.

examples/crewai-promptfoo-modular/requirements.txt (1)

2-2: Verified crewai 0.201.1 as the latest stable version on PyPI.

Comment on lines +23 to +56
def compose_messages(agent_id: str, task_id: str, vars: Dict[str, Any]):
"""Build chat messages from YAML files."""
agents = yaml.safe_load(open(AGENTS_PATH))
tasks = yaml.safe_load(open(TASKS_PATH))

agent = agents[agent_id]
task = tasks[task_id]

role = render(agent["role"], vars)
goal = render(agent["goal"], vars)
backstory = render(agent["backstory"], vars)
description = render(task["description"], vars)
expected_output = render(task["expected_output"], vars)

system_message = f"""Role: {role}
Goal: {goal}
Backstory: {backstory}
You must strictly follow the expected output format.
"""

user_message = f"""Task description:
{description}

Input text:
{vars.get("input_text", "")}

Expected output:
{expected_output}
"""

return [
{"role": "system", "content": system_message},
{"role": "user", "content": user_message},
]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Add return type hint and error handling for missing YAML keys.

The function is missing a return type hint (should be List[Dict[str, str]]), and doesn't handle cases where the YAML files are missing, malformed, or lack expected keys.

Apply this diff:

-def compose_messages(agent_id: str, task_id: str, vars: Dict[str, Any]):
+def compose_messages(agent_id: str, task_id: str, vars: Dict[str, Any]) -> list[dict[str, str]]:
     """Build chat messages from YAML files."""
+    try:
-    agents = yaml.safe_load(open(AGENTS_PATH))
-    tasks = yaml.safe_load(open(TASKS_PATH))
+        with open(AGENTS_PATH) as f:
+            agents = yaml.safe_load(f)
+        with open(TASKS_PATH) as f:
+            tasks = yaml.safe_load(f)
+    except FileNotFoundError as e:
+        raise FileNotFoundError(f"Required YAML file not found: {e.filename}")
+    except yaml.YAMLError as e:
+        raise ValueError(f"Invalid YAML format: {e}")
 
-    agent = agents[agent_id]
-    task = tasks[task_id]
+    if agent_id not in agents:
+        raise KeyError(f"Agent '{agent_id}' not found in {AGENTS_PATH}")
+    if task_id not in tasks:
+        raise KeyError(f"Task '{task_id}' not found in {TASKS_PATH}")
+    
+    agent = agents[agent_id]
+    task = tasks[task_id]
+    
+    required_agent_keys = ["role", "goal", "backstory"]
+    required_task_keys = ["description", "expected_output"]
+    
+    for key in required_agent_keys:
+        if key not in agent:
+            raise KeyError(f"Missing required key '{key}' in agent '{agent_id}'")
+    for key in required_task_keys:
+        if key not in task:
+            raise KeyError(f"Missing required key '{key}' in task '{task_id}'")

As per coding guidelines (use type hints, follow Google Python Style Guide).

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def compose_messages(agent_id: str, task_id: str, vars: Dict[str, Any]):
"""Build chat messages from YAML files."""
agents = yaml.safe_load(open(AGENTS_PATH))
tasks = yaml.safe_load(open(TASKS_PATH))
agent = agents[agent_id]
task = tasks[task_id]
role = render(agent["role"], vars)
goal = render(agent["goal"], vars)
backstory = render(agent["backstory"], vars)
description = render(task["description"], vars)
expected_output = render(task["expected_output"], vars)
system_message = f"""Role: {role}
Goal: {goal}
Backstory: {backstory}
You must strictly follow the expected output format.
"""
user_message = f"""Task description:
{description}
Input text:
{vars.get("input_text", "")}
Expected output:
{expected_output}
"""
return [
{"role": "system", "content": system_message},
{"role": "user", "content": user_message},
]
def compose_messages(agent_id: str, task_id: str, vars: Dict[str, Any]) -> list[dict[str, str]]:
"""Build chat messages from YAML files."""
try:
with open(AGENTS_PATH) as f:
agents = yaml.safe_load(f)
with open(TASKS_PATH) as f:
tasks = yaml.safe_load(f)
except FileNotFoundError as e:
raise FileNotFoundError(f"Required YAML file not found: {e.filename}")
except yaml.YAMLError as e:
raise ValueError(f"Invalid YAML format: {e}")
if agent_id not in agents:
raise KeyError(f"Agent '{agent_id}' not found in {AGENTS_PATH}")
if task_id not in tasks:
raise KeyError(f"Task '{task_id}' not found in {TASKS_PATH}")
agent = agents[agent_id]
task = tasks[task_id]
required_agent_keys = ["role", "goal", "backstory"]
required_task_keys = ["description", "expected_output"]
for key in required_agent_keys:
if key not in agent:
raise KeyError(f"Missing required key '{key}' in agent '{agent_id}'")
for key in required_task_keys:
if key not in task:
raise KeyError(f"Missing required key '{key}' in task '{task_id}'")
role = render(agent["role"], vars)
goal = render(agent["goal"], vars)
backstory = render(agent["backstory"], vars)
description = render(task["description"], vars)
expected_output = render(task["expected_output"], vars)
system_message = f"""Role: {role}
Goal: {goal}
Backstory: {backstory}
You must strictly follow the expected output format.
"""
user_message = f"""Task description:
{description}
Input text:
{vars.get("input_text", "")}
Expected output:
{expected_output}
"""
return [
{"role": "system", "content": system_message},
{"role": "user", "content": user_message},
]
🤖 Prompt for AI Agents
In examples/crewai-promptfoo-modular/composer.py around lines 23 to 56, add a
return type hint of List[Dict[str, str]] to compose_messages and harden YAML
loading and key access: open AGENTS_PATH and TASKS_PATH with try/except to catch
FileNotFoundError and yaml.YAMLError and raise a clear ValueError, validate that
agent_id and task_id exist in the loaded dicts and that required keys
("role","goal","backstory" for agent and "description","expected_output" for
task) are present, raising informative errors if missing or malformed, and
ensure vars is typed/checked (Dict[str, Any]) before rendering so function
always returns the expected List[Dict[str, str]].

Comment on lines 1 to 46
# yaml-language-server: $schema=https://promptfoo.dev/config-schema.json
description: "CrewAI modular agent-task evaluation (code-first)"

providers:
- id: file://./composer.py
label: "Composer Provider"
config:
agent_id: trend_researcher
task_id: trend_identification_task
model: gpt-4.1
temperature: 0.2

prompts:
- "{{input_text}}"

defaultTest:
assert:
- type: llm-rubric
provider: openai:gpt-4.1
value: |
Evaluate the response for:
1. Instruction adherence (bullet points, <=10)
2. Topic focus ({{topic}})
3. Depth (not generic)
4. Grounding in input_text
- type: javascript
value: |
// Ensure bullet points exist
return /\n[-*•]\s/.test(String(output));

tests:
- description: "Electric Vehicles Trend Analysis"
options:
provider: "Composer Provider"
vars:
topic: "Electric Vehicles"
input_text: |
The global EV market grew 35% year-over-year in 2024, driven by Asia and Europe...

- description: "AI in Healthcare Trend Analysis"
options:
provider: "Composer Provider"
vars:
topic: "Artificial Intelligence in Healthcare"
input_text: |
Hospitals are rapidly adopting AI diagnostic tools and pharmaceutical companies are investing in AI drug discovery...
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Reorder fields to match the required configuration structure.

Per coding guidelines and learnings, the field order must be: description, env (optional), prompts, providers, defaultTest (optional), scenarios (optional), tests. Currently, providers appears before prompts.

Apply this diff to reorder:

 # yaml-language-server: $schema=https://promptfoo.dev/config-schema.json
 description: "CrewAI modular agent-task evaluation (code-first)"
 
+prompts:
+  - "{{input_text}}"
+
 providers:
   - id: file://./composer.py
     label: "Composer Provider"
     config:
       agent_id: trend_researcher
       task_id: trend_identification_task
       model: gpt-4.1
       temperature: 0.2
 
-prompts:
-  - "{{input_text}}"
-
 defaultTest:

Based on learnings.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# yaml-language-server: $schema=https://promptfoo.dev/config-schema.json
description: "CrewAI modular agent-task evaluation (code-first)"
providers:
- id: file://./composer.py
label: "Composer Provider"
config:
agent_id: trend_researcher
task_id: trend_identification_task
model: gpt-4.1
temperature: 0.2
prompts:
- "{{input_text}}"
defaultTest:
assert:
- type: llm-rubric
provider: openai:gpt-4.1
value: |
Evaluate the response for:
1. Instruction adherence (bullet points, <=10)
2. Topic focus ({{topic}})
3. Depth (not generic)
4. Grounding in input_text
- type: javascript
value: |
// Ensure bullet points exist
return /\n[-*•]\s/.test(String(output));
tests:
- description: "Electric Vehicles Trend Analysis"
options:
provider: "Composer Provider"
vars:
topic: "Electric Vehicles"
input_text: |
The global EV market grew 35% year-over-year in 2024, driven by Asia and Europe...
- description: "AI in Healthcare Trend Analysis"
options:
provider: "Composer Provider"
vars:
topic: "Artificial Intelligence in Healthcare"
input_text: |
Hospitals are rapidly adopting AI diagnostic tools and pharmaceutical companies are investing in AI drug discovery...
# yaml-language-server: $schema=https://promptfoo.dev/config-schema.json
description: "CrewAI modular agent-task evaluation (code-first)"
prompts:
- "{{input_text}}"
providers:
- id: file://./composer.py
label: "Composer Provider"
config:
agent_id: trend_researcher
task_id: trend_identification_task
model: gpt-4.1
temperature: 0.2
defaultTest:
assert:
- type: llm-rubric
provider: openai:gpt-4.1
value: |
Evaluate the response for:
1. Instruction adherence (bullet points, <=10)
2. Topic focus ({{topic}})
3. Depth (not generic)
4. Grounding in input_text
- type: javascript
value: |
// Ensure bullet points exist
return /\n[-*•]\s/.test(String(output));
tests:
- description: "Electric Vehicles Trend Analysis"
options:
provider: "Composer Provider"
vars:
topic: "Electric Vehicles"
input_text: |
The global EV market grew 35% year-over-year in 2024, driven by Asia and Europe...
- description: "AI in Healthcare Trend Analysis"
options:
provider: "Composer Provider"
vars:
topic: "Artificial Intelligence in Healthcare"
input_text: |
Hospitals are rapidly adopting AI diagnostic tools and pharmaceutical companies are investing in AI drug discovery...
🤖 Prompt for AI Agents
In examples/crewai-promptfoo-modular/promptfooconfig.yaml around lines 1-46, the
top-level fields are out of the required order (providers appears before
prompts); reorder the YAML so the top-level keys follow: description, env (if
present), prompts, providers, defaultTest (if present), scenarios (if present),
tests — specifically move the prompts block to appear before the providers block
while preserving all existing content and indentation and leaving provider,
defaultTest and tests unchanged.

config:
agent_id: trend_researcher
task_id: trend_identification_task
model: gpt-4.1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Use a valid OpenAI model identifier.

gpt-4.1 is not a valid OpenAI model name. According to coding guidelines, prefer latest 2025 models like openai:gpt-4o-mini or openai:o3-mini. Note that composer.py defaults to gpt-4o-mini if no model is specified.

Apply this diff:

       agent_id: trend_researcher
       task_id: trend_identification_task
-      model: gpt-4.1
+      model: gpt-4o-mini
       temperature: 0.2

As per coding guidelines.

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In examples/crewai-promptfoo-modular/promptfooconfig.yaml around line 10, the
model field uses an invalid identifier "gpt-4.1"; replace it with a valid OpenAI
model identifier such as "openai:gpt-4o-mini" or "openai:o3-mini" (include the
"openai:" prefix), or remove the field to let composer.py default to
gpt-4o-mini; update the value accordingly so it conforms to the coding
guidelines.

defaultTest:
assert:
- type: llm-rubric
provider: openai:gpt-4.1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Use a valid OpenAI model identifier in the rubric provider.

openai:gpt-4.1 is not a valid model. Use a current model like openai:gpt-4o or openai:gpt-4o-mini.

Apply this diff:

   assert:
     - type: llm-rubric
-      provider: openai:gpt-4.1
+      provider: openai:gpt-4o
       value: |

As per coding guidelines.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
provider: openai:gpt-4.1
assert:
- type: llm-rubric
provider: openai:gpt-4o
value: |
🤖 Prompt for AI Agents
In examples/crewai-promptfoo-modular/promptfooconfig.yaml around line 19, the
rubric provider uses an invalid OpenAI model identifier `openai:gpt-4.1`;
replace it with a supported model such as `openai:gpt-4o` or
`openai:gpt-4o-mini` to comply with current OpenAI model names, updating the
provider line accordingly and saving the file.

@@ -0,0 +1,86 @@
# crewai-modular
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix the H1 heading to match the folder name.

The heading is crewai-modular but the folder name is crewai-promptfoo-modular. Per coding guidelines, the README.md must begin with the folder name as an H1 heading.

Apply this diff:

-# crewai-modular
+# crewai-promptfoo-modular
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# crewai-modular
# crewai-promptfoo-modular
🤖 Prompt for AI Agents
In examples/crewai-promptfoo-modular/README.md around line 1, the H1 heading
currently reads "crewai-modular" but must match the folder name; replace the
existing H1 with an H1 that exactly matches the folder name
crewai-promptfoo-modular (i.e., update the first line to be the folder name as
the H1).

Comment on lines +10 to +12
# Optional: Environment variable management from .env files
# Uncomment if you want to use .env files for API keys
python-dotenv>=1.1.1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Inconsistent comment about python-dotenv being optional.

The comment says "Uncomment if you want to use .env files" but the dependency is already uncommented on line 12. This contradicts the README which instructs users to "uncomment python-dotenv in requirements.txt" for .env file support.

Apply this diff to align the comment with the actual state:

-# Optional: Environment variable management from .env files
-# Uncomment if you want to use .env files for API keys
-python-dotenv>=1.1.1
+# Environment variable management from .env files
+# Comment out if you don't need .env file support
+python-dotenv>=1.1.1

Alternatively, if you want to make it truly optional by default, comment it out:

-# Optional: Environment variable management from .env files
-# Uncomment if you want to use .env files for API keys
-python-dotenv>=1.1.1
+# Optional: Environment variable management from .env files
+# Uncomment the line below if you want to use .env files for API keys
+# python-dotenv>=1.1.1
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# Optional: Environment variable management from .env files
# Uncomment if you want to use .env files for API keys
python-dotenv>=1.1.1
# Environment variable management from .env files
# Comment out if you don't need .env file support
python-dotenv>=1.1.1
🤖 Prompt for AI Agents
In examples/crewai-promptfoo-modular/requirements.txt around lines 10 to 12, the
comment states to "Uncomment if you want to use .env files" while
python-dotenv>=1.1.1 is already uncommented; update the file so the comment and
dependency state match the README by either commenting out the python-dotenv
line (prefix with #) to make it optional by default, or change the comment to
indicate the dependency is included by default — pick one approach and apply it
consistently.

@Ayush7614
Copy link
Contributor Author

cc @mldangelo

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant

Morty Proxy This is a proxified and sanitized view of the page, visit original site.