-
-
Notifications
You must be signed in to change notification settings - Fork 733
docs: add modular CrewAI + Promptfoo evaluation example (agent-task prompt testing) #5807
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
📝 WalkthroughWalkthroughAdds a new example directory demonstrating a CrewAI-style modular prompt setup for Promptfoo. Introduces agents.yaml and tasks.yaml to define an agent (trend_researcher) and a task (trend_identification_task). Implements composer.py to render templated fields and compose chat messages, with optional OpenAI chat completions integration driven by environment variables. Provides promptfooconfig.yaml configuring a file-based provider pointing to composer.py, default checks, and two example tests. Adds README with setup, usage, and troubleshooting, plus requirements.txt for dependencies. Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes
Possibly related issues
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 6
🧹 Nitpick comments (5)
examples/crewai-promptfoo-modular/README.md (1)
80-86
: Add a cleanup section to the README.The README is missing instructions for cleaning up resources after running the example, which is required by the coding guidelines. Consider adding a section explaining how to remove generated files, stop any running processes, or reset the environment.
Add a cleanup section after the Troubleshooting section:
## Cleanup After running the evaluation, you may want to clean up generated files: ```bash # Remove evaluation results rm -rf promptfoo-outputs/ # Remove cached data (if any) rm -rf .promptfoo/If you set environment variables temporarily, unset them:
unset OPENAI_API_KEY
As per coding guidelines. </blockquote></details> <details> <summary>examples/crewai-promptfoo-modular/promptfooconfig.yaml (1)</summary><blockquote> `4-11`: **Consider adding a mix of providers for comparison.** The configuration currently uses only one provider. Per learnings, including a mix of providers when comparing model performance helps demonstrate Promptfoo's capabilities and provides users with comparative insights. Consider adding alternative providers to the tests section: ```yaml providers: - id: file://./composer.py label: "Composer Provider (GPT-4o-mini)" config: agent_id: trend_researcher task_id: trend_identification_task model: gpt-4o-mini temperature: 0.2 - id: file://./composer.py label: "Composer Provider (Claude)" config: agent_id: trend_researcher task_id: trend_identification_task model: claude-3-5-sonnet-20241022 temperature: 0.2
Then reference different providers in the tests to show model comparison.
Based on learnings.
examples/crewai-promptfoo-modular/composer.py (3)
5-10
: Handle missing OPENAI_API_KEY more gracefully at import time.The OpenAI client is initialized at module import (line 7) without checking if
OPENAI_API_KEY
is set. If the key is missing, the OpenAI SDK may use a default or raise an exception, potentially causing import failures even when OpenAI isn't needed.Apply this diff to defer client initialization:
try: from openai import OpenAI - oai_client = OpenAI(api_key=os.getenv("OPENAI_API_KEY")) - USE_OPENAI = True + USE_OPENAI = True + oai_client = None # Initialize lazily except ImportError: USE_OPENAI = False + oai_client = NoneThen in
call_api
, initialize the client when needed:try: if USE_OPENAI: + global oai_client + if oai_client is None: + api_key = os.getenv("OPENAI_API_KEY") + if not api_key: + return {"error": "OPENAI_API_KEY environment variable not set"} + oai_client = OpenAI(api_key=api_key) response = oai_client.chat.completions.create(
59-79
: Improve error handling specificity and consider rate limiting.The broad exception handler on line 78 catches all exceptions and returns a generic error. Consider handling specific exception types (authentication errors, rate limits, network errors) to provide more actionable feedback to users.
Apply this diff:
try: if USE_OPENAI: + if oai_client is None: + return {"error": "OpenAI client not initialized"} response = oai_client.chat.completions.create( model=cfg.get("model", "gpt-4o-mini"), messages=messages, temperature=cfg.get("temperature", 0.2), ) return {"output": response.choices[0].message.content} else: return {"error": "OpenAI not installed or configured."} + except Exception as e: + import openai + if isinstance(e, openai.AuthenticationError): + return {"error": "OpenAI authentication failed. Check OPENAI_API_KEY."} + elif isinstance(e, openai.RateLimitError): + return {"error": f"OpenAI rate limit exceeded: {e}"} + elif isinstance(e, openai.APIError): + return {"error": f"OpenAI API error: {e}"} - except Exception as e: - return {"error": f"Composer error: {e}"} + else: + return {"error": f"Composer error: {type(e).__name__}: {e}"}
1-3
: Consider adding standard library and third-party import separation.Following the Google Python Style Guide, separate standard library imports (os) from third-party imports (yaml) with a blank line.
Apply this diff:
import os + import yaml from typing import Any, Dict
As per coding guidelines (follow Google Python Style Guide).
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
examples/crewai-promptfoo-modular/README.md
(1 hunks)examples/crewai-promptfoo-modular/agents.yaml
(1 hunks)examples/crewai-promptfoo-modular/composer.py
(1 hunks)examples/crewai-promptfoo-modular/promptfooconfig.yaml
(1 hunks)examples/crewai-promptfoo-modular/requirements.txt
(1 hunks)examples/crewai-promptfoo-modular/tasks.yaml
(1 hunks)
🧰 Additional context used
📓 Path-based instructions (8)
examples/*/README.md
📄 CodeRabbit inference engine (.cursor/rules/examples.mdc)
examples/*/README.md
: The README.md must begin with the folder name as an H1 heading
Every example README must include instructions on how to run it with 'npx promptfoo@latest init --example example-name'
Include a comprehensive README.md that explains the purpose, prerequisites, instructions, and expected outputs for the example
Document any model-specific capabilities or limitations in examples
Clearly list all required environment variables at the beginning of the README
For each environment variable, explain its purpose, how to obtain it, and any default values or constraints in the README
Include a sample .env file or instructions when multiple environment variables are needed in the README
Document any required API keys or credentials in the README
Provide instructions for cleaning up resources after running the example in the README
When creating examples for specific providers, explain any provider-specific configuration in the README
When creating examples for specific providers, document required environment variables in the README
When creating examples for specific providers, include information about pricing or usage limits in the README
When creating examples for specific providers, highlight unique features or capabilities in the README
When creating examples for specific providers, compare to similar providers where appropriate in the README
Files:
examples/crewai-promptfoo-modular/README.md
examples/*/{README.md,promptfooconfig.yaml}
📄 CodeRabbit inference engine (.cursor/rules/examples.mdc)
Include placeholder values for secrets/credentials in the README or configuration files
Files:
examples/crewai-promptfoo-modular/README.md
examples/crewai-promptfoo-modular/promptfooconfig.yaml
{site/**,examples/**}
📄 CodeRabbit inference engine (.cursor/rules/gh-cli-workflow.mdc)
Any pull request that only touches files in 'site/' or 'examples/' directories must use the 'docs:' prefix in the PR title, not 'feat:' or 'fix:'
Files:
examples/crewai-promptfoo-modular/README.md
examples/crewai-promptfoo-modular/agents.yaml
examples/crewai-promptfoo-modular/tasks.yaml
examples/crewai-promptfoo-modular/promptfooconfig.yaml
examples/crewai-promptfoo-modular/requirements.txt
examples/crewai-promptfoo-modular/composer.py
examples/**
📄 CodeRabbit inference engine (.cursor/rules/gh-cli-workflow.mdc)
When modifying examples, update existing files in 'examples/' instead of adding new ones (e.g., replace outdated model IDs rather than introducing new example files)
Put examples in examples/ with a clear README.md
Files:
examples/crewai-promptfoo-modular/README.md
examples/crewai-promptfoo-modular/agents.yaml
examples/crewai-promptfoo-modular/tasks.yaml
examples/crewai-promptfoo-modular/promptfooconfig.yaml
examples/crewai-promptfoo-modular/requirements.txt
examples/crewai-promptfoo-modular/composer.py
examples/*/promptfooconfig.yaml
📄 CodeRabbit inference engine (.cursor/rules/examples.mdc)
examples/*/promptfooconfig.yaml
: Include a working promptfooconfig.yaml (or equivalent) file in each example directory
Always include the YAML schema reference at the top of configuration files: '# yaml-language-server: $schema=https://promptfoo.dev/config-schema.json'
Follow the specified field order in all configuration files: description, env (optional), prompts, providers, defaultTest (optional), scenarios (optional), tests
Ensure all configuration files pass YAML lint validation
When referencing external files in configuration, always use the 'file://' prefix
Always use the latest model versions available in 2025 in configuration files
For OpenAI, prefer models like 'openai:o3-mini' and 'openai:gpt-4o-mini' in configuration files
For Anthropic, prefer models like 'anthropic:claude-3-7-sonnet-20250219' in configuration files
For open-source models, use the latest versions available (e.g., latest Llama) in configuration files
Include a mix of providers when comparing model performance in configuration files
When demonstrating specialized capabilities (vision, audio, etc.), use models that support those features in configuration files
Format configuration files consistently
When creating examples for specific providers, always use the latest available model versions for that provider in configuration files
Update model versions when new ones become available in configuration files
Files:
examples/crewai-promptfoo-modular/promptfooconfig.yaml
examples/**/requirements.txt
📄 CodeRabbit inference engine (.cursor/rules/python.mdc)
When adding Python examples, update relevant
requirements.txt
files
Files:
examples/crewai-promptfoo-modular/requirements.txt
**/*.py
📄 CodeRabbit inference engine (.cursor/rules/python.mdc)
**/*.py
: Use Python 3.9 or later
Follow the Google Python Style Guide
Use type hints to improve code readability and catch potential errors
Useruff
for linting and formatting
Runruff check --fix
for general linting
Runruff check --select I --fix
for import sorting
Runruff format
for formatting
Keep the Python codebase simple and minimal, without unnecessary external dependencies
When implementing custom providers, prompts, or asserts in Python, follow the promptfoo API patterns
Files:
examples/crewai-promptfoo-modular/composer.py
examples/**/*.py
📄 CodeRabbit inference engine (.cursor/rules/python.mdc)
Place example Python code in the
examples/
directory with appropriate documentation
Files:
examples/crewai-promptfoo-modular/composer.py
🧠 Learnings (11)
📓 Common learnings
Learnt from: CR
PR: promptfoo/promptfoo#0
File: .cursor/rules/examples.mdc:0-0
Timestamp: 2025-07-18T17:25:46.665Z
Learning: Applies to examples/*/promptfooconfig.yaml : Include a working promptfooconfig.yaml (or equivalent) file in each example directory
Learnt from: CR
PR: promptfoo/promptfoo#0
File: .cursor/rules/examples.mdc:0-0
Timestamp: 2025-07-18T17:25:38.444Z
Learning: Applies to examples/*/promptfooconfig*.yaml : Include a working promptfooconfig.yaml (or equivalent) file in each example
📚 Learning: 2025-07-18T17:25:38.444Z
Learnt from: CR
PR: promptfoo/promptfoo#0
File: .cursor/rules/examples.mdc:0-0
Timestamp: 2025-07-18T17:25:38.444Z
Learning: Applies to examples/*/promptfooconfig*.yaml : Follow the specific field order in all configuration files: description, env (optional), prompts, providers, defaultTest (optional), scenarios (optional), tests
Applied to files:
examples/crewai-promptfoo-modular/promptfooconfig.yaml
📚 Learning: 2025-07-18T17:25:38.444Z
Learnt from: CR
PR: promptfoo/promptfoo#0
File: .cursor/rules/examples.mdc:0-0
Timestamp: 2025-07-18T17:25:38.444Z
Learning: Applies to examples/*/promptfooconfig*.yaml : Include a working promptfooconfig.yaml (or equivalent) file in each example
Applied to files:
examples/crewai-promptfoo-modular/promptfooconfig.yaml
📚 Learning: 2025-07-18T17:25:46.665Z
Learnt from: CR
PR: promptfoo/promptfoo#0
File: .cursor/rules/examples.mdc:0-0
Timestamp: 2025-07-18T17:25:46.665Z
Learning: Applies to examples/*/promptfooconfig.yaml : Include a working promptfooconfig.yaml (or equivalent) file in each example directory
Applied to files:
examples/crewai-promptfoo-modular/promptfooconfig.yaml
📚 Learning: 2025-07-18T17:25:38.445Z
Learnt from: CR
PR: promptfoo/promptfoo#0
File: .cursor/rules/examples.mdc:0-0
Timestamp: 2025-07-18T17:25:38.445Z
Learning: Applies to examples/*/promptfooconfig*.yaml : Include a mix of providers when comparing model performance in configuration files
Applied to files:
examples/crewai-promptfoo-modular/promptfooconfig.yaml
📚 Learning: 2025-07-18T17:25:46.665Z
Learnt from: CR
PR: promptfoo/promptfoo#0
File: .cursor/rules/examples.mdc:0-0
Timestamp: 2025-07-18T17:25:46.665Z
Learning: Applies to examples/*/promptfooconfig.yaml : Format configuration files consistently
Applied to files:
examples/crewai-promptfoo-modular/promptfooconfig.yaml
📚 Learning: 2025-07-18T17:25:46.665Z
Learnt from: CR
PR: promptfoo/promptfoo#0
File: .cursor/rules/examples.mdc:0-0
Timestamp: 2025-07-18T17:25:46.665Z
Learning: Applies to examples/*/promptfooconfig.yaml : Always include the YAML schema reference at the top of configuration files: '# yaml-language-server: $schema=https://promptfoo.dev/config-schema.json'
Applied to files:
examples/crewai-promptfoo-modular/promptfooconfig.yaml
📚 Learning: 2025-07-18T17:25:38.445Z
Learnt from: CR
PR: promptfoo/promptfoo#0
File: .cursor/rules/examples.mdc:0-0
Timestamp: 2025-07-18T17:25:38.445Z
Learning: Applies to examples/*/promptfooconfig*.yaml : Format configuration files consistently
Applied to files:
examples/crewai-promptfoo-modular/promptfooconfig.yaml
📚 Learning: 2025-07-18T17:25:38.444Z
Learnt from: CR
PR: promptfoo/promptfoo#0
File: .cursor/rules/examples.mdc:0-0
Timestamp: 2025-07-18T17:25:38.444Z
Learning: Applies to examples/*/promptfooconfig*.yaml : Always include the YAML schema reference at the top of configuration files: '# yaml-language-server: $schema=https://promptfoo.dev/config-schema.json'
Applied to files:
examples/crewai-promptfoo-modular/promptfooconfig.yaml
📚 Learning: 2025-07-18T17:25:38.444Z
Learnt from: CR
PR: promptfoo/promptfoo#0
File: .cursor/rules/examples.mdc:0-0
Timestamp: 2025-07-18T17:25:38.444Z
Learning: Applies to examples/*/promptfooconfig*.yaml : For trivial test cases in configuration, make them quirky and fun to increase engagement
Applied to files:
examples/crewai-promptfoo-modular/promptfooconfig.yaml
📚 Learning: 2025-07-18T17:25:46.665Z
Learnt from: CR
PR: promptfoo/promptfoo#0
File: .cursor/rules/examples.mdc:0-0
Timestamp: 2025-07-18T17:25:46.665Z
Learning: Applies to examples/*/promptfooconfig.yaml : Follow the specified field order in all configuration files: description, env (optional), prompts, providers, defaultTest (optional), scenarios (optional), tests
Applied to files:
examples/crewai-promptfoo-modular/promptfooconfig.yaml
🧬 Code graph analysis (1)
examples/crewai-promptfoo-modular/composer.py (1)
examples/redteam-medical-agent/src/llm.js (1)
openai
(11-13)
🔇 Additional comments (5)
examples/crewai-promptfoo-modular/tasks.yaml (1)
1-7
: LGTM!The task definition is well-structured with clear description, expected output format (max 10 bullet points), and proper agent linkage to
trend_researcher
.examples/crewai-promptfoo-modular/README.md (1)
20-26
: LGTM! Comprehensive prerequisites section.The prerequisites are clearly listed with version requirements and explanations. The optional Ollama mention provides flexibility for users who want to avoid API costs.
examples/crewai-promptfoo-modular/agents.yaml (1)
1-8
: LGTM! Well-structured agent definition.The agent definition uses appropriate template variables (
{{topic}}
,{{input_text}}
), provides clear role and goal statements, and includes a contextual backstory. The structure integrates correctly with the composer workflow.examples/crewai-promptfoo-modular/composer.py (1)
16-20
: LGTM! Clean template rendering implementation.The
render
function provides straightforward variable substitution with proper type hints. The implementation is simple and effective for the use case.examples/crewai-promptfoo-modular/requirements.txt (1)
2-2
: Verified crewai 0.201.1 as the latest stable version on PyPI.
def compose_messages(agent_id: str, task_id: str, vars: Dict[str, Any]): | ||
"""Build chat messages from YAML files.""" | ||
agents = yaml.safe_load(open(AGENTS_PATH)) | ||
tasks = yaml.safe_load(open(TASKS_PATH)) | ||
|
||
agent = agents[agent_id] | ||
task = tasks[task_id] | ||
|
||
role = render(agent["role"], vars) | ||
goal = render(agent["goal"], vars) | ||
backstory = render(agent["backstory"], vars) | ||
description = render(task["description"], vars) | ||
expected_output = render(task["expected_output"], vars) | ||
|
||
system_message = f"""Role: {role} | ||
Goal: {goal} | ||
Backstory: {backstory} | ||
You must strictly follow the expected output format. | ||
""" | ||
|
||
user_message = f"""Task description: | ||
{description} | ||
|
||
Input text: | ||
{vars.get("input_text", "")} | ||
|
||
Expected output: | ||
{expected_output} | ||
""" | ||
|
||
return [ | ||
{"role": "system", "content": system_message}, | ||
{"role": "user", "content": user_message}, | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion | 🟠 Major
Add return type hint and error handling for missing YAML keys.
The function is missing a return type hint (should be List[Dict[str, str]]
), and doesn't handle cases where the YAML files are missing, malformed, or lack expected keys.
Apply this diff:
-def compose_messages(agent_id: str, task_id: str, vars: Dict[str, Any]):
+def compose_messages(agent_id: str, task_id: str, vars: Dict[str, Any]) -> list[dict[str, str]]:
"""Build chat messages from YAML files."""
+ try:
- agents = yaml.safe_load(open(AGENTS_PATH))
- tasks = yaml.safe_load(open(TASKS_PATH))
+ with open(AGENTS_PATH) as f:
+ agents = yaml.safe_load(f)
+ with open(TASKS_PATH) as f:
+ tasks = yaml.safe_load(f)
+ except FileNotFoundError as e:
+ raise FileNotFoundError(f"Required YAML file not found: {e.filename}")
+ except yaml.YAMLError as e:
+ raise ValueError(f"Invalid YAML format: {e}")
- agent = agents[agent_id]
- task = tasks[task_id]
+ if agent_id not in agents:
+ raise KeyError(f"Agent '{agent_id}' not found in {AGENTS_PATH}")
+ if task_id not in tasks:
+ raise KeyError(f"Task '{task_id}' not found in {TASKS_PATH}")
+
+ agent = agents[agent_id]
+ task = tasks[task_id]
+
+ required_agent_keys = ["role", "goal", "backstory"]
+ required_task_keys = ["description", "expected_output"]
+
+ for key in required_agent_keys:
+ if key not in agent:
+ raise KeyError(f"Missing required key '{key}' in agent '{agent_id}'")
+ for key in required_task_keys:
+ if key not in task:
+ raise KeyError(f"Missing required key '{key}' in task '{task_id}'")
As per coding guidelines (use type hints, follow Google Python Style Guide).
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
def compose_messages(agent_id: str, task_id: str, vars: Dict[str, Any]): | |
"""Build chat messages from YAML files.""" | |
agents = yaml.safe_load(open(AGENTS_PATH)) | |
tasks = yaml.safe_load(open(TASKS_PATH)) | |
agent = agents[agent_id] | |
task = tasks[task_id] | |
role = render(agent["role"], vars) | |
goal = render(agent["goal"], vars) | |
backstory = render(agent["backstory"], vars) | |
description = render(task["description"], vars) | |
expected_output = render(task["expected_output"], vars) | |
system_message = f"""Role: {role} | |
Goal: {goal} | |
Backstory: {backstory} | |
You must strictly follow the expected output format. | |
""" | |
user_message = f"""Task description: | |
{description} | |
Input text: | |
{vars.get("input_text", "")} | |
Expected output: | |
{expected_output} | |
""" | |
return [ | |
{"role": "system", "content": system_message}, | |
{"role": "user", "content": user_message}, | |
] | |
def compose_messages(agent_id: str, task_id: str, vars: Dict[str, Any]) -> list[dict[str, str]]: | |
"""Build chat messages from YAML files.""" | |
try: | |
with open(AGENTS_PATH) as f: | |
agents = yaml.safe_load(f) | |
with open(TASKS_PATH) as f: | |
tasks = yaml.safe_load(f) | |
except FileNotFoundError as e: | |
raise FileNotFoundError(f"Required YAML file not found: {e.filename}") | |
except yaml.YAMLError as e: | |
raise ValueError(f"Invalid YAML format: {e}") | |
if agent_id not in agents: | |
raise KeyError(f"Agent '{agent_id}' not found in {AGENTS_PATH}") | |
if task_id not in tasks: | |
raise KeyError(f"Task '{task_id}' not found in {TASKS_PATH}") | |
agent = agents[agent_id] | |
task = tasks[task_id] | |
required_agent_keys = ["role", "goal", "backstory"] | |
required_task_keys = ["description", "expected_output"] | |
for key in required_agent_keys: | |
if key not in agent: | |
raise KeyError(f"Missing required key '{key}' in agent '{agent_id}'") | |
for key in required_task_keys: | |
if key not in task: | |
raise KeyError(f"Missing required key '{key}' in task '{task_id}'") | |
role = render(agent["role"], vars) | |
goal = render(agent["goal"], vars) | |
backstory = render(agent["backstory"], vars) | |
description = render(task["description"], vars) | |
expected_output = render(task["expected_output"], vars) | |
system_message = f"""Role: {role} | |
Goal: {goal} | |
Backstory: {backstory} | |
You must strictly follow the expected output format. | |
""" | |
user_message = f"""Task description: | |
{description} | |
Input text: | |
{vars.get("input_text", "")} | |
Expected output: | |
{expected_output} | |
""" | |
return [ | |
{"role": "system", "content": system_message}, | |
{"role": "user", "content": user_message}, | |
] |
🤖 Prompt for AI Agents
In examples/crewai-promptfoo-modular/composer.py around lines 23 to 56, add a
return type hint of List[Dict[str, str]] to compose_messages and harden YAML
loading and key access: open AGENTS_PATH and TASKS_PATH with try/except to catch
FileNotFoundError and yaml.YAMLError and raise a clear ValueError, validate that
agent_id and task_id exist in the loaded dicts and that required keys
("role","goal","backstory" for agent and "description","expected_output" for
task) are present, raising informative errors if missing or malformed, and
ensure vars is typed/checked (Dict[str, Any]) before rendering so function
always returns the expected List[Dict[str, str]].
# yaml-language-server: $schema=https://promptfoo.dev/config-schema.json | ||
description: "CrewAI modular agent-task evaluation (code-first)" | ||
|
||
providers: | ||
- id: file://./composer.py | ||
label: "Composer Provider" | ||
config: | ||
agent_id: trend_researcher | ||
task_id: trend_identification_task | ||
model: gpt-4.1 | ||
temperature: 0.2 | ||
|
||
prompts: | ||
- "{{input_text}}" | ||
|
||
defaultTest: | ||
assert: | ||
- type: llm-rubric | ||
provider: openai:gpt-4.1 | ||
value: | | ||
Evaluate the response for: | ||
1. Instruction adherence (bullet points, <=10) | ||
2. Topic focus ({{topic}}) | ||
3. Depth (not generic) | ||
4. Grounding in input_text | ||
- type: javascript | ||
value: | | ||
// Ensure bullet points exist | ||
return /\n[-*•]\s/.test(String(output)); | ||
|
||
tests: | ||
- description: "Electric Vehicles Trend Analysis" | ||
options: | ||
provider: "Composer Provider" | ||
vars: | ||
topic: "Electric Vehicles" | ||
input_text: | | ||
The global EV market grew 35% year-over-year in 2024, driven by Asia and Europe... | ||
|
||
- description: "AI in Healthcare Trend Analysis" | ||
options: | ||
provider: "Composer Provider" | ||
vars: | ||
topic: "Artificial Intelligence in Healthcare" | ||
input_text: | | ||
Hospitals are rapidly adopting AI diagnostic tools and pharmaceutical companies are investing in AI drug discovery... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reorder fields to match the required configuration structure.
Per coding guidelines and learnings, the field order must be: description, env (optional), prompts, providers, defaultTest (optional), scenarios (optional), tests. Currently, providers
appears before prompts
.
Apply this diff to reorder:
# yaml-language-server: $schema=https://promptfoo.dev/config-schema.json
description: "CrewAI modular agent-task evaluation (code-first)"
+prompts:
+ - "{{input_text}}"
+
providers:
- id: file://./composer.py
label: "Composer Provider"
config:
agent_id: trend_researcher
task_id: trend_identification_task
model: gpt-4.1
temperature: 0.2
-prompts:
- - "{{input_text}}"
-
defaultTest:
Based on learnings.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
# yaml-language-server: $schema=https://promptfoo.dev/config-schema.json | |
description: "CrewAI modular agent-task evaluation (code-first)" | |
providers: | |
- id: file://./composer.py | |
label: "Composer Provider" | |
config: | |
agent_id: trend_researcher | |
task_id: trend_identification_task | |
model: gpt-4.1 | |
temperature: 0.2 | |
prompts: | |
- "{{input_text}}" | |
defaultTest: | |
assert: | |
- type: llm-rubric | |
provider: openai:gpt-4.1 | |
value: | | |
Evaluate the response for: | |
1. Instruction adherence (bullet points, <=10) | |
2. Topic focus ({{topic}}) | |
3. Depth (not generic) | |
4. Grounding in input_text | |
- type: javascript | |
value: | | |
// Ensure bullet points exist | |
return /\n[-*•]\s/.test(String(output)); | |
tests: | |
- description: "Electric Vehicles Trend Analysis" | |
options: | |
provider: "Composer Provider" | |
vars: | |
topic: "Electric Vehicles" | |
input_text: | | |
The global EV market grew 35% year-over-year in 2024, driven by Asia and Europe... | |
- description: "AI in Healthcare Trend Analysis" | |
options: | |
provider: "Composer Provider" | |
vars: | |
topic: "Artificial Intelligence in Healthcare" | |
input_text: | | |
Hospitals are rapidly adopting AI diagnostic tools and pharmaceutical companies are investing in AI drug discovery... | |
# yaml-language-server: $schema=https://promptfoo.dev/config-schema.json | |
description: "CrewAI modular agent-task evaluation (code-first)" | |
prompts: | |
- "{{input_text}}" | |
providers: | |
- id: file://./composer.py | |
label: "Composer Provider" | |
config: | |
agent_id: trend_researcher | |
task_id: trend_identification_task | |
model: gpt-4.1 | |
temperature: 0.2 | |
defaultTest: | |
assert: | |
- type: llm-rubric | |
provider: openai:gpt-4.1 | |
value: | | |
Evaluate the response for: | |
1. Instruction adherence (bullet points, <=10) | |
2. Topic focus ({{topic}}) | |
3. Depth (not generic) | |
4. Grounding in input_text | |
- type: javascript | |
value: | | |
// Ensure bullet points exist | |
return /\n[-*•]\s/.test(String(output)); | |
tests: | |
- description: "Electric Vehicles Trend Analysis" | |
options: | |
provider: "Composer Provider" | |
vars: | |
topic: "Electric Vehicles" | |
input_text: | | |
The global EV market grew 35% year-over-year in 2024, driven by Asia and Europe... | |
- description: "AI in Healthcare Trend Analysis" | |
options: | |
provider: "Composer Provider" | |
vars: | |
topic: "Artificial Intelligence in Healthcare" | |
input_text: | | |
Hospitals are rapidly adopting AI diagnostic tools and pharmaceutical companies are investing in AI drug discovery... |
🤖 Prompt for AI Agents
In examples/crewai-promptfoo-modular/promptfooconfig.yaml around lines 1-46, the
top-level fields are out of the required order (providers appears before
prompts); reorder the YAML so the top-level keys follow: description, env (if
present), prompts, providers, defaultTest (if present), scenarios (if present),
tests — specifically move the prompts block to appear before the providers block
while preserving all existing content and indentation and leaving provider,
defaultTest and tests unchanged.
config: | ||
agent_id: trend_researcher | ||
task_id: trend_identification_task | ||
model: gpt-4.1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use a valid OpenAI model identifier.
gpt-4.1
is not a valid OpenAI model name. According to coding guidelines, prefer latest 2025 models like openai:gpt-4o-mini
or openai:o3-mini
. Note that composer.py
defaults to gpt-4o-mini
if no model is specified.
Apply this diff:
agent_id: trend_researcher
task_id: trend_identification_task
- model: gpt-4.1
+ model: gpt-4o-mini
temperature: 0.2
As per coding guidelines.
Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In examples/crewai-promptfoo-modular/promptfooconfig.yaml around line 10, the
model field uses an invalid identifier "gpt-4.1"; replace it with a valid OpenAI
model identifier such as "openai:gpt-4o-mini" or "openai:o3-mini" (include the
"openai:" prefix), or remove the field to let composer.py default to
gpt-4o-mini; update the value accordingly so it conforms to the coding
guidelines.
defaultTest: | ||
assert: | ||
- type: llm-rubric | ||
provider: openai:gpt-4.1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use a valid OpenAI model identifier in the rubric provider.
openai:gpt-4.1
is not a valid model. Use a current model like openai:gpt-4o
or openai:gpt-4o-mini
.
Apply this diff:
assert:
- type: llm-rubric
- provider: openai:gpt-4.1
+ provider: openai:gpt-4o
value: |
As per coding guidelines.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
provider: openai:gpt-4.1 | |
assert: | |
- type: llm-rubric | |
provider: openai:gpt-4o | |
value: | |
🤖 Prompt for AI Agents
In examples/crewai-promptfoo-modular/promptfooconfig.yaml around line 19, the
rubric provider uses an invalid OpenAI model identifier `openai:gpt-4.1`;
replace it with a supported model such as `openai:gpt-4o` or
`openai:gpt-4o-mini` to comply with current OpenAI model names, updating the
provider line accordingly and saving the file.
@@ -0,0 +1,86 @@ | ||
# crewai-modular |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix the H1 heading to match the folder name.
The heading is crewai-modular
but the folder name is crewai-promptfoo-modular
. Per coding guidelines, the README.md must begin with the folder name as an H1 heading.
Apply this diff:
-# crewai-modular
+# crewai-promptfoo-modular
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
# crewai-modular | |
# crewai-promptfoo-modular |
🤖 Prompt for AI Agents
In examples/crewai-promptfoo-modular/README.md around line 1, the H1 heading
currently reads "crewai-modular" but must match the folder name; replace the
existing H1 with an H1 that exactly matches the folder name
crewai-promptfoo-modular (i.e., update the first line to be the folder name as
the H1).
# Optional: Environment variable management from .env files | ||
# Uncomment if you want to use .env files for API keys | ||
python-dotenv>=1.1.1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inconsistent comment about python-dotenv being optional.
The comment says "Uncomment if you want to use .env files" but the dependency is already uncommented on line 12. This contradicts the README which instructs users to "uncomment python-dotenv
in requirements.txt
" for .env file support.
Apply this diff to align the comment with the actual state:
-# Optional: Environment variable management from .env files
-# Uncomment if you want to use .env files for API keys
-python-dotenv>=1.1.1
+# Environment variable management from .env files
+# Comment out if you don't need .env file support
+python-dotenv>=1.1.1
Alternatively, if you want to make it truly optional by default, comment it out:
-# Optional: Environment variable management from .env files
-# Uncomment if you want to use .env files for API keys
-python-dotenv>=1.1.1
+# Optional: Environment variable management from .env files
+# Uncomment the line below if you want to use .env files for API keys
+# python-dotenv>=1.1.1
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
# Optional: Environment variable management from .env files | |
# Uncomment if you want to use .env files for API keys | |
python-dotenv>=1.1.1 | |
# Environment variable management from .env files | |
# Comment out if you don't need .env file support | |
python-dotenv>=1.1.1 |
🤖 Prompt for AI Agents
In examples/crewai-promptfoo-modular/requirements.txt around lines 10 to 12, the
comment states to "Uncomment if you want to use .env files" while
python-dotenv>=1.1.1 is already uncommented; update the file so the comment and
dependency state match the README by either commenting out the python-dotenv
line (prefix with #) to make it optional by default, or change the comment to
indicate the dependency is included by default — pick one approach and apply it
consistently.
cc @mldangelo |
This PR introduces a new example demonstrating how to evaluate modular agent-task prompts in the CrewAI framework using Promptfoo.
Key additions: