A benchmark for the modern AI security agent.
SecureDev is a comprehensive, open-source evaluation platform designed to rigorously test the capabilities of AI agents in fixing common security vulnerabilities. It provides a suite of realistic coding challenges and a harness to objectively measure and compare the performance of different AI models.
- Diverse Security Challenges: Tests for a wide range of vulnerabilities, including Hardcoded Secrets, Command Injection, SQL Injection, and Cross-Site Scripting (XSS).
- Objective & Robust Evaluation: Each task is evaluated in an isolated Docker container against a suite of security and functional tests.
- Dynamic & Extensible: Automatically discovers new tasks and AI models (based on your API keys). The platform is designed to be easily extended.
- Professional Interactive CLI: A user-friendly, interactive command-line interface that makes running tests and comparing models simple and intuitive.
- Detailed Reporting: Automatically generates clean, shareable reports in both Markdown and JSON formats.
Note: the CLI supports parallel, non-interactive runs via --parallel and -j/--workers for faster CI or bulk benchmarking — see docs/07-cli-reference.md for details.
- Python 3.9+
- Docker
- Git
-
Clone the repository:
git clone https://github.com/samcodesign/securedev-bench.git cd securedev-bench -
Create and activate a virtual environment:
python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate
-
Install the required dependencies:
pip install -r requirements.txt
-
Set up your API keys: Create a
.envfile in the project root (you can copy the example):cp .env.example .env
Then edit
.envand add your API keys.
Run the interactive benchmark CLI:
python run_benchmark.pyThe tool will discover available tasks and models and guide you through selection.
For non-interactive usage and additional options:
python run_benchmark.py --helpFor full details (architecture, results interpretation, contribution workflow), see the /docs directory.
Topics include:
- How to Add a New Task
- How to Add a New AI Provider
- Project Architecture
- Cli Reference
- Interpreting the Results
- Credibility Shield
- Contribute
Contributions are welcome. Please review the guides in /docs before submitting changes. All contributors must follow the Code of Conduct.
This project is licensed under the GNU General Public License v3.0. See the LICENSE file for details.
