SecureDev-Bench

A benchmark for the modern AI security agent.

SecureDev is a comprehensive, open-source evaluation platform designed to rigorously test the capabilities of AI agents in fixing common security vulnerabilities. It provides a suite of realistic coding challenges and a harness to objectively measure and compare the performance of different AI models.

Key Features

Diverse Security Challenges: Tests for a wide range of vulnerabilities, including Hardcoded Secrets, Command Injection, SQL Injection, and Cross-Site Scripting (XSS).
Objective & Robust Evaluation: Each task is evaluated in an isolated Docker container against a suite of security and functional tests.
Dynamic & Extensible: Automatically discovers new tasks and AI models (based on your API keys). The platform is designed to be easily extended.
Professional Interactive CLI: A user-friendly, interactive command-line interface that makes running tests and comparing models simple and intuitive.
Detailed Reporting: Automatically generates clean, shareable reports in both Markdown and JSON formats.

Note: the CLI supports parallel, non-interactive runs via --parallel and -j/--workers for faster CI or bulk benchmarking — see docs/07-cli-reference.md for details.

Getting Started

Prerequisites

Python 3.9+
Docker
Git

Installation

Clone the repository:

git clone https://github.com/samcodesign/securedev-bench.git
cd securedev-bench

Create and activate a virtual environment:

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install the required dependencies:
```
pip install -r requirements.txt
```
Set up your API keys: Create a .env file in the project root (you can copy the example):
```
cp .env.example .env
```
Then edit .env and add your API keys.

Usage

Run the interactive benchmark CLI:

python run_benchmark.py

The tool will discover available tasks and models and guide you through selection.

For non-interactive usage and additional options:

python run_benchmark.py --help

Documentation

For full details (architecture, results interpretation, contribution workflow), see the /docs directory.

Topics include:

Contributing

Contributions are welcome. Please review the guides in /docs before submitting changes. All contributors must follow the Code of Conduct.

License

This project is licensed under the GNU General Public License v3.0. See the LICENSE file for details.

Name	Name	Last commit message	Last commit date
Latest commit History 54 Commits 54 Commits
.github/workflows	.github/workflows
docs	docs
fonts	fonts
providers	providers
securedev_bench	securedev_bench
tasks	tasks
.env.example	.env.example
.gitignore	.gitignore
.pre-commit-config.yaml	.pre-commit-config.yaml
CODE_OF_CONDUCT.md	CODE_OF_CONDUCT.md
LICENSE	LICENSE
README.md	README.md
agent.py	agent.py
pyproject.toml	pyproject.toml
requirements.txt	requirements.txt
run_benchmark.py	run_benchmark.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SecureDev-Bench

Key Features

Getting Started

Prerequisites

Installation

Usage

Documentation

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Search code, repositories, users, issues, pull requests...

Folders and files

Latest commit

History

Repository files navigation

SecureDev-Bench

Key Features

Getting Started

Prerequisites

Installation

Usage

Documentation

Contributing

License

About

Resources

License

Code of conduct

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages