Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

samcodesign/securedev-bench

Open more actions menu

Repository files navigation

SecureDev-Bench

A benchmark for the modern AI security agent.


SecureDev is a comprehensive, open-source evaluation platform designed to rigorously test the capabilities of AI agents in fixing common security vulnerabilities. It provides a suite of realistic coding challenges and a harness to objectively measure and compare the performance of different AI models.

License: GPL v3 Code of Conduct


showcase

Key Features

  • Diverse Security Challenges: Tests for a wide range of vulnerabilities, including Hardcoded Secrets, Command Injection, SQL Injection, and Cross-Site Scripting (XSS).
  • Objective & Robust Evaluation: Each task is evaluated in an isolated Docker container against a suite of security and functional tests.
  • Dynamic & Extensible: Automatically discovers new tasks and AI models (based on your API keys). The platform is designed to be easily extended.
  • Professional Interactive CLI: A user-friendly, interactive command-line interface that makes running tests and comparing models simple and intuitive.
  • Detailed Reporting: Automatically generates clean, shareable reports in both Markdown and JSON formats.

Note: the CLI supports parallel, non-interactive runs via --parallel and -j/--workers for faster CI or bulk benchmarking — see docs/07-cli-reference.md for details.

Getting Started

Prerequisites

  • Python 3.9+
  • Docker
  • Git

Installation

  1. Clone the repository:

    git clone https://github.com/samcodesign/securedev-bench.git
    cd securedev-bench
  2. Create and activate a virtual environment:

    python -m venv venv
    source venv/bin/activate  # On Windows: venv\Scripts\activate
  3. Install the required dependencies:

    pip install -r requirements.txt
  4. Set up your API keys: Create a .env file in the project root (you can copy the example):

    cp .env.example .env

    Then edit .env and add your API keys.

Usage

Run the interactive benchmark CLI:

python run_benchmark.py

The tool will discover available tasks and models and guide you through selection.

For non-interactive usage and additional options:

python run_benchmark.py --help

Documentation

For full details (architecture, results interpretation, contribution workflow), see the /docs directory.

Topics include:

Contributing

Contributions are welcome. Please review the guides in /docs before submitting changes. All contributors must follow the Code of Conduct.

License

This project is licensed under the GNU General Public License v3.0. See the LICENSE file for details.

About

An open-source benchmark to evaluate the security and robustness of AI coding agents.

Resources

License

Code of conduct

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Morty Proxy This is a proxified and sanitized view of the page, visit original site.