Exposing Jailbreak Vulnerabilities in LLM Applications with ARTKIT

Automated prompt-based testing to extract passwords from the Gandalf Challenge's LLM system

As large language models (LLMs) become more widely adopted across different industries and domains, significant security risks have emerged and intensified. Several of these key concerns include breaches of data privacy, the potential for biases, and the risk of information manipulation.
Uncovering these security risks is crucial to ensuring that LLM applications remain beneficial in real-world scenarios while upholding their safety, effectiveness, and robustness.
In this project, we explore how to use the open-source ARTKIT framework to automatically evaluate security vulnerabilities of LLM applications using the popular Gandalf Challenge as an illustrative example.

gandalf_challenge.ipynb: Jupyter notebook containing the codes for the walkthrough

Special thanks to Sean Anggani, Andy Moon, Matthew Wong, Randi Griffin, and Andrea Gao!

Name	Name	Last commit message	Last commit date
Latest commit History 9 Commits
assets	assets
.env.template	.env.template
.gitignore	.gitignore
LICENSE	LICENSE
README.md	README.md
gandalf_challenge.ipynb	gandalf_challenge.ipynb
requirements.txt	requirements.txt