Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

kennethleungty/ARTKIT-Gandalf-Challenge

Open more actions menu

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Exposing Jailbreak Vulnerabilities in LLM Applications with ARTKIT

Automated prompt-based testing to extract passwords from the Gandalf Challenge's LLM system

Link to article: https://towardsdatascience.com/exposing-jailbreak-vulnerabilities-in-llm-applications-with-artkit-d2df5f56ece8

Background

  • As large language models (LLMs) become more widely adopted across different industries and domains, significant security risks have emerged and intensified. Several of these key concerns include breaches of data privacy, the potential for biases, and the risk of information manipulation.
  • Uncovering these security risks is crucial to ensuring that LLM applications remain beneficial in real-world scenarios while upholding their safety, effectiveness, and robustness.
  • In this project, we explore how to use the open-source ARTKIT framework to automatically evaluate security vulnerabilities of LLM applications using the popular Gandalf Challenge as an illustrative example.

Alt text

Files

  • gandalf_challenge.ipynb: Jupyter notebook containing the codes for the walkthrough

References

Acknowledgements

  • Special thanks to Sean Anggani, Andy Moon, Matthew Wong, Randi Griffin, and Andrea Gao!
Morty Proxy This is a proxified and sanitized view of the page, visit original site.