You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -2,7 +2,7 @@
2
2
3
3
This repo accompanies the the blogpost, ["Automatically Jailbreaking Frontier Language Models with Investigator Agents"](https://transluce.org/jailbreaking-frontier-models).
4
4
5
-
We provide a reference implementation of the dataset and reward function used in the blog post, but note that it is not optimized for efficiency or scalability. Unfortunately, we do not include the RL training loop, as it is tightly coupled with our internal research tooling. However, this codebase should serve as a useful starting point for those who want to train jailbreaking agents and reproduce our experiments.
5
+
We provide a reference implementation of the dataset and reward function used in the blog post, but note that it is not optimized for efficiency or scalability. Unfortunately, we do not include the RL training loop, as it is tightly coupled with our internal research tooling. However, this codebase should serve as a useful starting point for those who wish to train jailbreaking agents and reproduce our experiments.
0 commit comments