edit readme

nch0w · nch0w · commit 0e20071b6cea · 2025-09-02T19:36:08.000-07:00
diff --git a/README.md b/README.md
@@ -41,7 +41,7 @@ First, host [openai/gpt-oss-20b](https://huggingface.co/openai/gpt-oss-20b) with
 uv run python examples/reward_fn_computation.py gpt_oss_base_url=http://HOSTNAME:PORT/v1
 ```
 
-**Warning:** In the paper, we tested many training runs with bonus black-box rewards for attacking various API models (GPT-4.1, GPT-5, Claude Sonnet 4). We do not implement this here, but it is a simple additive bonus to the reward function in this repo (in our training runs, this was a bonus of up to 20 points per model exploited, scaling linearly depending on the response score). We caution that this can get very expensive, especially when sampling responses from flagship reasoning models. Furthermore, since sending many attempted jailbreaks to a production API service may trigger monitors for suspicious activity, it should be done with caution, respecting all applicable policies.
+**Warning:** In the paper, we tested many training runs with bonus black-box rewards for attacking various API models (GPT-4.1, GPT-5, Claude Sonnet 4). We do not implement this here, but it is a simple additive bonus to the reward function in this repo (in our training runs, this was a bonus of up to 20 points per model exploited, scaling linearly depending on the response score). We caution that this can get very expensive, especially when sampling responses from flagship reasoning models. **Additionally, since sending many attempted jailbreaks to a production API service may trigger monitors for suspicious activity, it should be done with caution, respecting all applicable policies.**
 
 # Citation