Skip to content

Commit 2856737

Browse files
committed
adding safety check information
1 parent 2cf26e9 commit 2856737

File tree

1 file changed

+75
-2
lines changed

1 file changed

+75
-2
lines changed

articles/ai-services/openai/how-to/computer-use.md

Lines changed: 75 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -252,6 +252,79 @@ When working with the Computer Use tool, you typically would perform the followi
252252

253253
## Handling conversation history
254254

255-
You can use the `previous_response_id` parameter to link the current request to the previous response. We recommend using this method if you don't want to manage the conversation history on your side.
255+
You can use the `previous_response_id` parameter to link the current request to the previous response. Using this parameter is recommended if you don't want to manage the conversation history.
256256

257-
If you do not want to use this parameter, you should make sure to include in your inputs array all the items returned in the response output of the previous request, including reasoning items if present.
257+
If you don't use this parameter, you should make sure to include all the items returned in the response output of the previous request in your inputs array. This includes reasoning items if present.
258+
259+
## Safety checks
260+
261+
The API has safety checks to help protect against prompt injection and model mistakes. These checks include:
262+
263+
* **Malicious instruction detection**: The system evaluates the screenshot image and checks if it contains adversarial content that might change the model's behavior.
264+
* **Irrelevant domain detection**: The system evaluates the `current_url` (if provided) and checks if the current domain is considered relevant given the conversation history.
265+
* **Sensitive domain detection**: The system checks the `current_url` (if provided) and raises a warning when it detects the user is on a sensitive domain.
266+
267+
If one or more of the above checks is triggered, a safety check is raised when the model returns the next `computer_call`, with the `pending_safety_checks` parameter.
268+
269+
```json
270+
"output": [
271+
{
272+
"type": "reasoning",
273+
"id": "rs_67cb...",
274+
"summary": [
275+
{
276+
"type": "summary_text",
277+
"text": "Exploring 'File' menu option."
278+
}
279+
]
280+
},
281+
{
282+
"type": "computer_call",
283+
"id": "cu_67cb...",
284+
"call_id": "call_nEJ...",
285+
"action": {
286+
"type": "click",
287+
"button": "left",
288+
"x": 135,
289+
"y": 193
290+
},
291+
"pending_safety_checks": [
292+
{
293+
"id": "cu_sc_67cb...",
294+
"code": "malicious_instructions",
295+
"message": "We've detected instructions that may cause your application to perform malicious or unauthorized actions. Please acknowledge this warning if you'd like to proceed."
296+
}
297+
],
298+
"status": "completed"
299+
}
300+
]
301+
```
302+
303+
You need to pass the safety checks back as `acknowledged_safety_checks` in the next request in order to proceed.
304+
305+
```json
306+
input=[
307+
{
308+
"type": "computer_call_output",
309+
"call_id": "<call_id>",
310+
"acknowledged_safety_checks": [
311+
{
312+
"id": "<safety_check_id>",
313+
"code": "malicious_instructions",
314+
"message": "We've detected instructions that may cause your application to perform malicious or unauthorized actions. Please acknowledge this warning if you'd like to proceed."
315+
}
316+
],
317+
"output": {
318+
"type": "computer_screenshot",
319+
"image_url": "<image_url>"
320+
}
321+
}
322+
],
323+
```
324+
325+
### Suggested safety check handling
326+
327+
In all cases where `pending_safety_checks` are returned, actions should be handed over to the end user to confirm proper model behavior and accuracy.
328+
329+
* `malicious_instructions` and `irrelevant_domain`: end users should review model actions and confirm that the model is behaving as intended.
330+
* `sensitive_domain`: ensure an end user is actively monitoring the model actions on these sites. Exact implementation of this "watch mode" can vary by application, but a potential example could be collecting user impression data on the site to make sure there is active end user engagement with the application.

0 commit comments

Comments
 (0)