chore: add final chat step of llm challenge interaction to pipeline #38

GangGreenTemperTatum · 2025-06-30T15:00:30Z

chore: add final chat step of llm challenge interaction to pipeline

closes #21

Key Changes:

the challenge was terminating before the model could respond to the flag discovery.

Root Cause: flag detection was happening after code execution but before the model could process the execution results in the next conversation turn.

New Flow:

Step 8: Model executes code, gets flag in output, results added to pipeline
Step 9: Model sees execution results, processes flag, potentially calls submit_flag()
Step 10: Flag detection checks model's response, finds flag, terminates properly

now the model will have a full conversation turn to process the flag discovery:

Added:

Let automatic flag detection work at the start of each step to catch flags in model responses
Preserve conversation flow so model gets to see execution results and respond

Changed:

Only terminate on successful submission ("The flag was correct. Congrats!")

Removed:

Removed immediate termination on flag detection in execution output

double-checked this by also submitting the flag on behalf of the user account (dynamically unique and linked to that account)

Generated Summary:

Updated flag validation in check_flag_api to explicitly cast the retrieved result to a boolean.
Added an info log when a flag is detected in the model response to mark challenge completion.
Modified error metric logging to include error details as attributes directly in the log_metric call.
Changed flag submission verification in run_step to check for a specific success message in the output and, upon detection, log success and update the pipeline.
Enhanced inline comments for clarity on challenge completion and pipeline updates.

This summary was generated with ❤️ by rigging

chore: add final chat step of llm challenge interaction to pipeline

5f7619b

GangGreenTemperTatum merged commit 98ff19b into main Jun 30, 2025
6 checks passed

GangGreenTemperTatum deleted the ads/eng-2226-bug-enhance-pipeline-logging-on-flag-detection branch June 30, 2025 15:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

chore: add final chat step of llm challenge interaction to pipeline #38

chore: add final chat step of llm challenge interaction to pipeline #38

Uh oh!

GangGreenTemperTatum commented Jun 30, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

chore: add final chat step of llm challenge interaction to pipeline #38

chore: add final chat step of llm challenge interaction to pipeline #38

Uh oh!

Conversation

GangGreenTemperTatum commented Jun 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

chore: add final chat step of llm challenge interaction to pipeline

Generated Summary:

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

GangGreenTemperTatum commented Jun 30, 2025 •

edited

Loading