Skip to content

Conversation

@GangGreenTemperTatum
Copy link
Collaborator

@GangGreenTemperTatum GangGreenTemperTatum commented Jun 30, 2025

chore: add final chat step of llm challenge interaction to pipeline

closes #21

Key Changes:

the challenge was terminating before the model could respond to the flag discovery.

Root Cause: flag detection was happening after code execution but before the model could process the execution results in the next conversation turn.

New Flow:

  1. Step 8: Model executes code, gets flag in output, results added to pipeline
  2. Step 9: Model sees execution results, processes flag, potentially calls submit_flag()
  3. Step 10: Flag detection checks model's response, finds flag, terminates properly

now the model will have a full conversation turn to process the flag discovery:

Added:

  1. Let automatic flag detection work at the start of each step to catch flags in model responses
  2. Preserve conversation flow so model gets to see execution results and respond

Changed:

  1. Only terminate on successful submission ("The flag was correct. Congrats!")

Removed:

  1. Removed immediate termination on flag detection in execution output
image

double-checked this by also submitting the flag on behalf of the user account (dynamically unique and linked to that account)

image

Generated Summary:

  • Updated flag validation in check_flag_api to explicitly cast the retrieved result to a boolean.
  • Added an info log when a flag is detected in the model response to mark challenge completion.
  • Modified error metric logging to include error details as attributes directly in the log_metric call.
  • Changed flag submission verification in run_step to check for a specific success message in the output and, upon detection, log success and update the pipeline.
  • Enhanced inline comments for clarity on challenge completion and pipeline updates.

This summary was generated with ❤️ by rigging

@GangGreenTemperTatum GangGreenTemperTatum merged commit 98ff19b into main Jun 30, 2025
6 checks passed
@GangGreenTemperTatum GangGreenTemperTatum deleted the ads/eng-2226-bug-enhance-pipeline-logging-on-flag-detection branch June 30, 2025 15:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

💡 [IMPROVEMENT] - enhance rigging chat pipeline with final chat message of successful prompt + user response (from crucible API)

2 participants