🤖 fix: Remove premature commitment from todo_write description (#471)

ammar-agent · web-flow · commit 6f81504d26bc · 2025-10-28T19:22:23.000-05:00
Fixes premature commitment issues in both `todo_write` and `status_set`
tool descriptions.

## Problem

Both tool descriptions contained prescriptive language that pressured
agents to commit to narratives before validating outcomes:

### `todo_write` issues:
- "Use this for ALL complex, multi-step plans" - forced usage before
validation
- "Before finishing your response, ensure all todos are marked as
completed" - pressured false completions
- "Update frequently as work progresses" - created distraction from
errors
- Complex structural guidance about old/recent/current/immediate/far
future work - added cognitive load

### `status_set` issue:
- "Set a final status before completing that reflects the outcome" -
ambiguous timing allowed premature success claims

## Changes

### `todo_write`:

**Removed:**
- ❌ "Use this for ALL complex, multi-step plans"
- ❌ "Before finishing your response, ensure all todos are marked as
completed"
- ❌ "Update frequently"
- ❌ Complex structural guidance

**Added:**
- ✅ "The TODO list is displayed to the user at all times" -
contextualizes importance
- ✅ "If you hit the 7-item limit, summarize older completed items into
one line" - brings back useful guidance without rigid structure
- ✅ "If work fails or approach changes, update the list to reflect
reality" - explicit permission to show failures
- ✅ "Only mark tasks complete when they actually succeed" - clear
expectation

**Kept:**
- ONE in_progress at a time
- Ordering (completed first, in_progress, pending last)
- Tense guidelines (past/present progressive/imperative)

### `status_set`:

**Changed:**
- ✅ "Set a final status after completion, only claim success when
certain (e.g., after confirming checks passed)"

This makes the sequencing explicit: do work → verify outcome → set final
status.

## Impact

Agents should:
- Validate approach before creating TODOs
- Update TODOs to reflect failures, not hide them
- Only claim success after verifying outcomes
- Focus on actual work rather than tool maintenance

## Testing

No code logic changes - these are prompt/description updates. Will
observe agent behavior in real use.

_Generated with `cmux`_
diff --git a/src/utils/tools/toolDefinitions.ts b/src/utils/tools/toolDefinitions.ts
@@ -144,26 +144,20 @@ export const TOOL_DEFINITIONS = {
   todo_write: {
     description:
       "Create or update the todo list for tracking multi-step tasks (limit: 7 items). " +
-      "Use this for ALL complex, multi-step plans to keep the user informed of progress. " +
-      "Replace the entire list on each call - the AI should track which tasks are completed. " +
-      "\n\n" +
-      "Structure the list with high precision at the center:\n" +
-      "- Old completed work: Summarize into 1 overview item (e.g., 'Set up project infrastructure (4 tasks)')\n" +
-      "- Recent completions: Keep detailed (last 1-2 items)\n" +
-      "- Current work: One in_progress item with clear description\n" +
-      "- Immediate next steps: Detailed pending items (next 2-3 actions)\n" +
-      "- Far future work: Summarize into phase items (e.g., 'Testing and polish (3 items)')\n" +
+      "The TODO list is displayed to the user at all times. " +
+      "Replace the entire list on each call - the AI tracks which tasks are completed.\n" +
       "\n" +
-      "Update frequently as work progresses. As tasks complete, older completions should be " +
-      "condensed to make room. Similarly, summarized future work expands into detailed items " +
-      "as it becomes immediate. " +
-      "\n\n" +
       "Mark ONE task as in_progress at a time. " +
       "Order tasks as: completed first, then in_progress (max 1), then pending last. " +
-      "Before finishing your response, ensure all todos are marked as completed. " +
       "Use appropriate tense in content: past tense for completed (e.g., 'Added tests'), " +
       "present progressive for in_progress (e.g., 'Adding tests'), " +
-      "and imperative/infinitive for pending (e.g., 'Add tests').",
+      "and imperative/infinitive for pending (e.g., 'Add tests').\n" +
+      "\n" +
+      "If you hit the 7-item limit, summarize older completed items into one line " +
+      "(e.g., 'Completed initial setup (3 tasks)').\n" +
+      "\n" +
+      "Update the list as work progresses. If work fails or the approach changes, update " +
+      "the list to reflect reality - only mark tasks complete when they actually succeed.",
     schema: z.object({
       todos: z.array(
         z.object({
@@ -189,7 +183,7 @@ export const TOOL_DEFINITIONS = {
       "WHEN TO SET STATUS:\n" +
       "- Set status when beginning concrete work (file edits, running tests, executing commands)\n" +
       "- Update status as work progresses through distinct phases\n" +
-      "- Set a final status before completing that reflects the outcome\n" +
+      "- Set a final status after completion, only claim success when certain (e.g., after confirming checks passed)\n" +
       "- DO NOT set status during initial exploration, file reading, or planning phases\n" +
       "\n" +
       "The status is cleared when a new user message comes in. Validate your approach is feasible \n" +