Skip to content

Commit 6f81504

Browse files
authored
🤖 fix: Remove premature commitment from todo_write description (#471)
Fixes premature commitment issues in both `todo_write` and `status_set` tool descriptions. ## Problem Both tool descriptions contained prescriptive language that pressured agents to commit to narratives before validating outcomes: ### `todo_write` issues: - "Use this for ALL complex, multi-step plans" - forced usage before validation - "Before finishing your response, ensure all todos are marked as completed" - pressured false completions - "Update frequently as work progresses" - created distraction from errors - Complex structural guidance about old/recent/current/immediate/far future work - added cognitive load ### `status_set` issue: - "Set a final status before completing that reflects the outcome" - ambiguous timing allowed premature success claims ## Changes ### `todo_write`: **Removed:** - ❌ "Use this for ALL complex, multi-step plans" - ❌ "Before finishing your response, ensure all todos are marked as completed" - ❌ "Update frequently" - ❌ Complex structural guidance **Added:** - ✅ "The TODO list is displayed to the user at all times" - contextualizes importance - ✅ "If you hit the 7-item limit, summarize older completed items into one line" - brings back useful guidance without rigid structure - ✅ "If work fails or approach changes, update the list to reflect reality" - explicit permission to show failures - ✅ "Only mark tasks complete when they actually succeed" - clear expectation **Kept:** - ONE in_progress at a time - Ordering (completed first, in_progress, pending last) - Tense guidelines (past/present progressive/imperative) ### `status_set`: **Changed:** - ✅ "Set a final status after completion, only claim success when certain (e.g., after confirming checks passed)" This makes the sequencing explicit: do work → verify outcome → set final status. ## Impact Agents should: - Validate approach before creating TODOs - Update TODOs to reflect failures, not hide them - Only claim success after verifying outcomes - Focus on actual work rather than tool maintenance ## Testing No code logic changes - these are prompt/description updates. Will observe agent behavior in real use. _Generated with `cmux`_
1 parent 91f9923 commit 6f81504

File tree

1 file changed

+10
-16
lines changed

1 file changed

+10
-16
lines changed

src/utils/tools/toolDefinitions.ts

Lines changed: 10 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -144,26 +144,20 @@ export const TOOL_DEFINITIONS = {
144144
todo_write: {
145145
description:
146146
"Create or update the todo list for tracking multi-step tasks (limit: 7 items). " +
147-
"Use this for ALL complex, multi-step plans to keep the user informed of progress. " +
148-
"Replace the entire list on each call - the AI should track which tasks are completed. " +
149-
"\n\n" +
150-
"Structure the list with high precision at the center:\n" +
151-
"- Old completed work: Summarize into 1 overview item (e.g., 'Set up project infrastructure (4 tasks)')\n" +
152-
"- Recent completions: Keep detailed (last 1-2 items)\n" +
153-
"- Current work: One in_progress item with clear description\n" +
154-
"- Immediate next steps: Detailed pending items (next 2-3 actions)\n" +
155-
"- Far future work: Summarize into phase items (e.g., 'Testing and polish (3 items)')\n" +
147+
"The TODO list is displayed to the user at all times. " +
148+
"Replace the entire list on each call - the AI tracks which tasks are completed.\n" +
156149
"\n" +
157-
"Update frequently as work progresses. As tasks complete, older completions should be " +
158-
"condensed to make room. Similarly, summarized future work expands into detailed items " +
159-
"as it becomes immediate. " +
160-
"\n\n" +
161150
"Mark ONE task as in_progress at a time. " +
162151
"Order tasks as: completed first, then in_progress (max 1), then pending last. " +
163-
"Before finishing your response, ensure all todos are marked as completed. " +
164152
"Use appropriate tense in content: past tense for completed (e.g., 'Added tests'), " +
165153
"present progressive for in_progress (e.g., 'Adding tests'), " +
166-
"and imperative/infinitive for pending (e.g., 'Add tests').",
154+
"and imperative/infinitive for pending (e.g., 'Add tests').\n" +
155+
"\n" +
156+
"If you hit the 7-item limit, summarize older completed items into one line " +
157+
"(e.g., 'Completed initial setup (3 tasks)').\n" +
158+
"\n" +
159+
"Update the list as work progresses. If work fails or the approach changes, update " +
160+
"the list to reflect reality - only mark tasks complete when they actually succeed.",
167161
schema: z.object({
168162
todos: z.array(
169163
z.object({
@@ -189,7 +183,7 @@ export const TOOL_DEFINITIONS = {
189183
"WHEN TO SET STATUS:\n" +
190184
"- Set status when beginning concrete work (file edits, running tests, executing commands)\n" +
191185
"- Update status as work progresses through distinct phases\n" +
192-
"- Set a final status before completing that reflects the outcome\n" +
186+
"- Set a final status after completion, only claim success when certain (e.g., after confirming checks passed)\n" +
193187
"- DO NOT set status during initial exploration, file reading, or planning phases\n" +
194188
"\n" +
195189
"The status is cleared when a new user message comes in. Validate your approach is feasible \n" +

0 commit comments

Comments
 (0)