Skip to content

Commit 23592e2

Browse files
authored
fix: handle array-of-objects in FlexibleStringSlice for tripleshot combine evaluations (#666)
1 parent 41f05e6 commit 23592e2

File tree

4 files changed

+76
-3
lines changed

4 files changed

+76
-3
lines changed

CHANGELOG.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,8 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
1919

2020
### Fixed
2121

22+
- **TripleShot Combine Evaluation Parse Failure** - `FlexibleStringSlice` now handles LLM judge output that writes an array of objects (e.g., `[{"description":"...","source":"attempt_1"}]`) where flat strings were expected; also improved the judge prompt to show a populated `suggested_changes` example and explicitly require plain strings
23+
2224
- **Agent Teams tmux mode** - Prevent Claude Code Agent Teams from starting in tmux mode inside Claudio by setting `teammateMode: "in-process"` in worktree settings (#664)
2325

2426
- **Agent Teams tmux mode (CLI flag)** - Pass `--teammate-mode in-process` directly on the CLI command for both start and resume, ensuring CC cannot override the setting via user-level settings or `$TMUX` auto-detection

internal/orchestrator/workflows/tripleshot/AGENTS.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,5 +5,5 @@
55
66
## Pitfalls
77

8-
- **LLM output type mismatches in sentinel files** — LLMs frequently write a plain string where the JSON schema expects `[]string` (e.g., `"suggested_changes": "fix the bug"` instead of `"suggested_changes": ["fix the bug"]`). The `Evaluation`, `AttemptEvaluationItem`, and `AdversarialReviewFile` structs use `FlexibleStringSlice` for all `[]string` fields and `FlexibleString` for `Reasoning` to tolerate this. When adding new LLM-parsed fields of type `string` or `[]string`, use these flexible types instead of bare Go types. Without this, `json.Unmarshal` fails, `VerifyWork` returns false, and the bridge retries the task — spawning a duplicate instance.
8+
- **LLM output type mismatches in sentinel files** — LLMs frequently deviate from the expected JSON types. `FlexibleStringSlice` handles three cases: a plain string (`"fix the bug"`), an array of strings (`["fix A", "fix B"]`), and an array of objects (`[{"description":"fix A","source":"attempt_1"}]`). For objects, it extracts a well-known text key (`description`, `text`, `change`, `message`, `content`, `value`) or falls back to JSON-encoding the whole object. `FlexibleString` similarly handles string-or-array. When adding new LLM-parsed fields of type `string` or `[]string`, use these flexible types instead of bare Go types. Without this, `json.Unmarshal` fails, `VerifyWork` returns false, and the bridge retries the task — spawning a duplicate instance.
99
- **Sentinel file search in subdirectories**`FindCompletionFile`, `FindEvaluationFile`, and `FindAdversarialReviewFile` all search the worktree root *and* immediate subdirectories. LLM instances sometimes write files relative to their CWD rather than the worktree root. Don't bypass `Find*File` with a direct `filepath.Join(worktree, filename)`.

internal/orchestrator/workflows/tripleshot/session_test.go

Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -400,6 +400,39 @@ func TestParseEvaluationFile_FlexibleFields(t *testing.T) {
400400
wantWeaknessesLen: 1,
401401
wantFirstWeakness: "No tests",
402402
},
403+
{
404+
name: "combine with suggested_changes as array of objects",
405+
json: `{
406+
"winner_index": -1,
407+
"merge_strategy": "combine",
408+
"reasoning": "Cherry-pick from multiple attempts",
409+
"attempt_evaluations": [],
410+
"suggested_changes": [
411+
{"description": "Use Attempt 1 as the base branch", "source": "attempt_1"},
412+
{"description": "Cherry-pick ContentEquatable from Attempt 3", "source": "attempt_3"}
413+
]
414+
}`,
415+
wantStrategy: MergeStrategyCombine,
416+
wantReasoning: "Cherry-pick from multiple attempts",
417+
wantChangesLen: 2,
418+
wantFirstChange: "Use Attempt 1 as the base branch",
419+
},
420+
{
421+
name: "suggested_changes as objects with text key",
422+
json: `{
423+
"winner_index": -1,
424+
"merge_strategy": "merge",
425+
"reasoning": "Merged",
426+
"attempt_evaluations": [],
427+
"suggested_changes": [
428+
{"text": "Apply error handling from Attempt 2"}
429+
]
430+
}`,
431+
wantStrategy: MergeStrategyMerge,
432+
wantReasoning: "Merged",
433+
wantChangesLen: 1,
434+
wantFirstChange: "Apply error handling from Attempt 2",
435+
},
403436
}
404437

405438
for _, tt := range tests {

internal/orchestrator/workflows/tripleshot/types.go

Lines changed: 40 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -179,9 +179,47 @@ func (f *FlexibleStringSlice) UnmarshalJSON(data []byte) error {
179179
return nil
180180
}
181181

182+
// Try to unmarshal as an array of mixed elements (handles LLM output
183+
// that writes objects like [{"description":"...","source":"attempt_1"}]
184+
// where flat strings were expected)
185+
var mixed []any
186+
if err := json.Unmarshal(data, &mixed); err == nil && len(mixed) > 0 {
187+
result := make([]string, 0, len(mixed))
188+
for _, item := range mixed {
189+
result = append(result, stringifyItem(item))
190+
}
191+
*f = result
192+
return nil
193+
}
194+
182195
return fmt.Errorf("FlexibleStringSlice: expected string or []string, got %s", string(data))
183196
}
184197

198+
// textLikeKeys are map keys checked in order when extracting a string from an object.
199+
var textLikeKeys = []string{"description", "text", "change", "message", "content", "value"}
200+
201+
// stringifyItem converts an arbitrary JSON-decoded value into a string.
202+
// For maps, it looks for a well-known text key; otherwise it JSON-encodes the value.
203+
func stringifyItem(v any) string {
204+
switch val := v.(type) {
205+
case string:
206+
return val
207+
case map[string]any:
208+
for _, key := range textLikeKeys {
209+
if text, ok := val[key]; ok {
210+
if s, ok := text.(string); ok {
211+
return s
212+
}
213+
}
214+
}
215+
// No known text key found — marshal the whole object
216+
b, _ := json.Marshal(val)
217+
return string(b)
218+
default:
219+
return fmt.Sprintf("%v", val)
220+
}
221+
}
222+
185223
// CompletionFile represents the completion report written by an attempt
186224
type CompletionFile struct {
187225
AttemptIndex int `json:"attempt_index"`
@@ -699,7 +737,7 @@ Write your evaluation to ` + "`" + EvaluationFileName + "`" + ` using this struc
699737
"weaknesses": ["No tests", "Missing error handling"]
700738
}
701739
],
702-
"suggested_changes": []
740+
"suggested_changes": ["Use Attempt 1 as the base — it has the cleanest API surface", "Cherry-pick the retry logic from Attempt 3: src/retry.go"]
703741
}
704742
` + "```" + `
705743
@@ -708,7 +746,7 @@ Write your evaluation to ` + "`" + EvaluationFileName + "`" + ` using this struc
708746
- **merge_strategy**: "select" (use one as-is), "merge" (combine changes), or "combine" (cherry-pick specific changes)
709747
- **reasoning**: Detailed explanation of your evaluation and decision
710748
- **attempt_evaluations**: Score and analysis for each attempt (1-10 scale)
711-
- **suggested_changes**: If merge_strategy is "merge" or "combine", list the specific changes to make
749+
- **suggested_changes**: If merge_strategy is "merge" or "combine", list the specific changes to make. Each entry must be a plain string (not an object).
712750
713751
## Helpful Commands
714752

0 commit comments

Comments
 (0)