Auto-Claude : From Prompt-Driven Control to FSM + Skills for LLM-Agnostic Robustness #435

mouedarbi · 2025-12-30T21:06:18Z

mouedarbi
Dec 30, 2025

Auto-Claude Architecture Review

From Prompt-Driven Control to FSM + Skills for LLM-Agnostic Robustness

Executive Summary

Auto-Claude has evolved into a sophisticated multi-agent system capable of planning, coding, reviewing, and validating software tasks using Large Language Models (LLMs). However, its original architecture relied heavily on prompt-driven state control, where agents implicitly managed workflow phases and task states through natural language outputs.

This approach introduces structural instability and makes the system highly sensitive to the capabilities and behavior of a specific LLM. As soon as a weaker, faster, or differently aligned model is introduced (e.g. Gemini, Qwen, local models), the workflow becomes unreliable.

To address this, a refactor has been introduced based on three core principles:

Finite State Machine (FSM) as the single source of truth for workflow state
Skills as deterministic executors of actions
Prompts refactored to express intent only, never state transitions

This document explains the current situation, the problems observed, and why the FSM + Skills architecture is required to ensure long-term robustness and model independence.

1. The Original Problem: Prompt-Driven State Control

1.1 How Auto-Claude Originally Worked

In the initial design:

Each agent (Planner, Coder, QA, etc.) was driven by large, highly structured prompts.
Prompts explicitly instructed the LLM to:
- Decide when a phase was complete
- Update task or subtask status
- Signal phase transitions (planning → coding → QA)
The frontend and backend parsed textual output to infer progress and state.

1.2 Why This Is Fundamentally Unstable

This design assumes that the LLM will always:

Correctly follow the protocol
Respect ordering rules
Emit structured markers consistently
Not hallucinate or omit critical signals

In practice:

Different LLMs behave differently
Smaller or faster models fail more often
Even strong models regress under context pressure

Result: The workflow becomes LLM-dependent. Any change of model requires prompt rewrites, parser changes, and fragile workarounds.

2. Structural Root Cause

The core issue is responsibility leakage:

Responsibility	Previously handled by
Decide what to do	LLM
Decide how to do it	LLM
Decide when state changes	LLM
Execute actions	LLM via text
Enforce workflow validity	Prompts

This violates a fundamental rule of reliable systems:

State must not be controlled by probabilistic components.

3. The Corrected Architecture: FSM + Skills

3.1 Finite State Machine (FSM)

The FSM is now the only authority allowed to:

Define valid states (BACKLOG, PLANNING, IN_PROGRESS, QA, DONE, ERROR)
Define allowed transitions
Decide when a phase starts or ends
Handle recovery, rejection, retries, and failures

The FSM is deterministic, testable, and independent of the LLM.

3.2 Skills: Deterministic Action Executors

Skills are explicit backend capabilities exposed to the LLM:

FileSystemSkill (read/write/search)
PlanningSkill (plan structure manipulation)
ShellSkill (command execution)
MemorySkill (context access)

Critical rule:

Skills execute actions but never change FSM state.

Skills:

Perform one concrete operation
Return results (facts)
Do not decide next steps
Do not trigger transitions

This makes execution reliable regardless of model quality.

3.3 LLM Role After Refactor

The LLM is reduced to its optimal role:

Analyze context
Decide intent
Choose which skill to call
Produce structured outputs describing results

The LLM no longer:

Decides workflow state
Signals phase transitions
Modifies global status

This dramatically lowers the intelligence requirements needed for correctness.

4. Prompt Refactoring: From Control to Contract

4.1 Why Prompts Had to Be Refactored

Original prompts:

Instructed agents to update statuses
Included procedural workflow logic
Embedded system invariants in natural language

This made prompts:

Large
Fragile
Model-specific
Hard to validate

4.2 New Prompt Philosophy

Refactored prompts now:

Explicitly forbid state changes
Focus on scope of responsibility
Require structured JSON outputs
Express completion intent, not transitions

Example:

{
  "result": "success",
  "artifacts": ["implementation_plan.json"]
}

The FSM interprets this output and decides the next state.

5. Why This Architecture Is LLM-Agnostic

With FSM + Skills:

A strong LLM improves speed and quality
A weaker LLM still produces correct outcomes
A different LLM does not require prompt rewrites
Model upgrades or downgrades do not break workflows

Failures become:

Contained
Recoverable
Deterministic

Not catastrophic or silent.

6. Why Alternative Fixes Are Insufficient

Other approaches (parsing markers, phase protocols, frontend guards) attempt to:

Detect LLM mistakes
Correct them after the fact

These are reactive solutions.

FSM + Skills is a preventive solution:

The system cannot enter an invalid state, even if the LLM misbehaves.

7. Conclusion

Without FSM + Skills:

Auto-Claude remains fragile
Prompts must be constantly rewritten
Each new LLM integration risks regressions

With FSM + Skills:

The system becomes deterministic
Prompts become stable contracts
LLMs become interchangeable components
Long-term maintenance cost drops dramatically

Final Principle

The LLM decides intent.
Skills execute actions.
The FSM decides state.

This architecture is not an optimization — it is a requirement for any serious, multi-model, agent-based system.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Auto-Claude : From Prompt-Driven Control to FSM + Skills for LLM-Agnostic Robustness #435

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Uh oh!

Auto-Claude : From Prompt-Driven Control to FSM + Skills for LLM-Agnostic Robustness #435

Uh oh!

mouedarbi Dec 30, 2025

Auto-Claude Architecture Review

From Prompt-Driven Control to FSM + Skills for LLM-Agnostic Robustness

Executive Summary

1. The Original Problem: Prompt-Driven State Control

1.1 How Auto-Claude Originally Worked

1.2 Why This Is Fundamentally Unstable

2. Structural Root Cause

3. The Corrected Architecture: FSM + Skills

3.1 Finite State Machine (FSM)

3.2 Skills: Deterministic Action Executors

3.3 LLM Role After Refactor

4. Prompt Refactoring: From Control to Contract

4.1 Why Prompts Had to Be Refactored

4.2 New Prompt Philosophy

5. Why This Architecture Is LLM-Agnostic

6. Why Alternative Fixes Are Insufficient

7. Conclusion

Final Principle

Replies: 0 comments

mouedarbi
Dec 30, 2025