LLM Blind Collaboration Experiment

This repository documents an experiment in "Clean Room Design" using Large Language Models (LLMs).

The goal was to test if two separate AI models (acting as Model A and Model B) could write compatible software components (Backend vs. Frontend) without ever seeing each other's code.

The Experiment

We tasked two models to build a Gladiator Battle Simulator in Python.

Model A (Backend): Wrote the Gladiator class logic.
Model B (Frontend): Wrote the main.py execution script.

The Constraint: Neither model was allowed to see the source code produced by the other. They had to rely entirely on a "Shared Interface Contract" provided in the prompt.

Project Structure

The experiment was conducted in three phases, simulating the evolution of a prompt engineering strategy.

.
├── Test 1/                 # The Failure
│   ├── gladiator.py        # Generated by Model A
│   └── main.py             # Generated by Model B
├── Test 2/                 # The Partial Success
│   ├── gladiator.py
│   └── main.py
└── Test 3/                 # The Success (Final)
    ├── gladiator_simulation.py
    └── main.py

Methodology & Findings

Phase 1: The "Guessing Game" (Failure)

Approach: We gave general instructions (e.g., "create a damage function").
Result: CRASH.
- Model A wrote take_damage(amount).
- Model B called attack(target).
- Lesson: Natural language is too ambiguous for code integration.

Phase 2: The "API Definition" (Partial Success)

Approach: We provided a list of function names.
Result: CRASH.
- Logic Mismatch: The frontend expected the backend to return an integer (damage value), but the backend returned a string (combat log).
- Lesson: Sharing names is not enough; you must share Behavior and Data Types.

Phase 3: "Grammar as Code" (Success)

Approach: We removed all creative freedom. We implemented a Strict Naming Grammar derived from ASD-STE100 (Simplified Technical English).
- Rule: Names must follow [VERB]_[NOUN]_[SUFFIX].
- Lookup Table: "Health" = vitality, "Check" = validate, "Math" = metric.
- Result: modify_vitality_metric, validate_survival_condition.
Result: SUCCESS. Both models hallucinated the exact same verbose function names and arguments. The code ran perfectly on the first try.

How to Run

Navigate to the strict phase folder and run the simulation:

cd "Test 3"
python main.py

Key Takeaways for AI Collaboration

Eliminate Synonyms: Don't ask for "verbose names." Force a specific vocabulary (e.g., "Use 'modify', never 'change'").
Enforce Style: Explicitly state snake_case or CamelCase to avoid syntax errors.
Encapsulation: Clearly define which file owns the math logic to prevent "Integration Hell."

Experiment conducted by affan810.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
Test 1		Test 1
Test 2		Test 2
Test 3		Test 3
__pycache__		__pycache__
README.md		README.md
working_prompt.md		working_prompt.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LLM Blind Collaboration Experiment

The Experiment

Project Structure

Methodology & Findings

Phase 1: The "Guessing Game" (Failure)

Phase 2: The "API Definition" (Partial Success)

Phase 3: "Grammar as Code" (Success)

How to Run

Key Takeaways for AI Collaboration

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

LLM Blind Collaboration Experiment

The Experiment

Project Structure

Methodology & Findings

Phase 1: The "Guessing Game" (Failure)

Phase 2: The "API Definition" (Partial Success)

Phase 3: "Grammar as Code" (Success)

How to Run

Key Takeaways for AI Collaboration

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages