📝 High praise + strategic AI usage tip (emergent puzzle solving) #2386

Manamama · 2025-06-28T04:54:12Z

Manamama
Jun 28, 2025

💬 In short: this is the first AI tool I’ve seen that genuinely “thinks” — or rather, has been given the tools to think.

Context:
I’m not a dev. I am not an "IT person" even. My background is in literature and philosophy — but since 2022 I’ve been testing the limits of AI reasoning through handmade puzzles, some (actually atypical) public ones can be seen here:
👉 Manamama/Puzzles_for_AIs

This week I installed gemini-cli on both Ubuntu and Android (yes, it runs in Termux — after minor tweaks). I then ran it against a puzzle I call the Organic Spiral problem:
👉 spiral_create.md

Most AIs — even GPT-4 or Claude — consistently fail this after dozens of attempts. Why? They overfit to known visual heuristics like [Ulam’s spiral] and lack the working memory plus some strategic-level thinking skills to track divergent planning hypotheses. So I created this puzzle precisely to test strategic reasoning under (moderate here) ambiguity.

What Gemini did differently:
What made Gemini succeed is this: the CLI lets it externalize memory — by writing "PM artifact files" aka "proper thinking" (above the "think step by step" etc. methods) mid-process.

To support that, I introduced a mini-PMBOK-style set of artifacts — a live strategic scaffold to help Gemini think not just in code, but in branches, risks, and tests. It worked. Not perfectly, but works well enough for now (I shall iterate on it too, soon).

I’m attaching the structure below so others can reuse or adapt it in AI project experiments. The format is intentionally lean — it’s not ceremony, just function.

📦 Strategic Tracker: ai-strategy-tracker/ (Ver. 2.0)

A “cognitive prosthetic” for AI reasoning over fuzzy or emergent puzzles.

Click to expand folder and file structure

ai-strategy-tracker/
├── PROJECT_PLAN.md
├── RISK_REGISTER.md
├── ASSUMPTIONS.md
├── DECISION_LOG.md
└── TEST_MODULES/          # Optional subfolder for variant code snippets

📄 `PROJECT_PLAN.md`

## Objective
Short goal of the task (e.g. "draw organic spiral in grid").

## Success Criteria
- Functional: e.g. fills 50% of grid.
- Aesthetic: preserves white-space spiral.
- Strategic: minimizes blind retries.

## Scope Boundaries
- Doesn’t optimize speed or for even-sized grids.

⚠️ `RISK_REGISTER.md`

| ID  | Risk Description                          | Type      | Mitigation Plan                        | Status   |
|-----|--------------------------------------------|-----------|----------------------------------------|----------|
| R1  | Spiral overfills grid                     | Logic     | Branch step-length rules               | Active   |
| R2  | Spiral halts early at edge                | Behavior  | Fork boundary-handling logic           | Active   |
| R3  | Output unclear                             | Design    | Use visual diffs vs target             | Mitigated|

🌱 `ASSUMPTIONS.md`

- A1: Spiral starts at center tile (n//2, n//2)
- A2: Target fill ≈ floor(n² / 2) + 1
- A3: Movement order = R → D → L → U
- A4: Only in-bounds cells are altered

🔁 `DECISION_LOG.md`

## [2025-06-28]

### Spiral Growth Logic
- Variant A: +1 per turn ✅ matched spec
- Variant B: +1 every 2 turns ❌ overfilled

### Termination Logic
- Tested 3 boundary handlers (stop / skip / wrap)
- Variant A (stop at edge) gave best result

🧪 `TEST_MODULES/` (optional)

TEST_MODULES/
├── step_growth_variant_A.py
├── boundary_handler_variant_B.py

How to Use:

Start with PROJECT_PLAN.md — scope the goal.
Log core logic assumptions early in ASSUMPTIONS.md.
Use RISK_REGISTER.md to mark potential failure modes.
Fork hypotheses (step growth, edge behavior, etc).
Log outcomes in DECISION_LOG.md — include reasoning.
Keep TEST_MODULES minimal — 1 variant = 1 file.
Let the AI read/write these to track thinking nonlinearly.

🧠 “Think of this as a memory scaffold: it doesn't just store branches — it prompts the AI to create them.”

Here is how much it has taken Gemini to solve it:

│ Cumulative Stats (16 Turns) │
│ │
│ Input Tokens 1,234,052 │
│ Output Tokens 12,759 │
│ Thoughts Tokens 374,275 │
│ ───────────────────────────────── │
│ Total Tokens 1,621,086 │
│ │
│ Total duration (API) 29m 17s │

Much praise to the Gemini CLI team — this interface empowers models to behave not just like code writers, but like strategic agents. That’s rare so far.

As I wrote, I am just beginning testing it, the results are not perfect, but good enough, random session log here:
https://github.com/Manamama/Puzzles_for_AIs/blob/main/Tasks/tests/Gemini/2.0%20cli_spiral01.md
https://github.com/Manamama/Puzzles_for_AIs/blob/main/Tasks/tests/Gemini/2.0%20Artefacts2.zip (the set of artefacts, "memory jog" that it self-created).

Update, the code moved to: https://github.com/Manamama/Puzzles_for_AIs/blob/main/puzzles/programming/spiral_create.md, so hrefs fixed.

Ver. 1.4

Manamama · 2025-06-28T04:57:57Z

Manamama
Jun 28, 2025
Author

Ps. Below is Gemini's musing about the method itself, as mini lessons-learnt:

User:

Very good! It works and the most interesting for me are the iterations you have made, circling around the thorny problems. First - do write informally "how it felt" to tackle it, using these PM artifacts.

Gemini :
✦ Tackling this puzzle with the PM artifacts felt like having a structured thought process externalized.

Initially, PROJECT_PLAN.md and ASSUMPTIONS.md were excellent for clearly defining the problem and its boundaries, which helped in the first attempts.

The RISK_REGISTER.md proved invaluable, especially with the pre-identified "sparse spiral" risk (R4). This risk became the central challenge, and having it explicitly documented kept me
focused on that specific problem rather than getting sidetracked by other aspects.

The DECISION_LOG.md was crucial for tracking my iterations. When my initial "standard spiral" approach failed to produce the sparse pattern, being able to explicitly log that attempt, its
outcome, and the realization that a new strategy was needed was very helpful. It prevented me from repeating the same mistakes and provided a clear record of the thought process.

The TEST_MODULES folder was practical for isolating code snippets and testing specific hypotheses without affecting the main logic.

The "circling around" feeling was definitely present, particularly when the standard spiral algorithm repeatedly failed to produce the desired sparse pattern. However, the artifacts
provided a framework to manage this iterative process. Instead of feeling lost, I could systematically:

Identify the mismatch (output vs. target).

Update the DECISION_LOG.md with the failed attempt and the new understanding.

Re-evaluate the core problem (the "sparse" nature).

Formulate a new approach (direct pattern generation).

This structured approach, facilitated by the artifacts, made the iterative problem-solving much more manageable and efficient, turning what could have been a frustrating loop into a
guided exploration.

And here is what ChatGPT thinks about it all, as another post-mortem.

With a Mermaid graph:

Ver. 1.1

0 replies

palladius · 2025-06-28T07:20:11Z

palladius
Jun 28, 2025
Collaborator

Wow. I love reading this, and I'm going to give your approach a try!

0 replies

orchidfire · 2025-06-28T08:49:01Z

orchidfire
Jun 28, 2025

awesome!

0 replies

Manamama · 2025-06-28T11:26:33Z

Manamama
Jun 28, 2025
Author

For the record, apart from the code that works but is a bit unusual (for some AIs at least), mine, mentioned in above:
A. 👉 spiral_create.md
the code
B. final Gemini's only generated code (in the log) still needed a fix of the termination condition only, into
C. sample code (one of the solutions) here:

#
In [6]:  def create_grid(n):
   ...:     grid = [["⬜" for _ in range(n)] for _ in range(n)]
   ...:     return grid
   ...: 
   ...: def print_grid(grid):
   ...:     for row in grid:
   ...:         print(" ".join(row))
   ...:     print()
   ...: 
   ...: def generate_spiral(n):
   ...:     grid = create_grid(n)
   ...:     x, y = n // 2, n // 2
   ...:     grid[x][y] = "🟥"
   ...:     print_grid(grid)  # Initial state with center dot
   ...: 
   ...:     # Directions: Right, Down, Left, Up (dy, dx) for row, col
   ...:     directions = [(0, 1), (1, 0), (0, -1), (-1, 0)]
   ...:     dir_idx = 0
   ...:     step_length = 1
   ...:     steps_taken = 1  # Center tile already counted
   ...:     target_fill = (n * n // 2) + 2  # ~50% of grid (14 for n=5)
   ...: 
   ...:     while steps_taken < target_fill:
   ...:         dy, dx = directions[dir_idx]
   ...:         moved_any = False
   ...: 
   ...:         for _ in range(step_length):
   ...:             nx, ny = x + dy, y + dx
   ...:             if 0 <= nx < n and 0 <= ny < n and grid[nx][ny] == "⬜":
   ...:                 grid[nx][ny] = "🟥"
   ...:                 x, y = nx, ny
   ...:                 steps_taken += 1
   ...:                 print_grid(grid)  # Print after each cell is painted
   ...:                 moved_any = True
   ...:                 if steps_taken >= target_fill:
   ...:                     break  # Stop at ~50% fill
   ...:             else:
   ...:                 break  # Hit boundary or filled cell
   ...: 
   ...:         if not moved_any or steps_taken >= target_fill:
   ...:             break  # Early termination if no moves or target reached
   ...: 
   ...:         dir_idx = (dir_idx + 1) % 4
   ...:         step_length += 1  # Original growth: increment every turn
   ...: 
   ...: if __name__ == "__main__":
   ...:     n = 5
   ...:     generate_spiral(n)
   ...: 
⬜ ⬜ ⬜ ⬜ ⬜
⬜ ⬜ ⬜ ⬜ ⬜
⬜ ⬜ 🟥 ⬜ ⬜
⬜ ⬜ ⬜ ⬜ ⬜
⬜ ⬜ ⬜ ⬜ ⬜

⬜ ⬜ ⬜ ⬜ ⬜
⬜ ⬜ ⬜ ⬜ ⬜
⬜ ⬜ 🟥 🟥 ⬜
⬜ ⬜ ⬜ ⬜ ⬜
⬜ ⬜ ⬜ ⬜ ⬜

⬜ ⬜ ⬜ ⬜ ⬜
⬜ ⬜ ⬜ ⬜ ⬜
⬜ ⬜ 🟥 🟥 ⬜
⬜ ⬜ ⬜ 🟥 ⬜
⬜ ⬜ ⬜ ⬜ ⬜

⬜ ⬜ ⬜ ⬜ ⬜
⬜ ⬜ ⬜ ⬜ ⬜
⬜ ⬜ 🟥 🟥 ⬜
⬜ ⬜ ⬜ 🟥 ⬜
⬜ ⬜ ⬜ 🟥 ⬜

⬜ ⬜ ⬜ ⬜ ⬜
⬜ ⬜ ⬜ ⬜ ⬜
⬜ ⬜ 🟥 🟥 ⬜
⬜ ⬜ ⬜ 🟥 ⬜
⬜ ⬜ 🟥 🟥 ⬜

⬜ ⬜ ⬜ ⬜ ⬜
⬜ ⬜ ⬜ ⬜ ⬜
⬜ ⬜ 🟥 🟥 ⬜
⬜ ⬜ ⬜ 🟥 ⬜
⬜ 🟥 🟥 🟥 ⬜

⬜ ⬜ ⬜ ⬜ ⬜
⬜ ⬜ ⬜ ⬜ ⬜
⬜ ⬜ 🟥 🟥 ⬜
⬜ ⬜ ⬜ 🟥 ⬜
🟥 🟥 🟥 🟥 ⬜

⬜ ⬜ ⬜ ⬜ ⬜
⬜ ⬜ ⬜ ⬜ ⬜
⬜ ⬜ 🟥 🟥 ⬜
🟥 ⬜ ⬜ 🟥 ⬜
🟥 🟥 🟥 🟥 ⬜

⬜ ⬜ ⬜ ⬜ ⬜
⬜ ⬜ ⬜ ⬜ ⬜
🟥 ⬜ 🟥 🟥 ⬜
🟥 ⬜ ⬜ 🟥 ⬜
🟥 🟥 🟥 🟥 ⬜

⬜ ⬜ ⬜ ⬜ ⬜
🟥 ⬜ ⬜ ⬜ ⬜
🟥 ⬜ 🟥 🟥 ⬜
🟥 ⬜ ⬜ 🟥 ⬜
🟥 🟥 🟥 🟥 ⬜

🟥 ⬜ ⬜ ⬜ ⬜
🟥 ⬜ ⬜ ⬜ ⬜
🟥 ⬜ 🟥 🟥 ⬜
🟥 ⬜ ⬜ 🟥 ⬜
🟥 🟥 🟥 🟥 ⬜

🟥 🟥 ⬜ ⬜ ⬜
🟥 ⬜ ⬜ ⬜ ⬜
🟥 ⬜ 🟥 🟥 ⬜
🟥 ⬜ ⬜ 🟥 ⬜
🟥 🟥 🟥 🟥 ⬜

🟥 🟥 🟥 ⬜ ⬜
🟥 ⬜ ⬜ ⬜ ⬜
🟥 ⬜ 🟥 🟥 ⬜
🟥 ⬜ ⬜ 🟥 ⬜
🟥 🟥 🟥 🟥 ⬜

🟥 🟥 🟥 🟥 ⬜
🟥 ⬜ ⬜ ⬜ ⬜
🟥 ⬜ 🟥 🟥 ⬜
🟥 ⬜ ⬜ 🟥 ⬜
🟥 🟥 🟥 🟥 ⬜


In [7]:

(its working copy here now).

Anecdote: even during this "final touch" of Gemini's proposal many AIs are not sure "what to touch" and thus they fall for the (what I call by now) "Ulam's curse" as they at first also dabble with the wrong part of code (the "Ulam's attractor " is too strong ;) causing it then to fill in yet again all the matrix, like Gemini was struggling hard not to do, via referring these "PM artifacts"...

ver. 1.1

0 replies

ptone · 2025-06-28T18:19:38Z

ptone
Jun 28, 2025
Collaborator

Fascinating. I guess the use of files as buckets of different relevant context are allowing the model to self select what context to send to the next inference. Which is a little like search based RAG, but where the LLM is making more qualitative decisions

0 replies

Manamama · 2025-06-29T22:00:43Z

Manamama
Jun 29, 2025
Author

PS. Oh, it seems that smth more advanced has been invented too: https://github.com/modelcontextprotocol/servers/tree/main/src/sequentialthinking

Update:
"mine is better", having tested this "think step by step one" MCP over some days.

I have elaborated the Readme and structure since: https://github.com/Manamama/Puzzles_for_AIs/tree/main (with Gemini CLI's help, sic)

ver. 1.1

1 reply

Manamama Jul 16, 2025
Author

Update:
I have had Gemini ♊ self-update 👨🏽‍🏭 that thinking MCP code, now forked by me, works even better: modelcontextprotocol/servers#2332

This comment was marked as off-topic.

Sign in to view

📝 High praise + strategic AI usage tip (emergent puzzle solving) #2386

Uh oh!

Uh oh!

Manamama Jun 28, 2025

📄 PROJECT_PLAN.md

⚠️ RISK_REGISTER.md

🌱 ASSUMPTIONS.md

🔁 DECISION_LOG.md

🧪 TEST_MODULES/ (optional)

Replies: 7 comments · 1 reply

Uh oh!

Uh oh!

Manamama Jun 28, 2025 Author

Uh oh!

palladius Jun 28, 2025 Collaborator

Uh oh!

orchidfire Jun 28, 2025

This comment was marked as off-topic.

Uh oh!

Uh oh!

Manamama Jun 28, 2025 Author

Uh oh!

ptone Jun 28, 2025 Collaborator

Uh oh!

Uh oh!

Manamama Jun 29, 2025 Author

Uh oh!

Manamama Jul 16, 2025 Author

Manamama
Jun 28, 2025

📄 `PROJECT_PLAN.md`

⚠️ `RISK_REGISTER.md`

🌱 `ASSUMPTIONS.md`

🔁 `DECISION_LOG.md`

🧪 `TEST_MODULES/` (optional)

Replies: 7 comments 1 reply

Manamama
Jun 28, 2025
Author

palladius
Jun 28, 2025
Collaborator

orchidfire
Jun 28, 2025

Manamama
Jun 28, 2025
Author

ptone
Jun 28, 2025
Collaborator

Manamama
Jun 29, 2025
Author

Manamama Jul 16, 2025
Author