[Feedback Encouraged] - "Gemini-Cli Orchestration," Reducing tokens usage & API calls #10256

Justadudeinspace · 2025-09-30T18:42:18Z

Justadudeinspace
Sep 30, 2025

Gemini-Cli Orchestration

Hey all,

I’ve been experimenting with using gemini-cli as more than a one-off prompt runner — essentially turning it into an AI orchestration engine by driving it through a structured workflow with GPT-5. Thought I’d share the approach here in case others are exploring similar patterns.

🧩 How I Orchestrate Inputs

Instead of thinking of gemini-cli as a “question → answer” tool, I treat it like the control panel of a build system. My workflow looks like this:

TUI Menu as Control Surface
I don’t fire off random prompts. I use the TUI input box like a patch request terminal. Every prompt is carefully written to output unified diffs (patches) against my local repo.
→ This avoids giant file rewrites and keeps model output surgical, lightweight, and easy to apply with git apply.

Modular Prompt Bundles

I split work into Passes.

Pass A: core utilities (feature flags, rate limiting, admin cache).
Pass B: moderation & safety.
Pass C: productivity modules (reminders, tickets, vault, RSS, menus).
Each Pass is self-contained, which keeps prompts compact, avoids repetition, and minimizes token drift.

Micro Fix Prompts
If a patch rejects or a single hunk fails, I don’t re-describe the whole repo.
Example micro prompt:

Unified diff only. Touch ONLY modules/admin.py.
Goal: add /unban <user_id> and ensure admin check via get_admins_cached.

This saves quota, reduces hallucination risk, and keeps things tight.

⚡ Reducing Token Use & API Calls

I’m on a free solo-dev trial, so token conservation is critical. Here’s how I keep things lean:

Patch-Only Workflow
GPT-5 never re-sees the entire project — only my intent + the files I allow it to touch. No “rewrite main.py from scratch.”
Unified Diff Output
Smaller deltas → fewer tokens. I apply them directly via git apply. The model doesn’t need to “hold” repo context across prompts.
Feature Flags Everywhere
Every new feature is OFF by default. I toggle it with /feature <key> on|off. This means expanded code doesn’t hammer the Bot API until I choose.
Batch Digests
For chatty features (RSS, reminders), the bot sends one digest message instead of dozens of updates. That cuts down on API calls and keeps groups uncluttered.

🛡️ Handling Hallucinations

AI hallucinations aren’t going away — but I treat them like runtime errors in a build process. Here’s how I handle them:

Strict Prompt Rules
Every prompt starts with “Output ONLY unified diff. No prose. No code fences.” → this kills filler.
Scoped File Targets
I explicitly name which files the patch may touch. No “random modules” or ghost imports.
Reject/Retry Flow
If git apply creates a .rej, I immediately fix with a micro prompt. I never rerun the whole giant prompt, so the error stays localized.
Rate Limiting
Decorators like @rate_limit keep the bot itself from spamming Telegram’s API if I accidentally create a loop or noisy handler.

🚀 Why This Matters

Most devs treat LLMs as autocomplete or one-shot generators. I’m treating gemini-cli as the command router for a continuous, incremental development process:

Controlled → I define the scope and files every time.
Low-token → diffs, not rewrites.
Incremental → passes & micro fixes, not all-in-one dumps.
Resilient → hallucinations get sandboxed and corrected quickly.

This transforms gemini-cli from “tool that answers” into “tool that builds.”

➡️ Currently this process is reducing my solo-dev workflow labor by roughly 70% to 80%.

🔗 Real Projects Using This Flow

I’m applying this orchestration process to my:

Ultimate Bot — a modular, Rose-class Telegram bot with admin, productivity, safety, and orchestration features.

BLUX Lite GOLD — my orchestrator project that mimics this very workflow:
User Input → GPT-5 → Prompt → gemini-cli (Gemini-2.5-Pro) → Patch → Test → Apply.
Currently this orchestration loop is manual, but the goal is to have the orchestrator itself auto-handle the passes.

Without gemini-cli’s TUI loop, I’d be burning way more tokens and time trying to manage this manually.

💡 Curious

Is anyone else here doing orchestration like this?

Do you split your work into passes/modules too?

How do you keep patch reject rates down?

Would there be interest in a “patch-only mode” built directly into gemini-cli (so prompts default to diffs and can be applied in one shot)?

Thanks for the tool — it’s genuinely become the backbone of my AI-assisted development workflow.

— Solo dev, Ultimate Bot project
~JADIS

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feedback Encouraged] - "Gemini-Cli Orchestration," Reducing tokens usage & API calls #10256

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

[Feedback Encouraged] - "Gemini-Cli Orchestration," Reducing tokens usage & API calls #10256

Uh oh!

Justadudeinspace Sep 30, 2025

Gemini-Cli Orchestration

Hey all,

🧩 How I Orchestrate Inputs

Modular Prompt Bundles

⚡ Reducing Token Use & API Calls

🛡️ Handling Hallucinations

🚀 Why This Matters

🔗 Real Projects Using This Flow

💡 Curious

Replies: 0 comments

Justadudeinspace
Sep 30, 2025