Skip to content

Allow nelson to act on cost-saving instructions#23

Open
LannyRipple wants to merge 1 commit intoharrymunro:mainfrom
LannyRipple:cost-savings
Open

Allow nelson to act on cost-saving instructions#23
LannyRipple wants to merge 1 commit intoharrymunro:mainfrom
LannyRipple:cost-savings

Conversation

@LannyRipple
Copy link

  • Add cost-savings primer in SKILL.md referencing model-selection
  • Add haiku only blocks in crew-briefing
  • Add cost-weight value in crew-roles and royal-marines. Weights and adjustments are explained in model-selection

@LannyRipple
Copy link
Author

Will collide with PR #22 because of formatting changes. Easy fix and I'll rebase when #22 goes in.

Had Claude do most of the heavy lifting and I like what we came up with. Here's an overview of what was decided and some of the motivation.


Cost-Savings Mode

What Was Added

Nelson can now operate in cost-savings mode. When the sailing orders express cost concern ("keep costs low", "be aggressive with cost savings", etc.), Nelson loads references/model-selection.md and applies weight-based model selection across the squadron instead of assigning every agent the admiral's model.

Five files were changed or created:

File Change
skills/nelson/references/model-selection.md New — weight table, threshold rule, adjustment guidance, briefing enhancements
skills/nelson/references/crew-roles.md Added cost-weight to each role definition
skills/nelson/references/royal-marines.md Added cost-weight to each marine specialisation
skills/nelson/SKILL.md Step 2 now triggers model-selection.md when cost-savings intent is detected
skills/nelson/references/admiralty-templates/crew-briefing.md Added conditional haiku-only blocks (identity anchor, output format, task decomposition)

Design Decisions

Triggering: Natural Language Inference Only

Cost-savings mode activates through the admiral's reading of the sailing orders, not through an explicit flag or parameter. This keeps the interface simple and consistent with how Nelson handles other constraints (budget, scope, compliance rules) — they are part of the sailing orders, not separate controls.

The intensity of the language matters: modest language ("stay within budget") produces modest weight pressure; emphatic language ("be aggressive") licenses the admiral to push more roles below the haiku threshold.

Weight System: Hybrid Static/Dynamic

Each crew role and marine specialisation carries a default cost-weight. The tasking agent adjusts these defaults based on the specific task before assigning a model:

  • Raise weight when the task involves judgment, edge cases, or verification that exceeds the role default.
  • Lower weight when the task is more atomic or contained than the role default suggests.

This avoids a purely static table (which would be wrong for too many edge cases) while also avoiding a fully dynamic system (which would be inconsistent and hard to reason about). The defaults are an anchor, not a ceiling.

Binary Model Selection: Haiku or Admiral's Model

The threshold is simple: weight ≤ 4 → haiku; weight ≥ 5 → admiral's model. No middle tier.

The key reason: the binary approach has a forcing function built in. When a task sits in a gray zone — too complex for haiku, but not clearly worth the admiral's model — the correct response is to decompose the task further, not to reach for an intermediate model. Decomposition produces better parallelism and clearer deliverables. A middle tier would paper over under-decomposed tasks rather than surface them.

A secondary reason: the admiral's model is already chosen by the user as the right capability level for the mission. Anything that doesn't qualify for haiku should match that ceiling. The "admiral sets the ceiling" pattern also scales cleanly — a user running sonnet as their admiral model gets haiku vs sonnet automatically; a user running opus gets haiku vs opus. Nelson doesn't need to reason about the relative capabilities of specific model versions.

No Version Strings

All agents at weight ≥ 5 inherit the admiral's model via the model parameter. No specific version (e.g., claude-sonnet-4-5) is hardcoded anywhere in the skill. This prevents the guidance from going stale as models are updated.

Briefing Enhancements Are Conditional

The three haiku-specific blocks (identity anchor, output format, task decomposition prompt) are only added when haiku is assigned. The motivation: less capable models benefit from explicit grounding — confirming they are operating as a real agent, not in a roleplay context — and from tighter output contracts. These additions would be noise in a full-capability briefing and are omitted there.

Transparency: No Announcement

Nelson does not announce that cost-savings mode is active. Users who specify cost savings understand the implication. Adding a banner or summary would add ceremony without value.


Future Considerations

Introducing a Mid-Tier Model

The current binary threshold works well when the weight distribution is bimodal — most tasks are clearly haiku-worthy or clearly not. If a user running opus notices a large class of weight-5–7 tasks that feel overserved by opus but underserved by haiku, a mid-model field in the sailing orders could allow an opt-in middle tier without complicating the common case. This would be a sailing-order-level override rather than a permanent third tier in the weight table.

The sonnet/opus performance relationship is not stable across task types or model generations, which is an additional reason to keep any mid-tier designation opt-in rather than baked into the default weight logic.

Per-Mission Weight Overrides

The weight table provides role-level defaults. A future extension could allow the sailing orders to override specific role weights directly (e.g., "treat PWOs as weight 3 for this mission"). This would let the admiral dial in cost pressure at the role level without editing the reference files.

Cost Reporting in the Captain's Log

The captain's log currently records decisions, diffs, and validation evidence. Adding a cost section — estimated or actual token usage per agent, and total mission cost — would close the feedback loop for users trying to calibrate cost-savings settings across missions.

Explicit Cost-Savings Flag

Natural language inference is the current trigger. An explicit sailing-orders field (e.g., cost-savings: aggressive) would make the intent machine-readable and reduce reliance on the admiral correctly parsing intent from free text. The trade-off is added structure in the sailing-orders template.

* Add cost-savings primer in SKILL.md referencing model-selection
* Add haiku only blocks in crew-briefing
* Add cost-weight value in crew-roles and royal-marines.  Weights and
  adjustments are explained in model-selection
@harrymunro
Copy link
Owner

Hey @LannyRipple thanks for this. All sounds good. Please fix the conflict with the main branch. By the way if you haven't already you may wish to disable Claude Code author commits - it means that on merging these PRs to main Claude gets noted as the contributor and not yourself. See: https://www.reddit.com/r/ClaudeCode/comments/1qndhpd/how_to_remove_claude_as_coauthor_from_commits/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants