Skip to content

Commit 9261d67

Browse files
fix(rfspec): persist results and support fire-and-forget polling
The run.sh script spawns three droid exec calls that take several minutes, but the Execute tool times out at 60s. When that happens the temp dir self-destructs and results are lost. Changes: - Write model outputs to persistent ~/.factory/rfspec/runs/<id>/ instead of a temp dir - Print RFSPEC_RUN_DIR path immediately so the agent captures it before timeout - Write a done sentinel (STATUS=complete|failed) for polling - Update SKILL.md (v1.3.0) with fire-and-forget + poll workflow instructions
1 parent 6959152 commit 9261d67

File tree

5 files changed

+339
-63
lines changed

5 files changed

+339
-63
lines changed

plugins/rfspec/commands/rfspec

Lines changed: 44 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,45 @@
11
#!/usr/bin/env bash
2-
exec "$(dirname "$0")/../skills/rfspec/scripts/run.sh" "$@"
2+
# Launch rfspec in background and return polling instructions immediately.
3+
# This avoids the Execute tool timeout killing the long-running model calls.
4+
5+
SCRIPT_DIR="$(dirname "$0")"
6+
RUN_SH="${SCRIPT_DIR}/../skills/rfspec/scripts/run.sh"
7+
8+
if [ $# -eq 0 ]; then
9+
exec "$RUN_SH"
10+
fi
11+
12+
# Run the script in background, capturing output to its own log.
13+
# run.sh prints RFSPEC_RUN_DIR=<path> as its first line, so we wait
14+
# just long enough to capture that, then return control to the agent.
15+
BGLOG=$(mktemp /tmp/rfspec-bg-XXXXXXXX)
16+
nohup "$RUN_SH" "$@" >"$BGLOG" 2>&1 &
17+
BG_PID=$!
18+
19+
# Wait briefly for run.sh to create the output dir and print the path
20+
sleep 2
21+
22+
# Extract the run dir from the early output
23+
RUN_DIR=$(grep -m1 'RFSPEC_RUN_DIR=' "$BGLOG" 2>/dev/null | cut -d= -f2-)
24+
25+
echo "User prompt: $*"
26+
echo ""
27+
28+
if [ -z "$RUN_DIR" ]; then
29+
echo "rfspec launched (PID ${BG_PID}), but run dir not yet available."
30+
echo "Check log: ${BGLOG}"
31+
else
32+
echo "RFSPEC_RUN_DIR=${RUN_DIR}"
33+
fi
34+
35+
echo ""
36+
echo "rfspec is running in background (PID ${BG_PID})."
37+
echo "Background log: ${BGLOG}"
38+
echo ""
39+
echo "Three models (Opus, GPT-5.4, Gemini) are generating competing specs."
40+
echo "Tell the user results will be ready in a few minutes, then poll:"
41+
echo ""
42+
echo " cat ${RUN_DIR:-<run_dir>}/done 2>/dev/null || echo PENDING"
43+
echo ""
44+
echo "When done, read results:"
45+
echo " Read: ${RUN_DIR:-<run_dir>}/results.md"

plugins/rfspec/hooks/hooks.json

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
{
2+
"description": "Blocks rfspec result access until the rfspec skill is loaded",
3+
"hooks": {
4+
"PreToolUse": [
5+
{
6+
"matcher": "Execute",
7+
"commandRegex": "rfspec/runs/",
8+
"hooks": [
9+
{
10+
"type": "command",
11+
"command": "python3 ${DROID_PLUGIN_ROOT}/hooks/rfspec-ready.py",
12+
"timeout": 5
13+
}
14+
]
15+
}
16+
]
17+
}
18+
}
Lines changed: 97 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,97 @@
1+
#!/usr/bin/env python3
2+
"""PreToolUse hook -- blocks rfspec polling commands until skill is loaded.
3+
4+
Matches Execute commands containing rfspec/runs/. Allows if results are not
5+
ready yet or the rfspec skill marker exists. Blocks if results are ready but
6+
the skill hasn't been loaded, forcing the agent to invoke Skill: rfspec first.
7+
"""
8+
9+
import json
10+
import logging
11+
import re
12+
import sys
13+
from pathlib import Path
14+
15+
USER_FACTORY = Path.home() / ".factory"
16+
MARKER_DIR = USER_FACTORY / ".skill-markers"
17+
RFSPEC_RUNS = USER_FACTORY / "rfspec" / "runs"
18+
LOG_FILE = USER_FACTORY / "logs" / "hooks.log"
19+
20+
RUN_DIR_RE = re.compile(r"rfspec/runs/([\w-]+)")
21+
22+
23+
def setup_logging():
24+
LOG_FILE.parent.mkdir(parents=True, exist_ok=True)
25+
logging.basicConfig(
26+
filename=LOG_FILE,
27+
level=logging.INFO,
28+
format="%(asctime)s [%(levelname)s] %(name)s: %(message)s",
29+
datefmt="%Y-%m-%d %H:%M:%S",
30+
)
31+
return logging.getLogger("rfspec-ready")
32+
33+
34+
def find_run_dir(command: str) -> Path | None:
35+
m = RUN_DIR_RE.search(command)
36+
if m:
37+
return RFSPEC_RUNS / m.group(1)
38+
return None
39+
40+
41+
def skill_loaded(session_id: str) -> bool:
42+
if not session_id:
43+
return False
44+
return (MARKER_DIR / f"{session_id}-rfspec").exists()
45+
46+
47+
def main():
48+
log = setup_logging()
49+
50+
try:
51+
data = json.load(sys.stdin)
52+
except json.JSONDecodeError:
53+
sys.exit(0)
54+
55+
command = data.get("tool_input", {}).get("command", "")
56+
session_id = data.get("session_id", "")
57+
58+
run_dir = find_run_dir(command)
59+
if not run_dir:
60+
sys.exit(0)
61+
62+
done_file = run_dir / "done"
63+
if not done_file.exists():
64+
log.info(f"[PreToolUse] rfspec run {run_dir.name} not done yet, allowing poll")
65+
sys.exit(0)
66+
67+
if skill_loaded(session_id):
68+
log.info(f"[PreToolUse] rfspec skill loaded, allowing access to {run_dir.name}")
69+
sys.exit(0)
70+
71+
results_path = run_dir / "results.md"
72+
log.info(
73+
f"[PreToolUse] BLOCK rfspec results ready but skill not loaded, session={session_id[:8]}"
74+
)
75+
76+
message = (
77+
f"rfspec results are ready at {results_path}. "
78+
f"You MUST invoke the rfspec skill to load the evaluation workflow before reading results. "
79+
f'Call: Skill(skill="rfspec")'
80+
)
81+
82+
print(
83+
json.dumps(
84+
{
85+
"hookSpecificOutput": {
86+
"hookEventName": "PreToolUse",
87+
"permissionDecision": "deny",
88+
"permissionDecisionReason": message,
89+
},
90+
}
91+
)
92+
)
93+
sys.exit(2)
94+
95+
96+
if __name__ == "__main__":
97+
main()

plugins/rfspec/skills/rfspec/SKILL.md

Lines changed: 79 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
---
22
name: rfspec
3-
version: 1.2.0
3+
version: 1.3.0
44
description: |
55
Multi-model spec generation and synthesis. Use when the user wants to:
66
- Get competing proposals from different AI models
@@ -17,20 +17,63 @@ Fan out a prompt to multiple models, compare their responses, and help the user
1717

1818
## Quick Reference
1919

20-
| Task | Action |
21-
|------|--------|
22-
| Generate competing specs | `/rfspec <prompt>` |
23-
| Pick one result | Select via AskUser after comparison |
24-
| Synthesize results | Combine strongest elements when user chooses synthesis |
25-
| Save final spec | Write to `specs/active/YYYY-MM-DD-<slug>.md` |
20+
| Task | Action |
21+
| ------------------------ | ------------------------------------------------------ |
22+
| Generate competing specs | `/rfspec <prompt>` (background) |
23+
| Poll for results | Check `<run_dir>/done` sentinel |
24+
| Pick one result | Select via AskUser after comparison |
25+
| Synthesize results | Combine strongest elements when user chooses synthesis |
26+
| Save final spec | Write to `specs/active/YYYY-MM-DD-<slug>.md` |
2627

2728
## Workflow
2829

29-
1. Run `/rfspec <user's prompt>` -- fires parallel model calls, returns labeled options (A, B, C).
30-
2. Evaluate the results -- see [references/evaluation-guide.md](references/evaluation-guide.md).
31-
3. Present the choice to the user via AskUser.
32-
4. Present the selected or synthesized spec via ExitSpecMode for user review.
33-
5. Save to `specs/active/` only after the user approves in spec mode.
30+
The `/rfspec` command spawns three `droid exec` calls in parallel. These take
31+
several minutes, far exceeding the Execute tool timeout. You MUST use the
32+
fire-and-forget + poll pattern.
33+
34+
### Step 1 -- Launch (background)
35+
36+
Run the command with `fireAndForget=true`:
37+
38+
```
39+
Execute: /rfspec <user's prompt>
40+
fireAndForget: true
41+
```
42+
43+
The script immediately prints `RFSPEC_RUN_DIR=<path>` to its log file.
44+
Read the log file (path printed by Execute) to capture the run directory.
45+
46+
### Step 2 -- Poll for completion
47+
48+
Tell the user the models are running and you will check back. Then poll:
49+
50+
```
51+
Execute: cat <run_dir>/done 2>/dev/null || echo "PENDING"
52+
```
53+
54+
Poll every 30-60 seconds. The sentinel contains `STATUS=complete` or
55+
`STATUS=failed`. While waiting, you can do other work or let the user know
56+
progress.
57+
58+
### Step 3 -- Read results
59+
60+
Once `done` exists, read the results:
61+
62+
```
63+
Read: <run_dir>/results.md
64+
```
65+
66+
This file contains all three model outputs as markdown sections (Option A, B, C).
67+
68+
### Step 4 -- Evaluate and present
69+
70+
Evaluate the results -- see [references/evaluation-guide.md](references/evaluation-guide.md).
71+
Present the choice to the user via AskUser.
72+
73+
### Step 5 -- Finalize
74+
75+
Present the selected or synthesized spec via ExitSpecMode for user review.
76+
Save to `specs/active/` only after the user approves in spec mode.
3477

3578
## Saving
3679

@@ -43,6 +86,19 @@ specs/active/YYYY-MM-DD-<slug>.md
4386

4487
Where `<slug>` is a short kebab-case name derived from the topic.
4588

89+
## Resuming from slash command
90+
91+
If you are loading this skill after `/rfspec` already ran (the slash command told
92+
you to invoke `Skill: rfspec`), you already have the run directory. Pick up from
93+
Step 3:
94+
95+
1. Read `<run_dir>/results.md` to get the model outputs.
96+
2. Follow Step 4 (evaluate and present) and Step 5 (finalize) below.
97+
98+
The `results.md` file includes embedded agent instructions as a fallback, but
99+
prefer the full workflow in this document -- it covers the evaluation guide,
100+
saving rules, and rejection handling that the embedded version omits.
101+
46102
## Pitfalls
47103

48104
- Don't summarize each option individually -- compare them against each other.
@@ -63,29 +119,31 @@ Example 1: User wants competing specs
63119
User says: "Get me specs from multiple models for adding a dark mode toggle"
64120
Actions:
65121

66-
1. Run `/rfspec add a dark mode toggle to the settings page with persistent user preference`
67-
2. Read Options A, B, C
68-
3. Compare: "Option A uses CSS variables with a React context, Option B uses Tailwind's dark class with localStorage, Option C uses a theme provider with system preference detection."
69-
4. Present choice via AskUser
70-
Result: User picks Option B, saved to `specs/active/2026-03-06-dark-mode-toggle.md`
122+
1. Execute `/rfspec add a dark mode toggle ...` with `fireAndForget=true`
123+
2. Read the background log to get `RFSPEC_RUN_DIR`
124+
3. Tell user: "Models are running, I'll check back shortly."
125+
4. Poll `<run_dir>/done` until `STATUS=complete`
126+
5. Read `<run_dir>/results.md`, compare Options A, B, C
127+
6. Present choice via AskUser
128+
Result: User picks Option B, saved to `specs/active/2026-03-06-dark-mode-toggle.md`
71129

72130
Example 2: User wants synthesis
73131
User says: "rfspec this: refactor the auth module to use JWT"
74132
Actions:
75133

76-
1. Run `/rfspec refactor the auth module to use JWT`
77-
2. Compare results, noting Option A has better token rotation but Option C has cleaner middleware
134+
1. Launch background, poll for completion
135+
2. Read results, compare -- Option A has better token rotation, Option C has cleaner middleware
78136
3. User selects "Synthesize"
79137
4. Combine Option A's rotation logic with Option C's middleware structure
80-
Result: Synthesized spec saved to `specs/active/2026-03-06-auth-jwt-refactor.md`
138+
Result: Synthesized spec saved to `specs/active/2026-03-06-auth-jwt-refactor.md`
81139

82140
Example 3: All options rejected
83141
User says: "None of these work, they all miss the caching layer"
84142
Actions:
85143

86144
1. Ask what's missing -- user explains the Redis caching requirement
87145
2. Offer to re-run: `/rfspec refactor auth module to use JWT with Redis session caching`
88-
Result: New round of specs generated with caching addressed
146+
Result: New round of specs generated with caching addressed
89147

90148
## References
91149

0 commit comments

Comments
 (0)