|
| 1 | +# 🔍 ARTIFACT SHARING INVESTIGATION: Root Cause & Fixes |
| 2 | + |
| 3 | +## Executive Summary |
| 4 | + |
| 5 | +After thorough investigation of trace5 and documentation, I've identified the **root cause** of the artifact accessibility issue: **Artifacts created via `<artifact:create>` annotations are processed asynchronously AFTER the AI response completes, but delegation happens immediately, creating a race condition.** |
| 6 | + |
| 7 | +--- |
| 8 | + |
| 9 | +## 📊 Evidence from Trace5 |
| 10 | + |
| 11 | +### Timeline of Events: |
| 12 | + |
| 13 | +1. **Line 255:** Firecrawl tool call completes successfully ✅ |
| 14 | +2. **Line 280:** AI response includes `<artifact:create id="pg-goodwriting" tool="toolu_017itDUg7cwHYUjUvkvHi9ZJ" type="scraped_page" base="result" />` |
| 15 | +3. **Line 281:** AI immediately delegates back to orchestrator |
| 16 | +4. **Line 75:** Delegation response shows `parts: [{"kind":"text","text":"Task completed successfully"}]` |
| 17 | + - **CRITICAL:** NO artifact data in parts array ❌ |
| 18 | +5. **Line 295:** Orchestrator extracts metadata from TEXT message (not parts array) |
| 19 | +6. **Line 296:** Orchestrator correctly passes metadata to qualification agent |
| 20 | +7. **Line 336:** Qualification agent calls `get_reference_artifact` → **FAILS**: "Artifact not found" ❌ |
| 21 | + |
| 22 | +--- |
| 23 | + |
| 24 | +## 🎯 Root Cause Identified |
| 25 | + |
| 26 | +### The Problem: |
| 27 | + |
| 28 | +**Artifacts created via `<artifact:create>` annotations are processed asynchronously:** |
| 29 | + |
| 30 | +1. AI generates response with `<artifact:create>` annotation |
| 31 | +2. System processes annotation **asynchronously** (in background) |
| 32 | +3. Delegation happens **immediately** after AI response |
| 33 | +4. Delegation response returns **before** artifact is persisted |
| 34 | +5. Next agent tries to retrieve artifact → **NOT FOUND** (not persisted yet) |
| 35 | + |
| 36 | +### Evidence: |
| 37 | + |
| 38 | +- **Parts array empty:** Delegation response doesn't contain artifact metadata because artifact hasn't been created yet |
| 39 | +- **Artifact not found:** When qualification agent tries to retrieve it seconds later, artifact still doesn't exist |
| 40 | +- **Timing issue:** This is a **race condition** between artifact creation and delegation |
| 41 | + |
| 42 | +--- |
| 43 | + |
| 44 | +## 🔧 Solution Options |
| 45 | + |
| 46 | +### Option A: Wait for Artifact Processing (RECOMMENDED) |
| 47 | + |
| 48 | +**Approach:** Update `urlToMarkdown` agent to NOT delegate immediately after creating artifact. Instead, add explicit instruction to wait for artifact processing. |
| 49 | + |
| 50 | +**Implementation:** |
| 51 | +```typescript |
| 52 | +// In urlToMarkdown agent prompt - AFTER artifact creation: |
| 53 | + |
| 54 | +**WHEN COMPLETE:** |
| 55 | +- After successfully scraping content and creating artifacts, **WAIT** before delegating |
| 56 | +- **CRITICAL:** The artifact:create annotation triggers asynchronous artifact creation |
| 57 | +- **DO NOT delegate immediately** - wait for artifact to be processed |
| 58 | +- Add a brief delay or check mechanism to ensure artifact is persisted |
| 59 | +- Only delegate AFTER confirming artifact creation is complete |
| 60 | +``` |
| 61 | + |
| 62 | +**Pros:** |
| 63 | +- Simple fix |
| 64 | +- Works with current system behavior |
| 65 | +- No system changes needed |
| 66 | + |
| 67 | +**Cons:** |
| 68 | +- Requires explicit waiting logic |
| 69 | +- May add latency |
| 70 | + |
| 71 | +--- |
| 72 | + |
| 73 | +### Option B: Pass Tool Result Directly (ALTERNATIVE) |
| 74 | + |
| 75 | +**Approach:** Instead of relying on artifact retrieval, pass the tool result data directly in the delegation message. |
| 76 | + |
| 77 | +**Implementation:** |
| 78 | +```typescript |
| 79 | +// In urlToMarkdown agent prompt: |
| 80 | + |
| 81 | +**WHEN COMPLETE:** |
| 82 | +- After scraping, create artifact AND include key data in delegation message |
| 83 | +- Format: "Scraped [URL]. Artifact ID: [id]. Tool result summary: [key points]" |
| 84 | +- This provides immediate access to data while artifact processes in background |
| 85 | +``` |
| 86 | + |
| 87 | +**Pros:** |
| 88 | +- Immediate data access |
| 89 | +- Workflow continues without delay |
| 90 | + |
| 91 | +**Cons:** |
| 92 | +- Loses artifact benefits (citations, full data access) |
| 93 | +- Not ideal for production |
| 94 | + |
| 95 | +--- |
| 96 | + |
| 97 | +### Option C: Fallback Retrieval Mechanism (ROBUST) |
| 98 | + |
| 99 | +**Approach:** Add retry logic in downstream agents to handle async artifact creation. |
| 100 | + |
| 101 | +**Implementation:** |
| 102 | +```typescript |
| 103 | +// In qualification agent prompt: |
| 104 | + |
| 105 | +**ARTIFACT RETRIEVAL:** |
| 106 | +- First attempt: Retrieve artifact using provided metadata |
| 107 | +- If artifact not found: Wait 2-3 seconds, retry |
| 108 | +- If still not found: Extract data from delegation message text (fallback) |
| 109 | +- Continue workflow with available data |
| 110 | +``` |
| 111 | + |
| 112 | +**Pros:** |
| 113 | +- Handles race conditions gracefully |
| 114 | +- Robust error handling |
| 115 | + |
| 116 | +**Cons:** |
| 117 | +- Adds complexity |
| 118 | +- May still fail if artifact never creates |
| 119 | + |
| 120 | +--- |
| 121 | + |
| 122 | +### Option D: System-Level Fix (IDEAL BUT REQUIRES SDK CHANGES) |
| 123 | + |
| 124 | +**Approach:** Ensure artifacts are persisted synchronously before delegation responses are returned. |
| 125 | + |
| 126 | +**Implementation:** |
| 127 | +- Modify Inkeep Agents SDK to process `<artifact:create>` annotations synchronously |
| 128 | +- Ensure artifacts are persisted before delegation completes |
| 129 | +- Include artifact metadata in delegation response parts array automatically |
| 130 | + |
| 131 | +**Pros:** |
| 132 | +- Fixes root cause |
| 133 | +- No prompt changes needed |
| 134 | +- Works for all agents |
| 135 | + |
| 136 | +**Cons:** |
| 137 | +- Requires SDK changes |
| 138 | +- Not immediately actionable |
| 139 | + |
| 140 | +--- |
| 141 | + |
| 142 | +## ✅ Recommended Fix Plan |
| 143 | + |
| 144 | +### Phase 1: Immediate Fix (Option A + C Hybrid) |
| 145 | + |
| 146 | +**1. Update `urlToMarkdown` Agent:** |
| 147 | +- Add explicit instruction to wait after artifact creation |
| 148 | +- Include artifact metadata in delegation message text (current behavior - keep this) |
| 149 | +- Add instruction: "After creating artifact, wait briefly before delegating to ensure artifact is processed" |
| 150 | + |
| 151 | +**2. Update `qualificationAgent` (and other downstream agents):** |
| 152 | +- Add retry logic for artifact retrieval |
| 153 | +- Add fallback: If artifact not found, extract data from delegation message |
| 154 | + |
| 155 | +**3. Update Orchestrator:** |
| 156 | +- Keep current text-based extraction (works as fallback) |
| 157 | +- Add instruction: "If parts array doesn't contain artifacts, extract from text message" |
| 158 | + |
| 159 | +### Phase 2: Long-Term Fix (Option D) |
| 160 | + |
| 161 | +**Work with Inkeep team to:** |
| 162 | +- Ensure artifacts are persisted synchronously before delegation |
| 163 | +- Include artifact metadata in delegation response parts array automatically |
| 164 | +- This fixes the root cause for all agents |
| 165 | + |
| 166 | +--- |
| 167 | + |
| 168 | +## 📝 Specific Code Changes Needed |
| 169 | + |
| 170 | +### Change 1: `urlToMarkdown` Agent Prompt |
| 171 | + |
| 172 | +```typescript |
| 173 | +**WHEN COMPLETE:** |
| 174 | +- After successfully scraping content and creating artifacts: |
| 175 | + 1. Create artifact using `<artifact:create>` annotation |
| 176 | + 2. **WAIT:** Artifact creation happens asynchronously - wait 2-3 seconds before delegating |
| 177 | + 3. Include artifact metadata in delegation message text (as backup) |
| 178 | + 4. Delegate back to orchestrator |
| 179 | +- **CRITICAL:** The artifact:create annotation triggers background processing |
| 180 | +- Do NOT delegate immediately - give the system time to persist the artifact |
| 181 | +``` |
| 182 | + |
| 183 | +### Change 2: `qualificationAgent` Prompt |
| 184 | + |
| 185 | +```typescript |
| 186 | +**ARTIFACT RETRIEVAL WITH RETRY:** |
| 187 | +- First attempt: Use get_reference_artifact with provided metadata |
| 188 | +- If artifact not found (race condition): |
| 189 | + 1. Wait 2-3 seconds |
| 190 | + 2. Retry retrieval |
| 191 | + 3. If still not found, extract key data from delegation message text |
| 192 | + 4. Continue workflow with available data |
| 193 | +``` |
| 194 | + |
| 195 | +### Change 3: Orchestrator Prompt |
| 196 | + |
| 197 | +```typescript |
| 198 | +**ARTIFACT METADATA EXTRACTION (UPDATED):** |
| 199 | +1. FIRST: Check delegation response parts array for artifact data |
| 200 | +2. IF parts array contains artifacts: Extract metadata from parts array |
| 201 | +3. IF parts array is empty: Extract metadata from delegation message text (fallback) |
| 202 | +4. Store extracted metadata for next step |
| 203 | +5. NEVER proceed without artifact metadata |
| 204 | +``` |
| 205 | + |
| 206 | +--- |
| 207 | + |
| 208 | +## 🧪 Testing Plan |
| 209 | + |
| 210 | +1. **Test artifact creation:** Verify artifact is created correctly |
| 211 | +2. **Test timing:** Confirm artifact exists before retrieval attempt |
| 212 | +3. **Test fallback:** Verify text-based extraction works when parts array is empty |
| 213 | +4. **Test retry:** Verify retry logic handles race conditions |
| 214 | +5. **End-to-end:** Run full workflow to confirm all steps complete |
| 215 | + |
| 216 | +--- |
| 217 | + |
| 218 | +## 🎯 Expected Outcome |
| 219 | + |
| 220 | +After implementing these fixes: |
| 221 | +- ✅ Artifacts are accessible to downstream agents |
| 222 | +- ✅ Workflow completes successfully |
| 223 | +- ✅ Race conditions are handled gracefully |
| 224 | +- ✅ Fallback mechanisms ensure workflow continues even if artifacts are delayed |
| 225 | + |
| 226 | +--- |
| 227 | + |
| 228 | +## 📌 Key Insights |
| 229 | + |
| 230 | +1. **Artifact creation is asynchronous:** `<artifact:create>` annotations are processed after AI response |
| 231 | +2. **Delegation is immediate:** Happens right after AI response, before artifact persists |
| 232 | +3. **Parts array is empty:** Because artifact hasn't been created yet when delegation completes |
| 233 | +4. **Text-based extraction works:** Orchestrator correctly extracts metadata from text (fallback) |
| 234 | +5. **Retrieval fails:** Because artifact doesn't exist yet when qualification agent tries to access it |
| 235 | + |
| 236 | +--- |
| 237 | + |
| 238 | +## 🔄 Next Steps |
| 239 | + |
| 240 | +1. **Immediate:** Implement Phase 1 fixes (wait logic + retry mechanism) |
| 241 | +2. **Test:** Verify fixes work with trace5 scenario |
| 242 | +3. **Monitor:** Check if artifacts are accessible after fixes |
| 243 | +4. **Long-term:** Work with Inkeep team on Option D (synchronous artifact processing) |
| 244 | + |
| 245 | +--- |
| 246 | + |
| 247 | +## 📚 References |
| 248 | + |
| 249 | +- Trace5 analysis: Lines 255-336 show the complete artifact lifecycle |
| 250 | +- Inkeep documentation: Artifacts are "automatically created" but timing is unclear |
| 251 | +- Current implementation: Text-based extraction works as fallback |
| 252 | + |
0 commit comments