feat(copilot): Auto-save binary block outputs to workspace#11968
feat(copilot): Auto-save binary block outputs to workspace#11968
Conversation
When blocks produce binary outputs (PNG, JPEG, PDF, SVG), the data is now automatically saved to the user's workspace and replaced with workspace:// references. This prevents: - Massive token waste from LLM re-typing base64 strings (17,000+ tokens) - Potential data corruption from truncation/hallucination - Poor UX from slow character-by-character output Implementation: - New binary_output_processor.py module with hash-based deduplication - Integration in run_block.py (single entry point for all block executions) - Graceful degradation: failures preserve original data Fixes SECRT-1887
WalkthroughNew code auto-saves large binary/text outputs from block execution to the workspace, replaces them with deterministic Changes
Sequence DiagramsequenceDiagram
participant Client as Client/Caller
participant RunBlock as run_block
participant BinProc as binary_output_processor
participant WorkspaceMgr as WorkspaceManager
participant FileStore as File Storage
Client->>RunBlock: Execute block
RunBlock->>RunBlock: Run block logic -> collect outputs
RunBlock->>BinProc: process_binary_outputs(outputs, workspace_manager, block_name)
BinProc->>BinProc: Traverse outputs (dicts/lists), identify saveable fields & size
alt Large content
BinProc->>BinProc: Compute SHA-256 hash
alt Not cached
BinProc->>BinProc: Decode base64 if needed
BinProc->>WorkspaceMgr: request save (filename, data)
WorkspaceMgr->>FileStore: write_file(filename, data)
FileStore-->>WorkspaceMgr: success / failure
WorkspaceMgr-->>BinProc: workspace:// reference / error
BinProc->>BinProc: Cache reference if success
else Cached
BinProc->>BinProc: Reuse cached reference
end
BinProc->>BinProc: Replace content with workspace:// reference
else Small or non-saveable
BinProc->>BinProc: Leave content unchanged
end
BinProc-->>RunBlock: processed_outputs
RunBlock-->>Client: BlockOutputResponse(processed_outputs)
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~22 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In
`@autogpt_platform/backend/backend/api/features/chat/tools/binary_output_processor.py`:
- Around line 112-118: The _decode_base64 function should perform strict base64
validation and normalize padding before decoding: if value starts with "data:"
strip the prefix as before, then normalize padding by adding '=' characters so
len(value) % 4 == 0, and call base64.b64decode(value, validate=True) to enforce
strict character checking; catch binascii.Error and ValueError and return None
on any failure so corrupted inputs are not silently decoded. Ensure these
changes are implemented inside _decode_base64 (preserving the data URI handling)
and that no exceptions escape.
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (3)
autogpt_platform/backend/backend/api/features/chat/tools/binary_output_processor.pyautogpt_platform/backend/backend/api/features/chat/tools/run_block.pyautogpt_platform/backend/backend/api/features/chat/tools/test_binary_output_processor.py
🧰 Additional context used
📓 Path-based instructions (7)
autogpt_platform/backend/**/*.py
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
autogpt_platform/backend/**/*.py: Use Python 3.11 (required; managed by Poetry via pyproject.toml) for backend development
Always run 'poetry run format' (Black + isort) before linting in backend development
Always run 'poetry run lint' (ruff) after formatting in backend development
Files:
autogpt_platform/backend/backend/api/features/chat/tools/run_block.pyautogpt_platform/backend/backend/api/features/chat/tools/binary_output_processor.pyautogpt_platform/backend/backend/api/features/chat/tools/test_binary_output_processor.py
autogpt_platform/backend/backend/api/features/**/*.py
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
Update routes in '/backend/backend/api/features/' and add/update Pydantic models in the same directory for API development
When modifying API routes, update corresponding Pydantic models in the same directory and write tests alongside the route file
Files:
autogpt_platform/backend/backend/api/features/chat/tools/run_block.pyautogpt_platform/backend/backend/api/features/chat/tools/binary_output_processor.pyautogpt_platform/backend/backend/api/features/chat/tools/test_binary_output_processor.py
autogpt_platform/backend/**/*.{py,txt}
📄 CodeRabbit inference engine (autogpt_platform/backend/CLAUDE.md)
Use
poetry runprefix for all Python commands, including testing, linting, formatting, and migrations
Files:
autogpt_platform/backend/backend/api/features/chat/tools/run_block.pyautogpt_platform/backend/backend/api/features/chat/tools/binary_output_processor.pyautogpt_platform/backend/backend/api/features/chat/tools/test_binary_output_processor.py
autogpt_platform/backend/backend/api/**/*.py
📄 CodeRabbit inference engine (autogpt_platform/backend/CLAUDE.md)
autogpt_platform/backend/backend/api/**/*.py: Use FastAPI for building REST and WebSocket endpoints
Use JWT-based authentication with Supabase integration
Files:
autogpt_platform/backend/backend/api/features/chat/tools/run_block.pyautogpt_platform/backend/backend/api/features/chat/tools/binary_output_processor.pyautogpt_platform/backend/backend/api/features/chat/tools/test_binary_output_processor.py
autogpt_platform/backend/backend/**/*.py
📄 CodeRabbit inference engine (autogpt_platform/backend/CLAUDE.md)
Use Prisma ORM for database operations in PostgreSQL with pgvector for embeddings
Files:
autogpt_platform/backend/backend/api/features/chat/tools/run_block.pyautogpt_platform/backend/backend/api/features/chat/tools/binary_output_processor.pyautogpt_platform/backend/backend/api/features/chat/tools/test_binary_output_processor.py
autogpt_platform/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
Format Python code with
poetry run format
Files:
autogpt_platform/backend/backend/api/features/chat/tools/run_block.pyautogpt_platform/backend/backend/api/features/chat/tools/binary_output_processor.pyautogpt_platform/backend/backend/api/features/chat/tools/test_binary_output_processor.py
autogpt_platform/backend/**/*test*.py
📄 CodeRabbit inference engine (AGENTS.md)
Run
poetry run testfor backend testing (runs pytest with docker based postgres + prisma)
Files:
autogpt_platform/backend/backend/api/features/chat/tools/test_binary_output_processor.py
🧠 Learnings (6)
📚 Learning: 2026-02-04T16:50:20.494Z
Learnt from: CR
Repo: Significant-Gravitas/AutoGPT PR: 0
File: autogpt_platform/backend/CLAUDE.md:0-0
Timestamp: 2026-02-04T16:50:20.494Z
Learning: Applies to autogpt_platform/backend/backend/blocks/*.py : When adding new blocks, analyze block interfaces to ensure inputs and outputs tie well together for productive graph-based editor connections
Applied to files:
autogpt_platform/backend/backend/api/features/chat/tools/run_block.py
📚 Learning: 2026-02-04T16:50:20.494Z
Learnt from: CR
Repo: Significant-Gravitas/AutoGPT PR: 0
File: autogpt_platform/backend/CLAUDE.md:0-0
Timestamp: 2026-02-04T16:50:20.494Z
Learning: Applies to autogpt_platform/backend/backend/blocks/*.py : Never hardcode workspace checks when using `store_media_file()` - let `for_block_output` handle context adaptation automatically
Applied to files:
autogpt_platform/backend/backend/api/features/chat/tools/run_block.pyautogpt_platform/backend/backend/api/features/chat/tools/binary_output_processor.py
📚 Learning: 2026-02-04T16:49:42.476Z
Learnt from: CR
Repo: Significant-Gravitas/AutoGPT PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-02-04T16:49:42.476Z
Learning: Applies to autogpt_platform/backend/backend/blocks/**/*.py : Implement 'run' method with proper error handling in backend blocks
Applied to files:
autogpt_platform/backend/backend/api/features/chat/tools/run_block.py
📚 Learning: 2026-02-04T16:49:42.476Z
Learnt from: CR
Repo: Significant-Gravitas/AutoGPT PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-02-04T16:49:42.476Z
Learning: Applies to autogpt_platform/backend/backend/blocks/**/*.py : Inherit from 'Block' base class with input/output schemas when adding new blocks in backend
Applied to files:
autogpt_platform/backend/backend/api/features/chat/tools/run_block.py
📚 Learning: 2026-02-04T16:49:56.176Z
Learnt from: CR
Repo: Significant-Gravitas/AutoGPT PR: 0
File: autogpt_platform/CLAUDE.md:0-0
Timestamp: 2026-02-04T16:49:56.176Z
Learning: Applies to autogpt_platform/backend/backend/blocks/**/*.py : Backend architecture uses Blocks in `backend/backend/blocks/` as reusable components that perform specific tasks
Applied to files:
autogpt_platform/backend/backend/api/features/chat/tools/run_block.py
📚 Learning: 2026-02-04T16:50:20.494Z
Learnt from: CR
Repo: Significant-Gravitas/AutoGPT PR: 0
File: autogpt_platform/backend/CLAUDE.md:0-0
Timestamp: 2026-02-04T16:50:20.494Z
Learning: Applies to autogpt_platform/backend/backend/blocks/*.py : Implement blocks with an async `run` method and generate unique block IDs using `uuid.uuid4()`
Applied to files:
autogpt_platform/backend/backend/api/features/chat/tools/run_block.py
🧬 Code graph analysis (3)
autogpt_platform/backend/backend/api/features/chat/tools/run_block.py (3)
autogpt_platform/backend/backend/api/features/chat/tools/binary_output_processor.py (1)
process_binary_outputs(20-41)autogpt_platform/backend/backend/data/workspace.py (1)
get_or_create_workspace(19-40)autogpt_platform/backend/backend/util/workspace.py (1)
WorkspaceManager(30-419)
autogpt_platform/backend/backend/api/features/chat/tools/binary_output_processor.py (2)
autogpt_platform/backend/backend/util/workspace.py (2)
WorkspaceManager(30-419)write_file(155-287)autogpt_platform/backend/backend/api/features/chat/tools/test_binary_output_processor.py (1)
workspace_manager(13-16)
autogpt_platform/backend/backend/api/features/chat/tools/test_binary_output_processor.py (2)
autogpt_platform/backend/backend/api/features/chat/tools/binary_output_processor.py (2)
_decode_base64(112-119)process_binary_outputs(20-41)autogpt_platform/backend/backend/util/workspace.py (1)
write_file(155-287)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
- GitHub Check: types
- GitHub Check: Seer Code Review
- GitHub Check: test (3.13)
- GitHub Check: test (3.11)
- GitHub Check: test (3.12)
- GitHub Check: Check PR Status
🔇 Additional comments (7)
autogpt_platform/backend/backend/api/features/chat/tools/binary_output_processor.py (3)
20-41: Solid post-processing + dedup flow.
Clear, minimal traversal and cache usage; matches the intended behavior.
44-88: Recursive traversal looks correct.
Handles dict/list nesting without mutating originals; good.
91-109: Save path + graceful fallback are good.
The error handling preserves original data as intended.autogpt_platform/backend/backend/api/features/chat/tools/test_binary_output_processor.py (3)
12-16: Good test fixture setup.
Clean AsyncMock usage for workspace writes.
19-28: Decode tests are concise and effective.
Covers raw, data URI, and invalid inputs.
31-92: Comprehensive processing coverage.
Covers size threshold, dedup, failure fallback, nesting, and lists.autogpt_platform/backend/backend/api/features/chat/tools/run_block.py (1)
328-341: Integration looks correct.
Processing outputs before returning the response fits the intended flow.
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
autogpt_platform/backend/backend/api/features/chat/tools/binary_output_processor.py
Show resolved
Hide resolved
|
@claude please review this PR |
|
Claude finished @Torantulino's task in 2m 23s —— View job PR Review: Auto-save binary block outputs to workspace
Overall this is a well-scoped change that solves a real problem — the code executor block's Issues1. Strict base64 validation ( As CodeRabbit also flagged: def _decode_base64(value: str) -> bytes | None:
try:
if value.startswith("data:"):
value = value.split(",", 1)[1] if "," in value else value
padded = value + "=" * (-len(value) % 4)
return base64.b64decode(padded, validate=True)
except (binascii.Error, ValueError):
return None2. Hashing the base64 string vs. decoded content ( The deduplication hash is computed on the raw base64 string ( 3. Missing docstrings — 25% coverage flagged by pre-merge check
Observations (non-blocking)4. Overlap with Most other image-producing blocks (screenshotone, ai_image_generator, flux_kontext, bannerbear, etc.) already use 5. Test file location The test file is placed at 6.
7. Filename truncation is somewhat arbitrary (
SummaryThe architecture is sound — a single-point interception that cleanly handles the token-waste problem for code executor outputs. The main actionable item is the base64 validation fix (issue #1), which prevents silently saving corrupted binary data. The docstring coverage (issue #3) will need to be addressed for the pre-merge check to pass. |
Addresses CodeRabbit review feedback: - Add padding normalization before decoding - Use validate=True to reject invalid characters instead of silently discarding This prevents corrupted data from being saved to workspace.
Adds docstrings to _process_item and _process_dict to meet the 80% docstring coverage requirement.
|
Thanks for the thorough review! Addressed:
Declined:
Non-blocking observations acknowledged — Good points for future reference, no changes needed. |
ⓘ Your monthly quota for Qodo has expired. Upgrade your plan ⓘ Paying users. Check that your Qodo account is linked with this Git user account |
| and isinstance(value, str) | ||
| and len(value) > SIZE_THRESHOLD | ||
| ): | ||
| content_hash = hashlib.sha256(value.encode()).hexdigest() |
There was a problem hiding this comment.
Bug: Deduplication fails because it hashes the raw base64 string before decoding, causing identical binary content with different string formats (e.g., with/without a data URI prefix) to be saved as separate files.
Severity: MEDIUM
Suggested Fix
The base64 string value should be decoded into its binary form before the hash is computed. This ensures that the hash represents the actual content, not its string representation, allowing for correct deduplication across different base64 formats.
Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent.
Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not
valid.
Location:
autogpt_platform/backend/backend/api/features/chat/tools/binary_output_processor.py#L70
Potential issue: The deduplication logic in `binary_output_processor.py` computes a
SHA256 hash on the raw base64 string representation of binary data at line 70. However,
the same binary content can be represented by different strings, such as raw base64
(`"ABC..."`) or a data URI (`"data:image/png;base64,ABC..."`). Because the hashing
occurs before the string is decoded, these different representations produce different
hashes for the same underlying data. This causes the deduplication to fail, leading to
redundant storage of identical binary files in the workspace and defeating the feature's
purpose of reducing storage and token overhead.
Did we get this right? 👍 / 👎 to inform future reviews.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In
`@autogpt_platform/backend/backend/api/features/chat/tools/binary_output_processor.py`:
- Around line 103-107: The filename construction uses the raw block value and
only lower()/replace(' ', '_'), which allows path separators and unsafe
characters to reach storage backends; update the filename creation in
binary_output_processor.py to sanitize the block first (e.g., strip or replace
path separators, remove or percent/underscore non-alphanumeric characters, allow
only [a-z0-9_-], and truncate to the desired length) before building filename =
f"{...}_{field}_{uuid...}.{ext}"; apply this sanitized_block when calling
wm.write_file so both LocalWorkspaceStorage and GCSWorkspaceStorage receive a
safe filename (you can reuse or call an existing sanitize_filename helper if
available, or add a small sanitizer function used by the code that creates the
filename).
🧹 Nitpick comments (1)
autogpt_platform/backend/backend/api/features/chat/tools/binary_output_processor.py (1)
64-78: Hash binary content after decoding to improve dedup.Right now the hash is computed on the base64 string, so identical bytes encoded differently (e.g., data URI vs raw, differing padding) won’t deduplicate. Consider hashing decoded bytes for binary fields before caching.
♻️ Suggested change
- content_hash = hashlib.sha256(value.encode()).hexdigest() + if key in BINARY_FIELDS: + decoded = _decode_base64(value) + if decoded is None: + result[key] = value + continue + content_hash = hashlib.sha256(decoded).hexdigest() + else: + content_hash = hashlib.sha256(value.encode()).hexdigest()
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
autogpt_platform/backend/backend/api/features/chat/tools/binary_output_processor.py
🧰 Additional context used
📓 Path-based instructions (6)
autogpt_platform/backend/**/*.py
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
autogpt_platform/backend/**/*.py: Use Python 3.11 (required; managed by Poetry via pyproject.toml) for backend development
Always run 'poetry run format' (Black + isort) before linting in backend development
Always run 'poetry run lint' (ruff) after formatting in backend development
Files:
autogpt_platform/backend/backend/api/features/chat/tools/binary_output_processor.py
autogpt_platform/backend/backend/api/features/**/*.py
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
Update routes in '/backend/backend/api/features/' and add/update Pydantic models in the same directory for API development
When modifying API routes, update corresponding Pydantic models in the same directory and write tests alongside the route file
Files:
autogpt_platform/backend/backend/api/features/chat/tools/binary_output_processor.py
autogpt_platform/backend/**/*.{py,txt}
📄 CodeRabbit inference engine (autogpt_platform/backend/CLAUDE.md)
Use
poetry runprefix for all Python commands, including testing, linting, formatting, and migrations
Files:
autogpt_platform/backend/backend/api/features/chat/tools/binary_output_processor.py
autogpt_platform/backend/backend/api/**/*.py
📄 CodeRabbit inference engine (autogpt_platform/backend/CLAUDE.md)
autogpt_platform/backend/backend/api/**/*.py: Use FastAPI for building REST and WebSocket endpoints
Use JWT-based authentication with Supabase integration
Files:
autogpt_platform/backend/backend/api/features/chat/tools/binary_output_processor.py
autogpt_platform/backend/backend/**/*.py
📄 CodeRabbit inference engine (autogpt_platform/backend/CLAUDE.md)
Use Prisma ORM for database operations in PostgreSQL with pgvector for embeddings
Files:
autogpt_platform/backend/backend/api/features/chat/tools/binary_output_processor.py
autogpt_platform/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
Format Python code with
poetry run format
Files:
autogpt_platform/backend/backend/api/features/chat/tools/binary_output_processor.py
🧠 Learnings (2)
📓 Common learnings
Learnt from: CR
Repo: Significant-Gravitas/AutoGPT PR: 0
File: autogpt_platform/backend/CLAUDE.md:0-0
Timestamp: 2026-02-04T16:50:20.494Z
Learning: Applies to autogpt_platform/backend/backend/blocks/*.py : Never hardcode workspace checks when using `store_media_file()` - let `for_block_output` handle context adaptation automatically
📚 Learning: 2026-02-04T16:50:20.494Z
Learnt from: CR
Repo: Significant-Gravitas/AutoGPT PR: 0
File: autogpt_platform/backend/CLAUDE.md:0-0
Timestamp: 2026-02-04T16:50:20.494Z
Learning: Applies to autogpt_platform/backend/backend/blocks/*.py : Never hardcode workspace checks when using `store_media_file()` - let `for_block_output` handle context adaptation automatically
Applied to files:
autogpt_platform/backend/backend/api/features/chat/tools/binary_output_processor.py
🧬 Code graph analysis (1)
autogpt_platform/backend/backend/api/features/chat/tools/binary_output_processor.py (2)
autogpt_platform/backend/backend/util/workspace.py (2)
WorkspaceManager(30-419)write_file(155-287)autogpt_platform/backend/backend/api/features/chat/tools/test_binary_output_processor.py (1)
workspace_manager(13-16)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
- GitHub Check: types
- GitHub Check: Seer Code Review
- GitHub Check: test (3.12)
- GitHub Check: test (3.11)
- GitHub Check: test (3.13)
- GitHub Check: Check PR Status
🔇 Additional comments (4)
autogpt_platform/backend/backend/api/features/chat/tools/binary_output_processor.py (4)
14-18: Clear field taxonomy and threshold.Nice, concise separation of binary vs text saveables, and the threshold is explicit.
20-41: Top-level processing flow looks solid.Clean traversal and single-call dedup cache are easy to reason about.
44-55: Recursive item handling is straightforward.List/dict recursion is clear and keeps non-collection items untouched.
114-123: LGTM on the strict decode guardrails.The failure path is contained and keeps original data intact.
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
| ext = {"jpeg": "jpg"}.get(field, field) | ||
| filename = f"{block.lower().replace(' ', '_')[:20]}_{field}_{uuid.uuid4().hex[:12]}.{ext}" | ||
|
|
||
| file = await wm.write_file(content=content, filename=filename) | ||
| return f"workspace://{file.id}" |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
find autogpt_platform/backend -name "workspace.py" -type fRepository: Significant-Gravitas/AutoGPT
Length of output: 173
🏁 Script executed:
rg -n "def write_file" autogpt_platform/backend --type py -A 10 | head -50Repository: Significant-Gravitas/AutoGPT
Length of output: 947
🏁 Script executed:
cat autogpt_platform/backend/backend/util/workspace.py | head -200Repository: Significant-Gravitas/AutoGPT
Length of output: 6735
🏁 Script executed:
sed -n '155,250p' autogpt_platform/backend/backend/util/workspace.pyRepository: Significant-Gravitas/AutoGPT
Length of output: 3777
🏁 Script executed:
sed -n '215,240p' autogpt_platform/backend/backend/util/workspace.pyRepository: Significant-Gravitas/AutoGPT
Length of output: 951
🏁 Script executed:
find autogpt_platform/backend -name "workspace_storage.py" -type fRepository: Significant-Gravitas/AutoGPT
Length of output: 130
🏁 Script executed:
rg -n "def store\|class.*Storage" autogpt_platform/backend/backend/util/workspace_storage.py -A 5 | head -80Repository: Significant-Gravitas/AutoGPT
Length of output: 54
🏁 Script executed:
cat autogpt_platform/backend/backend/util/workspace_storage.py | head -300Repository: Significant-Gravitas/AutoGPT
Length of output: 10397
🏁 Script executed:
rg -n "def sanitize_filename" autogpt_platform/backend --type py -A 15Repository: Significant-Gravitas/AutoGPT
Length of output: 1471
🏁 Script executed:
cat autogpt_platform/backend/backend/api/features/chat/tools/binary_output_processor.py | head -120Repository: Significant-Gravitas/AutoGPT
Length of output: 4119
Sanitize block parameter before using in filename.
The block parameter is only cleaned with .lower().replace(' ', '_'), which leaves path separators (/, \) and other unsafe characters. While LocalWorkspaceStorage mitigates this via sanitize_filename(), GCSWorkspaceStorage passes the filename directly to blob paths without sanitization, allowing malformed blob paths in cloud deployments.
Sanitize block before constructing the filename:
Suggested fix
+from backend.util.file import sanitize_filename
+
ext = {"jpeg": "jpg"}.get(field, field)
- filename = f"{block.lower().replace(' ', '_')[:20]}_{field}_{uuid.uuid4().hex[:12]}.{ext}"
+ safe_block = sanitize_filename(block.lower())[:20]
+ filename = f"{safe_block}_{field}_{uuid.uuid4().hex[:12]}.{ext}"🤖 Prompt for AI Agents
In
`@autogpt_platform/backend/backend/api/features/chat/tools/binary_output_processor.py`
around lines 103 - 107, The filename construction uses the raw block value and
only lower()/replace(' ', '_'), which allows path separators and unsafe
characters to reach storage backends; update the filename creation in
binary_output_processor.py to sanitize the block first (e.g., strip or replace
path separators, remove or percent/underscore non-alphanumeric characters, allow
only [a-z0-9_-], and truncate to the desired length) before building filename =
f"{...}_{field}_{uuid...}.{ext}"; apply this sanitized_block when calling
wm.write_file so both LocalWorkspaceStorage and GCSWorkspaceStorage receive a
safe filename (you can reuse or call an existing sanitize_filename helper if
available, or add a small sanitizer function used by the code that creates the
filename).
There was a problem hiding this comment.
@ntindle Thank you for confirming! Since the block parameter comes from user-controlled input (block names), sanitizing it before filename construction will help ensure safe filenames across both local and cloud storage backends.
The suggested fix using sanitize_filename from backend.util.file should handle path separators, special characters, and other unsafe inputs. Let me know if you'd like me to help with anything else related to this change! 👍
🧠 Learnings used
Learnt from: CR
Repo: Significant-Gravitas/AutoGPT PR: 0
File: autogpt_platform/backend/CLAUDE.md:0-0
Timestamp: 2026-02-04T16:50:20.494Z
Learning: Applies to autogpt_platform/backend/backend/blocks/*.py : Never hardcode workspace checks when using `store_media_file()` - let `for_block_output` handle context adaptation automatically
Learnt from: CR
Repo: Significant-Gravitas/AutoGPT PR: 0
File: autogpt_platform/backend/CLAUDE.md:0-0
Timestamp: 2026-02-04T16:50:20.494Z
Learning: Applies to autogpt_platform/backend/backend/blocks/*.py : Always use `for_block_output` format in `store_media_file()` for block outputs unless there is a specific reason not to
Learnt from: CR
Repo: Significant-Gravitas/AutoGPT PR: 0
File: autogpt_platform/backend/CLAUDE.md:0-0
Timestamp: 2026-02-04T16:50:20.494Z
Learning: Applies to autogpt_platform/backend/backend/blocks/*.py : When working with files in blocks, use `store_media_file()` from `backend.util.file` with appropriate `return_format` parameter: `for_local_processing` for local tools, `for_external_api` for external APIs, `for_block_output` for block outputs
Learnt from: CR
Repo: Significant-Gravitas/AutoGPT PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-02-04T16:49:42.476Z
Learning: Applies to autogpt_platform/backend/backend/blocks/**/*.py : Generate block UUID using 'uuid.uuid4()' when creating new blocks in backend
ⓘ Your monthly quota for Qodo has expired. Upgrade your plan ⓘ Paying users. Check that your Qodo account is linked with this Git user account |
Summary
When CoPilot executes blocks that produce binary outputs (PDFs, images, charts), the data was previously returned as raw base64 strings. This caused:
Solution
This PR intercepts block outputs in
run_block.py(the single entry point for all block executions in CoPilot) and:png,jpeg,pdf,svg)workspace://referencesImplementation
binary_output_processor.pyrun_block.pytest_binary_output_processor.pyKey Design Decisions
Expected Impact
Testing
Fixes SECRT-1887