Skip to content

Commit 57a06f7

Browse files
majdyzclaudentindlegithub-actions[bot]Pwuts
authored
fix(blocks, security): Fixes for various DoS vulnerabilities (#10798)
This PR addresses multiple critical and medium security vulnerabilities that could lead to Denial of Service (DoS) attacks. All fixes implement defense-in-depth strategies with comprehensive testing. ### Changes 🏗️ #### **Critical Security Fixes:** 1. **GHSA-m2wr-7m3r-p52c - ReDoS in CodeExtractionBlock** - Fixed catastrophic backtracking in regex patterns `\s+[\s\S]*?` and `\s+(.*?)` - Replaced with safer patterns: `[ \t]*\n([^\s\S]*?)` - Files: `backend/blocks/code_extraction_block.py` 2. **GHSA-955p-gpfx-r66j - AITextSummarizerBlock Memory Amplification** - Added 1MB text size limit and 100 chunk maximum - Prevents 10K input → 50G memory amplification attacks - Files: `backend/blocks/llm.py` 3. **GHSA-5cqw-g779-9f9x - RSS Feed XML Bomb DoS** - Added 10MB feed size limit and 30s timeout - Prevents deep XML parsing memory exhaustion - Files: `backend/blocks/rss.py` 4. **GHSA-7g34-7fvq-xxq6 - File Storage Disk Exhaustion** - Added 100MB per file and 1GB per execution directory limits - Prevents disk space exhaustion from file uploads - Files: `backend/util/file.py` 5. **GHSA-pppq-xx2w-7jpq - ExtractTextInformationBlock ReDoS** - Added 1MB text limit, 1000 match limit, and 5s timeout protection - Prevents lookahead pattern memory exhaustion - Files: `backend/blocks/text.py` 6. **GHSA-vw3v-whvp-33v5 - Docker Logging Disk Exhaustion** - Added log rotation limits at Docker (10MB × 3 files) and application levels - Prevents unbounded log growth causing disk exhaustion - Files: `docker-compose.platform.yml`, `autogpt_libs/autogpt_libs/logging/config.py` #### **Additional Security Improvements:** 7. **StepThroughItemsBlock DoS Prevention** - Added 10,000 item limit and 1MB input size limit - Prevents large iteration DoS attacks - Files: `backend/blocks/iteration.py` 8. **XMLParserBlock XML Bomb Prevention** - Added 10MB XML input size limit - Files: `backend/blocks/xml_parser.py` #### **Code Quality:** - Fixed Python 3.10 typing compatibility issues - Added comprehensive security test suite - All code formatted and linted ### Checklist 📋 #### For code changes: - [x] I have clearly listed my changes in the PR description - [x] I have made a test plan - [x] I have tested my changes according to the test plan: - [x] Created comprehensive security test suite covering all vulnerabilities - [x] Verified ReDoS patterns are fixed and don't cause timeouts - [x] Confirmed memory limits prevent amplification attacks - [x] Tested file size limits prevent disk exhaustion - [x] Validated log rotation prevents unbounded growth - [x] Ensured backward compatibility for normal usage #### For configuration changes: - [x] `docker-compose.yml` is updated with logging limits - [x] I have included a list of my configuration changes in the PR description (under **Changes**) ### Test Plan 🧪 **Security Tests:** 1. **ReDoS Protection**: Tested with malicious regex inputs (large spaces) - completes without hanging 2. **Memory Limits**: Verified 2MB text input gets truncated to 1MB, chunk limits enforced 3. **File Size Limits**: Confirmed 200MB files rejected, directory size limits enforced 4. **Iteration Limits**: Tested 20K item arrays rejected, large JSON strings rejected 5. **Timeout Protection**: Dangerous regex patterns timeout after 5s instead of hanging **Compatibility Tests:** - Normal functionality preserved for all blocks - Existing tests pass with new security limits - Performance impact minimal for typical usage ### Security Impact 🛡️ **Before:** Multiple attack vectors could cause: - CPU exhaustion (ReDoS attacks) - Memory exhaustion (amplification attacks) - Disk exhaustion (file/log bombs) - Service unavailability **After:** All attack vectors mitigated with: - Input validation and size limits - Timeout protections - Resource quotas - Defense-in-depth approach All fixes maintain backward compatibility while preventing DoS attacks. 🤖 Generated with [Claude Code](https://claude.ai/code) <!-- CURSOR_SUMMARY --> --- > [!NOTE] > Adds robust DoS protections across blocks (regex, memory, iteration, XML/RSS, file I/O) and enables app/Docker log rotation with comprehensive tests. > > - **Security hardening**: > - Replace unsafe regex in `backend/blocks/code_extraction_block.py` to prevent ReDoS; add safer extraction/removal patterns. > - Constrain LLM summarizer chunking in `backend/blocks/llm.py` (1MB cap, chunk/overlap validation, chunk count limit). > - Limit RSS fetching in `backend/blocks/rss.py` (scheme validation, 10MB cap, timeout, bounded read) and return empty on failure. > - Impose XML size limit (10MB) in `backend/blocks/xml_parser.py`. > - Add file upload/download limits in `backend/util/file.py` (100MB/file, 1GB dir quota) and enforce scanning before write. > - Enable rotating file logs in `autogpt_libs/logging/config.py` (size + backups) and Docker json-file log rotation in `docker-compose.platform.yml`. > - **Iteration block**: > - Add item count/string size limits; fix yielded key for dicts; cap iterations in `backend/blocks/iteration.py`. > - **Tests**: > - New `backend/blocks/test/test_security_fixes.py` covering ReDoS, timeouts, memory/size and iteration limits, XML/file constraints. > - **Misc**: > - Typing fallback for `NotRequired` in `activity_status_generator.py`. > - Dependency updates in `backend/poetry.lock`. > > <sup>Written by [Cursor Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit 500e157. This will update automatically on new commits. Configure [here](https://cursor.com/dashboard?tab=bugbot).</sup> <!-- /CURSOR_SUMMARY --> --------- Co-authored-by: Claude <[email protected]> Co-authored-by: Nicholas Tindle <[email protected]> Co-authored-by: claude[bot] <41898282+claude[bot]@users.noreply.github.com> Co-authored-by: Nicholas Tindle <[email protected]> Co-authored-by: Zamil Majdy <[email protected]> Co-authored-by: Reinier van der Leer <[email protected]> Co-authored-by: Reinier van der Leer <[email protected]>
1 parent 258bf0b commit 57a06f7

File tree

13 files changed

+667
-123
lines changed

13 files changed

+667
-123
lines changed

autogpt_platform/autogpt_libs/autogpt_libs/logging/config.py

Lines changed: 22 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,7 @@
44
import os
55
import socket
66
import sys
7+
from logging.handlers import RotatingFileHandler
78
from pathlib import Path
89

910
from pydantic import Field, field_validator
@@ -139,8 +140,13 @@ def configure_logging(force_cloud_logging: bool = False) -> None:
139140
print(f"Log directory: {config.log_dir}")
140141

141142
# Activity log handler (INFO and above)
142-
activity_log_handler = logging.FileHandler(
143-
config.log_dir / LOG_FILE, "a", "utf-8"
143+
# Security fix: Use RotatingFileHandler with size limits to prevent disk exhaustion
144+
activity_log_handler = RotatingFileHandler(
145+
config.log_dir / LOG_FILE,
146+
mode="a",
147+
encoding="utf-8",
148+
maxBytes=10 * 1024 * 1024, # 10MB per file
149+
backupCount=3, # Keep 3 backup files (40MB total)
144150
)
145151
activity_log_handler.setLevel(config.level)
146152
activity_log_handler.setFormatter(
@@ -150,8 +156,13 @@ def configure_logging(force_cloud_logging: bool = False) -> None:
150156

151157
if config.level == logging.DEBUG:
152158
# Debug log handler (all levels)
153-
debug_log_handler = logging.FileHandler(
154-
config.log_dir / DEBUG_LOG_FILE, "a", "utf-8"
159+
# Security fix: Use RotatingFileHandler with size limits
160+
debug_log_handler = RotatingFileHandler(
161+
config.log_dir / DEBUG_LOG_FILE,
162+
mode="a",
163+
encoding="utf-8",
164+
maxBytes=10 * 1024 * 1024, # 10MB per file
165+
backupCount=3, # Keep 3 backup files (40MB total)
155166
)
156167
debug_log_handler.setLevel(logging.DEBUG)
157168
debug_log_handler.setFormatter(
@@ -160,8 +171,13 @@ def configure_logging(force_cloud_logging: bool = False) -> None:
160171
log_handlers.append(debug_log_handler)
161172

162173
# Error log handler (ERROR and above)
163-
error_log_handler = logging.FileHandler(
164-
config.log_dir / ERROR_LOG_FILE, "a", "utf-8"
174+
# Security fix: Use RotatingFileHandler with size limits
175+
error_log_handler = RotatingFileHandler(
176+
config.log_dir / ERROR_LOG_FILE,
177+
mode="a",
178+
encoding="utf-8",
179+
maxBytes=10 * 1024 * 1024, # 10MB per file
180+
backupCount=3, # Keep 3 backup files (40MB total)
165181
)
166182
error_log_handler.setLevel(logging.ERROR)
167183
error_log_handler.setFormatter(AGPTFormatter(DEBUG_LOG_FORMAT, no_color=True))

autogpt_platform/backend/backend/blocks/code_extraction_block.py

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -90,7 +90,7 @@ async def run(self, input_data: Input, **kwargs) -> BlockOutput:
9090
for aliases in language_aliases.values()
9191
for alias in aliases
9292
)
93-
+ r")\s+[\s\S]*?```"
93+
+ r")[ \t]*\n[\s\S]*?```"
9494
)
9595

9696
remaining_text = re.sub(pattern, "", input_data.text).strip()
@@ -103,7 +103,9 @@ def extract_code(self, text: str, language: str) -> str:
103103
# Escape special regex characters in the language string
104104
language = re.escape(language)
105105
# Extract all code blocks enclosed in ```language``` blocks
106-
pattern = re.compile(rf"```{language}\s+(.*?)```", re.DOTALL | re.IGNORECASE)
106+
pattern = re.compile(
107+
rf"```{language}[ \t]*\n(.*?)\n```", re.DOTALL | re.IGNORECASE
108+
)
107109
matches = pattern.finditer(text)
108110
# Combine all code blocks for this language with newlines between them
109111
code_blocks = [match.group(1).strip() for match in matches]

autogpt_platform/backend/backend/blocks/iteration.py

Lines changed: 26 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -54,20 +54,43 @@ def __init__(self):
5454
)
5555

5656
async def run(self, input_data: Input, **kwargs) -> BlockOutput:
57+
# Security fix: Add limits to prevent DoS from large iterations
58+
MAX_ITEMS = 10000 # Maximum items to iterate
59+
MAX_ITEM_SIZE = 1024 * 1024 # 1MB per item
60+
5761
for data in [input_data.items, input_data.items_object, input_data.items_str]:
5862
if not data:
5963
continue
64+
65+
# Limit string size before parsing
6066
if isinstance(data, str):
67+
if len(data) > MAX_ITEM_SIZE:
68+
raise ValueError(
69+
f"Input too large: {len(data)} bytes > {MAX_ITEM_SIZE} bytes"
70+
)
6171
items = json.loads(data)
6272
else:
6373
items = data
74+
75+
# Check total item count
76+
if isinstance(items, (list, dict)):
77+
if len(items) > MAX_ITEMS:
78+
raise ValueError(f"Too many items: {len(items)} > {MAX_ITEMS}")
79+
80+
iteration_count = 0
6481
if isinstance(items, dict):
6582
# If items is a dictionary, iterate over its values
66-
for item in items.values():
67-
yield "item", item
68-
yield "key", item
83+
for key, value in items.items():
84+
if iteration_count >= MAX_ITEMS:
85+
break
86+
yield "item", value
87+
yield "key", key # Fixed: should yield key, not item
88+
iteration_count += 1
6989
else:
7090
# If items is a list, iterate over the list
7191
for index, item in enumerate(items):
92+
if iteration_count >= MAX_ITEMS:
93+
break
7294
yield "item", item
7395
yield "key", index
96+
iteration_count += 1

autogpt_platform/backend/backend/blocks/llm.py

Lines changed: 17 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1404,11 +1404,27 @@ async def _run(
14041404

14051405
@staticmethod
14061406
def _split_text(text: str, max_tokens: int, overlap: int) -> list[str]:
1407+
# Security fix: Add validation to prevent DoS attacks
1408+
# Limit text size to prevent memory exhaustion
1409+
MAX_TEXT_LENGTH = 1_000_000 # 1MB character limit
1410+
MAX_CHUNKS = 100 # Maximum number of chunks to prevent excessive memory use
1411+
1412+
if len(text) > MAX_TEXT_LENGTH:
1413+
text = text[:MAX_TEXT_LENGTH]
1414+
1415+
# Ensure chunk_size is at least 1 to prevent infinite loops
1416+
chunk_size = max(1, max_tokens - overlap)
1417+
1418+
# Ensure overlap is less than max_tokens to prevent invalid configurations
1419+
if overlap >= max_tokens:
1420+
overlap = max(0, max_tokens - 1)
1421+
14071422
words = text.split()
14081423
chunks = []
1409-
chunk_size = max_tokens - overlap
14101424

14111425
for i in range(0, len(words), chunk_size):
1426+
if len(chunks) >= MAX_CHUNKS:
1427+
break # Limit the number of chunks to prevent memory exhaustion
14121428
chunk = " ".join(words[i : i + max_tokens])
14131429
chunks.append(chunk)
14141430

autogpt_platform/backend/backend/blocks/rss.py

Lines changed: 35 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,7 @@
11
import asyncio
2+
import logging
3+
import urllib.parse
4+
import urllib.request
25
from datetime import datetime, timedelta, timezone
36
from typing import Any
47

@@ -101,7 +104,38 @@ def __init__(self):
101104

102105
@staticmethod
103106
def parse_feed(url: str) -> dict[str, Any]:
104-
return feedparser.parse(url) # type: ignore
107+
# Security fix: Add protection against memory exhaustion attacks
108+
MAX_FEED_SIZE = 10 * 1024 * 1024 # 10MB limit for RSS feeds
109+
110+
# Validate URL
111+
parsed_url = urllib.parse.urlparse(url)
112+
if parsed_url.scheme not in ("http", "https"):
113+
raise ValueError(f"Invalid URL scheme: {parsed_url.scheme}")
114+
115+
# Download with size limit
116+
try:
117+
with urllib.request.urlopen(url, timeout=30) as response:
118+
# Check content length if available
119+
content_length = response.headers.get("Content-Length")
120+
if content_length and int(content_length) > MAX_FEED_SIZE:
121+
raise ValueError(
122+
f"Feed too large: {content_length} bytes exceeds {MAX_FEED_SIZE} limit"
123+
)
124+
125+
# Read with size limit
126+
content = response.read(MAX_FEED_SIZE + 1)
127+
if len(content) > MAX_FEED_SIZE:
128+
raise ValueError(
129+
f"Feed too large: exceeds {MAX_FEED_SIZE} byte limit"
130+
)
131+
132+
# Parse with feedparser using the validated content
133+
# feedparser has built-in protection against XML attacks
134+
return feedparser.parse(content) # type: ignore
135+
except Exception as e:
136+
# Log error and return empty feed
137+
logging.warning(f"Failed to parse RSS feed from {url}: {e}")
138+
return {"entries": []}
105139

106140
async def run(self, input_data: Input, **kwargs) -> BlockOutput:
107141
keep_going = True

0 commit comments

Comments
 (0)