Skip to content

Comments

Fix XXE vulnerabilities in XML parsing#1364

Open
codingfrog wants to merge 1 commit intoarsenetar:masterfrom
codingfrog:security/fix-xxe-vulnerabilities
Open

Fix XXE vulnerabilities in XML parsing#1364
codingfrog wants to merge 1 commit intoarsenetar:masterfrom
codingfrog:security/fix-xxe-vulnerabilities

Conversation

@codingfrog
Copy link

@codingfrog codingfrog commented Feb 14, 2026

Context

During a security audit of dupeGuru, we identified XML External Entity (XXE) vulnerabilities in all 4 locations where XML files are parsed.

Vulnerability

dupeGuru uses xml.etree.ElementTree.parse() to load user-editable XML files (saved results, ignore lists, exclusion lists, directory lists). The standard library's XML parser does not disable external entity processing by default, which means a crafted XML file could:

  • Read arbitrary files from the filesystem (e.g., /etc/passwd, SSH keys, browser cookies)
  • Trigger server-side request forgery (SSRF) by making HTTP requests to internal services
  • Cause denial of service via recursive entity expansion ("billion laughs" attack)

Affected locations:

File Method What it loads
core/directories.py:249 load_from_file() Saved directory scan list
core/results.py:231 load_from_xml() Saved duplicate results
core/ignore.py:99 load_from_xml() Ignore list (pairs the user chose to ignore)
core/exclude.py:324 load_from_xml() Regex exclusion patterns

Attack scenario

  1. A user saves their dupeGuru results to XML
  2. An attacker modifies the XML file (or tricks the user into opening a crafted one) to include an external entity declaration like:
    <!DOCTYPE foo [<!ENTITY xxe SYSTEM "file:///etc/passwd">]>
  3. When dupeGuru loads the file, the parser resolves the entity and the file contents are embedded in the parsed data
  4. The data could be exposed through the UI or subsequent XML exports

Fix

  • Add defusedxml>=0.7.1,<1.0.0 to requirements.txt — this is the standard Python library for safe XML parsing
  • Replace ET.parse(infile) with defusedxml.ElementTree.parse(infile) in all 4 files
  • Keep xml.etree.ElementTree for XML writing (Element, SubElement, etc.) which is not vulnerable

defusedxml disables external entity processing, DTD fetching, and entity expansion by default.

Test plan

  • pytest core/tests/directories_test.py core/tests/results_test.py core/tests/ignore_test.py core/tests/exclude_test.py — 167 tests pass
  • Verified defusedxml installs cleanly

🤖 Generated with Claude Code

Replace xml.etree.ElementTree.parse() with defusedxml safe parser
in all XML loading operations. This prevents XML External Entity
(XXE) attacks when loading user-provided XML files.

- Add defusedxml dependency to requirements.txt
- Use safe_parse() from defusedxml in directories.py
- Use safe_parse() from defusedxml in results.py
- Use safe_parse() from defusedxml in ignore.py
- Use safe_parse() from defusedxml in exclude.py
- Maintain xml.etree for XML writing (safe operations)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant