Skip to content

Redaction Improvements#5

Merged
grounzero merged 13 commits intomainfrom
docs/installation-instructions
Nov 3, 2025
Merged

Redaction Improvements#5
grounzero merged 13 commits intomainfrom
docs/installation-instructions

Conversation

@grounzero
Copy link
Copy Markdown
Owner

This pull request releases version 1.0.6 of pfsense-redactor with critical security fixes, performance improvements, new features, and updated documentation. The most significant changes are enhanced URL redaction to prevent credential leaks in non-HTTP protocols, improved handling of URLs without hostnames, performance optimizations for domain encoding, and the addition of a redaction comment to output files. Installation instructions and test coverage have also been expanded.

Security and Redaction Improvements:

  • Extended URL regex to handle non-HTTP protocols (FTP, FTPS, SFTP, SSH, Telnet, File, SMB, NFS), ensuring credentials in all supported protocol URLs are properly redacted and not leaked. [1] [2]
  • Fixed handling of URLs without hostnames (e.g., file:///path), preserving them unchanged to avoid altering their semantics. [1] [2]

Performance and Code Quality:

  • Introduced cached IDNA encoding with @functools.lru_cache(maxsize=256) to optimize repeated domain encoding operations. [1] [2] [3] [4]
  • Added type hints for the maskers dictionary to improve code clarity and maintainability.

Features and Output:

  • Added an XML comment to output files: <!-- Redacted using pfsense-redactor v1.0.6 --> to indicate redaction provenance. [1] [2]
  • Updated all test reference XML files to include the new redaction comment. [1] [2] [3] [4] [5] [6] [7] [8] [9] [10]

Documentation and Packaging:

  • Improved installation instructions in the README.md to address externally-managed-environment errors, including pipx, virtual environment, and user-space installation options.
  • Updated version to 1.0.6 in __init__.py and pyproject.toml. [1] [2]
  • Added module-level exports control via __all__ and enforced Python 3.9+ version check at import time.

Testing:

  • Added 14 comprehensive tests for URL handling, including non-HTTP protocol redaction and hostnameless URL preservation.

For full details, see the updated CHANGELOG.md. [1] [2]

@grounzero grounzero requested a review from Copilot November 2, 2025 23:49
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR releases version 1.0.6 of pfsense-redactor, focusing on critical security fixes for URL redaction, performance improvements through caching, and enhanced documentation. The key changes address credential leaks in non-HTTP protocol URLs and incorrect handling of URLs without hostnames.

Key Changes:

  • Extended URL regex to capture non-HTTP protocols (FTP, FTPS, SFTP, SSH, Telnet, File, SMB, NFS) to prevent credential leaks
  • Added early return in _mask_url() for URLs without hostnames to preserve their semantics
  • Implemented cached IDNA encoding with @functools.lru_cache(maxsize=256) to optimize domain encoding performance

Reviewed Changes

Copilot reviewed 27 out of 27 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
pfsense_redactor/redactor.py Core security fixes for URL redaction, performance optimization with cached IDNA encoding, type hints added, and redaction comment feature
pfsense_redactor/__init__.py Version bump to 1.0.6, Python 3.9+ version check, and expanded module exports
tests/unit/test_url_handling.py Added 14 comprehensive tests for non-HTTP protocol URL redaction and hostnameless URL preservation
tests/reference/*/*.xml Updated all test reference files to include the new redaction comment
pyproject.toml Version updated to 1.0.6
README.md Enhanced installation instructions with alternatives for externally-managed-environment error
CHANGELOG.md Documented all changes for versions 1.0.6 and 1.0.5

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.


# Should be preserved exactly, not transformed to network path
assert result == url
assert "example.com" not in result
Copy link

Copilot AI Nov 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assertion is incorrect. For file:///C:/Users/admin/config.xml (a Windows local path without hostname), the result should equal the input URL exactly per line 153. The assertion assert 'example.com' not in result suggests domain masking is expected, but this contradicts the preservation requirement.

Suggested change
assert "example.com" not in result

Copilot uses AI. Check for mistakes.

# Should be preserved unchanged
assert result == url
assert "example.com" not in result
Copy link

Copilot AI Nov 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assertion is incorrect. For nfs:///mnt/share (a URL without hostname), the result should equal the input URL exactly per line 172. The assertion assert 'example.com' not in result contradicts the preservation requirement.

Suggested change
assert "example.com" not in result

Copilot uses AI. Check for mistakes.

# Should be preserved unchanged
assert result == url
assert "example.com" not in result
Copy link

Copilot AI Nov 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assertion is incorrect. For smb:///share (a URL without hostname), the result should equal the input URL exactly per line 181. The assertion assert 'example.com' not in result contradicts the preservation requirement.

Suggested change
assert "example.com" not in result

Copilot uses AI. Check for mistakes.
- This changed URL semantics from local filesystem to network file share
- Added early return in `_mask_url()` when `hostname` is `None` or empty
- **ENHANCEMENT**: Expanded email regex to support RFC 5322 special characters
- Now matches emails with `!#$&'*/=?^`{|}~` in local part (e.g., `user!test@example.com`)

Check notice

Code scanning / Remark-lint (reported by Codacy)

Warn for literal URLs in text. Note

[no-literal-urls] Don’t use literal URLs without angle brackets
Copy link
Copy Markdown

@github-advanced-security github-advanced-security bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pylintpython3 (reported by Codacy) found more than 20 potential problems in the proposed changes. Check the Files changed tab for more details.

Copy link
Copy Markdown

@github-advanced-security github-advanced-security bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pylint (reported by Codacy) found more than 20 potential problems in the proposed changes. Check the Files changed tab for more details.

Copy link
Copy Markdown

@github-advanced-security github-advanced-security bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prospector (reported by Codacy) found more than 20 potential problems in the proposed changes. Check the Files changed tab for more details.

- Used string type aliases for Python 3.9/pylint compatibility
- Added pylint disable comments for intentional patterns
- Simplified version fallback code

All 176 tests passing on Python 3.9+
@grounzero grounzero merged commit cd9bdae into main Nov 3, 2025
33 checks passed
@grounzero grounzero deleted the docs/installation-instructions branch November 3, 2025 10:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants