-
Notifications
You must be signed in to change notification settings - Fork 53
Description
Summary:
The Handelsregister Python SDK is instrumental in fetching and caching HTML pages from the Handelsregister. Currently, the process involves storing retrieved HTML pages in a local cache without sanitizing company names. This practice could lead to potential issues, as company names with slashes ("/"), backslashes (""), or other dangerous characters might cause file path traversal problems or other unintended behavior when used as filenames or identifiers in the cache.
Issue Description:
When company names contain filesystem or URL special characters (e.g., "/", "", "?", "%", "*", ":", "|", """, "<", ">", ".", "&", "$"), it poses a risk for file storage and retrieval operations within the local cache mechanism. Specifically, characters like slashes can be interpreted by the operating system as directory separators, leading to attempts to access unintended directories or the creation of files in unexpected locations.
Potential Impact:
Security Risks: Unsanitized input might lead to directory traversal vulnerabilities, allowing malicious users to access or manipulate files outside the intended cache directory.
Functionality Issues: Special characters in company names could lead to errors in file operations, such as saving or retrieving cached pages, especially if the characters are not valid for filenames in the operating system.
Steps to Reproduce:
Use the SDK to fetch data for a company with special characters in its name (e.g., "Test/Company").
Observe the behavior of the caching mechanism when attempting to store or retrieve the HTML page associated with this company name.
Suggested Fix:
Implement a sanitization function that either removes or replaces special characters in company names before using them in file paths or filenames. This function should be applied to company names as soon as they are retrieved and before any caching operations are performed. Additionally, consider a consistent scheme for character replacement to ensure that the sanitized names remain unique and recognizable.
Environment:
Python Version: 3.12.2
SDK Version: 1ef5e1e
Operating System: macOS Sonoma 14.4.1 / Ubuntu Linux 24.04