-
Notifications
You must be signed in to change notification settings - Fork 53
Remove questionable CLI options #36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
- Add type hints throughout the codebase (functions, methods, variables) - Create Company and HistoryEntry dataclasses for structured data - Add docstrings to key functions (parse_result, pr_company_info, get_companies_in_searchresults) - Extract SUFFIX_MAP as a module-level constant - Use list comprehension for cell parsing - Add null check for grid in get_companies_in_searchresults - Use f-strings in pr_company_info
- Add custom exception hierarchy: HandelsregisterError, NetworkError, ParseError, FormError, CacheError - Wrap network operations in try/except with NetworkError - Handle form selection failures with FormError - Add parse validation with ParseError for malformed HTML - Handle cache read/write failures gracefully - Create main() function with proper exit codes for each error type - Add docstrings with Raises documentation
- Use SHA-256 hashing for cache filenames to prevent path traversal - Add CacheEntry dataclass to store metadata with cached content - Implement TTL-based cache expiration (default: 1 hour) - Store cache as JSON with query, options, timestamp, and HTML - Auto-delete expired or corrupted cache files - Add _get_cache_key, _get_cache_path, _load_from_cache, _save_to_cache methods - Include search options in cache key for proper cache invalidation
- Bump minimum Python version to 3.9 (3.6-3.8 are EOL) - Remove unused mechanicalsoup dependency - Add beautifulsoup4 as explicit dependency (was used but not declared) - Add pytest to dev dependencies - Add version constraints to dependencies for reproducibility - Update tox envlist to py39, py310, py311, py312 - Add project metadata: description, license, repository, keywords - Bump version to 0.2.0
- Replace print() debug statements with `logging` module - Add module-level logger configuration - Replace `if x == True:` with `if x:` (PEP 8) - Organize imports: `stdlib` first, third-party second - Configure logging format based on debug flag - Enable mechanize logger in debug mode - Use logger.debug/info/warning for appropriate log levels
- Add pytest markers: @integration and @slow for live API tests - Skip integration tests by default (run with -m integration) - Add conftest.py with marker configuration - Create test fixtures: sample_search_html, mock_args, temp_cache_dir - Add unit tests for parsing (TestParseSearchResults) - Add unit tests for dataclasses (TestDataClasses) - Add unit tests for cache key generation (TestCache) - Add unit tests for suffix mapping (TestSuffixMap) - Move live API tests to TestLiveAPI class with proper markers - Improve test documentation and organization
- Extract SearchCache class for cache operations with configurable TTL - Extract ResultParser class with static methods for HTML parsing - Refactor HandelsRegister to use dependency injection for cache - Add _create_browser() factory method for browser configuration - Split search_company() into smaller focused methods - Add backward-compatible aliases for deprecated functions - Add configuration constants (BASE_URL, REQUEST_TIMEOUT) - Improve code organization with section headers - Add module docstring describing architecture - Update CLI help text with examples - Update tests to use new SearchCache class directly
- Add SearchOptions dataclass to encapsulate all search parameters - Add STATE_CODES mapping for all 16 German states (bundesland filtering) - Add REGISTER_TYPES list (HRA, HRB, GnR, PR, VR) - Add RESULTS_PER_PAGE_OPTIONS (10, 25, 50, 100) - Implement state filtering via --states CLI option - Implement register type filtering via --register-type option - Implement register number search via --register-number option - Add --include-deleted flag for historical entries - Add --similar-sounding flag for phonetic search - Add --results-per-page option to control pagination - Update _submit_search to set all form fields with proper error handling - Add _build_search_options method for args to SearchOptions conversion - Improve CLI help text with grouped arguments and examples - Add unit tests for SearchOptions and configuration constants
- Document all new CLI arguments (--states, --register-type, etc.) - Add state codes reference table - Add usage examples for common scenarios - Add testing instructions (unit vs integration tests) - Keep original API documentation intact
- Vollständig auf Deutsch - Bessere Struktur mit klaren Abschnitten - Rechtliche Hinweise hervorgehoben - API-Parameter in übersichtlichen Tabellen - Rechtsformen-Tabelle hinzugefügt - Bundesland-Filter dokumentiert
- Neue search() Funktion für programmatische Nutzung - Klare Python-API ohne argparse.Namespace - Vollständige Dokumentation mit Docstring und Beispielen - Alle Suchoptionen als benannte Parameter verfügbar - Ermöglicht einfache Integration in andere Anwendungen
- Konstruktor akzeptiert jetzt optionales args (Rückwärtskompatibilität) - Neuer debug Parameter für programmatische Nutzung - Neue from_options() Klassenmethode für SearchOptions - Neue search_with_options() Methode als saubere API - search_company() delegiert jetzt an search_with_options() - Deutsche Docstrings für bessere Konsistenz
- Titel zu 'Handelsregister' geändert (nicht nur CLI) - Neuer Abschnitt 'Verwendung als Library' mit Beispielen - Einfache API (search-Funktion) dokumentiert - Erweiterte API (HandelsRegister-Klasse) dokumentiert - Rückgabeformat mit Beispiel-Dictionary erklärt - CLI-Dokumentation in eigenen Abschnitt verschoben
- Import der neuen search() Funktion - TestPublicAPI: Tests für search() Funktion und SearchOptions - TestHandelsRegisterClass: Tests für neue Initialisierung - test_init_without_args - test_init_with_debug - test_init_with_custom_cache - test_from_options_classmethod - test_search_company_requires_args - Integration-Tests für search() und search_with_options() - Alle 25 Unit-Tests bestanden
- Von Poetry zu Standard PEP 621 [project] Format - hatchling als Build-Backend - [project.scripts] für CLI-Einstiegspunkt - [tool.uv] für dev-dependencies - pytest Marker-Konfiguration hinzugefügt - black Konfiguration hinzugefügt
- uv.lock Datei mit allen Dependencies erstellt - dependency-groups.dev statt tool.uv.dev-dependencies (deprecated) - Alle 25 Unit-Tests bestehen mit uv run pytest
- Installation mit uv sync statt poetry install - pip Alternative hinzugefügt - CLI-Beispiele: uv run handelsregister statt poetry run python - Tests: uv run pytest statt poetry run pytest
- Pipfile entfernt (Pipenv nicht mehr verwendet) - poetry.lock entfernt (ersetzt durch uv.lock) - conftest.py: Marker-Definition entfernt (jetzt in pyproject.toml)
- Address: Business address with street, postal code, city, country - Representative: Company representatives (Geschäftsführer, Vorstand, etc.) - Owner: Company owners/shareholders (Gesellschafter) - CompanyDetails: Extended company information combining all detail views These models will be used to store structured data from the SI, AD, and UT detail views of the Handelsregister.
Add parser for extracting company details from SI (Strukturierter Registerinhalt) HTML views: - Parse company name, legal form, capital, currency - Extract business address with street, postal code, city - Parse company purpose (Unternehmensgegenstand) - Extract representatives (Geschäftsführer, Vorstand, Prokurist) - Smart legal form detection with priority ordering The parser handles various HTML table structures and text patterns commonly found in the Handelsregister detail views.
…ister: HandelsRegister class: - get_company_details(): Fetch details for a single company (SI/AD/UT) - search_with_details(): Search and fetch details in one call - _fetch_detail_page(): Handle JSF form submission for details - _parse_details(): Route to appropriate parser DetailsParser class: - parse_ad(): Parse 'Aktueller Abdruck' (current printout) - parse_ut(): Parse 'Unternehmensträger' (company owners) - _extract_representatives_from_text(): Extract from free-form text - _extract_owners(): Extract owner/shareholder information Public API: - get_details(): Simple function to fetch details for a company The detail fetching uses the existing mechanize session to submit the JSF form with the appropriate control parameters.
- Add DETAILS_CACHE_TTL_SECONDS (24h) for longer caching of details - SearchCache now accepts details_ttl_seconds parameter - Cache.get() automatically uses longer TTL for 'details:' prefixed keys - Add clear() method to remove cache files (optionally details only) - Add get_stats() method for cache statistics This allows company details to be cached for longer periods since register data changes infrequently, while search results still use the shorter 1-hour TTL.
Extend the command-line interface with detail fetching options: New CLI arguments: - --details: Enable fetching of detailed company information - --detail-type: Choose detail type (SI/AD/UT, default: SI) New output function: - pr_company_details(): Pretty-print CompanyDetails with all fields The main() function now supports two modes: 1. Standard search (existing behavior) 2. Search with details (--details flag) Example usage: handelsregister.py -s 'GASAG AG' --details --detail-type SI handelsregister.py -s 'Bank' --states BE --details --json
Library usage: - Document get_details() function - Show available detail types (SI, AD, UT) - Add CompanyDetails response format example CLI usage: - Document --details and --detail-type options - Add examples for detail fetching - Show JSON output for details The documentation explains how to fetch extended company information including legal form, capital, address, representatives, and owners.
…oject.toml and uv.lock
…mentation structure - Changed site name and description to English. - Updated theme language to English and modified toggle names for dark/light mode. - Added i18n plugin configuration for multilingual support, including English and German translations. - Translated navigation and documentation sections to English. - Created a new German index file for localized documentation. - Updated existing index file to reflect English content and structure.
- Update parse_si, parse_ad, parse_ut to accept both Company and dict - Update CompanyDetails.from_company to accept both Company and dict - Maintain backward compatibility for tests and existing code - All tests passing (87 passed)
- Clean up formatting by removing extra blank lines - No functional changes
- Use company.name instead of company.get('name', 'unknown') in error logging
- Improves efficiency and type safety
- Company is a Pydantic model, not a dict
- Create _with_retry_and_rate_limit() helper function to eliminate code duplication - Replace 4 identical decorator stacks with single reusable decorator - Improves maintainability and follows DRY principle - No functional changes
- Simplify CompanyDetails.to_dict() to use model_dump() with mode='python' - Pydantic automatically handles nested model serialization - Optimize Company.to_dict() to use model_dump() with by_alias=True - Reduces code duplication and improves maintainability - No functional changes
- Remove _get_cache_key, _get_cache_path, _load_from_cache, _save_to_cache - These were deprecated private methods that just delegated to SearchCache - Use cache.get() and cache.set() directly instead - Breaking change: private API removed (methods were already deprecated)
- Replace incomplete fallback with proper FormError exception - Add original_error support to FormError for better error context - Improves robustness by failing fast with clear error messages - No silent failures from empty string returns
- Change parameter type from dict to Company for type safety - Use attribute access instead of dict.get() for better performance - Update history iteration to use HistoryEntry objects - Breaking change: function signature changed (public API)
- Add URL context to NetworkError messages - Include current page URL in FormError messages for debugging - Add original_error to FormError in _navigate_to_search and _submit_search - Improves debugging experience when errors occur
- Remove deprecated cache methods from HandelsRegister (private API) - Change pr_company_info() signature to accept Company instead of dict (public API) - All other changes are non-breaking optimizations and improvements
- Replace dict access patterns with attribute access (company.name instead of company['name']) - Update return value descriptions from 'list of dicts' to 'list of Company objects' - Fix pandas DataFrame examples to properly convert Company objects - Update both English and German documentation files - All examples now use Pydantic model attribute access
…ests - Introduced a new fixture `shared_hr_client` to optimize API calls during integration tests by reusing a single instance of `HandelsRegister` - Updated `pytest_collection_modifyitems` to skip integration tests by default unless specified - Added critical rate limit warnings in `test_handelsregister.py` to inform users about potential API request limits and recommendations for running tests efficiently - Improved code formatting and consistency across test file
- Update test matrix to test Python 3.9, 3.10, 3.11, and 3.12, matching the requires-python >=3.9 constraint in pyproject.toml - Remove Python 3.7 and 3.8 which are no longer supported - Also update GitHub Actions to latest versions (checkout@v4, setup-python@v5)
- Replace `URL | None` with `Optional[URL]` in `build_url` function to support Python 3.9 - The `|` union syntax for type hints was introduced in Python 3.10 and is therefore not compatible with Python 3.9
- Introduced a new workflow to deploy documentation to GitHub Pages upon successful completion of the Lint and Python tests workflows - The workflow checks if documentation files have changed and only proceeds with deployment if both linting and testing workflows have passed - Utilizes actions for checking workflow status, setting up Python, installing dependencies, building static site with MkDocs, and deploying to GitHub Pages
- Added a step to install `uv` for improved dependency management - Updated the installation command to utilize `uv` for syncing and installing dependencies - Modified the MkDocs build command to run through `uv`, ensuring a more efficient build process
- Changed site author to "BundesAPI Contributors" and updated copyright year to 2025 - Modified theme colors from deep purple and amber to indigo and blue - Updated navigation labels for better understanding, including translations for "User Guide" and "Fetching Details" - Enhanced clarity in other sections of the documentation structure
Changed the reference from `handelsregister.main` to `handelsregister.cli.main` in both German and English documentation files
- Introduced the alias `hrg` for the `handelsregister` CLI command - Updated both German and English docs to reflect this change - Updated README.md accordingly
…nd keyword matching: - Introduced `State`, `KeywordMatch`, and `RegisterType` enums for better type safety and clarity in search options - Updated the `search` function to accept these enums alongside string values for states and keyword options - Modified the `SearchOptions` model to validate and handle both enum and string inputs for states and register types - Updated and improved documentation to reflect these changes and provide usage examples for the new enums
These options add complexity without clear benefit: - --results-per-page: internal implementation detail, not user-facing - --details/--detail-type: relies on fragile HTML parsing, needs proper SI XML parser first Also cleaned up README to match the simplified CLI
9 tasks
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
Removes CLI options that add complexity without clear user benefit:
--results-per-page: Internal implementation detail, not a user-facing concern--details/--detail-type: Relies on fragile HTML parsing that breaks when the portal changesChanges
cli.pyTesting
All 87 unit tests pass.
Notes
The details functionality will be re-added in a separate PR once we have a proper SI XML parser that doesn't rely on HTML string matching.