All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
- AKN4EU Parser Article ID Extraction: Fixed bug where
AKN4EUParserwas not correctly extractingxml:idattributes for articles. TheAKNArticleExtractornow properly receives theid_attr='{http://www.w3.org/XML/1998/namespace}id'parameter for AKN4EU documents.
- Chained Intro Extraction for AKN4EU: New
extract_content_with_chained_intromethod inAKNArticleExtractorthat properly combines list intro text (subparagraph) with all its points into a single paragraph entry, preserving the logical structure of legal provisions.
- Formex Article eID Parsing: Fixed bug in
FormexArticleStrategywhere articles with IDs starting with3(e.g., Article 300, 333) were incorrectly parsed. Thelstrip('3')method was stripping all leading3characters instead of just the legacy prefix. Now correctly handles both normal 3-digit article numbers (300, 333, etc.) and legacy 4-digit prefixed IDs (3001 -> 001).
- Released version 0.4.2 with bug fixes and improvements for cellar standard parsing and proposal parsing.
- End-to-End Test Suite: Comprehensive E2E tests for all clients and parsers:
- EU Cellar client tests with multiple formats and error handling
- Member state client tests (Finland, France, Germany, Italy, Luxembourg, Portugal)
- Parser E2E tests for Cellar HTML, Veneto, Legifrance JSON, AKN, Finlex XML, Formex, German legislation, Italy Normattiva, and Luxembourg
- GitHub Actions CI/CD: Automated testing workflow with coverage reporting:
- Unit test execution with Poetry
- Coverage badge generation and automatic updates
- Codecov integration for detailed coverage reports
- Client Module Enhancements:
- New regional client:
VenetoClientfor Italian regional legislation - Enhanced Germany client with comprehensive RIS integration
- New Portugal ELI portal client
- Unit tests for Legilux, Malta, Normattiva, and other state clients
- New regional client:
- Coverage Tracking: Automated coverage badge generation from
coverage.xml
- Package Restructuring: Major reorganization for clarity and maintainability:
- Renamed
tulit.parsers→tulit.parser(singular for consistency) - Reorganized client modules by jurisdiction:
tulit.client.eu/for EU-level clients (Cellar)tulit.client.state/for national clients (Finlex, Legifrance, etc.)tulit.client.regional/for regional clients (Veneto)
- Reorganized test structure:
tests/unit/for unit teststests/e2e/for end-to-end tests- Organized by module type (client, parser)
- Renamed
- Test Organization: Comprehensive test restructuring:
- Split tests into unit and E2E categories
- Organized client tests by jurisdiction (eu, state, regional)
- Enhanced test fixtures with shared conftest configurations
- Improved file path handling using
locate_data_dirfor portability
- Test Coverage: Significantly expanded test suite:
- Added unit tests for all major parser classes
- Enhanced coverage for Akoma Ntoso parsers (AKN4EU, German LegalDocML, Luxembourg)
- Comprehensive tests for HTML parsers (Cellar variants, Veneto)
- Formex4Parser tests with additional assertions
- XML parser helper tests
- Code Quality: Enhanced maintainability and structure:
- Better separation of concerns in client modules
- Improved error handling in parser implementations
- Enhanced documentation and inline comments
- Consistent import patterns across modules
- Parser Implementations:
- Removed
base_dirparameter fromxml.pyparser - Enhanced Akoma Ntoso parser for article and section validation
- Improved Luxembourg AKN parser functionality
- Fixed RelaxNG schema structure in XML validation tests
- Removed
- Deprecated Scripts: Cleaned up obsolete test and migration scripts:
- Removed
scripts/run_all_clients.pyandscripts/run_all_parsers.py - Removed old migration and structure analysis reports
- Removed obsolete HTML parsing scripts
- Cleaned up deprecated test files for better codebase clarity
- Removed
- CI/CD Pipeline: Comprehensive GitHub Actions workflow for automated testing
- Test Framework: pytest-based test suite with enhanced fixtures
- Coverage Reporting: Integrated coverage tracking with badge generation
- Module Structure: Clear separation between unit and E2E tests
- Backward Compatibility: Import aliases maintained for smooth migration
- Update imports from
tulit.parserstotulit.parser(backward compatible) - Client imports now organized by jurisdiction:
from tulit.client.eu.cellar import CellarClientfrom tulit.client.state.finlex import FinlexClientfrom tulit.client.regional.veneto import VenetoClient
0.4.0 - 2025-12-10
- Parser Registry System: New
ParserRegistryclass for dynamic parser selection and management - Domain Models: Type-safe domain objects (
Article,Citation,Recital,Chapter,ArticleChild) using dataclasses - Text Normalization Strategies: Comprehensive normalization system with composable strategies:
WhitespaceNormalizerfor whitespace handlingUnicodeNormalizerfor character normalizationPatternReplacementNormalizerfor custom replacementsCompositeNormalizerfor combining strategies
- Article Extraction Strategies: Strategy pattern implementation for parser-specific article extraction
- Custom Exception Hierarchy: Granular error handling with
ParserError,ParseError,ValidationError,ExtractionError, andFileLoadError - XML Utilities Package: Centralized
XMLNodeExtractorandXMLValidatorclasses - Akoma Ntoso Package: Modular structure for Akoma Ntoso parsers:
AkomaNtosoParserbase classAKN4EUParserfor EU documentsGermanLegalDocMLParserfor German documentsLuxembourgAKNParserfor Luxembourg documents- Factory functions for automatic format detection
- Cellar HTML Parsers Package: Organized EU Cellar parsers into dedicated package:
CellarHTMLParserfor semantic XHTMLCellarStandardHTMLParserfor simple HTML structureProposalHTMLParserfor EU legislative proposals
- Comprehensive Documentation: New architecture guide and updated API documentation
- Sphinx Documentation: Integrated documentation build system
- Major Refactoring: Reorganized codebase following SOLID principles and design patterns
- Module Organization: Split large monolithic files into focused modules:
parser.py: Reduced from 825 to 315 linesxml.py: Reduced from 907 to 663 linesakomantoso.py: Split into 7 focused modules- HTML parsers organized into
cellar/package
- Import Paths: Updated structure with backward-compatible re-exports:
tulit.parsers.html.html_parserreplacestulit.parsers.html.xhtmltulit.parsers.html.cellarpackage for Cellar parserstulit.parsers.xml.akomantosopackage for Akoma Ntoso variants
- HTMLParser Base Class: Moved from
xhtml.pytohtml_parser.pyfor clarity - Package Documentation: Complete rewrite focusing on current architecture rather than migration
- Code Quality: Better separation of concerns and single responsibility principle
- Maintainability: Smaller, more focused modules (average 200-300 lines)
- Extensibility: Easy to add new parsers through registry and strategy patterns
- Type Safety: Domain models provide IDE autocompletion and type checking
- Error Handling: Specific exception types for better debugging
- Testing: Maintained 100% test coverage (126 tests passing)
- Design Patterns: Registry, Strategy, Factory, Template Method
- Architecture: Clean separation between parsing logic, data models, and utilities
- Backward Compatibility: All existing code works without changes through re-exports
- Documentation: Comprehensive architecture guide with examples
- Module import paths updated throughout test suite
- Consistent namespace handling in XML parsers
- Improved error messages in validation failures
tulit.parsers.html.xhtml: Usetulit.parsers.html.html_parserinstead (backward compatible)
0.3.2 - Previous Release
- Basic parser implementations for Formex, Akoma Ntoso, and HTML formats
- Client implementations for Cellar, Normattiva, and other legal databases
- Initial SPARQL query support
- Basic JSON export functionality