Changelog

All notable changes to this project will be documented in this file.

The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.

[0.4.4] - 2026-01-07

Fixed

AKN4EU Parser Article ID Extraction: Fixed bug where AKN4EUParser was not correctly extracting xml:id attributes for articles. The AKNArticleExtractor now properly receives the id_attr='{http://www.w3.org/XML/1998/namespace}id' parameter for AKN4EU documents.

Added

Chained Intro Extraction for AKN4EU: New extract_content_with_chained_intro method in AKNArticleExtractor that properly combines list intro text (subparagraph) with all its points into a single paragraph entry, preserving the logical structure of legal provisions.

[0.4.3] - 2026-01-05

Fixed

Formex Article eID Parsing: Fixed bug in FormexArticleStrategy where articles with IDs starting with 3 (e.g., Article 300, 333) were incorrectly parsed. The lstrip('3') method was stripping all leading 3 characters instead of just the legacy prefix. Now correctly handles both normal 3-digit article numbers (300, 333, etc.) and legacy 4-digit prefixed IDs (3001 -> 001).

[0.4.2] - 2026-01-04

Changed

Released version 0.4.2 with bug fixes and improvements for cellar standard parsing and proposal parsing.

[0.4.1] - 2024-12-22

Added

End-to-End Test Suite: Comprehensive E2E tests for all clients and parsers:
- EU Cellar client tests with multiple formats and error handling
- Member state client tests (Finland, France, Germany, Italy, Luxembourg, Portugal)
- Parser E2E tests for Cellar HTML, Veneto, Legifrance JSON, AKN, Finlex XML, Formex, German legislation, Italy Normattiva, and Luxembourg
GitHub Actions CI/CD: Automated testing workflow with coverage reporting:
- Unit test execution with Poetry
- Coverage badge generation and automatic updates
- Codecov integration for detailed coverage reports
Client Module Enhancements:
- New regional client: VenetoClient for Italian regional legislation
- Enhanced Germany client with comprehensive RIS integration
- New Portugal ELI portal client
- Unit tests for Legilux, Malta, Normattiva, and other state clients
Coverage Tracking: Automated coverage badge generation from coverage.xml

Changed

Package Restructuring: Major reorganization for clarity and maintainability:
- Renamed tulit.parsers → tulit.parser (singular for consistency)
- Reorganized client modules by jurisdiction:
  - tulit.client.eu/ for EU-level clients (Cellar)
  - tulit.client.state/ for national clients (Finlex, Legifrance, etc.)
  - tulit.client.regional/ for regional clients (Veneto)
- Reorganized test structure:
  - tests/unit/ for unit tests
  - tests/e2e/ for end-to-end tests
  - Organized by module type (client, parser)
Test Organization: Comprehensive test restructuring:
- Split tests into unit and E2E categories
- Organized client tests by jurisdiction (eu, state, regional)
- Enhanced test fixtures with shared conftest configurations
- Improved file path handling using locate_data_dir for portability

Improved

Test Coverage: Significantly expanded test suite:
- Added unit tests for all major parser classes
- Enhanced coverage for Akoma Ntoso parsers (AKN4EU, German LegalDocML, Luxembourg)
- Comprehensive tests for HTML parsers (Cellar variants, Veneto)
- Formex4Parser tests with additional assertions
- XML parser helper tests
Code Quality: Enhanced maintainability and structure:
- Better separation of concerns in client modules
- Improved error handling in parser implementations
- Enhanced documentation and inline comments
- Consistent import patterns across modules

Fixed

Parser Implementations:
- Removed base_dir parameter from xml.py parser
- Enhanced Akoma Ntoso parser for article and section validation
- Improved Luxembourg AKN parser functionality
- Fixed RelaxNG schema structure in XML validation tests

Removed

Deprecated Scripts: Cleaned up obsolete test and migration scripts:
- Removed scripts/run_all_clients.py and scripts/run_all_parsers.py
- Removed old migration and structure analysis reports
- Removed obsolete HTML parsing scripts
- Cleaned up deprecated test files for better codebase clarity

Technical Details

CI/CD Pipeline: Comprehensive GitHub Actions workflow for automated testing
Test Framework: pytest-based test suite with enhanced fixtures
Coverage Reporting: Integrated coverage tracking with badge generation
Module Structure: Clear separation between unit and E2E tests
Backward Compatibility: Import aliases maintained for smooth migration

Migration Notes

Update imports from tulit.parsers to tulit.parser (backward compatible)
Client imports now organized by jurisdiction:
- from tulit.client.eu.cellar import CellarClient
- from tulit.client.state.finlex import FinlexClient
- from tulit.client.regional.veneto import VenetoClient

0.4.0 - 2025-12-10

Added

Parser Registry System: New ParserRegistry class for dynamic parser selection and management
Domain Models: Type-safe domain objects (Article, Citation, Recital, Chapter, ArticleChild) using dataclasses
Text Normalization Strategies: Comprehensive normalization system with composable strategies:
- WhitespaceNormalizer for whitespace handling
- UnicodeNormalizer for character normalization
- PatternReplacementNormalizer for custom replacements
- CompositeNormalizer for combining strategies
Article Extraction Strategies: Strategy pattern implementation for parser-specific article extraction
Custom Exception Hierarchy: Granular error handling with ParserError, ParseError, ValidationError, ExtractionError, and FileLoadError
XML Utilities Package: Centralized XMLNodeExtractor and XMLValidator classes
Akoma Ntoso Package: Modular structure for Akoma Ntoso parsers:
- AkomaNtosoParser base class
- AKN4EUParser for EU documents
- GermanLegalDocMLParser for German documents
- LuxembourgAKNParser for Luxembourg documents
- Factory functions for automatic format detection
Cellar HTML Parsers Package: Organized EU Cellar parsers into dedicated package:
- CellarHTMLParser for semantic XHTML
- CellarStandardHTMLParser for simple HTML structure
- ProposalHTMLParser for EU legislative proposals
Comprehensive Documentation: New architecture guide and updated API documentation
Sphinx Documentation: Integrated documentation build system

Changed

Major Refactoring: Reorganized codebase following SOLID principles and design patterns
Module Organization: Split large monolithic files into focused modules:
- parser.py: Reduced from 825 to 315 lines
- xml.py: Reduced from 907 to 663 lines
- akomantoso.py: Split into 7 focused modules
- HTML parsers organized into cellar/ package
Import Paths: Updated structure with backward-compatible re-exports:
- tulit.parsers.html.html_parser replaces tulit.parsers.html.xhtml
- tulit.parsers.html.cellar package for Cellar parsers
- tulit.parsers.xml.akomantoso package for Akoma Ntoso variants
HTMLParser Base Class: Moved from xhtml.py to html_parser.py for clarity
Package Documentation: Complete rewrite focusing on current architecture rather than migration

Improved

Code Quality: Better separation of concerns and single responsibility principle
Maintainability: Smaller, more focused modules (average 200-300 lines)
Extensibility: Easy to add new parsers through registry and strategy patterns
Type Safety: Domain models provide IDE autocompletion and type checking
Error Handling: Specific exception types for better debugging
Testing: Maintained 100% test coverage (126 tests passing)

Technical Details

Design Patterns: Registry, Strategy, Factory, Template Method
Architecture: Clean separation between parsing logic, data models, and utilities
Backward Compatibility: All existing code works without changes through re-exports
Documentation: Comprehensive architecture guide with examples

Fixed

Module import paths updated throughout test suite
Consistent namespace handling in XML parsers
Improved error messages in validation failures

Deprecated

tulit.parsers.html.xhtml: Use tulit.parsers.html.html_parser instead (backward compatible)

0.3.2 - Previous Release

Previous Features

Basic parser implementations for Formex, Akoma Ntoso, and HTML formats
Client implementations for Cellar, Normattiva, and other legal databases
Initial SPARQL query support
Basic JSON export functionality

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Changelog

[0.4.4] - 2026-01-07

Fixed

Added

[0.4.3] - 2026-01-05

Fixed

[0.4.2] - 2026-01-04

Changed

[0.4.1] - 2024-12-22

Added

Changed

Improved

Fixed

Removed

Technical Details

Migration Notes

0.4.0 - 2025-12-10

Added

Changed

Improved

Technical Details

Fixed

Deprecated

0.3.2 - Previous Release

Previous Features

FilesExpand file tree

CHANGELOG.md

Latest commit

History

CHANGELOG.md

File metadata and controls

Changelog

[0.4.4] - 2026-01-07

Fixed

Added

[0.4.3] - 2026-01-05

Fixed

[0.4.2] - 2026-01-04

Changed

[0.4.1] - 2024-12-22

Added

Changed

Improved

Fixed

Removed

Technical Details

Migration Notes

0.4.0 - 2025-12-10

Added

Changed

Improved

Technical Details

Fixed

Deprecated

0.3.2 - Previous Release

Previous Features