-
Notifications
You must be signed in to change notification settings - Fork 102
Description
Description
Summary
The TypeAdapter import in docling_core/utils/file.py causes an UnboundLocalError when docling-core is used in compiled Python environments (Nuitka, PyInstaller, etc.), despite the import statement being syntactically correct.
Error Message
UnboundLocalError: cannot access local variable 'TypeAdapter' where it is not associated with a value
File "docling_core/utils/file.py", line 82, in _get_url
http_url: AnyHttpUrl = TypeAdapter(AnyHttpUrl).validate_python(source)
Affected Versions
- docling-core: 2.45.0 (confirmed)
- Python: 3.13.1
- Pydantic: 2.11.7
- Environment: Windows 11, compiled with Nuitka 2.7.13
Root Cause
Python's variable scoping rules can cause imported names to become "unbound" in certain execution contexts, particularly in compiled environments where the import resolution differs from standard Python interpretation.
Steps to Reproduce
-
Create a Python application that uses docling-core:
from docling.document_converter import DocumentConverter
def process_document(file_path): converter = DocumentConverter() result = converter.convert(file_path) return result
2. Compile with Nuitka:
```bash
python -m nuitka --standalone --follow-imports process_document.py
-
Run the compiled executable with any document:
./process_document.dist/process_document.exe test.pdf
-
Observe the UnboundLocalError at line 82 or 134 in
docling_core/utils/file.py
Expected Behavior
The TypeAdapter should work correctly in both interpreted and compiled environments.
Actual Behavior
UnboundLocalError occurs when TypeAdapter is referenced, despite successful import.
Proposed Fix
File: docling_core/utils/file.py
Current Code (Lines 17, 82, 134):
# Line 17
from pydantic import AnyHttpUrl, TypeAdapter, ValidationError
# Line 82 (in _get_url function)
http_url: AnyHttpUrl = TypeAdapter(AnyHttpUrl).validate_python(source)
# Line 134 (in _get_local_path function)
local_path = TypeAdapter(Path).validate_python(source)
Fixed Code:
# Line 17 - Use alias to avoid scoping issues
from pydantic import AnyHttpUrl, TypeAdapter as _TypeAdapter, ValidationError
# Line 82 - Use the aliased import
http_url: AnyHttpUrl = _TypeAdapter(AnyHttpUrl).validate_python(source)
# Line 134 - Use the aliased import
local_path = _TypeAdapter(Path).validate_python(source)
Why This Fix Works
The alias (_TypeAdapter) creates a new name binding that avoids Python's function-scope variable binding edge case. This ensures the imported class remains accessible even in complex execution contexts created by compilation tools.
Testing
The fix has been tested in:
- Standard Python interpreter (3.13.1)
- Nuitka compiled executable
- Production Windows environment
- With various document types (PDF, DOCX, HTML)
Impact
Affected Users
- Anyone using docling in compiled/packaged applications
- Commercial applications requiring standalone executables
- Enterprise deployments using application bundlers
Severity
High - This completely blocks usage in compiled environments with no workaround except modifying the library code.
Additional Context
This issue was discovered during development of AI-Extractor, a commercial document extraction system. The bug manifests consistently in Nuitka-compiled environments but may also affect:
- PyInstaller
- cx_Freeze
- py2exe
- Any tool that modifies Python's import mechanism
Proposed Pull Request
I can submit a PR with this fix if desired. The change is minimal (adding an alias) but resolves a critical issue for compiled deployments.
Verification Script
#!/usr/bin/env python3
"""Test script to verify TypeAdapter issue and fix"""
import sys
import tempfile
from pathlib import Path
def test_typeadapter_import():
"""Test if TypeAdapter import works correctly"""
try:
# This mimics what docling_core does
from pydantic import TypeAdapter
# Test Path validation (like line 134)
test_path = Path("/tmp/test.pdf")
validated = TypeAdapter(Path).validate_python(test_path)
print(f"✓ TypeAdapter works: {validated}")
return True
except UnboundLocalError as e:
print(f"✗ UnboundLocalError: {e}")
return False
except Exception as e:
print(f"✗ Other error: {e}")
return False
def test_with_alias():
"""Test the proposed fix with alias"""
try:
from pydantic import TypeAdapter as _TypeAdapter
test_path = Path("/tmp/test.pdf")
validated = _TypeAdapter(Path).validate_python(test_path)
print(f"✓ Alias fix works: {validated}")
return True
except Exception as e:
print(f"✗ Alias fix failed: {e}")
return False
if __name__ == "__main__":
print("Testing TypeAdapter import issue...")
print(f"Python: {sys.version}")
print(f"Executable: {sys.executable}")
print("-" * 50)
# Test original approach
print("1. Testing original import:")
original_works = test_typeadapter_import()
# Test fix
print("\n2. Testing alias fix:")
alias_works = test_with_alias()
print("-" * 50)
if not original_works and alias_works:
print("CONFIRMED: Bug exists and fix works!")
elif original_works:
print("Cannot reproduce in this environment")
else:
print("Both approaches failed - different issue")
References
- Similar issue in other projects: pydantic/pydantic#8492 (example)
- Python scoping documentation: https://docs.python.org/3/reference/executionmodel.html
- Nuitka compilation effects: https://nuitka.net/doc/user-manual.html#differences-to-cpython
Contact
- Reporter: Greg Lamberson
- Email: [email protected]
- Project: AI-Extractor (https://github.com/lamco-admin/ai-extractor)
- Company: Lamco Development
Happy to provide additional testing or clarification as needed.