Skip to content

fix(elixir): resolve DOCX keyword extraction FunctionClauseError#313

Merged
Goldziher merged 1 commit intomainfrom
fix-issue-309
Jan 18, 2026
Merged

fix(elixir): resolve DOCX keyword extraction FunctionClauseError#313
Goldziher merged 1 commit intomainfrom
fix-issue-309

Conversation

@Goldziher
Copy link
Collaborator

Fixes #309

Changes

Rust (crates/kreuzberg/src/extractors/docx.rs):

  • Parse comma-separated keywords from DOCX core properties into Vec
  • Store in typed Metadata.keywords field instead of metadata.additional

Elixir (packages/elixir/lib/kreuzberg/result.ex):

  • Add string handling clause to normalize_keywords/1
  • Parse comma-separated keyword strings into keyword map format

Tests:

  • Rust: test_docx_keywords_extraction in docx_metadata_extraction_test.rs
  • Elixir: 8 keyword parsing tests in extraction_result_test.exs

Root Cause

DOCX extractor stored keywords as comma-separated strings, but normalize_keywords/1 only handled nil, [], and lists.

Fixed crash when extracting DOCX files with keywords metadata by
implementing proper keyword parsing at both Rust and Elixir layers.

- Parse comma-separated keywords from DOCX core properties into Vec<String>
- Store in typed Metadata.keywords field instead of metadata.additional
- Ensures consistent data structure across all language bindings

- Add string handling clause to normalize_keywords/1 function
- Parse comma-separated keyword strings into expected keyword map format
- Provides defensive handling for keywords from any source

- Added test_docx_keywords_extraction in docx_metadata_extraction_test.rs
  - Creates minimal DOCX with keywords metadata
  - Verifies parsing into Vec<String> in Metadata.keywords
- Added 8 keyword parsing tests in extraction_result_test.exs
  - Tests comma-separated strings, whitespace handling, edge cases
  - Regression tests for GitHub issue #309

DOCX extractor stored keywords as comma-separated strings in
metadata.additional["keywords"], but Elixir's normalize_keywords/1
only handled nil, [], and lists - causing FunctionClauseError.
@Goldziher Goldziher merged commit 835af94 into main Jan 18, 2026
22 of 57 checks passed
@Goldziher Goldziher deleted the fix-issue-309 branch January 18, 2026 12:18
Goldziher added a commit that referenced this pull request Feb 13, 2026
fix(elixir): resolve DOCX keyword extraction FunctionClauseError
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug: Kreuzberg fails to extract docx files using the Elixir binding

1 participant