Skip to content

Commit 5b2ba69

Browse files
committed
chore(release): v4.2.10
## Fixes ### Java Bindings - Fix ClassCastException when deserializing nested generic collections (#355) - Added @JsonDeserialize annotations to PageStructure, FormattedBlock, Footnote, Attributes, PageHierarchy, PageContent, DjotContent - Added comprehensive JSON deserialization regression tests ### Python Bindings - Fix Windows CLI binary missing from wheel (#349) - CI workflow was copying with wrong filename (kreuzberg.exe instead of kreuzberg-cli.exe) ### MIME Type Detection - Fix DOCX/XLSX/PPTX detected as ZIP via detect_mime_type_from_bytes (#350) ### Java Bindings - Fix format-specific metadata missing in getMetadataMap()
1 parent d77a24c commit 5b2ba69

File tree

88 files changed

+1031
-121
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

88 files changed

+1031
-121
lines changed

.github/workflows/ci-python.yaml

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -152,10 +152,11 @@ jobs:
152152
echo "=== Building CLI Binary with Features ==="
153153
cargo build --release --package kreuzberg-cli --features all
154154
mkdir -p packages/python/kreuzberg
155+
# Copy with correct name (kreuzberg-cli) to match pyproject.toml include paths
155156
if [ "${{ runner.os }}" = "Windows" ]; then
156-
cp target/release/kreuzberg.exe packages/python/kreuzberg/
157+
cp target/release/kreuzberg.exe packages/python/kreuzberg/kreuzberg-cli.exe
157158
else
158-
cp target/release/kreuzberg packages/python/kreuzberg/
159+
cp target/release/kreuzberg packages/python/kreuzberg/kreuzberg-cli
159160
fi
160161
161162
- name: Build FFI library

CHANGELOG.md

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,13 +9,21 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
99

1010
## [Unreleased]
1111

12+
---
13+
14+
## [4.2.10] - 2026-02-05
15+
1216
### Fixed
1317

1418
#### MIME Type Detection
1519
- **DOCX/XLSX/PPTX files detected as ZIP via `detect_mime_type_from_bytes`**: Fixed Office Open XML files (DOCX, XLSX, PPTX) being incorrectly detected as `application/zip` when using bytes-based MIME detection. The function now inspects ZIP contents for Office format markers (`word/document.xml`, `xl/workbook.xml`, `ppt/presentation.xml`) to correctly identify these formats. (#350)
1620

1721
#### Java Bindings
1822
- **Format-specific metadata missing in `getMetadataMap()`**: Fixed `sheet_count`, `sheet_names`, and other format-specific metadata fields not being accessible via `ExtractionResult.getMetadataMap()`. The `ResultParser.buildMetadata()` method now properly propagates flattened format metadata (e.g., Excel, PPTX) to the `Metadata.additional` map.
23+
- **ClassCastException when deserializing nested generic collections**: Fixed `LinkedHashMap cannot be cast to PageStructure` and similar errors when deserializing JSON with nested `List<T>` fields. Added `@JsonDeserialize(contentAs = ...)` annotations to all model classes with generic list fields (`PageStructure`, `FormattedBlock`, `Footnote`, `Attributes`, `PageHierarchy`, `PageContent`, `DjotContent`) to preserve type information during Jackson deserialization. (#355)
24+
25+
#### Python Bindings
26+
- **Windows CLI binary still missing from wheel**: Fixed CI workflow copying CLI binary with wrong filename (`kreuzberg.exe` instead of `kreuzberg-cli.exe`), causing the binary to be excluded from Windows wheels despite the v4.2.9 build.py fix. The CI now copies with the correct name to match pyproject.toml include paths. (#349)
1927

2028
---
2129

Cargo.lock

Lines changed: 13 additions & 13 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

Cargo.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@ kreuzberg = { path = "crates/kreuzberg" }
2222
kreuzberg-tesseract = { path = "crates/kreuzberg-tesseract" }
2323

2424
[workspace.package]
25-
version = "4.2.9"
25+
version = "4.2.10"
2626
edition = "2024"
2727
rust-version = "1.91"
2828
authors = ["Na'aman Hirschfeld <nhirschfeld@gmail.com>"]

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@
2222
<img src="https://img.shields.io/maven-central/v/dev.kreuzberg/kreuzberg?label=Java&color=007ec6" alt="Java">
2323
</a>
2424
<a href="https://github.com/kreuzberg-dev/kreuzberg/releases">
25-
<img src="https://img.shields.io/github/v/tag/kreuzberg-dev/kreuzberg?label=Go&color=007ec6&filter=v4.2.9" alt="Go">
25+
<img src="https://img.shields.io/github/v/tag/kreuzberg-dev/kreuzberg?label=Go&color=007ec6&filter=v4.2.10" alt="Go">
2626
</a>
2727
<a href="https://www.nuget.org/packages/Kreuzberg/">
2828
<img src="https://img.shields.io/nuget/v/Kreuzberg?label=C%23&color=007ec6" alt="C#">

composer.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
{
22
"name": "kreuzberg/kreuzberg",
33
"description": "High-performance document intelligence for PHP. Extract text, metadata, and structured information from PDFs, Office documents, images, and 56 formats. Powered by Rust core for 10-50x speed improvements.",
4-
"version": "4.2.9",
4+
"version": "4.2.10",
55
"type": "php-ext",
66
"license": "MIT",
77
"keywords": [

crates/kreuzberg-cli/Cargo.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@ keywords = ["document", "extraction", "cli", "tool", "parser"]
1313
categories = ["command-line-utilities", "text-processing"]
1414

1515
[dependencies]
16-
kreuzberg = { path = "../kreuzberg", version = "4.2.9", features = ["cli", "bundled-pdfium"] }
16+
kreuzberg = { path = "../kreuzberg", version = "4.2.10", features = ["cli", "bundled-pdfium"] }
1717
clap = { workspace = true }
1818
tokio = { workspace = true }
1919
anyhow = { workspace = true }

crates/kreuzberg-node/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@
2222
<img src="https://img.shields.io/maven-central/v/dev.kreuzberg/kreuzberg?label=Java&color=007ec6" alt="Java">
2323
</a>
2424
<a href="https://github.com/kreuzberg-dev/kreuzberg/releases">
25-
<img src="https://img.shields.io/github/v/tag/kreuzberg-dev/kreuzberg?label=Go&color=007ec6&filter=v4.2.9" alt="Go">
25+
<img src="https://img.shields.io/github/v/tag/kreuzberg-dev/kreuzberg?label=Go&color=007ec6&filter=v4.2.10" alt="Go">
2626
</a>
2727
<a href="https://www.nuget.org/packages/Kreuzberg/">
2828
<img src="https://img.shields.io/nuget/v/Kreuzberg?label=C%23&color=007ec6" alt="C#">

crates/kreuzberg-node/npm/darwin-arm64/package.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
{
22
"name": "@kreuzberg/node-darwin-arm64",
3-
"version": "4.2.9",
3+
"version": "4.2.10",
44
"cpu": [
55
"arm64"
66
],

crates/kreuzberg-node/npm/darwin-x64/package.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
{
22
"name": "@kreuzberg/node-darwin-x64",
3-
"version": "4.2.9",
3+
"version": "4.2.10",
44
"cpu": [
55
"x64"
66
],

0 commit comments

Comments
 (0)