Skip to content

Commit 112c946

Browse files
committed
chore: bounding box typing parity
1 parent d15da49 commit 112c946

File tree

96 files changed

+373
-168
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

96 files changed

+373
-168
lines changed

CHANGELOG.md

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
77

88
---
99

10-
## [Unreleased]
10+
## [4.3.5]
1111

1212
### Added
1313

@@ -22,9 +22,13 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
2222
- **Image placeholder injection in PDF markdown output**: Image references are inserted with OCR text as blockquotes at correct vertical position matching the image's bounding box.
2323
- **`render_document_as_markdown_with_tables()` function**: New public function for table-aware markdown rendering that embeds tables inline at correct positions and injects image placeholders. Used internally by `render_document_as_markdown()`.
2424
- **`inject_image_placeholders()` function**: New post-processing function for markdown that injects `![Image description]()` placeholders and OCR text blockquotes at correct vertical positions in the content.
25+
- **`bounding_box` field in all language bindings**: Added `bounding_box` (optional `BoundingBox`) to `Table` and `ExtractedImage` types across all 10 language bindings: Python, TypeScript (Node/Core/WASM), Ruby, PHP, Go, Java, C#, and Elixir.
2526

2627
### Fixed
2728

29+
- **Pipeline test flakiness**: Disabled post-processing in pipeline tests that don't test post-processing, fixing `test_pipeline_without_chunking` and related tests that failed due to global processor cache poisoning in parallel execution.
30+
- **PHP FFI bridge missing `bounding_box`**: The PHP Rust bridge (`kreuzberg-php`) was not passing `bounding_box` through for `Table` or `ExtractedImage`, causing the field to always be null despite being defined in the PHP user-facing types.
31+
2832
- **PaddleOCR dict index offset causing wrong character recognition (#395)**: `read_keys_from_file()` was missing the CTC blank token (`#`) at index 0 and the space token at the end, causing off-by-one character mapping errors. Now matches the `get_keys()` layout used for embedded models.
2933
- **PaddleOCR angle classifier misfiring on short text (#395)**: Changed `use_angle_cls` default from `true` to `false`. The angle classifier can misfire on short text regions (e.g., 2-3 character table cells), rotating crops incorrectly before recognition. Users can re-enable via `PaddleOcrConfig::with_angle_cls(true)` for rotated documents.
3034
- **PaddleOCR excessive padding including table gridlines (#395)**: Reduced default detection padding from 50px to 10px and made it configurable via `PaddleOcrConfig::with_padding()`. Large padding on small images caused table gridlines to be included in text crops.

Cargo.lock

Lines changed: 14 additions & 14 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

Cargo.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,7 @@ kreuzberg-paddle-ocr = { path = "crates/kreuzberg-paddle-ocr" }
2626
kreuzberg-pdfium-render = { path = "crates/kreuzberg-pdfium-render" }
2727

2828
[workspace.package]
29-
version = "4.3.4"
29+
version = "4.3.5"
3030
edition = "2024"
3131
rust-version = "1.91"
3232
authors = ["Na'aman Hirschfeld <nhirschfeld@gmail.com>"]

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@
2222
<img src="https://img.shields.io/maven-central/v/dev.kreuzberg/kreuzberg?label=Java&color=007ec6" alt="Java">
2323
</a>
2424
<a href="https://github.com/kreuzberg-dev/kreuzberg/releases">
25-
<img src="https://img.shields.io/github/v/tag/kreuzberg-dev/kreuzberg?label=Go&color=007ec6&filter=v4.3.4" alt="Go">
25+
<img src="https://img.shields.io/github/v/tag/kreuzberg-dev/kreuzberg?label=Go&color=007ec6&filter=v4.3.5" alt="Go">
2626
</a>
2727
<a href="https://www.nuget.org/packages/Kreuzberg/">
2828
<img src="https://img.shields.io/nuget/v/Kreuzberg?label=C%23&color=007ec6" alt="C#">

composer.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
{
22
"name": "kreuzberg/kreuzberg",
33
"description": "High-performance document intelligence for PHP. Extract text, metadata, and structured information from PDFs, Office documents, images, and 75 formats. Powered by Rust core for 10-50x speed improvements.",
4-
"version": "4.3.4",
4+
"version": "4.3.5",
55
"type": "php-ext",
66
"license": "MIT",
77
"keywords": [

crates/kreuzberg-cli/Cargo.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@ keywords = ["document", "extraction", "cli", "tool", "parser"]
1313
categories = ["command-line-utilities", "text-processing"]
1414

1515
[dependencies]
16-
kreuzberg = { path = "../kreuzberg", version = "4.3.4", features = ["cli"] }
16+
kreuzberg = { path = "../kreuzberg", version = "4.3.5", features = ["cli"] }
1717
clap = { workspace = true }
1818
tokio = { workspace = true }
1919
anyhow = { workspace = true }

crates/kreuzberg-node/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@
2222
<img src="https://img.shields.io/maven-central/v/dev.kreuzberg/kreuzberg?label=Java&color=007ec6" alt="Java">
2323
</a>
2424
<a href="https://github.com/kreuzberg-dev/kreuzberg/releases">
25-
<img src="https://img.shields.io/github/v/tag/kreuzberg-dev/kreuzberg?label=Go&color=007ec6&filter=v4.3.4" alt="Go">
25+
<img src="https://img.shields.io/github/v/tag/kreuzberg-dev/kreuzberg?label=Go&color=007ec6&filter=v4.3.5" alt="Go">
2626
</a>
2727
<a href="https://www.nuget.org/packages/Kreuzberg/">
2828
<img src="https://img.shields.io/nuget/v/Kreuzberg?label=C%23&color=007ec6" alt="C#">

crates/kreuzberg-node/npm/darwin-arm64/package.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
{
22
"name": "@kreuzberg/node-darwin-arm64",
3-
"version": "4.3.4",
3+
"version": "4.3.5",
44
"cpu": [
55
"arm64"
66
],

crates/kreuzberg-node/npm/darwin-x64/package.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
{
22
"name": "@kreuzberg/node-darwin-x64",
3-
"version": "4.3.4",
3+
"version": "4.3.5",
44
"cpu": [
55
"x64"
66
],

crates/kreuzberg-node/npm/linux-arm-gnueabihf/package.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
{
22
"name": "@kreuzberg/node-linux-arm-gnueabihf",
3-
"version": "4.3.4",
3+
"version": "4.3.5",
44
"cpu": [
55
"arm"
66
],

0 commit comments

Comments
 (0)