@@ -9,6 +9,90 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
99
1010## [ Unreleased]
1111
12+ ### Added
13+
14+ #### API
15+ - ** POST /chunk endpoint** : New text chunking endpoint for breaking text into smaller pieces
16+ - Accepts JSON body with ` text ` , ` chunker_type ` (text/markdown), and optional ` config `
17+ - Returns chunks with byte offsets, indices, and metadata
18+ - Configuration options: ` max_characters ` (default: 2000), ` overlap ` (default: 100), ` trim ` (default: true)
19+ - Supports both text and markdown chunking strategies
20+ - Case-insensitive chunker_type parameter
21+ - Comprehensive error handling for invalid inputs
22+
23+ #### Core
24+ - ** Element-based output format** : New ` OutputFormat::ElementBased ` option provides Unstructured.io-compatible semantic element extraction
25+ - Extracts structured elements: titles, paragraphs, lists, tables, images, page breaks, headings, code blocks, block quotes, headers, footers
26+ - Each element includes rich metadata: bounding boxes, page numbers, confidence scores, hierarchy information
27+ - Transformation pipeline converts unified output to element-based format via ` extraction::transform ` module
28+ - Added ` Element ` , ` ElementType ` , ` ElementMetadata ` , and ` BoundingBox ` types to core types module
29+ - Supports PDF hierarchy detection for semantic heading levels
30+ - Configuration via ` config.output_format ` field (defaults to ` Unified ` )
31+
32+ #### Language Bindings
33+ - ** Python** : Element-based output support with full type hints
34+ - New ` output_format ` parameter in extraction config accepting ` "unified" ` or ` "element_based" `
35+ - ` Element ` , ` ElementType ` , ` ElementMetadata ` , ` BoundingBox ` types exported from ` kreuzberg.types `
36+ - Result includes ` elements ` field when using element-based format
37+ - Compatible with Unstructured.io API for migration
38+
39+ - ** TypeScript/Node.js** : Element-based output with strict TypeScript interfaces
40+ - ` Element ` , ` ElementType ` , ` ElementMetadata ` , ` BoundingBox ` interfaces in ` @kreuzberg/core `
41+ - ` outputFormat: "unified" | "element_based" ` configuration option
42+ - Result type includes optional ` elements ` array
43+
44+ - ** Ruby** : Element-based output with idiomatic Ruby types
45+ - ` Element ` , ` ElementType ` , ` ElementMetadata ` , ` BoundingBox ` classes in ` Kreuzberg::Types `
46+ - Snake_case serialization for Ruby conventions
47+ - ` output_format: :unified ` or ` :element_based ` symbol-based configuration
48+
49+ - ** PHP** : Element-based output with typed classes
50+ - ` Element ` , ` ElementType ` , ` ElementMetadata ` , ` BoundingBox ` classes in ` Kreuzberg\Types `
51+ - ` outputFormat ` field in extraction config
52+ - ` $result->elements ` array when using element-based format
53+
54+ - ** Go** : Element-based output with idiomatic Go structs
55+ - ` Element ` , ` ElementType ` , ` ElementMetadata ` , ` BoundingBox ` types with JSON tags
56+ - ` OutputFormat ` field in extraction config
57+ - Result struct includes ` Elements ` slice
58+
59+ - ** Java** : Element-based output with builder pattern
60+ - ` Element ` , ` ElementType ` , ` ElementMetadata ` , ` BoundingBox ` classes with builders
61+ - ` outputFormat ` field in ` ExtractionConfig `
62+ - ` ExtractionResult.getElements() ` method
63+
64+ - ** C#** : Element-based output with nullable reference types
65+ - ` Element ` , ` ElementType ` , ` ElementMetadata ` , ` BoundingBox ` classes
66+ - ` OutputFormat ` property in extraction config
67+ - ` ExtractionResult.Elements ` property
68+
69+ - ** Elixir** : Element-based output with pattern matching
70+ - ` Kreuzberg.Element ` module with typespecs
71+ - ` :output_format ` option in config accepting ` :unified ` or ` :element_based `
72+ - Result map includes ` :elements ` key with element list
73+
74+ - ** WASM** : Element-based output with TypeScript definitions
75+ - Element types exported to WASM TypeScript bindings
76+ - ` output_format ` configuration option
77+ - Elements accessible from extraction result
78+
79+ #### Documentation
80+ - ** Migration guides** : New documentation for Unstructured.io users
81+ - ` docs/migration/from-unstructured.md ` : Step-by-step migration guide with code examples
82+ - ` docs/comparisons/kreuzberg-vs-unstructured.md ` : Feature comparison and compatibility matrix
83+ - Element-based output guide: ` docs/guides/element-based-output.md ` covering all 11 element types
84+ - Type reference updates: Added Element, ElementType, ElementMetadata, BoundingBox, OutputFormat
85+ - Code snippets for element-based extraction in all 10 languages
86+
87+ ### Fixed
88+
89+ #### Python
90+ - ** Type exports** : Fixed missing type exports in ` kreuzberg.types.__all__ `
91+ - Added ` Element ` , ` ElementMetadata ` , ` ElementType ` , ` BoundingBox ` to exported types
92+ - Added ` HtmlImageMetadata ` for HTML image metadata
93+ - Total 32 public types now properly exported for IDE autocomplete and type checking
94+ - Resolves import failures where types were defined but not accessible
95+
1296---
1397
1498## [ 4.0.8] - 2026-01-17
0 commit comments