feat: Add comprehensive Djot markup support with configurable output formats#312
Merged
feat: Add comprehensive Djot markup support with configurable output formats#312
Conversation
The maven-gpg-plugin was attempting to sign artifacts during local builds (mvn clean install), causing setup failures when GPG is not configured. GPG signing is now skipped by default and only enabled when using the 'publish' profile, allowing the project to work out of the box for local development.
Add a new DjotExtractor that parses Djot markup documents using the jotdown crate. Djot is a modern markup language with simpler parsing rules than CommonMark. Features: - YAML frontmatter metadata extraction - Table extraction as structured data - Heading structure preservation - Code block and link extraction - Smart punctuation handling The implementation follows the same pattern as the Markdown extractor, making it consistent with the existing codebase. MIME types: text/djot, text/x-djot Closes #262
Move Djot extractor to its own feature flag since it only needs jotdown and serde_yaml_ng (already a core dep), without requiring the full office feature dependencies. - Add `djot` feature with just `dep:jotdown` + `tokio-runtime` - Include `djot` in the `full` feature - Update all cfg attributes from `office` to `djot`
…onfiguration Add full djot extraction and output format support: - Add OutputFormat enum (Plain, Markdown, Djot, Html) to ExtractionConfig - Add --content-format CLI flag for extract and batch commands - Add KREUZBERG_OUTPUT_FORMAT environment variable support - Implement 100% djot feature extraction including: - Block elements: blockquotes, lists, code blocks, divs, sections - Inline elements: strong, emphasis, links, images, spans - Attributes system with classes, IDs, and key-value pairs - Footnotes, math blocks, raw content - Add djot generation functions for output format conversion - Create frontmatter_utils.rs for shared YAML frontmatter handling - Wire output_format through extraction pipeline - Add djot_content field to ExtractionResult for structured djot data Closes #263
Comment out broken links to non-existent benchmark pages to fix mkdocs strict mode build. Benchmark documentation will be added in the future.
- Update format count from 56 to 57 - Add djot to text & markdown formats table - Add output format examples and configuration - Update features documentation
Added comprehensive changelog entries for: - Djot markup format support with full feature list - Content output format configuration (Plain/Markdown/Djot/HTML) - Language bindings updates for all platforms - Documentation updates and fixes - Clarified distinction between result_format and content_format
1f9b791 to
a0e19ae
Compare
Goldziher
added a commit
that referenced
this pull request
Feb 13, 2026
feat: Add comprehensive Djot markup support with configurable output formats
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds comprehensive support for Djot markup language with configurable output format conversion for all file types.
Features
.djotfiles with full syntax support (headings, lists, tables, code blocks, emphasis, links, images, footnotes, math expressions, smart punctuation)Usage
kreuzberg extract document.pdf --content-format djot KREUZBERG_OUTPUT_FORMAT=djot kreuzberg extract file.docx kreuzberg batch *.pdf --content-format djot --format jsonChanges
Test Results
✅ All 39 Djot tests passing
✅ MkDocs build successful
✅ Integration tests passing
Closes #263