Skip to content

feat: Add support for Markdown files #496

Merged
ali-sedaghatbaf merged 37 commits intomainfrom
md-support
Mar 27, 2026
Merged

feat: Add support for Markdown files #496
ali-sedaghatbaf merged 37 commits intomainfrom
md-support

Conversation

@ali-sedaghatbaf
Copy link
Copy Markdown
Contributor

@ali-sedaghatbaf ali-sedaghatbaf commented Mar 24, 2026

Description

This pull request introduces a major refactor and improvement to the document loading system in the experimental SimpleKG pipeline, expanding support for Markdown files and standardizing terminology and interfaces across the codebase and documentation. The changes include replacing the legacy from_pdf and pdf_loader parameters with the more general from_file and file_loader, updating documentation and examples accordingly, and adding support for Markdown input. Backwards compatibility is preserved with deprecation warnings for old parameter names.

  • The SimpleKG pipeline now supports Markdown (.md, .markdown) files in addition to PDFs when using the default FileLoader (CHANGELOG.md).
  • The parameters and configuration keys from_pdf and pdf_loader are replaced by from_file and file_loader throughout the codebase, configuration files, and documentation. Legacy names are still accepted with a deprecation warning (user_guide_kg_builder.rst, README.md, multiple example scripts and config files).
  • The PdfLoader and related classes are moved from pdf_loader to the new data_loader module, which also introduces MarkdownLoader and a unified FileLoader. All relevant imports and documentation are updated (api.rst, example scripts).
  • The LoadedDocument type replaces PdfDocument as the standard output for data loaders, and documentation is updated to reflect this (types.rst, example scripts).
  • A new exception, UnsupportedDocumentFormatError, is introduced for unsupported file types in the document loader (exceptions.py).

Type of Change

  • New feature
  • Bug fix
  • Breaking change
  • Documentation update
  • Project configuration change

Complexity

Complexity: low

How Has This Been Tested?

  • Unit tests
  • E2E tests
  • Manual tests

Checklist

The following requirements should have been met (depending on the changes in the branch):

  • Documentation has been updated
  • Unit tests have been updated
  • E2E tests have been updated
  • Examples have been updated
  • New files have copyright header
  • CLA (https://neo4j.com/developer/cla/) has been signed
  • CHANGELOG.md updated if appropriate

@ali-sedaghatbaf ali-sedaghatbaf requested a review from a team as a code owner March 24, 2026 14:32
@ali-sedaghatbaf ali-sedaghatbaf changed the title Md support Add support for Markdown files Mar 24, 2026
@ali-sedaghatbaf ali-sedaghatbaf marked this pull request as draft March 24, 2026 14:33
@ali-sedaghatbaf ali-sedaghatbaf marked this pull request as ready for review March 24, 2026 15:17
@ali-sedaghatbaf ali-sedaghatbaf changed the title Add support for Markdown files feat: Add support for Markdown files Mar 24, 2026
Copy link
Copy Markdown
Contributor

@jonnylaw jonnylaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good stuff 🚀

A few missing tests and the MarkdownLoader currently returns PdfLoaderError.

@ali-sedaghatbaf ali-sedaghatbaf merged commit 99ce46c into main Mar 27, 2026
14 checks passed
@ali-sedaghatbaf ali-sedaghatbaf deleted the md-support branch March 27, 2026 07:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants