Efficiently rename header handling in ExperimentalMarkdownSyntaxTextSplitter for improved LLM understanding #26970
Unanswered
david101-hunter
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Checked other resources
Commit to Help
Example Code
Description
Background
I'm working with the ExperimentalMarkdownSyntaxTextSplitter from the LangChain library to prepare text for processing by an LLM (Language Model). I want to improve how the splitter handles Markdown headers (h1 to h6) to enhance the LLM's understanding of document structure.
In practice, I've found through experimentation that it's not very good yet, as it often provides incomplete answers when the response requires multiple steps,...
Current Approach
Currently, I'm using the default ExperimentalMarkdownSyntaxTextSplitter without any customizatio. In markdown.py of langchain_text_splliters (of langchain library).
Desired Outcome
I want to refine the splitter to better recognize and handle headers from h1 to h6, potentially by customizing the separator patterns and I also want to know any ways to achieve improving llm answers.
Question
How can I modify the ExperimentalMarkdownSyntaxTextSplitter to improve its handling of Markdown headers (h1 to h6)? Specifically:
System Info
Additional Information
Beta Was this translation helpful? Give feedback.
All reactions