-
Notifications
You must be signed in to change notification settings - Fork 19.8k
Description
Checked other resources
- This is a feature request, not a bug report or usage question.
- I added a clear and descriptive title that summarizes the feature request.
- I used the GitHub search to find a similar feature request and didn't find it.
- I checked the LangChain documentation and API reference to see if this feature already exists.
- This is not related to the langchain-community package.
Package (Required)
- langchain
- langchain-openai
- langchain-anthropic
- langchain-classic
- langchain-core
- langchain-cli
- langchain-model-profiles
- langchain-tests
- langchain-text-splitters
- langchain-chroma
- langchain-deepseek
- langchain-exa
- langchain-fireworks
- langchain-groq
- langchain-huggingface
- langchain-mistralai
- langchain-nomic
- langchain-ollama
- langchain-perplexity
- langchain-prompty
- langchain-qdrant
- langchain-xai
- Other / not sure / general
Feature Description
I would like to request adding support for a MySQL language text splitter in langchain-text-splitters. Currently, the class RecursiveCharacterTextSplitter includes support for several languages, but MySQL syntax is not covered. MySQL users often need structured splitting based on keywords such as SELECT, UPDATE, INSERT, DELETE, and procedural blocks like BEGIN…END, which could greatly improve parsing and processing use cases within data workflows and SQL analysis tasks.
Use Case
Once MySQL language splitter support is provided, users will be able to use it as follow:
from langchain.text_splitter import RecursiveCharacterTextSplitter, Language
MYSQL_CODE = """
SELECT * FROM users;
SELECT * FROM orders;
"""
mysql_splitter = RecursiveCharacterTextSplitter.from_language(
language=Language.MYSQL,
chunk_size=30,
chunk_overlap=0
)
mysql_docs = mysql_splitter.create_documents([MYSQL_CODE])
print(mysql_docs)
# [Document(metadata={}, page_content='SELECT * FROM users;'), Document(metadata={}, page_content='SELECT * FROM orders;')]Proposed Solution
Add MySQL text splitter support in RecursiveCharacterTextSplitter by introducing the language type Language.MYSQL and defining MySQL-specific separators, enabling users to split SQL text more effectively.
Alternatives Considered
No response
Additional Context
Related pull request: #34028
I have already submitted a pull request to the LangChain repository, PR #34028, titled feat(text-splitters): add MySQL language support to RecursiveCharacterTextSplitter. In that PR, I have completed the code contribution related to adding Language.MYSQL and updating the from_language() method, without removing or modifying any unrelated code. I also added a new unit test case. Following the contribution guidelines, I ran all required tests locally, and all CI checks have passed on the PR page.
However, the PR appears to be stuck at the CodeQL scanning stage and the CodeQL workflow has not started at all. As I am a new contributor, I am unable to determine the cause or request a review. I have described the problem in detail in the comments of PR #34028.
I apologize for submitting a PR before opening an issue. I hope the LangChain maintainers will consider and accept this issue and the proposed solution. Thank you!