Skip to content

Add MySQL language text splitter support #34058

@makkruo

Description

@makkruo

Checked other resources

  • This is a feature request, not a bug report or usage question.
  • I added a clear and descriptive title that summarizes the feature request.
  • I used the GitHub search to find a similar feature request and didn't find it.
  • I checked the LangChain documentation and API reference to see if this feature already exists.
  • This is not related to the langchain-community package.

Package (Required)

  • langchain
  • langchain-openai
  • langchain-anthropic
  • langchain-classic
  • langchain-core
  • langchain-cli
  • langchain-model-profiles
  • langchain-tests
  • langchain-text-splitters
  • langchain-chroma
  • langchain-deepseek
  • langchain-exa
  • langchain-fireworks
  • langchain-groq
  • langchain-huggingface
  • langchain-mistralai
  • langchain-nomic
  • langchain-ollama
  • langchain-perplexity
  • langchain-prompty
  • langchain-qdrant
  • langchain-xai
  • Other / not sure / general

Feature Description

I would like to request adding support for a MySQL language text splitter in langchain-text-splitters. Currently, the class RecursiveCharacterTextSplitter includes support for several languages, but MySQL syntax is not covered. MySQL users often need structured splitting based on keywords such as SELECT, UPDATE, INSERT, DELETE, and procedural blocks like BEGIN…END, which could greatly improve parsing and processing use cases within data workflows and SQL analysis tasks.

Use Case

Once MySQL language splitter support is provided, users will be able to use it as follow:

from langchain.text_splitter import RecursiveCharacterTextSplitter, Language

MYSQL_CODE = """
SELECT * FROM users;
SELECT * FROM orders;
"""
mysql_splitter = RecursiveCharacterTextSplitter.from_language(
    language=Language.MYSQL,
    chunk_size=30,
    chunk_overlap=0
)
mysql_docs = mysql_splitter.create_documents([MYSQL_CODE])
print(mysql_docs) 
# [Document(metadata={}, page_content='SELECT * FROM users;'), Document(metadata={}, page_content='SELECT * FROM orders;')]

Proposed Solution

Add MySQL text splitter support in RecursiveCharacterTextSplitter by introducing the language type Language.MYSQL and defining MySQL-specific separators, enabling users to split SQL text more effectively.

Alternatives Considered

No response

Additional Context

Related pull request: #34028

I have already submitted a pull request to the LangChain repository, PR #34028, titled feat(text-splitters): add MySQL language support to RecursiveCharacterTextSplitter. In that PR, I have completed the code contribution related to adding Language.MYSQL and updating the from_language() method, without removing or modifying any unrelated code. I also added a new unit test case. Following the contribution guidelines, I ran all required tests locally, and all CI checks have passed on the PR page.

However, the PR appears to be stuck at the CodeQL scanning stage and the CodeQL workflow has not started at all. As I am a new contributor, I am unable to determine the cause or request a review. I have described the problem in detail in the comments of PR #34028.

I apologize for submitting a PR before opening an issue. I hope the LangChain maintainers will consider and accept this issue and the proposed solution. Thank you!

Metadata

Metadata

Assignees

No one assigned

    Labels

    feature requestrequest for an enhancement / additional functionalitytext-splittersRelated to the package `text-splitters`

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions