Skip to content

Allow setting the recursion limit for sql parsing#14756

Merged
alamb merged 4 commits intoapache:mainfrom
pydantic:sql_recursion_limit_config
Feb 28, 2025
Merged

Allow setting the recursion limit for sql parsing#14756
alamb merged 4 commits intoapache:mainfrom
pydantic:sql_recursion_limit_config

Conversation

@cetra3
Copy link
Contributor

@cetra3 cetra3 commented Feb 19, 2025

Which issue does this PR close?

No issue, just running into this in production.

Rationale for this change

At the moment there isn't a clean way to set the recursion limit on the sql parser.

What changes are included in this PR?

This PR allows a config option, datafusion.sql_parser.recursion_limit which allows overriding the recursion limit.

Are these changes tested?

No, it's just a config option.

Are there any user-facing changes?

There will be some changes to public facing API. I'm also not sure if this is the best approach here, as DFParser might need a deeper refactor to keep things clean.

@github-actions github-actions bot added sql SQL Planner core Core DataFusion crate common Related to common crate labels Feb 19, 2025
@jatin510
Copy link
Contributor

I see that a recursion limit of 50 is introduced for SQL parsing.
Is this based on specific performance benchmarks or potential stack overflow risks?
Also, is there any reference to similar constraints in databases like PostgreSQL, or is this a DataFusion-specific safeguard?

@cetra3

@cetra3
Copy link
Contributor Author

cetra3 commented Feb 19, 2025

@jatin510 This number is actually the default in sqlparser: https://github.com/apache/datafusion-sqlparser-rs/blob/b482562618caa3efa89c2f42f87472b00a270926/src/parser/mod.rs#L187

So it will essentially be the exact same behaviour, but configurable.

@github-actions github-actions bot added the sqllogictest SQL Logic Tests (.slt) label Feb 20, 2025
@cetra3 cetra3 force-pushed the sql_recursion_limit_config branch 3 times, most recently from fd42361 to b29e5e2 Compare February 21, 2025 04:26
@github-actions github-actions bot added the documentation Improvements or additions to documentation label Feb 21, 2025
@adriangb
Copy link
Contributor

I'll note that we encountered this when parsing a dynamically generated expression containing a lot of ANDs and ORs. I believe these sorts of expressions are parsed recursively, so it's pretty easy to hit 50 levels of recursion if each AND is a level.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @cetra3 -- this change makes sense to me. Thank you 🙏

To merge this PR I think we should:

  1. try and improve the API (I left comments about this)
  2. Add a test (so that if we break this in the future accidentally we'll know)

A good test would be a sqllogictest that runs a query that passes and then cranks the limit down to something low and then runs the same query again and show it errors

I'll note that we encountered this when parsing a dynamically generated expression containing a lot of ANDs and ORs. I believe these sorts of expressions are parsed recursively, so it's pretty easy to hit 50 levels of recursion if each AND is a level.

@adriangb I think you are saying that this is a good change, right?


let recursion_limit = self.config.options().sql_parser.recursion_limit;

let mut statements = DFParser::parse_sql_with_dialect_limit(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The API for DFParser is already somewhat tough.

Rather than adding a new method here, could you make this a builder style instead, so ths would look something like this?

Suggested change
let mut statements = DFParser::parse_sql_with_dialect_limit(
let mut statements = DFParser::new_with_dialect(sql, dialect.with_ref())
.with_recursion_limit(recursion_limit)
.parse_statements()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've adjusted the PR to include a new DFParserBuilder that accomplishes this, but I've tried to keep the original methods as close as possible for backwards compat

}

/// Same as `sqlparser`
const DEFAULT_RECURSION_LIMIT: usize = 50;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adriangb
Copy link
Contributor

@adriangb I think you are saying that this is a good change, right?

Yes I was just explaining why this is necessary - it's not immediately obvious how parsing SQL can be so recursive.

@cetra3 cetra3 force-pushed the sql_recursion_limit_config branch 2 times, most recently from 9ffaea5 to 15dbe2e Compare February 27, 2025 02:31
@cetra3 cetra3 force-pushed the sql_recursion_limit_config branch from 15dbe2e to b36073d Compare February 27, 2025 02:41
@cetra3
Copy link
Contributor Author

cetra3 commented Feb 27, 2025

@alamb I've added a test and a builder struct, let me know if you want further changes

@alamb alamb mentioned this pull request Feb 28, 2025
10 tasks
alamb
alamb previously approved these changes Feb 28, 2025
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @cetra3

In order to try and minimize downstream API breakages following https://datafusion.apache.org/library-user-guide/api-health.html I think we should

  1. put back DFParser::new() and DFParser::new_with_dialect marked at deprecated
  2. Adding some doc strings and examples to DFParserBuilder

Here is a proposed PR with those changes to your fork:

@alamb alamb self-requested a review February 28, 2025 02:49
@alamb alamb dismissed their stale review February 28, 2025 02:49

Clicked wrong button

Add some examples, restore old APIs and deprecate
@alamb
Copy link
Contributor

alamb commented Feb 28, 2025

Thanks again @cetra3

@alamb alamb merged commit ac13687 into apache:main Feb 28, 2025
25 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

common Related to common crate core Core DataFusion crate documentation Improvements or additions to documentation sql SQL Planner sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants