Skip to content

Conversation

@romanb
Copy link
Contributor

@romanb romanb commented Apr 2, 2025

This PR is a follow-up to #1741. In particular, it addresses the parsing of IF/ELSE statements for MSSQL. These are syntactically quite different from the already supported IF ... THEN ... ELSEIF ... END IF statements, see the language reference.

The following are the main changes in this PR:

  • IfStatement is now an enum with two variants.
  • Statement parsing is overridden for the MSSQL dialect in order to parse IF statements differently for MSSQL.
  • AST nodes for IF/CASE statements are slightly refactored to accommodate the new functionality and produce correct spans via attached tokens.
  • Tests are updated and extended.

Roman Borschel added 3 commits April 2, 2025 11:40
These are syntactically quite different from the already supported
IF ... THEN ... ELSEIF ... END IF statements. Hence IfStatement is now
an enum with two variants and statement parsing is overridden for
the MSSQL dialect in order to parse IF statements differently for MSSQL.

Thereby fix spans for if/case AST nodes by including start/end tokens,
if present.
src/ast/mod.rs Outdated
/// ```
///
/// [MSSQL](https://learn.microsoft.com/en-us/sql/t-sql/language-elements/if-else-transact-sql?view=sql-server-ver16)
MsSqlIfElse {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to merge the representation to be shared by both? Something like this roughly I'm thinking

enum StatementBlock {
  Expr(Expr),
  Statements(Vec<Statements>),
  Begin(Vec<Statements>)
}

pub struct IfStatement {
    pub condition: Expr,
    pub if_block: StatementBlock,
    pub elseif_blocks: Vec<StatementBlock>,
    pub else_block: Option<StatementBlock>,
}

Ideally we avoid introducing dialect specific nodes in the AST

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your feedback. I pushed further changes to avoid the dialect-specific AST nodes.

Copy link
Contributor

@iffyio iffyio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @romanb! Left a minor comment, otherwise this looks good to me!

src/ast/spans.rs Outdated
has_end_case: _,
case_token: AttachedToken(start),
end_case_token: AttachedToken(end),
..
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this and ConditionalStatement Spanned impl could we explicitly list the remaining fields so that the let match is exhaustive? So that we're forced to revisit this part of the code if new fields are added/modified

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, I did that! Thanks again for the review.

Copy link
Contributor

@iffyio iffyio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks @romanb!
cc @alamb

@iffyio iffyio merged commit 0d2976d into apache:main Apr 6, 2025
9 checks passed
@romanb romanb deleted the mssql-if-else branch April 7, 2025 08:28
@alamb
Copy link
Contributor

alamb commented Apr 7, 2025

Amazing -- thank you @romanb and @iffyio

@aharpervc
Copy link
Contributor

aharpervc commented Apr 7, 2025

This is a great improvement! However, I noticed a difficulty with my code. This SQL fails to parse:

declare @x bit = 1

if @x = 1
begin
  select 1
end

It works if you add a semi-colon for the declare statement, though. Either syntax executes as expected in SQL Server directly.

What would it take to make it parse okay even without the semi-colon?

@romanb
Copy link
Contributor Author

romanb commented Apr 7, 2025

This is a great improvement! However, I noticed a difficulty with my code. This SQL fails to parse:

declare @x bit = 1

if @x = 1
begin
  select 1
end

It works if you add a semi-colon for the declare statement, though. Either syntax executes as expected in SQL Server directly.

What would it take to make it parse okay even without the semi-colon?

To parse multiple top-level statements for MSSQL with optional semicolons, you can use the Parser API as follows:

let mut stmts = Vec::new();
loop {
    if let Token::EOF = parser.peek_token_ref().token {
        break;
    }
    stmts.push(parser.parse_statement()?);
    while let Token::SemiColon = parser.peek_token_ref().token {
        parser.advance_token();
    }
}

I also did it similarly in this PR when parsing the statements within BEGIN and END of a conditional for MSSQL. In contrast, Parser#parse_statements() (note the plural) always expects semicolon-separated statements.

@aharpervc
Copy link
Contributor

Interesting. Of course, I should have also shared my Rust code. I'm doing Parser::parse_sql(&dialect, sql_text) where the dialect is SQL Server & the sql_text is loaded from a file.

Also, I was trying to simplify the example SQL to a minimum reproducible example, but my original scenario is this kind of code inside of a SP.

For example:

create or alter procedure test()
as
begin
  declare @x bit = 1

  if @x = 1
  begin
    select 1
  end
end

This also demonstrates the difficulty. I will adjust my Rust logic to the approach you suggested and see if I can make it work -- thank you!

I still view this as a very surprising library behavior, though, especially since SQL Server itself accepts/parses/executes the identical file without issue.

@romanb
Copy link
Contributor Author

romanb commented Apr 7, 2025

@aharpervc I see your difficulty. While working on this PR I did notice that a more general custom parsing for begin/end for the MSSQL dialect is probably necessary as well (e.g. by overriding parse_begin in the MSSQL dialect), but I did not want to expand the scope of this PR unnecessarily.

If your example statement for the stored procedure does not parse, I would suggest to open an issue as a starting point. The problem of parsing multiple top-level statements with optional semicolons for MSSQL may also warrant another issue - I'm personally just sticking to the code I mentioned above to work around that problem for now, not least because even in SQL server, the use of semicolons is very much encouraged (quoting from the docs: Although the semicolon isn't required for most statements in this version of SQL Server, it will be required in a future version.)

Realistically though, I doubt the enforcement of semicolons will ever happen in SQL server and it is definitely desirable for this library to be fully capable of parsing with optional semicolons when using the MSSQL dialect out-of-the-box. I can only suggest to open new issues (or even PRs, if possible) for the remaining problems.

@aharpervc
Copy link
Contributor

If your example statement for the stored procedure does not parse, I would suggest to open an issue as a starting point.

Done: #1800

ayman-sigma pushed a commit to sigmacomputing/sqlparser-rs that referenced this pull request Apr 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants