Skip to content

Conversation

seve-martinez
Copy link
Contributor

Snowflake, Databricks, Postgres, and others all support some from of table sampling.

This commonly takes on the form

SELECT ...
FROM ... 
{ SAMPLE | TABLESAMPLE } [ samplingMethod ]

where

samplingMethod ::= { { BERNOULLI | ROW } ( { <probability> | <num> ROWS } ) |
                     { SYSTEM | BLOCK } ( <probability> ) [ { REPEATABLE | SEED } ( <seed> ) ] }

Different dialects support one, the other, or both sample keywords and only some of the methods. The current PR follows the snowflake paradigm.

Outstanding questions

  • This method is not currently called anywhere. Review of the Parser code would make me think it should be part of the parse_table_factor() logic since it always trails a relation. However this would then require each variant of TableFactor that supports SAMPLE to have a placeholder for it. Is that the right approach?

@iffyio
Copy link
Contributor

iffyio commented Dec 6, 2024

Hi @seve-martinez! It sounds like the parsing can be done while parsing the select body instead? Probably here before looking for a WHERE clause?

@yoavcloud
Copy link
Contributor

I did not notice this PR when I started working on parsing the SAMPLE option, but since then it was merged: #1580

I can share that we used to parse the SAMPLE option as part of the TableFactor but it wasn't ideal, and moving it to the select made more sense.

@seve-martinez
Copy link
Contributor Author

I did not notice this PR when I started working on parsing the SAMPLE option, but since then it was merged: #1580

I can share that we used to parse the SAMPLE option as part of the TableFactor but it wasn't ideal, and moving it to the select made more sense.

Oh awesome! Thanks for taking care of it, I wasn't in love with my solution and what you have there looks very sound. 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants