-
Notifications
You must be signed in to change notification settings - Fork 18
Blog on Extending SQL to create own SQL Dialects #97
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
I am feeling August is going to be a month of some amazing blog content |
Yep |
|
have a look on this one also @alamb |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @Adez017 for providing the good content 👍 I have one suggestion for introducing the journey of the SQL. Other parts look great to me 👍
Co-authored-by: Jax Liu <[email protected]>
Co-authored-by: Jax Liu <[email protected]>
|
i had made the changes as per your suggestion @goldmedal , please take a look around |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
|
can we move forward to merge @goldmedal @alamb |
|
I plan to read / work on this PR later today |
|
I am starting to check this one out |
|
Thanks for the Adjustments @alamb , i highly appreciate your time and efforts |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @Adez017 -- this is a great start to a blog. I pushed a few more links to the intro and background
I think it would be best if we could improve the examples before publishing this. Please let me know what you think.
I have two major comments:
Apply to the motivating example?
The post says:
DataFusion provides an excellent example of custom SQL dialect implementation in their [sql_dialect.rs] example. Let's break down how it works and then apply the pattern to our
ATTACH DATABASEuse case.
I didn't see any mention / examples of how to use the ATTACH DATABASE syntax
The COPY TO parser example is strange
I tried the example for COPY TO in the sql and it basically worked without a custom parser:
> create table source_table as values (1);
0 row(s) fetched.
Elapsed 0.002 seconds.
> COPY source_table TO 'file.fasta' STORED AS FASTA;
Error during planning: There is no registered file format with ext fasta(I would expect a parser error for a statement) I know this is not anything introduced by this post.
What would you think about updating the sql_parser.rs example to do something more exciting, such as either:
- Actually implementing the motivating example from this blog post
CREATE EXTERNAL CATALOG my_catalog
STORED AS PARQUET
LOCATION 's3://my-bucket/data/'
OPTIONS (
'aws.region' = 'us-west-2',
'catalog.type' = 'hive_metastore'
);- Implementing the
ATTACHandDETACHstatements from DuckDB?
Thanks @alamb , i think the its a great idea we can follow up along . and it would be great start |
|
by the way i think i had somewhere mixed up thinks from duckDB , IG |
So let's come up with a plan @Adez017 -- would you like me to take a shot at updating the sql parser example or would you? Do you have any preference on what we should show ( |
i would suggest you @alamb to do as , you had better experience in the following , and for preferences i think we should move forward with |
|
any updates @alamb |
|
Not yet -- sorry -- I am currently planning to help get I just don't want to publish a blog on the DataFusion site that is confusing -- I think we have a history of high quality content so if we are going to publish something I want it to be really compelling. THis one has the potential, but I think it needs some more work -- at the very least we need to resolve the discrepancy between the motivating example and what is acutally shown |
Sure , i am open to help you anywhere , when needed please let me know @alamb |
|
I was thinking some more last night -- maybe a good example would be "you want to implement a SQL dialect where the FROM clause is first, so instead of You wanted to implement 🤔 I think custom DDL / statements are likely to be the most common usecases though 🤔 |
Logically , it make sense . Most of the time we use the DDL commands and having a modularity to create custom DDL would be a great example . Much appreciated @alamb |
|
Hi @alamb , just checking in. If you need any help, let me know, as you have many overheads. |
|
Hi @alamb , any updates ? just curious about this one . |
|
Hi @Adez017 -- I am not likely to be able to spend much time on this project for a while To be published, at minimum I think this blog needs to be updated to actually implement the motivating example (or perhaps change the motivating example so it matches the examples upstream) As I mentioned, I think we could make the upstream example significantly more compelling either with custom query syntax, or perhaps implementing the motivating example in the blog. However, that is a larger project which I don't have time for Please feel free to work on those changes if you have time. I would most appreciate it |
|
@theirix made a PR with pretty interesting example apache/datafusion#17633 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will say upfront I don't usually review these blog posts so I'm not as familiar with the general standards and such, but I'll leave some notes below from my review.
I agree with above comments that the motivating example should be consistent throughout the post; it makes more sense if we structure this like a case study with one specific example that we follow from top to bottom, instead of introducing different examples.
Some other minor notes:
- Do we need to mention sqlparser-rs anywhere? There's lots of mention of DataFusion parsing but I believe sqlparser-rs does a lot of the heavy lifting here, unless it's under the umbrella of the DataFusion keyword
- For the conclusion I think should use a consistent call to action, like so:
datafusion-site/content/blog/2025-09-21-custom-types-using-metadata.md
Lines 327 to 341 in 31f9668
| ## Get Involved | |
| The DataFusion team is an active and engaging community and we would love to have you join | |
| us and help the project. | |
| Here are some ways to get involved: | |
| * Learn more by visiting the [DataFusion] project page. | |
| * Try out the project and provide feedback, file issues, and contribute code. | |
| * Work on a [good first issue]. | |
| * Reach out to us via the [communication doc]. | |
| [DataFusion]: https://datafusion.apache.org/index.html | |
| [good first issue]: https://github.com/apache/datafusion/issues?q=is%3Aissue+is%3Aopen+label%3A%22good+first+issue%22 | |
| [communication doc]: https://datafusion.apache.org/contributor-guide/communication.html |
Hi @alamb @goldmedal, I have drafted the blog on the topic and need you to review it for suggestions.