Skip to content

Conversation

@Adez017
Copy link
Contributor

@Adez017 Adez017 commented Jul 26, 2025

Hi @alamb @goldmedal, I have drafted the blog on the topic and need you to review it for suggestions.

@alamb
Copy link
Contributor

alamb commented Jul 26, 2025

I am feeling August is going to be a month of some amazing blog content

@Adez017
Copy link
Contributor Author

Adez017 commented Jul 26, 2025

I am feeling August is going to be a month of some amazing blog content

Yep

@Adez017
Copy link
Contributor Author

Adez017 commented Jul 26, 2025

have a look on this one also @alamb

@goldmedal goldmedal self-requested a review July 28, 2025 02:44
Copy link

@goldmedal goldmedal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Adez017 for providing the good content 👍 I have one suggestion for introducing the journey of the SQL. Other parts look great to me 👍

@Adez017
Copy link
Contributor Author

Adez017 commented Jul 28, 2025

i had made the changes as per your suggestion @goldmedal , please take a look around

Copy link

@goldmedal goldmedal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@Adez017
Copy link
Contributor Author

Adez017 commented Jul 28, 2025

can we move forward to merge @goldmedal @alamb

@alamb
Copy link
Contributor

alamb commented Jul 28, 2025

I plan to read / work on this PR later today

@alamb
Copy link
Contributor

alamb commented Jul 29, 2025

I am starting to check this one out

@Adez017
Copy link
Contributor Author

Adez017 commented Jul 29, 2025

Thanks for the Adjustments @alamb , i highly appreciate your time and efforts

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @Adez017 -- this is a great start to a blog. I pushed a few more links to the intro and background

I think it would be best if we could improve the examples before publishing this. Please let me know what you think.

I have two major comments:

Apply to the motivating example?

The post says:

DataFusion provides an excellent example of custom SQL dialect implementation in their [sql_dialect.rs] example. Let's break down how it works and then apply the pattern to our ATTACH DATABASE use case.

I didn't see any mention / examples of how to use the ATTACH DATABASE syntax

The COPY TO parser example is strange

I tried the example for COPY TO in the sql and it basically worked without a custom parser:

> create table source_table as values (1);
0 row(s) fetched.
Elapsed 0.002 seconds.

> COPY source_table TO 'file.fasta' STORED AS FASTA;
Error during planning: There is no registered file format with ext fasta

(I would expect a parser error for a statement) I know this is not anything introduced by this post.

What would you think about updating the sql_parser.rs example to do something more exciting, such as either:

  1. Actually implementing the motivating example from this blog post
CREATE EXTERNAL CATALOG my_catalog
STORED AS PARQUET
LOCATION 's3://my-bucket/data/'
OPTIONS (
  'aws.region' = 'us-west-2',
  'catalog.type' = 'hive_metastore'
);
  1. Implementing the ATTACH and DETACH statements from DuckDB?

https://duckdb.org/docs/stable/sql/statements/attach.html

@Adez017
Copy link
Contributor Author

Adez017 commented Jul 29, 2025

Thank you @Adez017 -- this is a great start to a blog. I pushed a few more links to the intro and background

I think it would be best if we could improve the examples before publishing this. Please let me know what you think.

I have two major comments:

Apply to the motivating example?

The post says:

DataFusion provides an excellent example of custom SQL dialect implementation in their [sql_dialect.rs] example. Let's break down how it works and then apply the pattern to our ATTACH DATABASE use case.

I didn't see any mention / examples of how to use the ATTACH DATABASE syntax

The COPY TO parser example is strange

I tried the example for COPY TO in the sql and it basically worked without a custom parser:

> create table source_table as values (1);
0 row(s) fetched.
Elapsed 0.002 seconds.

> COPY source_table TO 'file.fasta' STORED AS FASTA;
Error during planning: There is no registered file format with ext fasta

(I would expect a parser error for a statement) I know this is not anything introduced by this post.

What would you think about updating the sql_parser.rs example to do something more exciting, such as either:

  1. Actually implementing the motivating example from this blog post
CREATE EXTERNAL CATALOG my_catalog
STORED AS PARQUET
LOCATION 's3://my-bucket/data/'
OPTIONS (
  'aws.region' = 'us-west-2',
  'catalog.type' = 'hive_metastore'
);
  1. Implementing the ATTACH and DETACH statements from DuckDB?

https://duckdb.org/docs/stable/sql/statements/attach.html

Thanks @alamb , i think the its a great idea we can follow up along . and it would be great start

@Adez017
Copy link
Contributor Author

Adez017 commented Jul 29, 2025

by the way i think i had somewhere mixed up thinks from duckDB , IG

@alamb
Copy link
Contributor

alamb commented Jul 29, 2025

Thanks @alamb , i think the its a great idea we can follow up along . and it would be great start

So let's come up with a plan @Adez017 -- would you like me to take a shot at updating the sql parser example or would you? Do you have any preference on what we should show (CREATE EXTERNAL CATALOG or ATTACH or something else?)

@Adez017
Copy link
Contributor Author

Adez017 commented Jul 29, 2025

Thanks @alamb , i think the its a great idea we can follow up along . and it would be great start

So let's come up with a plan @Adez017 -- would you like me to take a shot at updating the sql parser example or would you? Do you have any preference on what we should show (CREATE EXTERNAL CATALOG or ATTACH or something else?)

i would suggest you @alamb to do as , you had better experience in the following , and for preferences i think we should move forward with ATTACH as its been never mentioned anywhere as of i know

@Adez017
Copy link
Contributor Author

Adez017 commented Jul 30, 2025

any updates @alamb

@alamb
Copy link
Contributor

alamb commented Jul 30, 2025

Not yet -- sorry -- I am currently planning to help get

I just don't want to publish a blog on the DataFusion site that is confusing -- I think we have a history of high quality content so if we are going to publish something I want it to be really compelling.

THis one has the potential, but I think it needs some more work -- at the very least we need to resolve the discrepancy between the motivating example and what is acutally shown

@Adez017
Copy link
Contributor Author

Adez017 commented Jul 31, 2025

Not yet -- sorry -- I am currently planning to help get

I just don't want to publish a blog on the DataFusion site that is confusing -- I think we have a history of high quality content so if we are going to publish something I want it to be really compelling.

THis one has the potential, but I think it needs some more work -- at the very least we need to resolve the discrepancy between the motivating example and what is acutally shown

Sure , i am open to help you anywhere , when needed please let me know @alamb

@alamb
Copy link
Contributor

alamb commented Aug 1, 2025

I was thinking some more last night -- maybe a good example would be "you want to implement a SQL dialect where the FROM clause is first, so instead of

SELECT * FROM table

You wanted to implement

FROM table SELECT *

🤔

I think custom DDL / statements are likely to be the most common usecases though 🤔

@Adez017
Copy link
Contributor Author

Adez017 commented Aug 1, 2025

I was thinking some more last night -- maybe a good example would be "you want to implement a SQL dialect where the FROM clause is first, so instead of

SELECT * FROM table

You wanted to implement

FROM table SELECT *

🤔

I think custom DDL / statements are likely to be the most common usecases though 🤔

Logically , it make sense . Most of the time we use the DDL commands and having a modularity to create custom DDL would be a great example . Much appreciated @alamb

@Adez017
Copy link
Contributor Author

Adez017 commented Aug 3, 2025

Hi @alamb , just checking in. If you need any help, let me know, as you have many overheads.

@Adez017
Copy link
Contributor Author

Adez017 commented Aug 4, 2025

Hi @alamb , any updates ? just curious about this one .

@alamb
Copy link
Contributor

alamb commented Aug 4, 2025

Hi @Adez017 -- I am not likely to be able to spend much time on this project for a while

To be published, at minimum I think this blog needs to be updated to actually implement the motivating example (or perhaps change the motivating example so it matches the examples upstream)

As I mentioned, I think we could make the upstream example significantly more compelling either with custom query syntax, or perhaps implementing the motivating example in the blog. However, that is a larger project which I don't have time for

Please feel free to work on those changes if you have time. I would most appreciate it

@alamb
Copy link
Contributor

alamb commented Sep 23, 2025

@theirix made a PR with pretty interesting example apache/datafusion#17633

Copy link

@Jefffrey Jefffrey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will say upfront I don't usually review these blog posts so I'm not as familiar with the general standards and such, but I'll leave some notes below from my review.

I agree with above comments that the motivating example should be consistent throughout the post; it makes more sense if we structure this like a case study with one specific example that we follow from top to bottom, instead of introducing different examples.

Some other minor notes:

  • Do we need to mention sqlparser-rs anywhere? There's lots of mention of DataFusion parsing but I believe sqlparser-rs does a lot of the heavy lifting here, unless it's under the umbrella of the DataFusion keyword
  • For the conclusion I think should use a consistent call to action, like so:

## Get Involved
The DataFusion team is an active and engaging community and we would love to have you join
us and help the project.
Here are some ways to get involved:
* Learn more by visiting the [DataFusion] project page.
* Try out the project and provide feedback, file issues, and contribute code.
* Work on a [good first issue].
* Reach out to us via the [communication doc].
[DataFusion]: https://datafusion.apache.org/index.html
[good first issue]: https://github.com/apache/datafusion/issues?q=is%3Aissue+is%3Aopen+label%3A%22good+first+issue%22
[communication doc]: https://datafusion.apache.org/contributor-guide/communication.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants