Skip to content

Conversation

lutter
Copy link
Collaborator

@lutter lutter commented Sep 19, 2025

This PR picks up the work done in PR #5382. Its main purpose is to help evaluate how such an interface would work. There's no expectation that this will ever make it into a graph-node release at this point.

Documentation can be found in docs/implementation/sql-interface.md in this PR

This is a continuation of PR #5799 but had to be a new PR because of commit signing

lutter and others added 17 commits September 18, 2025 16:53
1. Do not use CTE's to inject a view of a table at a certain block. Instead
   rewrite the 'from' clause
2. Do not turn bytea columns into string columns since that is hugely
   wasteful
That setup makes it much easier to add more tests that check that we scrub
dangerous constructs from SQL
The wrapping with to_jsonb is closely tied to how we run the query
This is to keep Postgres from complaining about reserved words as column
names, e.g., the column `to` needs to appear as `"to"` in a query
Copy link
Member

@isum isum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks very nice! Blocking because I have found one security issue that needs to be fixed

))
.await?;

let query_hash = req.query_hash();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This hash is not very useful in this context, since the SQL query can be manipulated in a lot of ways to produce a different hash every time.

It is definetly not a problem for testing, but we should normalize the SQL queries before hashing them if we plan to use this feature in production

impl VisitorMut for Validator<'_> {
type Break = Error;

fn pre_visit_statement(&mut self, _statement: &mut Statement) -> ControlFlow<Self::Break> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The parameter name does not need to start with _

] }
sqlparser = "0.46.0"
slog = { version = "2.7.0", features = ["release_max_level_trace", "max_level_trace"] }
sqlparser = { version = "0.46.0", features = ["visitor"] }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The latest version (0.59.0) has an additional variant for SetExpr for DELETE statements. It's probably worth updating and covering the new variant in the validator.

Comment on lines +133 to +140
// Add common table expressions to the set of known tables
if let Some(ref with) = query.with {
self.ctes.extend(
with.cte_tables
.iter()
.map(|cte| cte.alias.name.value.to_lowercase()),
);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nested queries might have their own scoped CTEs what are not available to parent queries.
CTEs are never removed from the list when a query scope ends and this is a problem, because nested CTEs allow parent queries to access arbitrary tables.

This query passes the validation:
SELECT *, (WITH pg_user AS (SELECT 1) SELECT 1) AS one FROM pg_user;

Also, maybe the pre_ and post_ access usage should be revisited to ensure consistent validations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants