Skip to content

Conversation

@mattnibs
Copy link
Collaborator

This commit adds back 0-based index/slice expressions to the language. It also institutes pragmas declarations that allow users to specify 1-based or 0-based indexes for a given scope.

@mattnibs mattnibs force-pushed the 0based-index branch 2 times, most recently from 9634d87 to 35d381b Compare November 10, 2025 20:04
@mattnibs mattnibs force-pushed the 0based-index branch 3 times, most recently from eb51434 to ba97cea Compare November 10, 2025 20:26
This commit adds back 0-based index/slice expressions to the language.
It also institutes pragmas declarations that allow users to specify
1-based or 0-based indexes for a given scope.

# Conflicts:
#	compiler/parser/parser.go
@mattnibs mattnibs merged commit 8d07a38 into main Nov 10, 2025
5 checks passed
@mattnibs mattnibs deleted the 0based-index branch November 10, 2025 21:38
@chrismo
Copy link

chrismo commented Nov 10, 2025

👍🏻 I like this much more than the prior PR (#6327)

@chrismo
Copy link

chrismo commented Nov 10, 2025

I am still curious what this behavior will look like when pragma is set to 1?

values SUBSTRING(this FOR 3) => "foob"

@philrz
Copy link
Contributor

philrz commented Nov 11, 2025

@chrismo: I suspect you may have already done some of your own testing between the merge of this PR and the related #6351. But the answer to the question in your comment is that for this specific example the result is the same regardless of the pragma setting:

$ super -version
Version: d84d65640

$ echo '"foobar"' | super -c "values SUBSTRING(this FOR 3)" -
"foo"

$ echo '"foobar"' | super -c "pragma index_base = 1 values SUBSTRING(this FOR 3)" -
"foo"

That happens to match what traditional SQL systems do as well.

$ duckdb -c "SELECT SUBSTRING(s FOR 3) FROM (SELECT 'foobar' AS s);"
┌────────────────────────────────────────────┐
│ main."substring"(s, 1, CAST(3 AS INTEGER)) │
│                  varchar                   │
├────────────────────────────────────────────┤
│ foo                                        │
└────────────────────────────────────────────┘

$ psql postgres -c "SELECT SUBSTRING(s FOR 3) FROM (SELECT 'foobar' AS s);"
 substring 
-----------
 foo
(1 row)

I think this makes sense since, as it's used here, the concept of "index base" doesn't really come into play, since the query is just asking for the first three characters of the string. In this regard, I think you were certainly correct to call out that unexpected 4-character "foob" result as a likely bug in that closed-not-merged #6327. I'm not sure if there was a reason at the time for the behavior that we may have thought was defensible, but it's all in the past now.

Where the pragma still has some impact is if we include the "start position" (to use the term from the SQL spec) since it behaves like an index base. Per #6351, we made the decision for now to let that be affected by the pragma setting. That means the default behavior does diverge from SQL, but strict backward compatibility with SQL is why we added the pragma.

$ duckdb -c "SELECT SUBSTRING(s FROM 2 FOR 3) FROM (SELECT 'foobar' AS s);"
┌───────────────────────────┐
│ main."substring"(s, 2, 3) │
│          varchar          │
├───────────────────────────┤
│ oob                       │
└───────────────────────────┘

$ psql postgres -c "SELECT SUBSTRING(s FROM 2 FOR 3) FROM (SELECT 'foobar' AS s);"
 substring 
-----------
 oob
(1 row)

$ super -c "SELECT SUBSTRING(s FROM 2 FOR 3) FROM (SELECT 'foobar' AS s);"
{"SUBSTRING(s FROM 2 FOR 3)":"oba"}

$ super -c "pragma index_base = 1 SELECT SUBSTRING(s FROM 2 FOR 3) FROM (SELECT 'foobar' AS s);"
{"SUBSTRING(s FROM 2 FOR 3)":"oob"}

Since you showed an interest in this topic and your feedback has been helpful, I'll flag you down on community Slack to give you a bit more detail about the approaches we considered here and see if you have any other reactions.

@chrismo
Copy link

chrismo commented Nov 11, 2025

unexpected 4-character "foob" result as a likely bug in that closed-not-merged #6327

Yeah, that clearly seemed to be a bug. Glad to see that it didn't propagate into this PR :)

@chrismo
Copy link

chrismo commented Nov 11, 2025

I think this makes sense since, as it's used here, the concept of "index base" doesn't really come into play, since the query is just asking for the first three characters of the string.

Yeah, I dunno if I agree. 🤔 I guess this is the sort of scenario that led y'all to attempt "1-based in SQL, 0-based elsewhere" ... It seems inconsistent as just FOR 3 still implies an index_base?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants