Skip to content

Conversation

Djfe
Copy link

@Djfe Djfe commented May 31, 2025

I split both changes so 09901dd can be applied later to docs folder 1.2

Both commits should be applied to the folder stable as well.

Disclaimer: This was verified using the Python API and SQL commands.
I could not find the code that maps NULL to SQLNULL. The DuckDB code base uses SQLNULL and TIMESTAMP_TZ instead of NULL and TIMESTAMPTZ like the external SQL API.

Djfe added 2 commits May 31, 2025 12:38
improve documentation of the csv_reader option auto_type_candidates
- rename SQLNULL to NULL, add NULL to default set of candidate types
- reorder candidate types by their weight of specificity [0]
- add additional explanations

Sources:
[0] https://github.com/duckdb/duckdb/blob/dcf0e1c8936d74be48fd1cc0309638117b43aa47/src/execution/operator/csv_scanner/util/csv_reader_options.cpp#L523-L530
[1] https://github.com/duckdb/duckdb/blob/dcf0e1c8936d74be48fd1cc0309638117b43aa47/src/include/duckdb/execution/operator/csv_scanner/csv_reader_options.hpp#L82-L86

Signed-off-by: Felix Baumann <[email protected]>
add TIMESTAMPTZ to default set of auto_type_candidates

DuckDB 1.3.0 added support for TIMESTAMPTZ in type detection
[0] duckdb/duckdb@a3bc569

Signed-off-by: Felix Baumann <[email protected]>
@Djfe
Copy link
Author

Djfe commented May 31, 2025

Right, I forgot to mention: NULL doesn't appear to make a difference to auto_type_candidates. Though adding it doesn't hurt either.
Even empty lines are of type VARCHAR for some reason (despite SQLNULL having the heighest weight of specificity).
And I cannot cast VARCHAR columns to type NULL which means I'm not even sure, it makes sense to mention that it belongs to the auto_type_candidates since it might serve no purpose.

EDIT:
It might be worth mentioning, that DECIMAL doesn't detect precision and defaults to DECIMAL(18,6). You need to set (prec, scale) manually.
And it only supports one of these DECIMAL statements (it chooses the last one provided and ignores all others). So listing several and letting DuckDB choose required precision is not possible atm/yet.
duckdb/duckdb#11639 (comment)

Mytherin added a commit to duckdb/duckdb that referenced this pull request Jun 2, 2025
correct explanation about specificity by inverting it

SQLNULL has the highest specificity not the lowest VARCHAR is the
fallback and has therefore the lowest specificity

See
https://github.com/duckdb/duckdb/blob/dcf0e1c8936d74be48fd1cc0309638117b43aa47/src/execution/operator/csv_scanner/util/csv_reader_options.cpp#L523-L530

I was confused by the code comment in the header file while improving
the documentation since it contradicts the actual specifity weights in
the cpp file.
duckdb/duckdb-web#5459
@szarnyasg szarnyasg requested a review from pdet June 3, 2025 13:19
@@ -130,7 +130,7 @@ Usage example:
SELECT * FROM read_csv('csv_file.csv', auto_type_candidates = ['BIGINT', 'DATE']);
```

The default value for the `auto_type_candidates` option is `['SQLNULL', 'BOOLEAN', 'BIGINT', 'DOUBLE', 'TIME', 'DATE', 'TIMESTAMP', 'VARCHAR']`.
The default value for the `auto_type_candidates` option is `['NULL', 'BOOLEAN', 'TIME', 'DATE', 'TIMESTAMP', 'TIMESTAMPTZ', 'BIGINT', 'DOUBLE', 'VARCHAR']`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The order doesn't really matter here, since it's internally sorted when the option is set. Still, this might make it more readable.

Copy link
Contributor

@pdet pdet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, I'm also fine with adding the info about the DECIMAL type.

@Djfe
Copy link
Author

Djfe commented Jun 4, 2025

How does autodetection of type NULL look like? Or does that rather concern single rows and not a full column?
Should we drop it from documentation or is it related to the read_csv parameter nullstr?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants