-
Notifications
You must be signed in to change notification settings - Fork 462
docs/data/csv: auto_type_candidates #5459
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
improve documentation of the csv_reader option auto_type_candidates - rename SQLNULL to NULL, add NULL to default set of candidate types - reorder candidate types by their weight of specificity [0] - add additional explanations Sources: [0] https://github.com/duckdb/duckdb/blob/dcf0e1c8936d74be48fd1cc0309638117b43aa47/src/execution/operator/csv_scanner/util/csv_reader_options.cpp#L523-L530 [1] https://github.com/duckdb/duckdb/blob/dcf0e1c8936d74be48fd1cc0309638117b43aa47/src/include/duckdb/execution/operator/csv_scanner/csv_reader_options.hpp#L82-L86 Signed-off-by: Felix Baumann <[email protected]>
add TIMESTAMPTZ to default set of auto_type_candidates DuckDB 1.3.0 added support for TIMESTAMPTZ in type detection [0] duckdb/duckdb@a3bc569 Signed-off-by: Felix Baumann <[email protected]>
Right, I forgot to mention: EDIT: |
correct explanation about specificity by inverting it SQLNULL has the highest specificity not the lowest VARCHAR is the fallback and has therefore the lowest specificity See https://github.com/duckdb/duckdb/blob/dcf0e1c8936d74be48fd1cc0309638117b43aa47/src/execution/operator/csv_scanner/util/csv_reader_options.cpp#L523-L530 I was confused by the code comment in the header file while improving the documentation since it contradicts the actual specifity weights in the cpp file. duckdb/duckdb-web#5459
@@ -130,7 +130,7 @@ Usage example: | |||
SELECT * FROM read_csv('csv_file.csv', auto_type_candidates = ['BIGINT', 'DATE']); | |||
``` | |||
|
|||
The default value for the `auto_type_candidates` option is `['SQLNULL', 'BOOLEAN', 'BIGINT', 'DOUBLE', 'TIME', 'DATE', 'TIMESTAMP', 'VARCHAR']`. | |||
The default value for the `auto_type_candidates` option is `['NULL', 'BOOLEAN', 'TIME', 'DATE', 'TIMESTAMP', 'TIMESTAMPTZ', 'BIGINT', 'DOUBLE', 'VARCHAR']`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The order doesn't really matter here, since it's internally sorted when the option is set. Still, this might make it more readable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, I'm also fine with adding the info about the DECIMAL
type.
How does autodetection of type |
I split both changes so 09901dd can be applied later to docs folder 1.2
Both commits should be applied to the folder stable as well.
Disclaimer: This was verified using the Python API and SQL commands.
I could not find the code that maps NULL to SQLNULL. The DuckDB code base uses SQLNULL and TIMESTAMP_TZ instead of NULL and TIMESTAMPTZ like the external SQL API.