-
Notifications
You must be signed in to change notification settings - Fork 32
Open
Labels
Description
What happens?
When I use sniff_csv implicitly using read_csv with the arguments ", header=True, delim=';', sample_size=10241", it triggers an "unable to detect csv format"-error.
- I can place multiple different lines on line 20241 and they all trigger it. The lines seem fine and not different than any others.
- When I reduce the sample_size to 10240, the import works again.
- When I then remove lines 10236-10240 from the original file and keep the sample_size on 10240, it also works.
This indicates to me that 10240 is an upper limit for the sampling before it goes wrong, but the default is of 20xxx.
I unfortunately can't provide you the sample.
The duckdb-version is "v1.2.0"
To Reproduce
self._db.execute(f"CREATE OR REPLACE TABLE {all_table_name} AS SELECT * FROM read_csv('{self.event.tmp_file_path}', header=True, delim=';', sample_size=10241)")
OS:
Linux
DuckDB Package Version:
1.2.0
Python Version:
3.13.7
Full Name:
Michel
Affiliation:
Acme
What is the latest build you tested with? If possible, we recommend testing with the latest nightly build.
I have tested with a stable release
Did you include all relevant data sets for reproducing the issue?
No - I cannot share the data sets because they are confidential
Did you include all code required to reproduce the issue?
- Yes, I have
Did you include all relevant configuration to reproduce the issue?
- Yes, I have