Skip to content

duckdb sniff_csv fails with > 10240 lines #113

@ddm-michel

Description

@ddm-michel

What happens?

When I use sniff_csv implicitly using read_csv with the arguments ", header=True, delim=';', sample_size=10241", it triggers an "unable to detect csv format"-error.

  • I can place multiple different lines on line 20241 and they all trigger it. The lines seem fine and not different than any others.
  • When I reduce the sample_size to 10240, the import works again.
  • When I then remove lines 10236-10240 from the original file and keep the sample_size on 10240, it also works.

This indicates to me that 10240 is an upper limit for the sampling before it goes wrong, but the default is of 20xxx.

I unfortunately can't provide you the sample.

The duckdb-version is "v1.2.0"

To Reproduce

self._db.execute(f"CREATE OR REPLACE TABLE {all_table_name} AS SELECT * FROM read_csv('{self.event.tmp_file_path}', header=True, delim=';', sample_size=10241)")

OS:

Linux

DuckDB Package Version:

1.2.0

Python Version:

3.13.7

Full Name:

Michel

Affiliation:

Acme

What is the latest build you tested with? If possible, we recommend testing with the latest nightly build.

I have tested with a stable release

Did you include all relevant data sets for reproducing the issue?

No - I cannot share the data sets because they are confidential

Did you include all code required to reproduce the issue?

  • Yes, I have

Did you include all relevant configuration to reproduce the issue?

  • Yes, I have

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions