Skip to content

csv.Sniffer._guess_delimiter() iterates over all ASCII on each line #137627

@maurycy

Description

@maurycy

Bug report

Bug description:

I know that there were discussions about rewriting CSV Sniffer, and they seem to be on hold, to avoid breaking any code relying on its behavior.

I believe it's possible, though, to still make it significantly faster.

The current code iterates over 127 ASCII characters and counts their occurences on each line, even if they're not present:

for char in ascii:

It's highly inefficient. We can count only present characters, and backfill zeros.

CPython versions tested on:

3.15

Operating systems tested on:

macOS

Linked PRs

Metadata

Metadata

Assignees

No one assigned

    Labels

    stdlibPython modules in the Lib dirtype-bugAn unexpected behavior, bug, or error

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions