Skip to content

[BUGFIX] .txt file with commas in URLs#342

Open
bkj wants to merge 1 commit intorom1504:mainfrom
jataware:bugfix_comma
Open

[BUGFIX] .txt file with commas in URLs#342
bkj wants to merge 1 commit intorom1504:mainfrom
jataware:bugfix_comma

Conversation

@bkj
Copy link

@bkj bkj commented Aug 23, 2023

The current implementation seems to fail when the URLs in a .txt input file have commas in them. This modification seems to fix the bug.

(Disclaimer: I am not 100% I am passing data the way that's intended ... if so, my mistake and please correct me!)

@clairej12
Copy link

I am also still having this issue. I get a CSV parse error like the below if the URL has one or more commas:
pyarrow.lib.ArrowInvalid: CSV parse error: Expected 1 columns, got 2: http://1.bp.blogspot.com/-xf8FZNbm-O4/UVbWL6XBdOI/AAAAAAAAFYs/1IPtSSmYZiI/s640/Big+sage,+Lantana ...

@rom1504
Copy link
Owner

rom1504 commented Oct 8, 2023

let's instead make the separator be something that never occurs

@MaxyLee
Copy link

MaxyLee commented Nov 4, 2024

Got the same issue. I change the delimiter from ',' to '\t' and it works (line 100 of reader.py):

df = csv_pa.read_csv(file, read_options=csv_pa.ReadOptions(column_names=["url"]), parse_options=csv_pa.ParseOptions(delimiter="\t"))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Waiting for user input

Development

Successfully merging this pull request may close these issues.

4 participants