Hello, I've a pretty large dataset (> 2 TB) split in six files.
I assumed that UTF-8 were the text encoding of jsonl files. However there are some charachters that apparently are non-UTF.8 and this causes R to fail when I specify the encoding.
Not specifying the encoding results in a messy full_text output
Hello, I've a pretty large dataset (> 2 TB) split in six files.
I assumed that UTF-8 were the text encoding of jsonl files. However there are some charachters that apparently are non-UTF.8 and this causes R to fail when I specify the encoding.
Not specifying the encoding results in a messy full_text output