-
Notifications
You must be signed in to change notification settings - Fork 8
[File] recognize encoding for remote resource #37
Copy link
Copy link
Open
Description
Original: datopian/datahub.io#105
File = require('data.js').File
// loading ISO8859 resource:
> file = File.load('https://raw.githubusercontent.com/frictionlessdata/test-data/master/files/csv/encodings/iso8859.csv')
> file.encoding
'utf-8'Acceptance criteria
File.load('https://raw.githubusercontent.com/frictionlessdata/test-data/master/files/csv/encodings/iso8859.csv').encoding == 'ISO-8859-1'File.load('https://raw.githubusercontent.com/frictionlessdata/test-data/master/files/csv/encodings/western-macos-roman.csv').encoding == <macOS-roman-or-so>
Tasks
- add test
- realize encoding recognize
Analysis
We need to change this method:
class FileRemote extends File {
...
get encoding() {
return DEFAULT_ENCODING
}analysis update
encoding() method should:
- connect to remote resource
- get small portion of raw-data
- try to recognize encoding
I tried to implement this schema, using chardet.detectFileSync() lib but it works only with files - any argument is treated as a file-name.
Possible solutions:
- save a part of remote resource in a local temp file, then use
chardet.detectFileSync(temp) - use some other lib to recognize encoding using remote Stream
Reactions are currently unavailable