Add support for lzo files in input

`hadoop-lzo.jar` is preinstalled on EMR, and in order to be able to split indexed LZO files when reading them (and have one Spark partition per block size instead of one Spark partition per file), we need to use `sc.newAPIHadoopFile()` to read them instead of `sc.textFile()` as it's currently the case.

More info can be found [here](https://github.com/aws-samples/emr-bootstrap-actions/blob/master/spark/examples/reading-lzo-files.md).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for lzo files in input #104

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add support for lzo files in input #104

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions