Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 1 addition & 19 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -73,7 +73,7 @@ Parquet is a very active project, and new features are being added quickly. Here

* Type-specific encoding
* Hive integration (deprecated)
* Pig integration
* Pig integration (deprecated)
* Cascading integration (deprecated)
* Crunch integration
* Apache Arrow integration
Expand Down Expand Up @@ -132,24 +132,6 @@ See the APIs:
* [Record conversion API](https://github.com/apache/parquet-java/tree/master/parquet-column/src/main/java/org/apache/parquet/io/api)
* [Hadoop API](https://github.com/apache/parquet-java/tree/master/parquet-hadoop/src/main/java/org/apache/parquet/hadoop/api)

## Apache Pig integration
A [Loader](https://github.com/apache/parquet-java/blob/master/parquet-pig/src/main/java/org/apache/parquet/pig/ParquetLoader.java) and a [Storer](https://github.com/apache/parquet-java/blob/master/parquet-pig/src/main/java/org/apache/parquet/pig/ParquetStorer.java) are provided to read and write Parquet files with Apache Pig

Storing data into Parquet in Pig is simple:
```
-- options you might want to fiddle with
SET parquet.page.size 1048576 -- default. this is your min read/write unit.
SET parquet.block.size 134217728 -- default. your memory budget for buffering data
SET parquet.compression lzo -- or you can use none, gzip, snappy
STORE mydata into '/some/path' USING parquet.pig.ParquetStorer;
```
Reading in Pig is also simple:
```
mydata = LOAD '/some/path' USING parquet.pig.ParquetLoader();
```

If the data was stored using Pig, things will "just work". If the data was stored using another method, you will need to provide the Pig schema equivalent to the data you stored (you can also write the schema to the file footer while writing it -- but that's pretty advanced). We will provide a basic automatic schema conversion soon.

## Hive integration

Hive integration is provided via the [parquet-hive](https://github.com/apache/parquet-java/tree/master/parquet-hive) sub-project.
Expand Down
95 changes: 0 additions & 95 deletions parquet-pig-bundle/pom.xml

This file was deleted.

248 changes: 0 additions & 248 deletions parquet-pig-bundle/src/main/resources/META-INF/LICENSE

This file was deleted.

18 changes: 0 additions & 18 deletions parquet-pig-bundle/src/main/resources/org/apache/parquet/bundle

This file was deleted.

Loading