You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A [Loader](https://github.com/apache/parquet-java/blob/master/parquet-pig/src/main/java/org/apache/parquet/pig/ParquetLoader.java) and a [Storer](https://github.com/apache/parquet-java/blob/master/parquet-pig/src/main/java/org/apache/parquet/pig/ParquetStorer.java) are provided to read and write Parquet files with Apache Pig
137
-
138
-
Storing data into Parquet in Pig is simple:
139
-
```
140
-
-- options you might want to fiddle with
141
-
SET parquet.page.size 1048576 -- default. this is your min read/write unit.
142
-
SET parquet.block.size 134217728 -- default. your memory budget for buffering data
143
-
SET parquet.compression lzo -- or you can use none, gzip, snappy
144
-
STORE mydata into '/some/path' USING parquet.pig.ParquetStorer;
145
-
```
146
-
Reading in Pig is also simple:
147
-
```
148
-
mydata = LOAD '/some/path' USING parquet.pig.ParquetLoader();
149
-
```
150
-
151
-
If the data was stored using Pig, things will "just work". If the data was stored using another method, you will need to provide the Pig schema equivalent to the data you stored (you can also write the schema to the file footer while writing it -- but that's pretty advanced). We will provide a basic automatic schema conversion soon.
152
-
153
135
## Hive integration
154
136
155
137
Hive integration is provided via the [parquet-hive](https://github.com/apache/parquet-java/tree/master/parquet-hive) sub-project.
0 commit comments