Skip to content
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -107,7 +107,7 @@
@RunWith(Parameterized.class)
public class ParquetRewriterTest {

private final int numRecord = 100000;
private final int numRecord = 10000;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may result in a single page for each column chunk. Could you try following things:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wgtmac : thanks for your comment. Do you happen to know how I can run a single test in the repository after making my changes? I wanted to run only the ParquetRewriterTest, but have not figured out a good way to achieve that via mvn.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cd ~/Projects/parquet-java   # replace with your project root directory

cd parquet-hadoop

mvn test -Dtest=org.apache.parquet.hadoop.rewrite.ParquetRewriterTest

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I understand, the relevant config parameter is parquet.page.row.count.limit and I have changed that to be num_records / 5. Hope it looks good!

private final Configuration conf = new Configuration();
private final ParquetConfiguration parquetConf = new PlainParquetConfiguration();
private final ParquetProperties.WriterVersion writerVersion;
Expand Down