Add optional output directory compression for inference runs #544

ntnn19 · 2025-10-25T11:08:08Z

Overview

This PR introduces an option to compress the inference output directory, which is useful for large-scale prediction workloads that generate extensive results.

Compression only occurs after a full inference run completes.
If only the data pipeline is executed, no compression takes place, even if the compression flag is set to true.
The default behavior, --compress_output_dir=false preserves existing functionality.

Motivation

Inference runs can generate large output directories, especially in high-throughput scenarios. This option helps users:

Reduce storage usage by compressing completed inference outputs.
Run data preprocessing (data_pipeline) without inference or compression when only staged preparation is needed.
Retain backward compatibility when compression is disabled (default setting).

Key Changes

Added --compress_output_dir flag to the pipeline configuration (default: false).

Behavioral logic:

Mode	Inference	force_output_dir	compress_output_dir	Result
Default run	✅	false	false	Normal uncompressed run
Default run	✅	false	true	Compression after inference
Default run	✅	true	false	Normal uncompressed run
Default run	✅	true	true	Compression after inference
Separate pipeline → inference	✅	false	false	Normal uncompressed run
Separate pipeline → inference	✅	false	true	Compression after inference
Separate pipeline → inference	✅	true	false	Forced uncompressed run
Separate pipeline → inference	✅	true	true	Compression after inference of the forced directory
Data-only pipeline	❌	false	false	Data pipeline runs; no compression
Data-only pipeline	❌	false	true	Data pipeline runs; no compression
Data-only pipeline	❌	true	false	Data pipeline runs; no compression
Data-only pipeline	❌	true	true	Data pipeline runs; no compression

Impact

✅ No breaking changes; default behavior unchanged

✅ Enables disk-saving via post-inference compression

✅ Explicitly avoids compression during data-only runs

Checklist

New flag and logic documented
Tested with both true/false values in multiple configurations
Defaults preserved for existing workflows

Example Output Directory Structure

To illustrate the effect of the new --compress_output_dir flag, here is a snippet of the output directory tree generated during testing:

tree results/
results/
├── rule_DATA_PIPELINE
│   ├── compress_false_inference_false_data_pipeline_true
│   │   └── 2PV7
│   │       └── 2PV7_data.json
│   ├── compress_false_inference_true_data_pipeline_false
│   │   └── 2PV7
│   │       ├── 2PV7_confidences.json
│   │       ├── 2PV7_data.json
│   │       ├── 2PV7_model.cif
│   │       ├── 2PV7_ranking_scores.csv
│   │       ├── 2PV7_summary_confidences.json
│   │       ├── seed-1_sample-0
│   │       │   ├── 2PV7_seed-1_sample-0_confidences.json
│   │       │   ├── 2PV7_seed-1_sample-0_model.cif
│   │       │   └── 2PV7_seed-1_sample-0_summary_confidences.json
│   │       ├── seed-1_sample-1
│   │       │   ├── 2PV7_seed-1_sample-1_confidences.json
│   │       │   ├── 2PV7_seed-1_sample-1_model.cif
│   │       │   └── 2PV7_seed-1_sample-1_summary_confidences.json
│   │       ├── seed-1_sample-2
│   │       │   ├── 2PV7_seed-1_sample-2_confidences.json
│   │       │   ├── 2PV7_seed-1_sample-2_model.cif
│   │       │   └── 2PV7_seed-1_sample-2_summary_confidences.json
│   │       ├── seed-1_sample-3
│   │       │   ├── 2PV7_seed-1_sample-3_confidences.json
│   │       │   ├── 2PV7_seed-1_sample-3_model.cif
│   │       │   └── 2PV7_seed-1_sample-3_summary_confidences.json
│   │       ├── seed-1_sample-4
│   │       │   ├── 2PV7_seed-1_sample-4_confidences.json
│   │       │   ├── 2PV7_seed-1_sample-4_model.cif
│   │       │   └── 2PV7_seed-1_sample-4_summary_confidences.json
│   │       └── TERMS_OF_USE.md
│   ├── compress_true_inference_false_data_pipeline_true
│   │   └── 2PV7
│   │       └── 2PV7_data.json
│   └── compress_true_inference_true_data_pipeline_false
│       └── 2PV7.tar.gz
├── rule_DATA_PIPELINE_PLUS_INFERENCE
│   ├── compress_false_inference_true_data_pipeline_true
│   │   └── 2PV7
│   │       ├── 2PV7_confidences.json
│   │       ├── 2PV7_data.json
│   │       ├── 2PV7_model.cif
│   │       ├── 2PV7_ranking_scores.csv
│   │       ├── 2PV7_summary_confidences.json
│   │       ├── seed-1_sample-0
│   │       │   ├── 2PV7_seed-1_sample-0_confidences.json
│   │       │   ├── 2PV7_seed-1_sample-0_model.cif
│   │       │   └── 2PV7_seed-1_sample-0_summary_confidences.json
│   │       ├── seed-1_sample-1
│   │       │   ├── 2PV7_seed-1_sample-1_confidences.json
│   │       │   ├── 2PV7_seed-1_sample-1_model.cif
│   │       │   └── 2PV7_seed-1_sample-1_summary_confidences.json
│   │       ├── seed-1_sample-2
│   │       │   ├── 2PV7_seed-1_sample-2_confidences.json
│   │       │   ├── 2PV7_seed-1_sample-2_model.cif
│   │       │   └── 2PV7_seed-1_sample-2_summary_confidences.json
│   │       ├── seed-1_sample-3
│   │       │   ├── 2PV7_seed-1_sample-3_confidences.json
│   │       │   ├── 2PV7_seed-1_sample-3_model.cif
│   │       │   └── 2PV7_seed-1_sample-3_summary_confidences.json
│   │       ├── seed-1_sample-4
│   │       │   ├── 2PV7_seed-1_sample-4_confidences.json
│   │       │   ├── 2PV7_seed-1_sample-4_model.cif
│   │       │   └── 2PV7_seed-1_sample-4_summary_confidences.json
│   │       └── TERMS_OF_USE.md
│   └── compress_true_inference_true_data_pipeline_true
│       └── 2PV7.tar.gz
└── rule_INFERENCE
    ├── compress_false_inference_true_data_pipeline_false
    │   └── 2PV7
    │       ├── 2PV7_confidences.json
    │       ├── 2PV7_data.json
    │       ├── 2PV7_model.cif
    │       ├── 2PV7_ranking_scores.csv
    │       ├── 2PV7_summary_confidences.json
    │       ├── seed-1_sample-0
    │       │   ├── 2PV7_seed-1_sample-0_confidences.json
    │       │   ├── 2PV7_seed-1_sample-0_model.cif
    │       │   └── 2PV7_seed-1_sample-0_summary_confidences.json
    │       ├── seed-1_sample-1
    │       │   ├── 2PV7_seed-1_sample-1_confidences.json
    │       │   ├── 2PV7_seed-1_sample-1_model.cif
    │       │   └── 2PV7_seed-1_sample-1_summary_confidences.json
    │       ├── seed-1_sample-2
    │       │   ├── 2PV7_seed-1_sample-2_confidences.json
    │       │   ├── 2PV7_seed-1_sample-2_model.cif
    │       │   └── 2PV7_seed-1_sample-2_summary_confidences.json
    │       ├── seed-1_sample-3
    │       │   ├── 2PV7_seed-1_sample-3_confidences.json
    │       │   ├── 2PV7_seed-1_sample-3_model.cif
    │       │   └── 2PV7_seed-1_sample-3_summary_confidences.json
    │       ├── seed-1_sample-4
    │       │   ├── 2PV7_seed-1_sample-4_confidences.json
    │       │   ├── 2PV7_seed-1_sample-4_model.cif
    │       │   └── 2PV7_seed-1_sample-4_summary_confidences.json
    │       └── TERMS_OF_USE.md
    └── compress_true_inference_true_data_pipeline_false
        └── 2PV7.tar.gz

32 directories, 68 files

google-cla · 2025-10-25T11:08:13Z

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

added an option to compress the output directoty

765b5ac

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add optional output directory compression for inference runs #544

Add optional output directory compression for inference runs #544

ntnn19 commented Oct 25, 2025 •

edited

Loading

Uh oh!

google-cla bot commented Oct 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Add optional output directory compression for inference runs #544

Are you sure you want to change the base?

Add optional output directory compression for inference runs #544

Conversation

ntnn19 commented Oct 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

Motivation

Key Changes

Impact

Checklist

Example Output Directory Structure

Uh oh!

google-cla bot commented Oct 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ntnn19 commented Oct 25, 2025 •

edited

Loading