feat(job-orchestration): Allow compression jobs to specify S3 object keys via `S3InputConfig`. #1407

LinZhihao-723 · 2025-10-12T13:50:05Z

Description

NOTE: This is not a breaking change so any existing S3 compression should still work.

This was originally a part of #1383. However, we decided to remove the compression interface change and make a clean PR for updating the job input config.

This PR updates S3InputConfig to add support for accepting a list of S3 object keys. Before this PR, the input config can only take a key prefix and ingest all logs under this prefix. With this PR, we make it more flexible so that users can define a list of keys under the given prefix. Notice that these keys have the following constraints:

All the keys must share the same key-prefix, as specified in the base S3Config.
Duplicate keys are not allowed.

To get metadata for the compression job partition, we take advantage of the shared prefix and do the following:

We sort keys in the lexical order.
We use HeadObject API to get the metadata for the first object within the sorted key.
We use ListObject API to get the metadata for the rest of keys, using the key-prefix for prefix filtering, and using the key of the first object (from last step) for StartAfter partition.

In this way, we obtain metadata in batches to improve performance.

Checklist

The PR satisfies the contribution guidelines.
This is a breaking change and that has been indicated in the PR title, OR this isn't a
breaking change.
Necessary docs have been updated, OR no docs need to be updated.

Validation performed

I used this ingestor prototype for testing: https://github.com/LinZhihao-723/LogIngestorServer with the following steps:

Build and start the CLP package from this branch (configured to use S3 input).
Start the ingestor server with
```
cargo run --release -- --db-url "mysql://clp-user:$CLP_DB_PASSWORD@localhost:3306/clp-db"
```
Remember to replace the password with the actual password placeholder.
Install the log-generation script from: https://github.com/LinZhihao-723/LogIngestorServer/tree/main/scripts
Follow the README to create a generation job and upload logs to S3 by:
```
python3 generate_and_upload.py \
  --s3-endpoint "https://s3.us-east-2.amazonaws.com" \
  --s3-key "$AWS_S3_KEY_ID" \
  --s3-secret "$AWS_S3_SECRET_KEY" \
  --s3-bucket "$BUCKET_NAME" \
  --num-files 10
```
Remember to replace the credential and bucket placeholder by valid values (reviewer can use lzh-test bucket for testing. YScope developers should have a valid credential to access this bucket)
The generation script will print an ingestion prefix. Use this prefix to create an scanner job:
```
curl -v -u "$AWS_S3_KEY_ID:$AWS_S3_SECRET_KEY" \
"http://127.0.0.1:8080/create?region=us-east-2&bucket=$BUCKET_NAME&dataset=test&key_prefix=$PREFIX"
```
Remember to replace the credential, the bucket name, and the prefix with the valid values.
Read the ingestor's logs to ensure log files are successfully created and scanned from S3.
Read the ingestor's log to ensure there's exactly one compression job submitted.
Use CLP's WebUI to check if the compression job finished successfully.

Summary by CodeRabbit

New Features
- Job inputs can now optionally include an explicit list of S3 object keys.
- Metadata retrieval supports both prefix-based discovery and targeted lookups by provided keys, with efficient pagination.
Bug Fixes
- Key-based lookups now surface clearer, descriptive errors for missing, duplicate or out-of-order keys.
- Improved handling of directory-like keys and head-object errors.
Documentation
- Updated descriptions to reflect new retrieval modes, input validation rules, and error behaviours.

coderabbitai · 2025-10-12T13:50:15Z

Walkthrough

Adds conditional dispatch in s3_get_object_metadata to handle prefix-based listing or an explicit ordered list of keys; introduces helpers for per-prefix iteration, per-key validation and head_object calls, paginated listing with StartAfter, and botocore ClientError → ValueError conversion. Adds optional non-empty keys to S3InputConfig.

Changes

Cohort / File(s)	Summary
S3 metadata retrieval refactor `components/clp-py-utils/clp_py_utils/s3_utils.py`	Adds `botocore` and `Generator` imports. Refactors `s3_get_object_metadata` to dispatch on `s3_input_config.keys`. Adds helpers: `_s3_get_object_metadata_from_single_prefix` (iterates via `_iter_s3_objects`, skips keys ending with `/`), `_s3_get_object_metadata_from_keys` (validates keys: non-empty, prefix match, no duplicates, enforces ordering; iterates and uses `_s3_get_object_metadata_from_key`), `_s3_get_object_metadata_from_key` (wraps `head_object`, converts `botocore.ClientError` → `ValueError`), and `_iter_s3_objects` (paginates `list_objects_v2`, supports `StartAfter`, yields `(key, size)`). Docstrings and error propagation updated; public function signature unchanged.
Job config: S3 input keys support `components/job-orchestration/job_orchestration/scheduler/job_config.py`	Adds optional `keys: Optional[List[str]]` to `S3InputConfig` and a validator that rejects an empty list (raises `ValueError`) while preserving existing API surface otherwise.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant Caller as Caller
  participant API as s3_get_object_metadata
  participant Prefix as _s3_get_object_metadata_from_single_prefix
  participant Keys as _s3_get_object_metadata_from_keys
  participant Head as _s3_get_object_metadata_from_key
  participant S3 as Amazon S3

  Caller->>API: s3_get_object_metadata(bucket, key_prefix, s3_input_config)
  alt s3_input_config.keys is None
    API->>Prefix: call prefix-based iterator
    Prefix->>S3: ListObjectsV2 (paged, StartAfter)
    S3-->>Prefix: pages of objects
    Prefix-->>API: yields FileMetadata[] (skips "/" keys)
  else keys provided
    API->>Keys: call keys-based flow
    Keys->>Keys: validate keys (non-empty, prefix match, no duplicates, ordering)
    Keys->>Head: for each key -> head_object wrapper
    Head->>S3: HeadObject(key)
    S3-->>Head: metadata or ClientError
    Head-->>Keys: metadata or raise ValueError
    Keys-->>API: FileMetadata[] or raise ValueError (missing/out-of-order)
  end
  API-->>Caller: return FileMetadata[] or raise error

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check	✅ Passed	The pull request title "Add support for accepting a list of S3 object keys for compression through `S3InputConfig`" directly corresponds to the main changes in the changeset. The raw summary confirms that the job_config.py file adds a new optional field `keys: Optional[List[str]]` to `S3InputConfig`, and the s3_utils.py file implements the supporting logic to handle both key-prefix and explicit keys scenarios. The title is concise, specific, and clearly communicates the primary feature being added without vague terminology. It accurately reflects what a teammate would understand as the core intent when scanning the pull request history.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7965b80 and beeac70.

📒 Files selected for processing (1)

components/clp-py-utils/clp_py_utils/s3_utils.py (3 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

components/clp-py-utils/clp_py_utils/s3_utils.py (1)

components/clp-py-utils/clp_py_utils/core.py (1)

FileMetadata (7-22)

🪛 Ruff (0.14.0)

components/clp-py-utils/clp_py_utils/s3_utils.py

4-4: Import from collections.abc instead: Generator

Import from collections.abc

(UP035)

4-4: typing.Dict is deprecated, use dict instead

(UP035)

4-4: typing.List is deprecated, use list instead

(UP035)

4-4: typing.Set is deprecated, use set instead

(UP035)

4-4: typing.Tuple is deprecated, use tuple instead

(UP035)

401-401: Unnecessary list() call (rewrite as a literal)

Rewrite as a literal

(C408)

429-429: Avoid specifying long messages outside the exception class

(TRY003)

434-434: Avoid specifying long messages outside the exception class

(TRY003)

436-436: Avoid specifying long messages outside the exception class

(TRY003)

438-438: Avoid specifying long messages outside the exception class

(TRY003)

455-455: Avoid specifying long messages outside the exception class

(TRY003)

469-471: Avoid specifying long messages outside the exception class

(TRY003)

492-495: Avoid specifying long messages outside the exception class

(TRY003)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)

GitHub Check: lint-check (macos-15)
GitHub Check: rust-checks (macos-15)
GitHub Check: rust-checks (ubuntu-24.04)
GitHub Check: rust-checks (ubuntu-22.04)
GitHub Check: package-image

🔇 Additional comments (4)

components/clp-py-utils/clp_py_utils/s3_utils.py (4)

256-279: LGTM! Clean dispatch logic.

The conditional routing based on s3_input_config.keys presence correctly maintains backward compatibility while enabling the new key-list mode. The updated docstring appropriately references both execution paths.

389-405: LGTM! Prefix-based metadata retrieval correctly extracted.

The helper cleanly encapsulates prefix-based iteration and appropriately delegates to _iter_s3_objects. The logic for skipping directory-like placeholders is handled in the iterator.

408-472: LGTM! Robust validation and efficient key-based retrieval.

The validation logic comprehensively checks for empty lists, prefix mismatches, duplicates, and invalid trailing slashes. The iteration strategy is sound:

head_object fetches metadata for the first sorted key.

Paginated list_objects_v2 with StartAfter efficiently retrieves remaining keys.

Early exit optimizes for cases where all keys are found before the end of the listing.

Missing keys are detected and reported with clear error messages.

The sorted key handling avoids mutating the caller's list while enabling efficient sequential validation.

474-495: LGTM! Proper error handling for single-key metadata retrieval.

The wrapper correctly uses head_object to fetch metadata for a single key and converts botocore.exceptions.ClientError into a ValueError with a descriptive message. The use of exception chaining (from e) preserves the original error context for debugging.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6623829 and 0a0bdcd.

📒 Files selected for processing (2)

components/clp-py-utils/clp_py_utils/s3_utils.py (3 hunks)
components/job-orchestration/job_orchestration/scheduler/job_config.py (1 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

components/clp-py-utils/clp_py_utils/s3_utils.py (1)

components/clp-py-utils/clp_py_utils/core.py (1)

FileMetadata (7-22)

🪛 Ruff (0.14.0)

components/job-orchestration/job_orchestration/scheduler/job_config.py

39-39: Missing return type annotation for classmethod validate_keys

(ANN206)

41-41: Avoid specifying long messages outside the exception class

(TRY003)

components/clp-py-utils/clp_py_utils/s3_utils.py

403-403: Unnecessary list() call (rewrite as a literal)

Rewrite as a literal

(C408)

442-442: Avoid specifying long messages outside the exception class

(TRY003)

447-447: Avoid specifying long messages outside the exception class

(TRY003)

449-449: Avoid specifying long messages outside the exception class

(TRY003)

453-453: Unnecessary list() call (rewrite as a literal)

Rewrite as a literal

(C408)

477-477: Avoid specifying long messages outside the exception class

(TRY003)

491-493: Avoid specifying long messages outside the exception class

(TRY003)

514-517: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

514-517: Avoid specifying long messages outside the exception class

(TRY003)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: package-image

components/clp-py-utils/clp_py_utils/s3_utils.py

coderabbitai · 2025-10-14T22:57:43Z

components/job-orchestration/job_orchestration/scheduler/job_config.py

+    @field_validator("keys")
+    @classmethod
+    def validate_keys(cls, value):
+        if value is not None and len(value) == 0:
+            raise ValueError("Keys cannot be an empty list")
+        return value


🛠️ Refactor suggestion | 🟠 Major

Annotate the validator return type to keep Ruff happy

Ruff’s ANN206 rule flags this classmethod because it lacks a return-type annotation. Adding -> Optional[List[str]] keeps lint clean and documents intent.

- def validate_keys(cls, value): + def validate_keys(cls, value: Optional[List[str]]) -> Optional[List[str]]:

🧰 Tools

🪛 Ruff (0.14.0)

39-39: Missing return type annotation for classmethod validate_keys

(ANN206)

41-41: Avoid specifying long messages outside the exception class

(TRY003)

🤖 Prompt for AI Agents

In components/job-orchestration/job_orchestration/scheduler/job_config.py around lines 37 to 42, the @field_validator classmethod validate_keys lacks a return-type annotation which trips Ruff ANN206; add an explicit return type like -> Optional[List[str]] to the method signature and, if not already imported, import Optional and List from typing (or use from typing import Optional, List) so the validator is properly annotated and the linter is satisfied.

junhaoliao

the overall logic lgtm and is consistent with what we discussed offline. i posted some questions about refactoring and edge case handling

components/clp-py-utils/clp_py_utils/s3_utils.py

Co-authored-by: Junhao Liao <[email protected]>

coderabbitai

Actionable comments posted: 2

♻️ Duplicate comments (1)

components/clp-py-utils/clp_py_utils/s3_utils.py (1)
444-449: Fix in-place mutation of the keys list.

The keys.sort() call mutates the list attached to s3_input_config.keys, causing a visible side effect for the caller. Use a sorted copy instead to keep the config immutable.

Apply this diff to fix the mutation issue:
-    keys.sort()
-    for idx, key in enumerate(keys):
+    sorted_keys = sorted(keys)
+    for idx, key in enumerate(sorted_keys):
         if not key.startswith(key_prefix):
             raise ValueError(f"Key `{key}` doesn't start with the specified prefix `{key_prefix}`.")
-        if idx > 0 and key == keys[idx - 1]:
+        if idx > 0 and key == sorted_keys[idx - 1]:
             raise ValueError(f"Duplicate key found: `{key}`.")
Then update line 451 to use sorted_keys:
-    key_iterator = iter(keys)
+    key_iterator = iter(sorted_keys)

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0a0bdcd and 771eeed.

📒 Files selected for processing (1)

components/clp-py-utils/clp_py_utils/s3_utils.py (3 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

components/clp-py-utils/clp_py_utils/s3_utils.py (1)

components/clp-py-utils/clp_py_utils/core.py (1)

FileMetadata (7-22)

🪛 Ruff (0.14.0)

components/clp-py-utils/clp_py_utils/s3_utils.py

403-403: Unnecessary list() call (rewrite as a literal)

Rewrite as a literal

(C408)

442-442: Avoid specifying long messages outside the exception class

(TRY003)

447-447: Avoid specifying long messages outside the exception class

(TRY003)

449-449: Avoid specifying long messages outside the exception class

(TRY003)

453-453: Unnecessary list() call (rewrite as a literal)

Rewrite as a literal

(C408)

477-477: Avoid specifying long messages outside the exception class

(TRY003)

491-493: Avoid specifying long messages outside the exception class

(TRY003)

514-517: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

514-517: Avoid specifying long messages outside the exception class

(TRY003)

🔇 Additional comments (4)

components/clp-py-utils/clp_py_utils/s3_utils.py (4)

7-7: LGTM!

The botocore import is necessary for exception handling in the new helper functions.

256-278: LGTM!

The branching logic correctly delegates to the appropriate helper function based on whether explicit keys are provided. The updated docstring accurately documents the propagated exceptions.

451-458: LGTM once the mutation issue is fixed.

The first-key handling logic is correct, using head_object for efficient single-key retrieval and appropriately returning early when only one key is provided.

460-483: LGTM!

The pagination logic correctly validates that all provided keys exist in the bucket, efficiently using StartAfter to skip already-processed objects. The directory placeholder handling is consistent with the prefix-based approach.

components/clp-py-utils/clp_py_utils/s3_utils.py

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (1)

components/clp-py-utils/clp_py_utils/s3_utils.py (1)

436-447: Do not mutate the caller-provided key list in place

keys.sort() reorders S3InputConfig.keys directly. Callers pass this config expecting it to remain untouched, so this side effect may leak into subsequent code paths. Work on a sorted copy instead (e.g., sorted_keys = sorted(keys)) and iterate over that copy.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 771eeed and f6f918c.

📒 Files selected for processing (1)

components/clp-py-utils/clp_py_utils/s3_utils.py (3 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

components/clp-py-utils/clp_py_utils/s3_utils.py (1)

components/clp-py-utils/clp_py_utils/core.py (1)

FileMetadata (7-22)

🪛 Ruff (0.14.0)

components/clp-py-utils/clp_py_utils/s3_utils.py

4-4: Import from collections.abc instead: Generator

Import from collections.abc

(UP035)

4-4: typing.Dict is deprecated, use dict instead

(UP035)

4-4: typing.List is deprecated, use list instead

(UP035)

4-4: typing.Set is deprecated, use set instead

(UP035)

4-4: typing.Tuple is deprecated, use tuple instead

(UP035)

403-403: Unnecessary list() call (rewrite as a literal)

Rewrite as a literal

(C408)

434-434: Avoid specifying long messages outside the exception class

(TRY003)

439-439: Avoid specifying long messages outside the exception class

(TRY003)

441-441: Avoid specifying long messages outside the exception class

(TRY003)

445-445: Unnecessary list() call (rewrite as a literal)

Rewrite as a literal

(C408)

462-462: Avoid specifying long messages outside the exception class

(TRY003)

476-478: Avoid specifying long messages outside the exception class

(TRY003)

499-502: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

499-502: Avoid specifying long messages outside the exception class

(TRY003)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)

GitHub Check: package-image
GitHub Check: rust-checks (ubuntu-24.04)
GitHub Check: rust-checks (ubuntu-22.04)
GitHub Check: lint-check (ubuntu-24.04)

components/clp-py-utils/clp_py_utils/s3_utils.py

coderabbitai

Actionable comments posted: 4

♻️ Duplicate comments (1)

components/clp-py-utils/clp_py_utils/s3_utils.py (1)
389-407: Prefer literal list initialisation.

Use [] for consistency and to satisfy Ruff C408.
-    file_metadata_list: List[FileMetadata] = list()
+    file_metadata_list: List[FileMetadata] = []

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a4f1c7f and 57fddc5.

📒 Files selected for processing (1)

components/clp-py-utils/clp_py_utils/s3_utils.py (3 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

components/clp-py-utils/clp_py_utils/s3_utils.py (1)

components/clp-py-utils/clp_py_utils/core.py (1)

FileMetadata (7-22)

🪛 Ruff (0.14.0)

components/clp-py-utils/clp_py_utils/s3_utils.py

4-4: Import from collections.abc instead: Generator

Import from collections.abc

(UP035)

4-4: typing.Dict is deprecated, use dict instead

(UP035)

4-4: typing.List is deprecated, use list instead

(UP035)

4-4: typing.Set is deprecated, use set instead

(UP035)

4-4: typing.Tuple is deprecated, use tuple instead

(UP035)

403-403: Unnecessary list() call (rewrite as a literal)

Rewrite as a literal

(C408)

431-431: Avoid specifying long messages outside the exception class

(TRY003)

436-436: Avoid specifying long messages outside the exception class

(TRY003)

438-438: Avoid specifying long messages outside the exception class

(TRY003)

440-440: Avoid specifying long messages outside the exception class

(TRY003)

444-444: Unnecessary list() call (rewrite as a literal)

Rewrite as a literal

(C408)

457-457: Avoid specifying long messages outside the exception class

(TRY003)

466-466: Unnecessary list() call (rewrite as a literal)

Rewrite as a literal

(C408)

471-473: Avoid specifying long messages outside the exception class

(TRY003)

494-497: Avoid specifying long messages outside the exception class

(TRY003)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)

GitHub Check: package-image
GitHub Check: rust-checks (ubuntu-24.04)
GitHub Check: rust-checks (macos-15)
GitHub Check: rust-checks (ubuntu-22.04)
GitHub Check: lint-check (macos-15)

🔇 Additional comments (1)

components/clp-py-utils/clp_py_utils/s3_utils.py (1)

476-498: HeadObject wrapper: LGTM.

Good conversion and exception chaining.

components/clp-py-utils/clp_py_utils/s3_utils.py

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (4)

components/clp-py-utils/clp_py_utils/s3_utils.py (4)
401-405: Use [] over list() for initialisation (minor).

Stylistic consistency and aligns with Ruff C408.
-    file_metadata_list: List[FileMetadata] = list()
+    file_metadata_list: List[FileMetadata] = []
269-269: Apply adaptive retries for metadata retrieval (parity with put/delete).

Use the same boto3 Config with retries to improve robustness of Head/List calls.
-    s3_client = _create_s3_client(s3_input_config.region_code, s3_input_config.aws_authentication)
+    boto3_config = Config(retries=dict(total_max_attempts=3, mode="adaptive"))
+    s3_client = _create_s3_client(
+        s3_input_config.region_code, s3_input_config.aws_authentication, boto3_config
+    )
497-499: PEP 604 union breaks Python 3.9; use Optional[str].

str | None requires Python 3.10+. This package targets >=3.9, so this will raise a SyntaxError on import.
-def _iter_s3_objects(
-    s3_client: boto3.client, bucket: str, key_prefix: str, start_from: str | None = None
-) -> Generator[Tuple[str, int], None, None]:
+def _iter_s3_objects(
+    s3_client: boto3.client, bucket: str, key_prefix: str, start_from: Optional[str] = None
+) -> Generator[Tuple[str, int], None, None]:
412-417: Docstring missing param for key_prefix.

Add the entry for clarity.
-    :param bucket:
-    :param keys:
+    :param bucket:
+    :param key_prefix:
+    :param keys:

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 57fddc5 and 7965b80.

📒 Files selected for processing (1)

components/clp-py-utils/clp_py_utils/s3_utils.py (3 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

components/clp-py-utils/clp_py_utils/s3_utils.py (1)

components/clp-py-utils/clp_py_utils/core.py (1)

FileMetadata (7-22)

🪛 Ruff (0.14.0)

components/clp-py-utils/clp_py_utils/s3_utils.py

4-4: Import from collections.abc instead: Generator

Import from collections.abc

(UP035)

4-4: typing.Dict is deprecated, use dict instead

(UP035)

4-4: typing.List is deprecated, use list instead

(UP035)

4-4: typing.Set is deprecated, use set instead

(UP035)

4-4: typing.Tuple is deprecated, use tuple instead

(UP035)

401-401: Unnecessary list() call (rewrite as a literal)

Rewrite as a literal

(C408)

428-428: Avoid specifying long messages outside the exception class

(TRY003)

433-433: Avoid specifying long messages outside the exception class

(TRY003)

435-435: Avoid specifying long messages outside the exception class

(TRY003)

437-437: Avoid specifying long messages outside the exception class

(TRY003)

441-441: Unnecessary list() call (rewrite as a literal)

Rewrite as a literal

(C408)

454-454: Avoid specifying long messages outside the exception class

(TRY003)

463-463: Unnecessary list() call (rewrite as a literal)

Rewrite as a literal

(C408)

468-470: Avoid specifying long messages outside the exception class

(TRY003)

491-494: Avoid specifying long messages outside the exception class

(TRY003)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: package-image

components/clp-py-utils/clp_py_utils/s3_utils.py

junhaoliao

lgtm. sorry for the delay in the review

for the title, how about:

feat(job-orchestration): Allow compression jobs to specify S3 object keys via `S3InputConfig`.

or

feat(job-orchestration): Support specifying explicit S3 object keys in `S3InputConfig` for compression.

LinZhihao-723 added 8 commits October 1, 2025 22:25

WIP.

3efacb5

Fix compress script.

433ca9e

WIP

58af081

Implementation fixed.

8080ee5

Merge oss-main.

033b3f8

Merged by mistakes.

fd21e56

Fix my bug.

af281ab

Revert compression interface.

4b63bcb

LinZhihao-723 and others added 2 commits October 12, 2025 10:18

Fix key validation.

b2e14c6

Merge branch 'main' into s3-input-config-update

0a0bdcd

LinZhihao-723 marked this pull request as ready for review October 14, 2025 22:50

LinZhihao-723 requested a review from a team as a code owner October 14, 2025 22:50

LinZhihao-723 requested a review from junhaoliao October 14, 2025 22:50

coderabbitai bot reviewed Oct 14, 2025

View reviewed changes

junhaoliao requested changes Oct 15, 2025

View reviewed changes

LinZhihao-723 and others added 2 commits October 14, 2025 22:42

Apply suggestion from @junhaoliao

710164e

Co-authored-by: Junhao Liao <[email protected]>

Apply suggestion from @junhaoliao

771eeed

Co-authored-by: Junhao Liao <[email protected]>

coderabbitai bot reviewed Oct 15, 2025

View reviewed changes

components/clp-py-utils/clp_py_utils/s3_utils.py Show resolved Hide resolved

components/clp-py-utils/clp_py_utils/s3_utils.py Outdated Show resolved Hide resolved

coderabbitai bot mentioned this pull request Oct 15, 2025

Add proper type hints for boto3 S3 client using boto3-stubs #1420

Open

LinZhihao-723 added 2 commits October 15, 2025 00:16

Implement iter.

f6f918c

Sort on a copy.

a98e995

coderabbitai bot reviewed Oct 15, 2025

View reviewed changes

components/clp-py-utils/clp_py_utils/s3_utils.py Show resolved Hide resolved

LinZhihao-723 and others added 4 commits October 15, 2025 00:21

Fix rabit's comment.

a4f1c7f

Merge branch 'main' into s3-input-config-update

9f547d7

Apply code review comments (offline).

57fddc5

Fix docstring.

7965b80

LinZhihao-723 requested a review from junhaoliao October 15, 2025 20:30

coderabbitai bot reviewed Oct 15, 2025

View reviewed changes

components/clp-py-utils/clp_py_utils/s3_utils.py Outdated Show resolved Hide resolved

LinZhihao-723 and others added 3 commits October 15, 2025 23:08

Apply AI comments.

bfd82bb

Fix comment.

b14a1c2

Merge branch 'main' into s3-input-config-update

beeac70

junhaoliao approved these changes Oct 16, 2025

View reviewed changes

LinZhihao-723 changed the title ~~feat(job-orchestration): Add support for accepting a list of S3 object keys for compression through S3InputConfig.~~ feat(job-orchestration): Allow compression jobs to specify S3 object keys via S3InputConfig. Oct 16, 2025

LinZhihao-723 merged commit e6b4a20 into y-scope:main Oct 16, 2025
24 checks passed

feat(job-orchestration): Allow compression jobs to specify S3 object keys via S3InputConfig. #1407

feat(job-orchestration): Allow compression jobs to specify S3 object keys via S3InputConfig. #1407

Uh oh!

Conversation

LinZhihao-723 commented Oct 12, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Checklist

Validation performed

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Oct 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Oct 14, 2025

Choose a reason for hiding this comment

Uh oh!

junhaoliao left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

junhaoliao left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

feat(job-orchestration): Allow compression jobs to specify S3 object keys via `S3InputConfig`. #1407

feat(job-orchestration): Allow compression jobs to specify S3 object keys via `S3InputConfig`. #1407

LinZhihao-723 commented Oct 12, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Oct 12, 2025 •

edited

Loading