-
Notifications
You must be signed in to change notification settings - Fork 89
feat(job-orchestration): Read compression input metadata from DB for ingestor jobs (addresses #2018) #2082
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
jonathan-imanu
wants to merge
11
commits into
y-scope:main
Choose a base branch
from
jonathan-imanu:comp_scheduler_eliminate_s3_trip
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
feat(job-orchestration): Read compression input metadata from DB for ingestor jobs (addresses #2018) #2082
Changes from 8 commits
Commits
Show all changes
11 commits
Select commit
Hold shift + click to select a range
ec37878
feat(job-orchestration): Read compression input metadata from DB for …
jonathan-imanu 4322e2f
Merge branch 'main' into comp_scheduler_eliminate_s3_trip
jonathan-imanu e7d64de
feat: add helper to fetch metadata from table
jonathan-imanu 833c33e
fix: rename conf. & remove helper
jonathan-imanu de34c46
fix: apply suggestions from code review
jonathan-imanu 89b5d6a
fix: add duplicate id check
jonathan-imanu 8c77e31
fix: wrong check in is_s3_based_input
jonathan-imanu a7cde90
fix: validate duplicate ids with pydantic
jonathan-imanu d0fd88f
fix: rename validator after adding additional func
jonathan-imanu 0e3808e
fix: apply suggestions from code review
jonathan-imanu 957dee3
Merge branch 'main' into comp_scheduler_eliminate_s3_trip
jonathan-imanu File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Some comments aren't visible on the classic Files Changed page.
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -26,7 +26,10 @@ | |
| fetch_existing_datasets, | ||
| ) | ||
| from clp_py_utils.compression import validate_path_and_get_info | ||
| from clp_py_utils.core import read_yaml_config_file | ||
| from clp_py_utils.core import ( | ||
| FileMetadata, | ||
| read_yaml_config_file, | ||
| ) | ||
| from clp_py_utils.s3_utils import s3_get_object_metadata | ||
| from clp_py_utils.sql_adapter import SqlAdapter | ||
| from pydantic import ValidationError | ||
|
|
@@ -38,13 +41,15 @@ | |
| from job_orchestration.scheduler.constants import ( | ||
| CompressionJobStatus, | ||
| CompressionTaskStatus, | ||
| INGESTED_S3_OBJECT_METADATA_TABLE_NAME, | ||
| SchedulerType, | ||
| ) | ||
| from job_orchestration.scheduler.job_config import ( | ||
| ClpIoConfig, | ||
| FsInputConfig, | ||
| InputType, | ||
| S3InputConfig, | ||
| S3ObjectMetadataInputConfig, | ||
| ) | ||
| from job_orchestration.scheduler.scheduler_data import ( | ||
| CompressionJob, | ||
|
|
@@ -183,6 +188,57 @@ def _process_s3_input( | |
| paths_to_compress_buffer.add_file(object_metadata) | ||
|
|
||
|
|
||
| def _process_s3_object_metadata_input( | ||
| s3_object_metadata_input_config: S3ObjectMetadataInputConfig, | ||
| paths_to_compress_buffer: PathsToCompressBuffer, | ||
| db_context: DbContext, | ||
| ) -> None: | ||
| """ | ||
| Fetches S3 object metadata rows from the `INGESTED_S3_OBJECT_METADATA_TABLE_NAME` table for the | ||
| given `s3_object_metadata_ids` and `ingestion_job_id`, and adds the metadata to | ||
| `paths_to_compress_buffer`. | ||
|
|
||
| :param s3_object_metadata_input_config: | ||
| :param paths_to_compress_buffer: | ||
| :param db_context: | ||
| :raises RuntimeError: If no rows are found, or if any requested metadata_id is missing. | ||
| """ | ||
| s3_object_metadata_ids = s3_object_metadata_input_config.s3_object_metadata_ids | ||
| ingestion_job_id = s3_object_metadata_input_config.ingestion_job_id | ||
|
|
||
| placeholders = ", ".join(["%s"] * len(s3_object_metadata_ids)) | ||
| query = ( | ||
| f"SELECT `id`, `key`, `size` FROM {INGESTED_S3_OBJECT_METADATA_TABLE_NAME} " | ||
| f"WHERE id IN ({placeholders}) AND ingestion_job_id = %s" | ||
| ) | ||
| params = (*s3_object_metadata_ids, ingestion_job_id) | ||
| db_context.cursor.execute(query, params) | ||
jonathan-imanu marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| metadata_list = db_context.cursor.fetchall() | ||
| if len(metadata_list) == 0: | ||
| raise RuntimeError( | ||
| f"No rows found in {INGESTED_S3_OBJECT_METADATA_TABLE_NAME} for the given " | ||
| f"s3_object_metadata_ids and ingestion_job_id {ingestion_job_id}." | ||
| ) | ||
| # Validate that all requested IDs are present. | ||
jonathan-imanu marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| returned_ids = {row["id"] for row in metadata_list} | ||
| requested_ids = set(s3_object_metadata_ids) | ||
| missing_ids = requested_ids - returned_ids | ||
| if len(missing_ids) > 0: | ||
| raise RuntimeError( | ||
| f"Missing metadata rows in {INGESTED_S3_OBJECT_METADATA_TABLE_NAME} for " | ||
| f"ingestion_job_id {ingestion_job_id}: {sorted(missing_ids)}." | ||
| ) | ||
|
|
||
| for metadata in metadata_list: | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. For each key, we should also ensure it contains the expected prefix specified in |
||
| if not metadata["key"].startswith(s3_object_metadata_input_config.key_prefix): | ||
| raise RuntimeError( | ||
| f"Metadata key {metadata['key']} does not start with the key prefix " | ||
| f"{s3_object_metadata_input_config.key_prefix}." | ||
| ) | ||
| file_metadata = FileMetadata(path=Path(metadata["key"]), size=int(metadata["size"])) | ||
| paths_to_compress_buffer.add_file(file_metadata) | ||
|
|
||
|
|
||
| def _write_user_failure_log( | ||
| title: str, | ||
| content: list[str], | ||
|
|
@@ -321,6 +377,22 @@ def search_and_schedule_new_tasks( | |
| }, | ||
| ) | ||
| return | ||
| elif input_type == InputType.S3_OBJECT_METADATA.value: | ||
| try: | ||
| _process_s3_object_metadata_input( | ||
| input_config, paths_to_compress_buffer, db_context | ||
| ) | ||
| except Exception as err: | ||
| logger.exception("Failed to process S3 object metadata input for job %s", job_id) | ||
| update_compression_job_metadata( | ||
| db_context, | ||
| job_id, | ||
| { | ||
| "status": CompressionJobStatus.FAILED, | ||
| "status_msg": f"S3 object metadata input failure: {err}", | ||
| }, | ||
| ) | ||
| return | ||
| else: | ||
| logger.error(f"Unsupported input type {input_type}") | ||
| update_compression_job_metadata( | ||
|
|
||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.