Lal/dataset api v5 prod 14582 #58

lalepee · 2025-08-19T17:21:42Z

No description provided.

Copilot

Pull Request Overview

This PR adds version compatibility support for the cosmotech-api package, introducing backward compatibility for API v4 while supporting the newer v5. The changes implement version-aware functionality to handle breaking changes between API versions.

Key changes:

Added semantic version utility to detect cosmotech-api version at runtime
Removed credentials parameter handling and updated function signatures across multiple modules
Implemented version-specific dataset download functions for v4 and v5 compatibility
Added comprehensive test skipping for version-specific functionality

Reviewed Changes

Copilot reviewed 26 out of 27 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
cosmotech/coal/utils/semver.py	New utility for semantic version detection of packages
cosmotech/coal/utils/decorator.py	New timing decorator for performance monitoring
cosmotech/coal/cosmotech_api/runner/datasets.py	Version-aware dataset download with separate v4/v5 implementations
cosmotech/coal/cosmotech_api/dataset/download/file.py	Refactored file processing with new decorator and extracted functions
Multiple test files	Updated to remove credentials parameters and add version-specific test skipping
requirements.past.txt	New dependency file for legacy API support
pyproject.toml	Added past dependencies configuration

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

Copilot · 2025-08-19T17:32:43Z

cosmotech/coal/cosmotech_api/dataset/download/file.py

+    LOGGER.debug(T("coal.services.dataset.processing_text").format(file_name=target_file))
+    with open(target_file, "r") as _file:
+        current_filename = os.path.basename(target_file)
+        content[current_filename] = "".join(line for line in _file)


The text processing function removes newlines by using "".join() instead of "\n".join(). This changes the original file content structure and may cause data loss.

Suggested change

content[current_filename] = "".join(line for line in _file)

content[current_filename] = _file.read()

Copilot · 2025-08-19T17:32:44Z

cosmotech/coal/cosmotech_api/runner/datasets.py

+        tmp_dataset_dir_path = Path(tmp_dataset_dir)
+        for part in dataset.parts:
+            part_file_path = tmp_dataset_dir_path / part.source_name
+            data_part = dataset_api_instance.download_dataset_part(organization_id, workspace_id, dataset_id, part.id)


The method call uses positional arguments instead of keyword arguments, which could lead to parameter mismatch if the API signature changes. Consider using keyword arguments for clarity and robustness.

Suggested change

data_part = dataset_api_instance.download_dataset_part(organization_id, workspace_id, dataset_id, part.id)

data_part = dataset_api_instance.download_dataset_part(

organization_id=organization_id,

workspace_id=workspace_id,

dataset_id=dataset_id,

dataset_part_id=part.id,

)

Copilot · 2025-08-19T17:32:44Z

cosmotech/coal/cosmotech_api/runner/datasets.py

+            with open(part_file_path, "wb") as binary_file:
+                binary_file.write(data_part)
+
+            if read_files:


The file content reading is performed inside the loop for each dataset part, but the read_files check happens after file writing. Consider moving this check before the file writing operation to avoid unnecessary disk I/O when read_files is False.

Suggested change

if read_files:

if read_files:

part_file_path = tmp_dataset_dir_path / part.source_name

data_part = dataset_api_instance.download_dataset_part(organization_id, workspace_id, dataset_id, part.id)

with open(part_file_path, "wb") as binary_file:

binary_file.write(data_part)

Copilot · 2025-08-19T17:32:44Z

cosmotech/coal/cosmotech_api/dataset/download/file.py

+    LOGGER.debug(T("coal.services.dataset.processing_json").format(file_name=target_file))
+    with open(target_file, "r") as _file:
+        current_filename = os.path.basename(target_file)
+        content[current_filename] = json.load(_file)


[nitpick] Using the same variable name '_file' for both the function parameter and the file handle in the with statement creates confusion. Consider using a more descriptive name like 'json_file' for the file handle.

Suggested change

content[current_filename] = json.load(_file)

with open(target_file, "r") as json_file:

current_filename = os.path.basename(target_file)

content[current_filename] = json.load(json_file)

neomatamune

Everything seems ok to me to merge in the feature branch, we probably will need to update the action that runs the tests to run for 3.x and 5.x version of cosmotech-api

lalepee · 2025-08-22T09:51:19Z

Everything seems ok to me to merge in the feature branch, we probably will need to update the action that runs the tests to run for 3.x and 5.x version of cosmotech-api

The cosmotech-api v5 lib is not released yet so testing will need either installing pre-release or using fixed version. pre-release is way to risky and fixed is way to static.
I'll add a test for retro compatibility that install the [past] requirement.txt, this will execute the same test for the time being but at the cosmotech-api release it will test correctly.

lalepee added 3 commits August 18, 2025 14:56

remove unused call to get solution

90233aa

fix test by skipping unsupported test for version 5.0 of the API

d1409ce

add test for new download v5

582606a

lalepee requested a review from neomatamune August 19, 2025 17:21

apply black formating

1c28426

neomatamune requested a review from Copilot August 19, 2025 17:31

Copilot AI reviewed Aug 19, 2025

View reviewed changes

lalepee changed the base branch from main to feature/cosmotech_api_5.0 August 20, 2025 13:45

neomatamune approved these changes Aug 20, 2025

View reviewed changes

fix txt file processing on download

106b47d

add retro-compatibility test

5a95b46

neomatamune merged commit 84262d4 into feature/cosmotech_api_5.0 Sep 16, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Lal/dataset api v5 prod 14582 #58

Lal/dataset api v5 prod 14582 #58

Uh oh!

lalepee commented Aug 19, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Aug 19, 2025

Uh oh!

Copilot AI Aug 19, 2025

Uh oh!

Copilot AI Aug 19, 2025

Uh oh!

Copilot AI Aug 19, 2025

Uh oh!

neomatamune left a comment

Uh oh!

lalepee commented Aug 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	content[current_filename] = "".join(line for line in _file)
	content[current_filename] = _file.read()

-            data_part = dataset_api_instance.download_dataset_part(organization_id, workspace_id, dataset_id, part.id)
+            data_part = dataset_api_instance.download_dataset_part(
+                organization_id=organization_id,
+                workspace_id=workspace_id,
+                dataset_id=dataset_id,
+                dataset_part_id=part.id,
+            )

Lal/dataset api v5 prod 14582 #58

Lal/dataset api v5 prod 14582 #58

Uh oh!

Conversation

lalepee commented Aug 19, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Aug 19, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Aug 19, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Aug 19, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Aug 19, 2025

Choose a reason for hiding this comment

Uh oh!

neomatamune left a comment

Choose a reason for hiding this comment

Uh oh!

lalepee commented Aug 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants