Support functionalities to enhance task traceability with metadata for dependency search. by tyzerrr · Pull Request #450 · m3dev/gokart

tyzerrr · 2025-03-05T09:24:15Z

Related works

What does PR do?

In this Pull Request, I implement a metadata attribution feature that enables searching for tasks dependent on specific tasks executed with a given parameter set.

Why is this needed?

Gokart caches the execution results and parameter states of each task in GCS. As shown in the Related Works section, various metadata are attached to each GCS object to enhance traceability. A common use case is searching for tasks that depend on a specific task executed with a given parameter set. Currently, Gokart does not support searching and tracing task dependencies from GCS metadata. This PR introduces this functionality.

Pre-Requisists

The focus of this PR is embedding the necessary metadata to allow the CLI to search for specific dependencies. The search functionality itself will be implemented on the CLI side (CLI: https://github.com/TlexCypher/gcs-metadog).

Checklist

CI is passing
Code formatting follows project standards.
Necessary tests have been added.
Existing tests pass.

…uld be handled.

tyzerrr · 2025-03-05T09:27:26Z

+K = TypeVar('K')
+
+
+def map_flattenable_items(items: FlattenableItems[T], func: Callable[[T], K]) -> FlattenableItems[K]:


If we can use both Generics and isinstance at the same time, code would be below.

def map_flattenable_items(items: FlattenableItems[T], func: Callable[[T], K]) -> FlattenableItems[K]: if isinstance(items, dict): return {k: map_flattenable_items(v, func) for k, v in items.items()} if isinstance(str): return items if isinstance(items, Iterable[T]): return [map_flattenable_items(i, func) for i in items] return func(items)

mamo3gr · 2025-03-06T02:38:46Z

+        @dataclass
+        class _RequiredTaskOutput:
+            task_name: str
+            output_path: str
+
+        _required_task_outputs = map_flattenable_items(
+            self.requires(),
+            func=lambda task: map_flattenable_items(
+                task.output(), func=lambda output: _RequiredTaskOutput(task_name=task.get_task_family(), output_path=output.path())
+            ),
+        )
+        required_task_outputs: dict[str, str] | None = None
+        if isinstance(_required_task_outputs, list):
+            required_task_outputs = {r.task_name: r.output_path for r in _required_task_outputs}
+        elif isinstance(_required_task_outputs, dict):
+            required_task_outputs = _required_task_outputs
+        else:
+            required_task_outputs = (
+                {_required_task_outputs.task_name: _required_task_outputs.output_path} if isinstance(_required_task_outputs, _RequiredTaskOutput) else None
+            )


[imo]
It would become more readable with extracting this section into a method, which returns required_task_outputs.

mamo3gr · 2025-03-06T02:59:59Z

+        lock_at_dump: bool = True,
+        task_params: dict[str, str] | None = None,
+        custom_labels: dict[str, Any] | None = None,
+        required_task_outputs: dict[str, str] | None = None,


[imo]
This parameter seems to be just a metadata. But its name may indicate that it effects the functionality of the method or the class's attribute. It would be better to rename for avoiding such a misleading.

mamo3gr · 2025-03-06T03:04:53Z

LGTM
I made some comments for improving code readability and leave them to your own choice to apply.

tyzerrr · 2025-03-06T04:31:16Z

@mamo3gr Thank you for your thoughtful comments. I'm gonna deal with all of them.

kitagry · 2025-03-06T04:34:03Z

+    if isinstance(items, str):
+        return items  # type: ignore


In this case, T means `str, so you should apply func for this.

Suggested change

if isinstance(items, str):

return items # type: ignore

if isinstance(items, str):

return func(items) # type: ignore

kitagry · 2025-03-06T04:36:49Z

+K = TypeVar('K')
+
+
+def map_flattenable_items(items: FlattenableItems[T], func: Callable[[T], K]) -> FlattenableItems[K]:


https://docs.python.org/3.13/library/functions.html#map

python original map define map(function, iterable), so you must suit python's manner.

Suggested change

def map_flattenable_items(items: FlattenableItems[T], func: Callable[[T], K]) -> FlattenableItems[K]:

def map_flattenable_items(func: Callable[[T], K], items: FlattenableItems[T]) -> FlattenableItems[K]:

kitagry · 2025-03-06T04:40:26Z

+    if isinstance(items, str):
+        return items  # type: ignore
+    if isinstance(items, Iterable):
+        return [map_flattenable_items(i, func) for i in items]


When pass tuple[T], it should returns tuple[K]. But, in this implementation, this case is not cared for.

And, could you add testcase?

kitagry · 2025-03-06T04:42:08Z

-                continue
-            merged_labels[label_name] = label_value
+        merged_labels: dict[str, str] = {}
+        for normalized_label in normalized_labels_list[:]:


Suggested change

for normalized_label in normalized_labels_list[:]:

for normalized_label in normalized_labels_list:

…uiredTaskOutput]]

…hashed should be list.

tyzerrr · 2025-04-17T09:33:15Z

@kitagry Sorry for late actions.
I accept your suggested changes.
Could you review this PR again?

mski-iksm · 2025-04-21T15:35:02Z

+from gokart.required_task_output import RequiredTaskOutput
+from gokart.utils import map_flattenable_items
+
+if sys.version_info < (3, 13):


Maybe this part is not needed?

hiro-o918 · 2025-04-22T00:44:26Z

+local_temporary_directory=./resource/tmp
+
+[core]
+logging_conf_file=logging.ini


[nits]
add end of newline

hiro-o918 · 2025-04-22T00:45:16Z

            copy.deepcopy(original_metadata),
            task_params,
            custom_labels,
+            required_task_outputs if required_task_outputs else None,


It seems to be redundant

Suggested change

required_task_outputs if required_task_outputs else None,

required_task_outputs,

hiro-o918 · 2025-04-22T00:52:52Z

        # Instead, users are expected to search using the labels they provided.
        # Therefore, in the event of a key conflict, the value registered by the user-provided labels will take precedence.
-        _merged_labels = GCSObjectMetadataClient._merge_custom_labels_and_task_params_labels(normalized_task_params_labels, normalized_custom_labels)
+        normalized_labels = (


[imo]
I prefer this because of readability

Suggested change

normalized_labels = (

normalized_labels = [normalized_custom_labels, normalized_task_params_labels]

if not required_task_outputs

normalized_labels.append({'__required_task_outputs': json.dumps(GCSObjectMetadataClient._get_serialized_string(required_task_outputs))})

mski-iksm · 2025-04-22T01:27:16Z

+        merged_labels: dict[str, str] = {}
+        for normalized_label in normalized_labels_list[:]:
+            for label_name, label_value in normalized_label.items():
+                if len(label_value) == 0:


[MUST] This code may fail, since it seems to assume that label_value is str.

I prefer checking if it is str, and then check the length as,

isinstance(label_value, str) and len(label_value)==0

Thank you for reviewing my code!
In my opinion, type checking is not necessary, because GCSObjectMetadataClient._normalize_labels convert all values stored in dictionary into string.
So, label_value definitely is string.

@TlexCypher
Then maybe the input normalized_labels_list: list[dict[str, Any]] should be normalized_labels_list: list[dict[str, str]] ?

@TlexCypher Colud you check this comment?

If you are confirmed that label_value is str, you should str instead of Any

I fixed here 0b06455

mski-iksm · 2025-04-22T01:32:10Z

-                continue
-            merged_labels[label_name] = label_value
+        merged_labels: dict[str, str] = {}
+        for normalized_label in normalized_labels_list[:]:


[weak-IMO]

for normalized_label in normalized_labels_list: for label_name, label_value in normalized_label.items(): if len(label_value) == 0:

I thought this part a bit difficult to understand, since it is deeply nested.

It may get better if you extract for label_name, label_value in... part as a separate function, and apply it with a functools.reduce().

However, current code is OK though. :)

Thank you for great suggestion!

For this specific task of merging labels, the simple nested loop is likely more readable and Pythonic than using functools.reduce.

While reduce can be used, in this scenario, the straightforward nested loop (or perhaps the alternative 'flattening' approach) probably offers better clarity and maintainability.

How do you think?

I preferred reduce approach, because it express the motivation of making merged_labels earlier, which makes the first time reader easier to understand.

merged_labels = reduce(...)

In the nested loop, you need to read to L.147 to understand the motivation of building merged_labels.

However, both approach is OK, since this is relatively small loop nest. :)

You're right.
I put some changes to use functools.reduce.
Thank you for help!

mski-iksm

@TlexCypher
I've made some comments but mainly LGTM! Thank you for your contribution!

hirosassa

Commented!

…get great readability.

hirosassa

Looks good!

hirosassa · 2025-04-28T04:27:41Z

@kitagry Hey, could you check this PR again?

kitagry

Sorry for the late review. Could you change custom_label type to dict[str, str] ?

kitagry · 2025-04-28T08:56:27Z

+        merged_labels: dict[str, str] = {}
+        for normalized_label in normalized_labels_list[:]:
+            for label_name, label_value in normalized_label.items():
+                if len(label_value) == 0:


@TlexCypher Colud you check this comment?

If you are confirmed that label_value is str, you should str instead of Any

tyzerrr · 2025-04-28T17:05:26Z

@kitagry Sorry for late reply.
I confirmed last type nits change comment!
I fix that.

荒木太一 added 6 commits March 4, 2025 13:42

WIP: End to implement the logic to gather the required task output path.

79a2881

WIP: success to add output path in nest mode, but some other case sho…

0cfe7ee

…uld be handled.

WIP: no ci apply.

3eee422

feat: fix to pass labels and has_seen_keys.

ec3bf4f

feat: fix conflicts

22a69d0

CI: apply ruff and mypy

08e3f59

tyzerrr marked this pull request as draft March 5, 2025 09:24

tyzerrr commented Mar 5, 2025

View reviewed changes

feat: add implementation of nest mode.

9b19a1c

tyzerrr changed the title ~~WIP: Feat/nestmode~~ Support functionalities to add metadata to enable searching for tasks dependent on specific tasks executed with a given parameter set. Mar 5, 2025

tyzerrr changed the title ~~Support functionalities to add metadata for searching tasks dependent on specific tasks executed with a given parameter set.~~ Support functionalities to enhance task traceability with metadata for dependency search. Mar 5, 2025

tyzerrr marked this pull request as ready for review March 5, 2025 12:28

tyzerrr mentioned this pull request Mar 5, 2025

Feat/nest mode tyzerrr/gcs-metadog#1

Merged

mamo3gr reviewed Mar 6, 2025

View reviewed changes

kitagry requested changes Mar 6, 2025

View reviewed changes

feat: deal with kitagry comments.

accbf1d

tyzerrr requested a review from kitagry March 6, 2025 07:58

kitagry reviewed Mar 6, 2025

View reviewed changes

Comment thread gokart/task.py Outdated

荒木太一 added 6 commits March 6, 2025 18:21

feat: Remove CLI dependencies.

6719f4d

feat: remove redundant statements.

0bcc16c

feat: change serialization expression for single FlattenableItems[Req…

5c41035

…uiredTaskOutput]]

CI: fix test and apply CI.

0b951ab

feat: fix mypy error.

10795a2

feat: refactoring make _list_flatten inner function.

32b4343

tyzerrr requested review from kitagry and mamo3gr March 6, 2025 10:17

tyzerrr added 3 commits April 17, 2025 18:28

feat: convert map object to list, any iterable objects that would be …

27b1abd

…hashed should be list.

Merge remote-tracking branch 'origin/master' into feat/nestmode

5ac1c4d

Merge remote-tracking branch 'origin/feat/nestmode' into feat/nestmode

f4479da

tyzerrr requested a review from kitagry April 17, 2025 09:32

mski-iksm reviewed Apr 21, 2025

View reviewed changes

hiro-o918 reviewed Apr 22, 2025

View reviewed changes

mski-iksm reviewed Apr 22, 2025

View reviewed changes

tyzerrr added 2 commits April 23, 2025 07:40

feat: add new line to end of param.ini

e71833b

feat: remove redundant expressions

46aabcf

tyzerrr requested review from hiro-o918 and mski-iksm April 23, 2025 22:24

Merge branch 'master' into feat/nestmode

7bde3b0

hirosassa approved these changes Apr 25, 2025

View reviewed changes

Comment thread gokart/gcs_obj_metadata_client.py Outdated

tyzerrr added 2 commits April 28, 2025 12:23

feat: use yiled to make memory efficient and use functools.reduce to …

4c44cea

…get great readability.

Merge remote-tracking branch 'origin/feat/nestmode' into feat/nestmode

dd6a629

tyzerrr requested a review from hirosassa April 28, 2025 03:25

hirosassa approved these changes Apr 28, 2025

View reviewed changes

Merge branch 'master' into feat/nestmode

d884c79

kitagry approved these changes Apr 28, 2025

View reviewed changes

tyzerrr added 2 commits April 29, 2025 02:02

feat: fix type of normalized_labeles_list

f1418f8

Merge remote-tracking branch 'origin/feat/nestmode' into feat/nestmode

6a1c4c2

kitagry force-pushed the feat/nestmode branch from 3a60c45 to 3f58ad4 Compare April 29, 2025 06:30

chore: change custom_labels type

0b06455

kitagry force-pushed the feat/nestmode branch from 3f58ad4 to 0b06455 Compare April 29, 2025 06:32

kitagry merged commit 42a7284 into m3dev:master Apr 29, 2025
8 checks passed

		K = TypeVar('K')


		def map_flattenable_items(items: FlattenableItems[T], func: Callable[[T], K]) -> FlattenableItems[K]:

	def map_flattenable_items(items: FlattenableItems[T], func: Callable[[T], K]) -> FlattenableItems[K]:
	def map_flattenable_items(func: Callable[[T], K], items: FlattenableItems[T]) -> FlattenableItems[K]:

	for normalized_label in normalized_labels_list[:]:
	for normalized_label in normalized_labels_list:

	required_task_outputs if required_task_outputs else None,
	required_task_outputs,

-        normalized_labels = (
+        normalized_labels = [normalized_custom_labels, normalized_task_params_labels]
+        if not required_task_outputs
+            normalized_labels.append({'__required_task_outputs': json.dumps(GCSObjectMetadataClient._get_serialized_string(required_task_outputs))})

Conversation

tyzerrr commented Mar 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Related works

What does PR do?

Why is this needed?

Pre-Requisists

Checklist

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mamo3gr commented Mar 6, 2025

Uh oh!

tyzerrr commented Mar 6, 2025

Uh oh!

kitagry Mar 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

tyzerrr commented Apr 17, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kitagry Apr 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mski-iksm Apr 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mski-iksm left a comment

Choose a reason for hiding this comment

Uh oh!

hirosassa left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

hirosassa left a comment

Choose a reason for hiding this comment

Uh oh!

hirosassa commented Apr 28, 2025

Uh oh!

kitagry left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

tyzerrr commented Mar 5, 2025 •

edited

Loading

kitagry Mar 6, 2025 •

edited

Loading

kitagry Apr 29, 2025 •

edited

Loading

mski-iksm Apr 25, 2025 •

edited

Loading