SNOW-2367850: task integration example update #250

sfc-gh-ajiang · 2026-01-06T23:52:45Z

No description provided.

samples/ml/ml_jobs/e2e_task_graph/src/pipeline_dag.py

samples/ml/ml_jobs/e2e_task_graph/src/train_model.py

samples/ml/ml_jobs/e2e_task_graph/src/modeling.py

samples/ml/ml_jobs/e2e_task_graph/src/pipeline_dag.py

samples/ml/ml_jobs/e2e_task_graph/src/modeling.py

samples/ml/ml_jobs/e2e_task_graph/src/train_model.py

sfc-gh-dhung

Remember this is a public facing sample, please be sure the code quality is high. It's especially important for the code to be simple and readable, with self documenting variable/function names and sufficient comments for non-experts to understand

sfc-gh-dhung · 2026-01-15T19:26:13Z

samples/ml/ml_jobs/e2e_task_graph/src/train_model.py

+    index = int(os.environ.get("SNOWFLAKE_JOB_INDEX", 0))
+
+    # Only head node saves and returns results
+    if index != 0:
+        print(f"Worker node (index {index}) - exiting")
+        exit(0)


Why is this necessary? ML Job/CR takes care of multi-node management, the driver script only gets run on the head node

sometimes I got an error

ValueError: Model is not trained yet. Please call fit first. The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/Users/ajiang/PycharmProjects/sf-samples/samples/ml/ml_jobs/e2e_task_graph/src/pipeline_local.py", line 134, in <module> run_pipeline( File "/Users/ajiang/PycharmProjects/sf-samples/samples/ml/ml_jobs/e2e_task_graph/src/pipeline_local.py", line 65, in run_pipeline model_obj = job.result()["model_obj"] File "/Users/ajiang/.pyenv/versions/3.10.18/lib/python3.10/site-packages/snowflake/ml/_internal/telemetry.py", line 611, in wrap return ctx.run(execute_func_with_statement_params) File "/Users/ajiang/.pyenv/versions/3.10.18/lib/python3.10/site-packages/snowflake/ml/_internal/telemetry.py", line 576, in execute_func_with_statement_params result = func(*args, **kwargs) File "/Users/ajiang/.pyenv/versions/3.10.18/lib/python3.10/site-packages/snowflake/ml/jobs/job.py", line 288, in result return cast(T, self._result.get_value()) File "/Users/ajiang/.pyenv/versions/3.10.18/lib/python3.10/site-packages/snowflake/ml/jobs/_interop/results.py", line 47, in get_value self._raise_exception(ex, wrap_exceptions) File "/Users/ajiang/.pyenv/versions/3.10.18/lib/python3.10/site-packages/snowflake/ml/jobs/_interop/results.py", line 26, in _raise_exception raise RuntimeError(f"Job execution failed with error: {exception!r}") from exception RuntimeError: Job execution failed with error: ValueError('Model is not trained yet. Please call fit first.')

Do you have any suggestions to resolve it?

and this doesn't happen if you remove these lines? what is the stack trace you see in the job itself?

sfc-gh-dhung · 2026-01-15T19:26:30Z

samples/ml/ml_jobs/e2e_task_graph/src/train_model.py

+        # Load the datasets
+        serialized = json.loads(ctx.get_predecessor_return_value("PREPARE_DATA"))
+
+    except Exception as e:


use a more specific exception type

sfc-gh-dhung · 2026-01-15T19:27:09Z

samples/ml/ml_jobs/e2e_task_graph/src/train_model.py

+        print(f"Error loading dataset info: {e}")
+        parser = argparse.ArgumentParser()
+        parser.add_argument("--dataset-info", type=str, required=True)
+        args = parser.parse_args()
+        serialized = json.loads(args.dataset_info)


Having argparse in an except block seems like a terrible pattern

sfc-gh-dhung · 2026-01-15T19:28:23Z

samples/ml/ml_jobs/e2e_task_graph/src/train_model.py

+        artifact_dir = config.artifact_dir
+
+        # Load the datasets
+        serialized = json.loads(ctx.get_predecessor_return_value("PREPARE_DATA"))


what is serialized referring to here? Can you use a more meaningful name?

sfc-gh-dhung · 2026-01-15T19:28:35Z

samples/ml/ml_jobs/e2e_task_graph/src/train_model.py

+    if not hasattr(model_obj, 'feature_weights'):
+        model_obj.feature_weights = None


What is this for?

sfc-gh-dhung · 2026-01-15T19:31:39Z

samples/ml/ml_jobs/e2e_task_graph/src/modeling.py

-# NOTE: Remove `target_instances=2` to run training on a single node
-#       See https://docs.snowflake.com/en/developer-guide/snowflake-ml/ml-jobs/distributed-ml-jobs
-@remote(COMPUTE_POOL, stage_name=JOB_STAGE, target_instances=2)


One of the main points of this sample is to demonstrate how easy it is to convert a local pipeline to pushing certain steps down into ML Jobs. Needing to write a separate script file which we submit_file() just for this conversion severely weakens this story. Why can't we just keep using a @remote() decorated function? @remote(...) should convert the function into an MLJobDefinition which we can directly use in pipeline_dag without needing an explicit MLJobDefinition.register() call

That is currently @remote does not create job definition and it creates a job directly. Currently, we only merged the PR for phase one and phase 2 is in review.

Let's hold off on merging this until @remote is ready then

sfc-gh-dhung · 2026-01-15T19:32:55Z

samples/ml/ml_jobs/e2e_task_graph/src/train_model.py

+    metrics = {**train_metrics, **test_metrics}
+    if artifact_dir:
+        model_pkl = cp.dumps(model_obj)
+        model_path = os.path.join(config.artifact_dir, "model.pkl")


use artifact_dir here. If prior code changes such that artifact_dir can be derived from something besides config.artifact_dir, this will break

sfc-gh-dhung · 2026-01-15T19:35:36Z

samples/ml/ml_jobs/e2e_task_graph/src/train_model.py

+            "model_obj": model_obj,
+            "metrics": metrics,
+        }
+        __return__= result_dict


Why not set __return__ for Task initiated jobs as well?

sfc-gh-ajiang added 2 commits January 6, 2026 09:32

update the task SDK

0291c04

revert changes

375d3ab

sfc-gh-ajiang requested a review from sfc-gh-dhung January 6, 2026 23:52

sfc-gh-dhung reviewed Jan 8, 2026

View reviewed changes

sfc-gh-ajiang added 2 commits January 8, 2026 19:01

resolve the comments

703488f

revert unnecessary changes

f96fd9b

sfc-gh-ajiang requested a review from sfc-gh-dhung January 9, 2026 03:03

revert unnecessary changes

ad5d13b

sfc-gh-dhung reviewed Jan 10, 2026

View reviewed changes

samples/ml/ml_jobs/e2e_task_graph/src/pipeline_dag.py Outdated Show resolved Hide resolved

samples/ml/ml_jobs/e2e_task_graph/src/train_model.py Outdated Show resolved Hide resolved

sfc-gh-ajiang added 2 commits January 12, 2026 16:22

resolve the comments

fd6a7dc

resolve the comments

5524f9a

sfc-gh-ajiang requested a review from sfc-gh-dhung January 13, 2026 00:48

sfc-gh-ajiang added 2 commits January 12, 2026 16:49

resolve the comments

3015500

resolve the comments

94e941a

sfc-gh-ajiang commented Jan 13, 2026

View reviewed changes

samples/ml/ml_jobs/e2e_task_graph/src/modeling.py Outdated Show resolved Hide resolved

sfc-gh-dhung reviewed Jan 13, 2026

View reviewed changes

sfc-gh-sichen reviewed Jan 14, 2026

View reviewed changes

samples/ml/ml_jobs/e2e_task_graph/src/train_model.py Outdated Show resolved Hide resolved

sfc-gh-ajiang added 2 commits January 13, 2026 19:14

resolve the comments

74a1edb

reformat the script

7f992c4

sfc-gh-dhung reviewed Jan 15, 2026

View reviewed changes

sfc-gh-ajiang added 6 commits January 21, 2026 16:56

update the sample

01c160f

update the sample

9fb7478

update the sample

071eb1c

update the sample

f6ab75c

update the samples

16b0b42

update the samples

9fe2b7a

sfc-gh-ajiang requested a review from sfc-gh-dhung January 23, 2026 15:40

		if not hasattr(model_obj, 'feature_weights'):
		model_obj.feature_weights = None

SNOW-2367850: task integration example update #250

Are you sure you want to change the base?

SNOW-2367850: task integration example update #250

Uh oh!

Conversation

sfc-gh-ajiang commented Jan 6, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sfc-gh-dhung left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sfc-gh-dhung Jan 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

sfc-gh-dhung Jan 21, 2026 •

edited

Loading