New project: Bank Subscription Prediction #206

htahir1 · 2025-05-15T15:37:34Z

No description provided.

dagshub · 2025-05-15T15:37:38Z

Join the discussion on DagsHub!

review-notebook-app · 2025-05-15T15:37:40Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

Copilot

Pull Request Overview

This PR introduces a new ZenML project for Bank Subscription Prediction using XGBoost, with components for data loading, cleaning, preprocessing, splitting, model training, evaluation, and configuration management.

Added utility functions for model operations
Implemented multiple ZenML pipeline steps and a training pipeline definition
Included YAML configuration files and extended project documentation

Reviewed Changes

Copilot reviewed 21 out of 21 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
bank_subscription_prediction/utils/model_utils.py	Added utility functions for calculating scale_pos_weight and retrieving feature importances
bank_subscription_prediction/steps/model_trainer.py	Created a step for training an XGBoost classifier with feature selection
bank_subscription_prediction/steps/model_evaluator.py	Developed model evaluation steps with interactive HTML visualizations
bank_subscription_prediction/steps/data_splitter.py	Added a step to split data into training and test sets with stratification support
bank_subscription_prediction/steps/data_preprocessor.py	Implemented data preprocessing by handling categorical variables and unnecessary columns
bank_subscription_prediction/steps/data_loader.py	Introduced a data loader that downloads data automatically if not found locally
bank_subscription_prediction/steps/data_cleaner.py	Added data cleaning to drop missing values and adjust column types
bank_subscription_prediction/run.py	Created a main script to run the entire training pipeline using click CLI
bank_subscription_prediction/configs/	Provided YAML configurations for baseline, more trees, and deeper trees experiments
bank_subscription_prediction/README.md	Updated documentation detailing project setup, structure, and usage

Copilot · 2025-05-16T08:25:55Z

bank_subscription_prediction/utils/model_utils.py

+        pos_count = class_counts[1]
+        return neg_count / pos_count
+    else:
+        print("Warning: Could not calculate scale_pos_weight. Using default value 1.")


Consider replacing the print statement with a logging call (e.g., logger.warning) for more consistent and configurable warning output in production.

Suggested change

print("Warning: Could not calculate scale_pos_weight. Using default value 1.")

logging.warning("Could not calculate scale_pos_weight. Using default value 1.")

Copilot · 2025-05-16T08:25:56Z

bank_subscription_prediction/steps/data_splitter.py

+    if stratify_col and stratify_col in df.columns:
+        stratify_data = df[stratify_col]
+    elif stratify_col:
+        print(f"Warning: Stratification column '{stratify_col}' not found. Proceeding without stratification.")


Replace the print statement with a logging warning to ensure consistency with the rest of the project’s logging practices.

Suggested change

print(f"Warning: Stratification column '{stratify_col}' not found. Proceeding without stratification.")

logging.warning(f"Stratification column '{stratify_col}' not found. Proceeding without stratification.")

Copilot · 2025-05-16T08:25:56Z

bank_subscription_prediction/pipelines/training_pipeline.py

+        y_test=y_test
+    )
+
+    print("Bank subscription training pipeline completed.") 


Use a logging statement (e.g., logger.info) instead of print to maintain consistent logging practices across the pipeline.

Suggested change

print("Bank subscription training pipeline completed.")

logger.info("Bank subscription training pipeline completed.")

bank_subscription_prediction/README.md

bank_subscription_prediction/configs/baseline.yaml

bank_subscription_prediction/configs/deeper_trees.yaml

bank_subscription_prediction/configs/more_trees.yaml

bank_subscription_prediction/requirements.txt

bank_subscription_prediction/run.py

bank_subscription_prediction/steps/data_loader.py

Co-authored-by: Alex Strick van Linschoten <[email protected]>

…or the project.</footer>

strickvl · 2025-05-16T12:01:21Z

bank_subscription_prediction/Dockerfile.codespace

+# Install dependencies with uv and cache optimization
+RUN --mount=type=cache,target=/root/.cache/uv \
+    uv pip install --system \
+    "zenml[server]>=0.50.0" \


i.e. >=0.50.0

strickvl

Wait you need also to update the README.md

Initial commit

c0b35e9

strickvl changed the title ~~New project~~ New project: Bank Subscription Prediction May 16, 2025

strickvl added enhancement New feature or request internal labels May 16, 2025

strickvl requested a review from Copilot May 16, 2025 08:24

Copilot AI reviewed May 16, 2025

View reviewed changes

strickvl requested changes May 16, 2025

View reviewed changes

htahir1 and others added 9 commits May 16, 2025 11:02

reformatted

041afe5

Add "fpr" to typos dictionary

f1735ae

Refacotr

8e252d3

Update bank_subscription_prediction/requirements.txt

b00d8c9

Co-authored-by: Alex Strick van Linschoten <[email protected]>

Refacotr

4ef4b15

Add production-ready MLOps pipeline for predicting bank term deposits

5797dcf

Add Dockerfile for bank_subscription_prediction model

81f68ff

Remove seaborn from dependencies<footer>Seaborn is no longer needed f…

d4a450a

…or the project.</footer>

Add numpy, click, and pyarrow to requirements

2bd2bd9

htahir1 requested a review from strickvl May 16, 2025 11:59

Add evaluation visualization image to README.md

ed8bb2c

strickvl reviewed May 16, 2025

View reviewed changes

Update ZenML server version to v0.80.0 in Dockerfile

0c6efc9

htahir1 requested a review from strickvl May 16, 2025 12:06

strickvl approved these changes May 16, 2025

View reviewed changes

htahir1 merged commit 58b08f2 into main May 16, 2025
3 of 4 checks passed

strickvl requested changes May 16, 2025

View reviewed changes

strickvl deleted the project/predict_financial_timeseries branch May 16, 2025 12:07

	print("Warning: Could not calculate scale_pos_weight. Using default value 1.")
	logging.warning("Could not calculate scale_pos_weight. Using default value 1.")

	print(f"Warning: Stratification column '{stratify_col}' not found. Proceeding without stratification.")
	logging.warning(f"Stratification column '{stratify_col}' not found. Proceeding without stratification.")

	print("Bank subscription training pipeline completed.")
	logger.info("Bank subscription training pipeline completed.")

New project: Bank Subscription Prediction #206

New project: Bank Subscription Prediction #206

Uh oh!

Conversation

htahir1 commented May 15, 2025

Uh oh!

dagshub bot commented May 15, 2025

Uh oh!

review-notebook-app bot commented May 15, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI May 16, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI May 16, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI May 16, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

strickvl May 16, 2025

Choose a reason for hiding this comment

Uh oh!

strickvl May 16, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

strickvl left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants