Skip to content

Conversation

@htahir1
Copy link
Contributor

@htahir1 htahir1 commented May 15, 2025

No description provided.

@dagshub
Copy link

dagshub bot commented May 15, 2025

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@strickvl strickvl changed the title New project New project: Bank Subscription Prediction May 16, 2025
@strickvl strickvl added enhancement New feature or request internal labels May 16, 2025
@strickvl strickvl requested a review from Copilot May 16, 2025 08:24
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces a new ZenML project for Bank Subscription Prediction using XGBoost, with components for data loading, cleaning, preprocessing, splitting, model training, evaluation, and configuration management.

  • Added utility functions for model operations
  • Implemented multiple ZenML pipeline steps and a training pipeline definition
  • Included YAML configuration files and extended project documentation

Reviewed Changes

Copilot reviewed 21 out of 21 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
bank_subscription_prediction/utils/model_utils.py Added utility functions for calculating scale_pos_weight and retrieving feature importances
bank_subscription_prediction/steps/model_trainer.py Created a step for training an XGBoost classifier with feature selection
bank_subscription_prediction/steps/model_evaluator.py Developed model evaluation steps with interactive HTML visualizations
bank_subscription_prediction/steps/data_splitter.py Added a step to split data into training and test sets with stratification support
bank_subscription_prediction/steps/data_preprocessor.py Implemented data preprocessing by handling categorical variables and unnecessary columns
bank_subscription_prediction/steps/data_loader.py Introduced a data loader that downloads data automatically if not found locally
bank_subscription_prediction/steps/data_cleaner.py Added data cleaning to drop missing values and adjust column types
bank_subscription_prediction/run.py Created a main script to run the entire training pipeline using click CLI
bank_subscription_prediction/configs/ Provided YAML configurations for baseline, more trees, and deeper trees experiments
bank_subscription_prediction/README.md Updated documentation detailing project setup, structure, and usage

pos_count = class_counts[1]
return neg_count / pos_count
else:
print("Warning: Could not calculate scale_pos_weight. Using default value 1.")
Copy link

Copilot AI May 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider replacing the print statement with a logging call (e.g., logger.warning) for more consistent and configurable warning output in production.

Suggested change
print("Warning: Could not calculate scale_pos_weight. Using default value 1.")
logging.warning("Could not calculate scale_pos_weight. Using default value 1.")

Copilot uses AI. Check for mistakes.
if stratify_col and stratify_col in df.columns:
stratify_data = df[stratify_col]
elif stratify_col:
print(f"Warning: Stratification column '{stratify_col}' not found. Proceeding without stratification.")
Copy link

Copilot AI May 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replace the print statement with a logging warning to ensure consistency with the rest of the project’s logging practices.

Suggested change
print(f"Warning: Stratification column '{stratify_col}' not found. Proceeding without stratification.")
logging.warning(f"Stratification column '{stratify_col}' not found. Proceeding without stratification.")

Copilot uses AI. Check for mistakes.
y_test=y_test
)

print("Bank subscription training pipeline completed.") No newline at end of file
Copy link

Copilot AI May 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use a logging statement (e.g., logger.info) instead of print to maintain consistent logging practices across the pipeline.

Suggested change
print("Bank subscription training pipeline completed.")
logger.info("Bank subscription training pipeline completed.")

Copilot uses AI. Check for mistakes.
@htahir1 htahir1 requested a review from strickvl May 16, 2025 11:59
# Install dependencies with uv and cache optimization
RUN --mount=type=cache,target=/root/.cache/uv \
uv pip install --system \
"zenml[server]>=0.50.0" \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i.e. >=0.50.0

@htahir1 htahir1 requested a review from strickvl May 16, 2025 12:06
@htahir1 htahir1 merged commit 58b08f2 into main May 16, 2025
3 of 4 checks passed
Copy link
Contributor

@strickvl strickvl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait you need also to update the README.md

@strickvl strickvl deleted the project/predict_financial_timeseries branch May 16, 2025 12:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request internal

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants