-
Notifications
You must be signed in to change notification settings - Fork 70
New project: Bank Subscription Prediction #206
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR introduces a new ZenML project for Bank Subscription Prediction using XGBoost, with components for data loading, cleaning, preprocessing, splitting, model training, evaluation, and configuration management.
- Added utility functions for model operations
- Implemented multiple ZenML pipeline steps and a training pipeline definition
- Included YAML configuration files and extended project documentation
Reviewed Changes
Copilot reviewed 21 out of 21 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| bank_subscription_prediction/utils/model_utils.py | Added utility functions for calculating scale_pos_weight and retrieving feature importances |
| bank_subscription_prediction/steps/model_trainer.py | Created a step for training an XGBoost classifier with feature selection |
| bank_subscription_prediction/steps/model_evaluator.py | Developed model evaluation steps with interactive HTML visualizations |
| bank_subscription_prediction/steps/data_splitter.py | Added a step to split data into training and test sets with stratification support |
| bank_subscription_prediction/steps/data_preprocessor.py | Implemented data preprocessing by handling categorical variables and unnecessary columns |
| bank_subscription_prediction/steps/data_loader.py | Introduced a data loader that downloads data automatically if not found locally |
| bank_subscription_prediction/steps/data_cleaner.py | Added data cleaning to drop missing values and adjust column types |
| bank_subscription_prediction/run.py | Created a main script to run the entire training pipeline using click CLI |
| bank_subscription_prediction/configs/ | Provided YAML configurations for baseline, more trees, and deeper trees experiments |
| bank_subscription_prediction/README.md | Updated documentation detailing project setup, structure, and usage |
| pos_count = class_counts[1] | ||
| return neg_count / pos_count | ||
| else: | ||
| print("Warning: Could not calculate scale_pos_weight. Using default value 1.") |
Copilot
AI
May 16, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider replacing the print statement with a logging call (e.g., logger.warning) for more consistent and configurable warning output in production.
| print("Warning: Could not calculate scale_pos_weight. Using default value 1.") | |
| logging.warning("Could not calculate scale_pos_weight. Using default value 1.") |
| if stratify_col and stratify_col in df.columns: | ||
| stratify_data = df[stratify_col] | ||
| elif stratify_col: | ||
| print(f"Warning: Stratification column '{stratify_col}' not found. Proceeding without stratification.") |
Copilot
AI
May 16, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replace the print statement with a logging warning to ensure consistency with the rest of the project’s logging practices.
| print(f"Warning: Stratification column '{stratify_col}' not found. Proceeding without stratification.") | |
| logging.warning(f"Stratification column '{stratify_col}' not found. Proceeding without stratification.") |
| y_test=y_test | ||
| ) | ||
|
|
||
| print("Bank subscription training pipeline completed.") No newline at end of file |
Copilot
AI
May 16, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use a logging statement (e.g., logger.info) instead of print to maintain consistent logging practices across the pipeline.
| print("Bank subscription training pipeline completed.") | |
| logger.info("Bank subscription training pipeline completed.") |
Co-authored-by: Alex Strick van Linschoten <[email protected]>
…or the project.</footer>
| # Install dependencies with uv and cache optimization | ||
| RUN --mount=type=cache,target=/root/.cache/uv \ | ||
| uv pip install --system \ | ||
| "zenml[server]>=0.50.0" \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i.e. >=0.50.0
strickvl
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wait you need also to update the README.md
No description provided.