|
| 1 | +--- |
| 2 | +title: CRUP Final Project |
| 3 | +nav_order: 3 |
| 4 | +layout: home |
| 5 | +--- |
| 6 | + |
| 7 | +# <span style="color: #397DFF; font-weight: 350">CRUP Fall 2025 Final Project</span> |
| 8 | + |
| 9 | +## <span style="color: #397DFF">Project Overview</span> |
| 10 | +You will design and execute a complete Machine Learning project that answers a real-world question and present your findings through an interactive website. This project integrates both machine learning and software engineering skills. |
| 11 | + |
| 12 | +## <span style="color: #397DFF">Core Requirements</span> |
| 13 | +- Formulate a specific, real-world problem that can be solved with **Machine Learning** |
| 14 | +- Build and train an **ML model** using a real dataset |
| 15 | +- Create an **interactive website** that explains your process and allows users to interact with your model |
| 16 | + |
| 17 | +--- |
| 18 | + |
| 19 | +# <span style="color: #397DFF; font-weight: 350">Part A: Machine Learning Component</span> |
| 20 | + |
| 21 | +## <span style="color: #397DFF">1. Problem Formulation & Dataset Selection</span> |
| 22 | +Your task is to identify a clear, answerable question that requires machine learning to solve. |
| 23 | + |
| 24 | +### Dataset Requirements |
| 25 | +- Must come from a real dataset (e.g., **Kaggle**, **Hugging Face**) |
| 26 | +- Your project must be one of the following: |
| 27 | + - **Classification** (e.g., “Will this customer churn?”) |
| 28 | + - **Regression** (e.g., “What will this house sell for?”) |
| 29 | + |
| 30 | +--- |
| 31 | + |
| 32 | +## <span style="color: #397DFF">2. Complete ML Pipeline Implementation</span> |
| 33 | +You must implement all stages of a professional ML workflow. |
| 34 | + |
| 35 | +### a) Data Acquisition and Cleaning |
| 36 | +- Download and load your dataset |
| 37 | +- Perform **Exploratory Data Analysis (EDA)** with visualizations and summary statistics |
| 38 | +- Handle missing values (remove, impute, etc.) |
| 39 | +- Encode categorical variables (e.g., one-hot encoding or label encoding) |
| 40 | + |
| 41 | +### b) Model Training and Hyperparameter Tuning |
| 42 | +- Try multiple models appropriate for your task |
| 43 | +- Train each model |
| 44 | +- Tune hyperparameters using **Grid Search**, **Random Search**, etc. |
| 45 | + |
| 46 | +### c) Rigorous Model Evaluation |
| 47 | + |
| 48 | +#### For Classification |
| 49 | +- **F1 Score** |
| 50 | +- **AUC** |
| 51 | +- **Accuracy** (use carefully) |
| 52 | +- **Confusion Matrix** |
| 53 | + |
| 54 | +#### For Regression |
| 55 | +- **R-squared (R²)** |
| 56 | +- **MAE** |
| 57 | +- **RMSE** |
| 58 | + |
| 59 | +### d) Artifact Preservation |
| 60 | +- Save your trained model (e.g., using `pickle` or `torch.save`) |
| 61 | +- You will load this model into your website |
| 62 | + |
| 63 | +--- |
| 64 | + |
| 65 | +# <span style="color: #397DFF; font-weight: 350">Part B: Software Engineering Component</span> |
| 66 | + |
| 67 | +## <span style="color: #397DFF">1. Public-Facing Website</span> |
| 68 | +- Built using **React** |
| 69 | +- Serves as documentation + interactive demonstration |
| 70 | + |
| 71 | +## <span style="color: #397DFF">2. Required Website Content (Documentation)</span> |
| 72 | + |
| 73 | +### a) Central Problem & Real-World Impact |
| 74 | +Explain: |
| 75 | +- What question you're answering |
| 76 | +- Why it matters |
| 77 | +- Who benefits |
| 78 | +- What real-world decisions your model could influence |
| 79 | + |
| 80 | +### b) Data Source & Nature |
| 81 | +Include: |
| 82 | +- Dataset link |
| 83 | +- What each row represents |
| 84 | +- Features included |
| 85 | +- Number of examples |
| 86 | +- Any limitations or biases |
| 87 | + |
| 88 | +### c) ML Methodology |
| 89 | +Clarify: |
| 90 | +- Which algorithms you tried |
| 91 | +- Which you chose |
| 92 | +- Why you chose it |
| 93 | +- What hyperparameters you tuned |
| 94 | + |
| 95 | +### d) Final Performance Metrics |
| 96 | +Report: |
| 97 | +- Your final evaluation metrics |
| 98 | +- A direct answer to your core question |
| 99 | +- Limitations + failure modes |
| 100 | + |
| 101 | +--- |
| 102 | + |
| 103 | +# <span style="color: #397DFF; font-weight: 350">3. Interactive Component (MANDATORY)</span> |
| 104 | + |
| 105 | +Your website must include at least **one interactive ML-powered element**. |
| 106 | + |
| 107 | +### Acceptable Options |
| 108 | +- **Prediction Form** (user enters input → model predicts) |
| 109 | +- **Slider-Based Dynamic Prediction** |
| 110 | +- **Interactive Visualizations** |
| 111 | +- **Comparative Predictions (What-if Analysis)** |
| 112 | + |
| 113 | +--- |
| 114 | + |
| 115 | +# <span style="color: #397DFF; font-weight: 350">Part C: Deadlines & Deliverables</span> |
| 116 | + |
| 117 | +## <span style="color: #397DFF">📅 Deadlines</span> |
| 118 | +- **Research Proposal** — *Due: End of Thanksgiving Break* |
| 119 | + - One paragraph |
| 120 | + - Includes central question, dataset, and approach |
| 121 | +- **Final Project** — *Due: Before Banquet* |
| 122 | + - Full ML pipeline |
| 123 | + - Fully functional website |
| 124 | + - Complete documentation |
| 125 | + |
| 126 | +## <span style="color: #397DFF">✅ Deliverables Checklist</span> |
| 127 | +- [ ] **Research Proposal** |
| 128 | +- [ ] **ML Solution** |
| 129 | + - Dataset acquired & cleaned |
| 130 | + - Multiple models compared |
| 131 | + - Best model selected |
| 132 | + - Model evaluated |
| 133 | + - Model saved |
| 134 | +- [ ] **Website Component** |
| 135 | + - React website (public-facing) |
| 136 | + - Full documentation |
| 137 | + - At least one interactive component |
| 138 | + - Accessible, clear design |
| 139 | + |
| 140 | +--- |
| 141 | + |
| 142 | +# <span style="color: #397DFF; font-weight: 350">🌟 Exemplary Projects for Inspiration</span> |
| 143 | +- https://llm-attacks.org |
| 144 | +- https://thinkingmachines.ai/blog/modular-manifolds |
| 145 | +- Distill-style explorations: |
| 146 | + - Feature Visualization |
| 147 | + - Activation Atlas |
| 148 | + - Handwriting with Neural Networks |
| 149 | + - Building Blocks of Interpretability |
| 150 | +- https://distill.pub |
| 151 | + |
| 152 | +--- |
| 153 | + |
| 154 | +# <span style="color: #397DFF; font-weight: 350">Tips for Success</span> |
| 155 | +- Choose a **focused** question |
| 156 | +- Select a **manageable dataset** |
| 157 | +- Document continuously |
| 158 | +- Build the interactive component early |
| 159 | +- Make explanations accessible to non-ML audiences |
| 160 | + |
| 161 | +Good luck! 🚀 |
0 commit comments