Skip to content

Commit 825b623

Browse files
committed
final proj
1 parent 0209f76 commit 825b623

File tree

1 file changed

+161
-0
lines changed

1 file changed

+161
-0
lines changed

docs/final_project.md

Lines changed: 161 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,161 @@
1+
---
2+
title: CRUP Final Project
3+
nav_order: 3
4+
layout: home
5+
---
6+
7+
# <span style="color: #397DFF; font-weight: 350">CRUP Fall 2025 Final Project</span>
8+
9+
## <span style="color: #397DFF">Project Overview</span>
10+
You will design and execute a complete Machine Learning project that answers a real-world question and present your findings through an interactive website. This project integrates both machine learning and software engineering skills.
11+
12+
## <span style="color: #397DFF">Core Requirements</span>
13+
- Formulate a specific, real-world problem that can be solved with **Machine Learning**
14+
- Build and train an **ML model** using a real dataset
15+
- Create an **interactive website** that explains your process and allows users to interact with your model
16+
17+
---
18+
19+
# <span style="color: #397DFF; font-weight: 350">Part A: Machine Learning Component</span>
20+
21+
## <span style="color: #397DFF">1. Problem Formulation & Dataset Selection</span>
22+
Your task is to identify a clear, answerable question that requires machine learning to solve.
23+
24+
### Dataset Requirements
25+
- Must come from a real dataset (e.g., **Kaggle**, **Hugging Face**)
26+
- Your project must be one of the following:
27+
- **Classification** (e.g., “Will this customer churn?”)
28+
- **Regression** (e.g., “What will this house sell for?”)
29+
30+
---
31+
32+
## <span style="color: #397DFF">2. Complete ML Pipeline Implementation</span>
33+
You must implement all stages of a professional ML workflow.
34+
35+
### a) Data Acquisition and Cleaning
36+
- Download and load your dataset
37+
- Perform **Exploratory Data Analysis (EDA)** with visualizations and summary statistics
38+
- Handle missing values (remove, impute, etc.)
39+
- Encode categorical variables (e.g., one-hot encoding or label encoding)
40+
41+
### b) Model Training and Hyperparameter Tuning
42+
- Try multiple models appropriate for your task
43+
- Train each model
44+
- Tune hyperparameters using **Grid Search**, **Random Search**, etc.
45+
46+
### c) Rigorous Model Evaluation
47+
48+
#### For Classification
49+
- **F1 Score**
50+
- **AUC**
51+
- **Accuracy** (use carefully)
52+
- **Confusion Matrix**
53+
54+
#### For Regression
55+
- **R-squared (R²)**
56+
- **MAE**
57+
- **RMSE**
58+
59+
### d) Artifact Preservation
60+
- Save your trained model (e.g., using `pickle` or `torch.save`)
61+
- You will load this model into your website
62+
63+
---
64+
65+
# <span style="color: #397DFF; font-weight: 350">Part B: Software Engineering Component</span>
66+
67+
## <span style="color: #397DFF">1. Public-Facing Website</span>
68+
- Built using **React**
69+
- Serves as documentation + interactive demonstration
70+
71+
## <span style="color: #397DFF">2. Required Website Content (Documentation)</span>
72+
73+
### a) Central Problem & Real-World Impact
74+
Explain:
75+
- What question you're answering
76+
- Why it matters
77+
- Who benefits
78+
- What real-world decisions your model could influence
79+
80+
### b) Data Source & Nature
81+
Include:
82+
- Dataset link
83+
- What each row represents
84+
- Features included
85+
- Number of examples
86+
- Any limitations or biases
87+
88+
### c) ML Methodology
89+
Clarify:
90+
- Which algorithms you tried
91+
- Which you chose
92+
- Why you chose it
93+
- What hyperparameters you tuned
94+
95+
### d) Final Performance Metrics
96+
Report:
97+
- Your final evaluation metrics
98+
- A direct answer to your core question
99+
- Limitations + failure modes
100+
101+
---
102+
103+
# <span style="color: #397DFF; font-weight: 350">3. Interactive Component (MANDATORY)</span>
104+
105+
Your website must include at least **one interactive ML-powered element**.
106+
107+
### Acceptable Options
108+
- **Prediction Form** (user enters input → model predicts)
109+
- **Slider-Based Dynamic Prediction**
110+
- **Interactive Visualizations**
111+
- **Comparative Predictions (What-if Analysis)**
112+
113+
---
114+
115+
# <span style="color: #397DFF; font-weight: 350">Part C: Deadlines & Deliverables</span>
116+
117+
## <span style="color: #397DFF">📅 Deadlines</span>
118+
- **Research Proposal***Due: End of Thanksgiving Break*
119+
- One paragraph
120+
- Includes central question, dataset, and approach
121+
- **Final Project***Due: Before Banquet*
122+
- Full ML pipeline
123+
- Fully functional website
124+
- Complete documentation
125+
126+
## <span style="color: #397DFF">✅ Deliverables Checklist</span>
127+
- [ ] **Research Proposal**
128+
- [ ] **ML Solution**
129+
- Dataset acquired & cleaned
130+
- Multiple models compared
131+
- Best model selected
132+
- Model evaluated
133+
- Model saved
134+
- [ ] **Website Component**
135+
- React website (public-facing)
136+
- Full documentation
137+
- At least one interactive component
138+
- Accessible, clear design
139+
140+
---
141+
142+
# <span style="color: #397DFF; font-weight: 350">🌟 Exemplary Projects for Inspiration</span>
143+
- https://llm-attacks.org
144+
- https://thinkingmachines.ai/blog/modular-manifolds
145+
- Distill-style explorations:
146+
- Feature Visualization
147+
- Activation Atlas
148+
- Handwriting with Neural Networks
149+
- Building Blocks of Interpretability
150+
- https://distill.pub
151+
152+
---
153+
154+
# <span style="color: #397DFF; font-weight: 350">Tips for Success</span>
155+
- Choose a **focused** question
156+
- Select a **manageable dataset**
157+
- Document continuously
158+
- Build the interactive component early
159+
- Make explanations accessible to non-ML audiences
160+
161+
Good luck! 🚀

0 commit comments

Comments
 (0)