Skip to content

Commit a2cc8cd

Browse files
committed
Update README.md
1 parent 1f832bf commit a2cc8cd

File tree

1 file changed

+167
-53
lines changed

1 file changed

+167
-53
lines changed

README.md

Lines changed: 167 additions & 53 deletions
Original file line numberDiff line numberDiff line change
@@ -1,53 +1,167 @@
1-
# 📊 Analytical Models in Excel
2-
3-
This repository showcases a range of analytical models implemented directly in Microsoft Excel, demonstrating both statistical understanding and advanced spreadsheet proficiency. As a data analytics student, I created this workbook to serve as a portfolio piece illustrating my hands-on ability to analyze data, build predictive models, and present findings in a clean, structured format - all using Excel's native features.
4-
5-
---
6-
7-
## 📁 Workbook Overview
8-
9-
The workbook includes the following sheets:
10-
11-
| **Sheet Name** | **Description** |
12-
| ----------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
13-
| **Linear Regression** | Predicts exam scores based on study hours and exam preparation. Includes manual calculation of coefficients, performance metrics, and a summary output. |
14-
| **Principal Component Analysis** | Manual calculation of covariance matrix, eigenvalues and eigenvectors, and principal components for a standardized feature set. |
15-
| **Decision Tree Calculation** | A rule-based decision tree example using categorical features (e.g. likes ice cream/chocolate). Uses entropy-based logic with branching conditions. |
16-
| **Logistic Model Evaluation** | Detailed logistic regression model using multiple predictors. Contains odds ratios, interpretations of coefficients, predicted probabilities, and error types (Type I/II). |
17-
| **Vector Prediction Decision Tree** | Manual calculation for an overfit vector prediction decision tree compared to a code generated one. |
18-
| **K-Fold Cross-Validation** | Performs 5-fold cross-validation on a machine failure dataset. Calculates fold-based splits, tracks performance, and helps evaluate model generalizability. |
19-
| **Logistic Regression** | Manual implementation of logistic regression on binary classification (machine working vs not). Includes logit function, probabilities, and log-likelihoods. |
20-
| **K-Nearest Neighbors** | Basic setup for KNN classification. Graphical representation included. |
21-
22-
---
23-
24-
## 🔧 Skills Demonstrated
25-
26-
* **Analytical Thinking**: Applying core data science concepts in spreadsheet format
27-
* **Excel Mastery**: Advanced use of formulas, named ranges, logical structures, formatting, and charts
28-
* **Model Interpretation**: Clear presentation of each model's outputs, performance metrics, and decision logic
29-
* **Self-contained Execution**: All computations are done manually within Excel — no external tools or code required
30-
31-
---
32-
33-
## 🧠 Why Excel?
34-
35-
While programming languages like Python and R are industry standards for analytics, Excel remains an invaluable tool — especially in business environments. This project demonstrates how deep analytical work can be achieved even in Excel, making concepts more transparent and accessible.
36-
37-
---
38-
39-
## 🚀 How to Use
40-
41-
1. Open the `Analytical_Models_In_Excel.xlsx` file.
42-
2. Navigate through the tabs to explore each model.
43-
3. Use in-sheet comments, formula breakdowns, and labeled sections to follow the logic step-by-step.
44-
45-
---
46-
47-
## 📬 Feedback & Collaboration
48-
49-
If you're a student, analyst, or recruiter reviewing this portfolio - feel free to reach out! I'm always open to feedback, collaboration, or internship opportunities in data analytics, machine learning, or related fields.
50-
51-
**Author:** Jishen Harilal
52-
**LinkedIn:** www.linkedin.com/in/jishen-harilal
53-
**Contact:** jishen2108@gmail.com
1+
[![Releases](https://img.shields.io/badge/Releases-Download%20Workbook-blue?logo=github)](https://github.com/zhaa-kun/analytical-models-in-excel/releases)
2+
3+
# Spreadsheet Machine Learning: Excel Models for Core Analytics Demo 📊🔧
4+
5+
Short description
6+
- A curated Excel workbook that shows core data analysis techniques. It uses spreadsheet formulas, structured sheets, and clear formatting to show how regression, classification, dimensionality reduction, and validation work without code.
7+
8+
Badges
9+
- Topics: analytics · cross-validation · data-analysis · data-analytics · data-science-portfolio · data-visualization · decision-trees · excel · excel-models · knn · linear-regression · logistic-regression · machine-learning · no-code-machine-learning · pca · predictive-modeling · spreadsheet-models · statistical-analysis
10+
11+
Hero image
12+
![Excel Icon](https://upload.wikimedia.org/wikipedia/commons/7/73/Microsoft_Office_Excel_%282018%E2%80%93present%29.svg)
13+
14+
Why this repo
15+
- Use it to teach model logic inside a spreadsheet.
16+
- Show model mechanics step-by-step for presentations or classes.
17+
- Audit model math in cells, not in black-box code.
18+
- Share a portfolio piece that highlights Excel modeling skills.
19+
20+
What you will find
21+
- A single, well-structured Excel workbook (.xlsx) with multiple sheets.
22+
- Worked examples and small datasets embedded in the workbook.
23+
- Clean layout for inputs, calculations, and outputs.
24+
- Visuals: charts and conditional formatting for model insight.
25+
- Reusable templates for experiments.
26+
27+
Get the workbook
28+
- Download the release file from the Releases page and open it in Excel. Execute the workbook by opening the downloaded file and stepping through the sheets.
29+
- Releases: https://github.com/zhaa-kun/analytical-models-in-excel/releases
30+
31+
Structure of the workbook (sheet-by-sheet)
32+
- README (sheet): Quick navigation and short guide.
33+
- Data: Small sample datasets for each demo. Columns include features and targets.
34+
- Preprocess: Missing value handling, scaling, and simple encoding done with formulas.
35+
- Linear Regression (OLS): Full OLS via matrix algebra with formulas. Includes residual plots and diagnostics.
36+
- Logistic Regression: Logit link implemented with iterative update (Newton-Raphson) and log-likelihood tracking.
37+
- k-NN: Distance matrix, neighbor selection, tie rules, and performance table.
38+
- Decision Tree (simple): Split metrics (Gini, entropy), split selection process, and manual tree diagram using cells.
39+
- PCA: Covariance matrix, eigen decomposition via characteristic polynomial approximation, variance explained table.
40+
- Cross-Validation: k-fold split by index formulas, aggregate metrics, and bias-variance illustration.
41+
- Model Comparison: Side-by-side metrics and a chart for AUC, RMSE, accuracy, and explained variance.
42+
- Notes (calc): Key formula references and aliases to cells that hold hyperparameters.
43+
- Visuals: Chart examples and interactive controls (data validation lists, slider-style cells).
44+
- Tests: Small suites of formula checks that validate expected numeric results.
45+
46+
Models and methods covered
47+
- Linear Regression: Ordinary Least Squares; matrix solution with X'X inversion using spreadsheet functions. Diagnostics: R², adjusted R², standard errors, t-stats, residual plots.
48+
- Logistic Regression: Iterative parameter update and probability output. Model fit via log-likelihood. Metrics: accuracy, precision, recall, ROC data points.
49+
- k-Nearest Neighbors (k-NN): Euclidean distance matrix, vote aggregation, weighted k versions.
50+
- Decision Tree (tiny): Manual split search with impurity reduction and depth control.
51+
- Principal Component Analysis (PCA): Center data, compute covariance, and derive principal components and explained variance.
52+
- Cross-Validation: k-fold, stratified split for classification demos, and metric aggregation.
53+
54+
How the sheets show the work
55+
- Inputs live at the top of each sheet.
56+
- Calculations sit in a dedicated block with named ranges.
57+
- Outputs and charts sit in a right-hand column for quick review.
58+
- Key steps use simple formulas so any user can trace a number from input to output.
59+
- Comments and short cell notes explain the formula intent.
60+
61+
Sample workflows
62+
- Fit a linear model
63+
1. Open the workbook and go to the Linear Regression sheet.
64+
2. Review the Inputs block and change the design matrix if needed.
65+
3. Watch the matrix algebra section compute coefficients.
66+
4. Check residual diagnostics and scatter plots.
67+
- Train and test a classifier
68+
1. Use the Cross-Validation sheet to set k (folds).
69+
2. The sheet splits data using index math and validation formulas.
70+
3. Go to Logistic Regression or k-NN sheet and view metrics per fold.
71+
4. Inspect aggregated metrics on the Model Comparison sheet.
72+
- Run PCA
73+
1. Center and scale features in Preprocess.
74+
2. Open PCA sheet to see covariance and component scores.
75+
3. Use the Visuals sheet to plot explained variance.
76+
77+
Key formulas and Excel features used
78+
- INDEX, MATCH, OFFSET, INDIRECT for structured references.
79+
- MMULT, MINVERSE, TRANSPOSE for matrix math.
80+
- SUMPRODUCT for dot products and weighted sums.
81+
- IF, COUNTIFS, SUMIFS for logic and grouped aggregates.
82+
- conditional formatting for residuals and outlier flags.
83+
- chart types: scatter, line, bar for metrics and diagnostics.
84+
- data validation lists for toggles between model variants.
85+
86+
Teaching tips
87+
- Freeze panes on calculation blocks so learners track formula flow.
88+
- Use cell colors for input, calc, and output to set expectations.
89+
- Step through iterations (logistic Newton steps) by copying intermediate columns.
90+
- Use the Tests sheet to run quick checks after changes.
91+
- Replace sample data with your dataset. The workbook uses named ranges for easy swap.
92+
93+
Performance notes
94+
- The workbook handles small to medium datasets that fit in one sheet.
95+
- Large datasets may slow spreadsheet calculations. Use smaller samples for teaching.
96+
- Matrix inversion via MINVERSE can show numerical issues; the sheet has a small demo of conditioning.
97+
98+
Examples and visuals
99+
- Residual plot: shows predicted vs actual and highlights heteroskedastic patterns.
100+
- ROC pseudo-curve: sorted thresholds and TPR/FPR computed in-sheet.
101+
- PCA scree plot: bar chart of explained variance per component.
102+
- Decision split table: shows candidate thresholds with impurity metrics.
103+
104+
Contributing
105+
- Pull requests welcome. Use clear issue descriptions and attach small test data.
106+
- Suggested contributions:
107+
- Add new model sheet (SVM, Lasso) using cell formulas.
108+
- Improve numeric stability for matrix math.
109+
- Add workbook macros for automation (kept separate from core formulas).
110+
- Improve visuals or add tutorial steps in separate sheets.
111+
112+
Releases and download
113+
- The workbook is packaged in Releases. Download the .xlsx from the releases page and open it in Excel to run the demos.
114+
- Releases page: https://github.com/zhaa-kun/analytical-models-in-excel/releases
115+
- If the link does not work in your environment, check the Releases section on the repository page.
116+
117+
License
118+
- MIT License. See LICENSE file in the repo for terms.
119+
120+
Contact
121+
- File issues on GitHub for bugs or feature requests.
122+
- Use pull requests for changes to workbook structure or new model sheets.
123+
124+
Examples of classroom exercises
125+
- Exercise 1 — OLS step trace
126+
- Goal: Reproduce coefficient for a single predictor using cell-by-cell math.
127+
- Task: Follow matrix build, compute X'X and invert, obtain beta.
128+
- Deliverable: A screenshot of cell ranges with matching coefficient values.
129+
- Exercise 2 — Evaluate k-NN sensitivity
130+
- Goal: Compare accuracy across k = 1,3,5.
131+
- Task: Use the k-NN sheet and the cross-validation split to record fold metrics.
132+
- Deliverable: Table and chart of accuracy vs k.
133+
- Exercise 3 — Bias-variance sketch
134+
- Goal: Visualize training vs validation error as model complexity changes.
135+
- Task: Use Decision Tree sheet, vary depth, record RMSE or error rates.
136+
- Deliverable: Line chart of errors across depths.
137+
138+
Advanced notes for readers who want depth
139+
- Numerical conditioning: The Linear Regression sheet has an example that shows how near-collinearity affects MINVERSE results. It uses a small ridge stabilizer formula to show shrinkage.
140+
- Logistic convergence: The sheet logs Newton-Raphson steps and shows change in log-likelihood. This helps teach convergence and step damping.
141+
- PCA eigen approximation: Full eigen decomposition is hard in raw Excel. The sheet uses power-iteration style approximations and orthogonal deflation for component extraction.
142+
143+
Screenshots
144+
- Add your own screenshots to issues or PRs. The workbook includes a Visuals sheet with sample charts that you can capture.
145+
146+
Repository topics (for search)
147+
analytics, cross-validation, data-analysis, data-analytics, data-science-portfolio, data-visualization, decision-trees, excel, excel-models, knn, linear-regression, logistic-regression, machine-learning, manual-calculations, no-code-machine-learning, pca, predictive-modeling, spreadsheet-models, statistical-analysis
148+
149+
Credits
150+
- Built to show in-sheet model mechanics and clear workbook design.
151+
- Inspired by classroom needs for transparent model math.
152+
153+
Legal and data hygiene
154+
- Remove sensitive data before uploading new workbook versions.
155+
- Use sample data sheets for public demonstrations.
156+
157+
Quick start checklist
158+
- Download the release from the Releases page and open the .xlsx file in Excel.
159+
- Read the README sheet inside the workbook to find the demo you want.
160+
- Run tests on the Tests sheet to validate formulas.
161+
- Modify inputs and watch outputs update.
162+
163+
Additional resources
164+
- Link to Excel function docs and matrix formula examples: use official Microsoft documentation for advanced formula behavior.
165+
- For numeric stability and linear algebra background, refer to standard statistics and linear algebra texts.
166+
167+
Thank you for exploring the workbook.

0 commit comments

Comments
 (0)