|
1 | | -# 📊 Analytical Models in Excel |
2 | | - |
3 | | -This repository showcases a range of analytical models implemented directly in Microsoft Excel, demonstrating both statistical understanding and advanced spreadsheet proficiency. As a data analytics student, I created this workbook to serve as a portfolio piece illustrating my hands-on ability to analyze data, build predictive models, and present findings in a clean, structured format - all using Excel's native features. |
4 | | - |
5 | | ---- |
6 | | - |
7 | | -## 📁 Workbook Overview |
8 | | - |
9 | | -The workbook includes the following sheets: |
10 | | - |
11 | | -| **Sheet Name** | **Description** | |
12 | | -| ----------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | |
13 | | -| **Linear Regression** | Predicts exam scores based on study hours and exam preparation. Includes manual calculation of coefficients, performance metrics, and a summary output. | |
14 | | -| **Principal Component Analysis** | Manual calculation of covariance matrix, eigenvalues and eigenvectors, and principal components for a standardized feature set. | |
15 | | -| **Decision Tree Calculation** | A rule-based decision tree example using categorical features (e.g. likes ice cream/chocolate). Uses entropy-based logic with branching conditions. | |
16 | | -| **Logistic Model Evaluation** | Detailed logistic regression model using multiple predictors. Contains odds ratios, interpretations of coefficients, predicted probabilities, and error types (Type I/II). | |
17 | | -| **Vector Prediction Decision Tree** | Manual calculation for an overfit vector prediction decision tree compared to a code generated one. | |
18 | | -| **K-Fold Cross-Validation** | Performs 5-fold cross-validation on a machine failure dataset. Calculates fold-based splits, tracks performance, and helps evaluate model generalizability. | |
19 | | -| **Logistic Regression** | Manual implementation of logistic regression on binary classification (machine working vs not). Includes logit function, probabilities, and log-likelihoods. | |
20 | | -| **K-Nearest Neighbors** | Basic setup for KNN classification. Graphical representation included. | |
21 | | - |
22 | | ---- |
23 | | - |
24 | | -## 🔧 Skills Demonstrated |
25 | | - |
26 | | -* **Analytical Thinking**: Applying core data science concepts in spreadsheet format |
27 | | -* **Excel Mastery**: Advanced use of formulas, named ranges, logical structures, formatting, and charts |
28 | | -* **Model Interpretation**: Clear presentation of each model's outputs, performance metrics, and decision logic |
29 | | -* **Self-contained Execution**: All computations are done manually within Excel — no external tools or code required |
30 | | - |
31 | | ---- |
32 | | - |
33 | | -## 🧠 Why Excel? |
34 | | - |
35 | | -While programming languages like Python and R are industry standards for analytics, Excel remains an invaluable tool — especially in business environments. This project demonstrates how deep analytical work can be achieved even in Excel, making concepts more transparent and accessible. |
36 | | - |
37 | | ---- |
38 | | - |
39 | | -## 🚀 How to Use |
40 | | - |
41 | | -1. Open the `Analytical_Models_In_Excel.xlsx` file. |
42 | | -2. Navigate through the tabs to explore each model. |
43 | | -3. Use in-sheet comments, formula breakdowns, and labeled sections to follow the logic step-by-step. |
44 | | - |
45 | | ---- |
46 | | - |
47 | | -## 📬 Feedback & Collaboration |
48 | | - |
49 | | -If you're a student, analyst, or recruiter reviewing this portfolio - feel free to reach out! I'm always open to feedback, collaboration, or internship opportunities in data analytics, machine learning, or related fields. |
50 | | - |
51 | | -**Author:** Jishen Harilal |
52 | | -**LinkedIn:** www.linkedin.com/in/jishen-harilal |
53 | | -**Contact:** jishen2108@gmail.com |
| 1 | +[](https://github.com/zhaa-kun/analytical-models-in-excel/releases) |
| 2 | + |
| 3 | +# Spreadsheet Machine Learning: Excel Models for Core Analytics Demo 📊🔧 |
| 4 | + |
| 5 | +Short description |
| 6 | +- A curated Excel workbook that shows core data analysis techniques. It uses spreadsheet formulas, structured sheets, and clear formatting to show how regression, classification, dimensionality reduction, and validation work without code. |
| 7 | + |
| 8 | +Badges |
| 9 | +- Topics: analytics · cross-validation · data-analysis · data-analytics · data-science-portfolio · data-visualization · decision-trees · excel · excel-models · knn · linear-regression · logistic-regression · machine-learning · no-code-machine-learning · pca · predictive-modeling · spreadsheet-models · statistical-analysis |
| 10 | + |
| 11 | +Hero image |
| 12 | + |
| 13 | + |
| 14 | +Why this repo |
| 15 | +- Use it to teach model logic inside a spreadsheet. |
| 16 | +- Show model mechanics step-by-step for presentations or classes. |
| 17 | +- Audit model math in cells, not in black-box code. |
| 18 | +- Share a portfolio piece that highlights Excel modeling skills. |
| 19 | + |
| 20 | +What you will find |
| 21 | +- A single, well-structured Excel workbook (.xlsx) with multiple sheets. |
| 22 | +- Worked examples and small datasets embedded in the workbook. |
| 23 | +- Clean layout for inputs, calculations, and outputs. |
| 24 | +- Visuals: charts and conditional formatting for model insight. |
| 25 | +- Reusable templates for experiments. |
| 26 | + |
| 27 | +Get the workbook |
| 28 | +- Download the release file from the Releases page and open it in Excel. Execute the workbook by opening the downloaded file and stepping through the sheets. |
| 29 | +- Releases: https://github.com/zhaa-kun/analytical-models-in-excel/releases |
| 30 | + |
| 31 | +Structure of the workbook (sheet-by-sheet) |
| 32 | +- README (sheet): Quick navigation and short guide. |
| 33 | +- Data: Small sample datasets for each demo. Columns include features and targets. |
| 34 | +- Preprocess: Missing value handling, scaling, and simple encoding done with formulas. |
| 35 | +- Linear Regression (OLS): Full OLS via matrix algebra with formulas. Includes residual plots and diagnostics. |
| 36 | +- Logistic Regression: Logit link implemented with iterative update (Newton-Raphson) and log-likelihood tracking. |
| 37 | +- k-NN: Distance matrix, neighbor selection, tie rules, and performance table. |
| 38 | +- Decision Tree (simple): Split metrics (Gini, entropy), split selection process, and manual tree diagram using cells. |
| 39 | +- PCA: Covariance matrix, eigen decomposition via characteristic polynomial approximation, variance explained table. |
| 40 | +- Cross-Validation: k-fold split by index formulas, aggregate metrics, and bias-variance illustration. |
| 41 | +- Model Comparison: Side-by-side metrics and a chart for AUC, RMSE, accuracy, and explained variance. |
| 42 | +- Notes (calc): Key formula references and aliases to cells that hold hyperparameters. |
| 43 | +- Visuals: Chart examples and interactive controls (data validation lists, slider-style cells). |
| 44 | +- Tests: Small suites of formula checks that validate expected numeric results. |
| 45 | + |
| 46 | +Models and methods covered |
| 47 | +- Linear Regression: Ordinary Least Squares; matrix solution with X'X inversion using spreadsheet functions. Diagnostics: R², adjusted R², standard errors, t-stats, residual plots. |
| 48 | +- Logistic Regression: Iterative parameter update and probability output. Model fit via log-likelihood. Metrics: accuracy, precision, recall, ROC data points. |
| 49 | +- k-Nearest Neighbors (k-NN): Euclidean distance matrix, vote aggregation, weighted k versions. |
| 50 | +- Decision Tree (tiny): Manual split search with impurity reduction and depth control. |
| 51 | +- Principal Component Analysis (PCA): Center data, compute covariance, and derive principal components and explained variance. |
| 52 | +- Cross-Validation: k-fold, stratified split for classification demos, and metric aggregation. |
| 53 | + |
| 54 | +How the sheets show the work |
| 55 | +- Inputs live at the top of each sheet. |
| 56 | +- Calculations sit in a dedicated block with named ranges. |
| 57 | +- Outputs and charts sit in a right-hand column for quick review. |
| 58 | +- Key steps use simple formulas so any user can trace a number from input to output. |
| 59 | +- Comments and short cell notes explain the formula intent. |
| 60 | + |
| 61 | +Sample workflows |
| 62 | +- Fit a linear model |
| 63 | + 1. Open the workbook and go to the Linear Regression sheet. |
| 64 | + 2. Review the Inputs block and change the design matrix if needed. |
| 65 | + 3. Watch the matrix algebra section compute coefficients. |
| 66 | + 4. Check residual diagnostics and scatter plots. |
| 67 | +- Train and test a classifier |
| 68 | + 1. Use the Cross-Validation sheet to set k (folds). |
| 69 | + 2. The sheet splits data using index math and validation formulas. |
| 70 | + 3. Go to Logistic Regression or k-NN sheet and view metrics per fold. |
| 71 | + 4. Inspect aggregated metrics on the Model Comparison sheet. |
| 72 | +- Run PCA |
| 73 | + 1. Center and scale features in Preprocess. |
| 74 | + 2. Open PCA sheet to see covariance and component scores. |
| 75 | + 3. Use the Visuals sheet to plot explained variance. |
| 76 | + |
| 77 | +Key formulas and Excel features used |
| 78 | +- INDEX, MATCH, OFFSET, INDIRECT for structured references. |
| 79 | +- MMULT, MINVERSE, TRANSPOSE for matrix math. |
| 80 | +- SUMPRODUCT for dot products and weighted sums. |
| 81 | +- IF, COUNTIFS, SUMIFS for logic and grouped aggregates. |
| 82 | +- conditional formatting for residuals and outlier flags. |
| 83 | +- chart types: scatter, line, bar for metrics and diagnostics. |
| 84 | +- data validation lists for toggles between model variants. |
| 85 | + |
| 86 | +Teaching tips |
| 87 | +- Freeze panes on calculation blocks so learners track formula flow. |
| 88 | +- Use cell colors for input, calc, and output to set expectations. |
| 89 | +- Step through iterations (logistic Newton steps) by copying intermediate columns. |
| 90 | +- Use the Tests sheet to run quick checks after changes. |
| 91 | +- Replace sample data with your dataset. The workbook uses named ranges for easy swap. |
| 92 | + |
| 93 | +Performance notes |
| 94 | +- The workbook handles small to medium datasets that fit in one sheet. |
| 95 | +- Large datasets may slow spreadsheet calculations. Use smaller samples for teaching. |
| 96 | +- Matrix inversion via MINVERSE can show numerical issues; the sheet has a small demo of conditioning. |
| 97 | + |
| 98 | +Examples and visuals |
| 99 | +- Residual plot: shows predicted vs actual and highlights heteroskedastic patterns. |
| 100 | +- ROC pseudo-curve: sorted thresholds and TPR/FPR computed in-sheet. |
| 101 | +- PCA scree plot: bar chart of explained variance per component. |
| 102 | +- Decision split table: shows candidate thresholds with impurity metrics. |
| 103 | + |
| 104 | +Contributing |
| 105 | +- Pull requests welcome. Use clear issue descriptions and attach small test data. |
| 106 | +- Suggested contributions: |
| 107 | + - Add new model sheet (SVM, Lasso) using cell formulas. |
| 108 | + - Improve numeric stability for matrix math. |
| 109 | + - Add workbook macros for automation (kept separate from core formulas). |
| 110 | + - Improve visuals or add tutorial steps in separate sheets. |
| 111 | + |
| 112 | +Releases and download |
| 113 | +- The workbook is packaged in Releases. Download the .xlsx from the releases page and open it in Excel to run the demos. |
| 114 | +- Releases page: https://github.com/zhaa-kun/analytical-models-in-excel/releases |
| 115 | +- If the link does not work in your environment, check the Releases section on the repository page. |
| 116 | + |
| 117 | +License |
| 118 | +- MIT License. See LICENSE file in the repo for terms. |
| 119 | + |
| 120 | +Contact |
| 121 | +- File issues on GitHub for bugs or feature requests. |
| 122 | +- Use pull requests for changes to workbook structure or new model sheets. |
| 123 | + |
| 124 | +Examples of classroom exercises |
| 125 | +- Exercise 1 — OLS step trace |
| 126 | + - Goal: Reproduce coefficient for a single predictor using cell-by-cell math. |
| 127 | + - Task: Follow matrix build, compute X'X and invert, obtain beta. |
| 128 | + - Deliverable: A screenshot of cell ranges with matching coefficient values. |
| 129 | +- Exercise 2 — Evaluate k-NN sensitivity |
| 130 | + - Goal: Compare accuracy across k = 1,3,5. |
| 131 | + - Task: Use the k-NN sheet and the cross-validation split to record fold metrics. |
| 132 | + - Deliverable: Table and chart of accuracy vs k. |
| 133 | +- Exercise 3 — Bias-variance sketch |
| 134 | + - Goal: Visualize training vs validation error as model complexity changes. |
| 135 | + - Task: Use Decision Tree sheet, vary depth, record RMSE or error rates. |
| 136 | + - Deliverable: Line chart of errors across depths. |
| 137 | + |
| 138 | +Advanced notes for readers who want depth |
| 139 | +- Numerical conditioning: The Linear Regression sheet has an example that shows how near-collinearity affects MINVERSE results. It uses a small ridge stabilizer formula to show shrinkage. |
| 140 | +- Logistic convergence: The sheet logs Newton-Raphson steps and shows change in log-likelihood. This helps teach convergence and step damping. |
| 141 | +- PCA eigen approximation: Full eigen decomposition is hard in raw Excel. The sheet uses power-iteration style approximations and orthogonal deflation for component extraction. |
| 142 | + |
| 143 | +Screenshots |
| 144 | +- Add your own screenshots to issues or PRs. The workbook includes a Visuals sheet with sample charts that you can capture. |
| 145 | + |
| 146 | +Repository topics (for search) |
| 147 | +analytics, cross-validation, data-analysis, data-analytics, data-science-portfolio, data-visualization, decision-trees, excel, excel-models, knn, linear-regression, logistic-regression, machine-learning, manual-calculations, no-code-machine-learning, pca, predictive-modeling, spreadsheet-models, statistical-analysis |
| 148 | + |
| 149 | +Credits |
| 150 | +- Built to show in-sheet model mechanics and clear workbook design. |
| 151 | +- Inspired by classroom needs for transparent model math. |
| 152 | + |
| 153 | +Legal and data hygiene |
| 154 | +- Remove sensitive data before uploading new workbook versions. |
| 155 | +- Use sample data sheets for public demonstrations. |
| 156 | + |
| 157 | +Quick start checklist |
| 158 | +- Download the release from the Releases page and open the .xlsx file in Excel. |
| 159 | +- Read the README sheet inside the workbook to find the demo you want. |
| 160 | +- Run tests on the Tests sheet to validate formulas. |
| 161 | +- Modify inputs and watch outputs update. |
| 162 | + |
| 163 | +Additional resources |
| 164 | +- Link to Excel function docs and matrix formula examples: use official Microsoft documentation for advanced formula behavior. |
| 165 | +- For numeric stability and linear algebra background, refer to standard statistics and linear algebra texts. |
| 166 | + |
| 167 | +Thank you for exploring the workbook. |
0 commit comments