Skip to content

Commit 4ccea17

Browse files
committed
Update classification, clustering, and loss functions documentation by removing unnecessary emoji and adding mathematical formulas for accuracy, precision, recall, F1-score, specificity, and logistic regression loss. Enhance clarity and completeness of content.
1 parent a0312db commit 4ccea17

File tree

3 files changed

+26
-6
lines changed

3 files changed

+26
-6
lines changed

docs/machine_learning/classification.md

Lines changed: 22 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
!!! note
2-
This page is still not complete and new sections might get added later. That said, the existing content is ready to be consumed. 🍔 :wink:
2+
This page is still not complete and new sections might get added later. That said, the existing content is ready to be consumed. 🍔
33

44
## Introduction
55

@@ -16,14 +16,27 @@ Classification metrics are used to evaluate the performance of a classification
1616

1717
1. **Accuracy**: Accuracy is the most basic classification metric, measuring the ratio of correctly predicted instances to the total number of instances. It provides an overall measure of the model's correctness. However, it may not be suitable for imbalanced datasets, where one class significantly outnumbers the others.
1818

19+
$${\displaystyle \mathrm {Accuracy} ={\frac {TP+TN}{TP+TN+FP+FN}}}$$
20+
1921
2. **Precision**: Precision is the ratio of true positive predictions to the total number of positive predictions made by the model. High precision indicates that when the model predicts a positive class, it is likely to be correct.
2022

23+
$${\displaystyle \mathrm {Precision} ={\frac {TP}{TP+FP}}}$$
24+
2125
3. **Recall (Sensitivity or True Positive Rate)**: Recall is the ratio of true positive predictions to the total number of actual positive instances in the dataset. It measures the model's ability to capture all positive instances. High recall means that the model can find most of the positive cases.
2226

23-
4. **F1-Score**: The F1-Score is the harmonic mean of precision and recall. It balances both metrics and is particularly useful when you need to consider the trade-off between precision and recall. It's a good overall measure of a model's performance. Please be aware of `average` params in the [Sklearn implementation](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.f1_score.html). Set the param to `macro` in case of imbalanced dataset, as it will compute the score for each class and then perform unweighted average i.e. giving each class equal importance, no matter their frequency. Setting it to `weighted` is similar to `macro`, but now the average will be weighted. Setting to `micro` will lead to computing the numbers for complete data without considering any class.
27+
$${\displaystyle \mathrm {Recall} ={\frac {TP}{TP+FN}}}$$
28+
29+
4. **F1-Score**: The F1-Score is the harmonic mean of precision and recall. It balances both metrics and is particularly useful when you need to consider the trade-off between precision and recall. It's a good overall measure of a model's performance.
30+
31+
!!! Note
32+
Please be aware of `average` params in the [Sklearn implementation](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.f1_score.html). Set the param to `macro` in case of imbalanced dataset, as it will compute the score for each class and then perform unweighted average i.e. giving each class equal importance, no matter their frequency. Setting it to `weighted` is similar to `macro`, but now the average will be weighted. Setting to `micro` will lead to computing the numbers for complete data without considering any class.
33+
34+
$${\displaystyle \mathrm {F1}_{score} ={\frac {2}{\frac {1}{\mathrm {Precision}}+\frac {1}{\mathrm {Recall}}}}}$$
2435

2536
5. **Specificity (True Negative Rate)**: Specificity measures the model's ability to correctly identify negative instances. It is the ratio of true negative predictions to the total number of actual negative instances. It is particularly relevant when false negatives are costly.
2637

38+
$${\displaystyle \mathrm {Specificity} ={\frac {TN}{TN+FP}}}$$
39+
2740
6. **ROC Curve and AUC**: The Receiver Operating Characteristic (ROC) curve is a graphical representation of the model's performance across different thresholds. The Area Under the ROC Curve (AUC) quantifies the overall performance of the model, with a higher AUC indicating better discrimination between classes.
2841

2942
7. **Confusion Matrix**: A confusion matrix is a table that summarizes the model's predictions compared to the actual labels, breaking down true positives, true negatives, false positives, and false negatives. It provides detailed insights into the model's performance.
@@ -50,6 +63,13 @@ While there are many classification algorithms, here are some of the most common
5063

5164
- Logistic Regression is a widely used classification model that is particularly effective for binary classification problems. It works by modeling the relationship between the input features and the probability of belonging to a particular class. It does this by fitting a logistic curve to the data, which allows it to output probabilities that an instance belongs to a specific class. [Logistic Regression is a linear model](interview_questions.md#even-though-sigmoid-function-is-non-linear-why-is-logistic-regression-considered-a-linear-classifier), which means it assumes a linear relationship between the input features and the log-odds of the class probabilities. It's simple, interpretable, and computationally efficient, making it a good choice for problems with a large number of features.
5265

66+
- The formula for Logistic Regression is shown below,
67+
68+
$${\displaystyle \mathrm {LogisticRegression_loss}(i) = -(y_i \log(\hat{y_i})+(1-y_i) \log(1-\hat{y_i}))}$$
69+
70+
where, $y_i$ is the actual class and $\hat{y_i}$ is the predicted class
71+
72+
5373
### Decision Tree
5474

5575
- A Decision Tree is a versatile and interpretable machine learning model used for both classification and regression tasks. It is a tree-like structure where each internal node represents a feature, each branch represents a decision rule based on that feature, and each leaf node represents the predicted outcome or value. Decision Trees are particularly well-suited for tasks where the decision-making process can be represented as a series of logical if-then-else conditions.

docs/machine_learning/clustering.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
!!! note
2-
This page is still not complete and new sections might get added later. That said, the existing content is ready to be consumed. 🍔 :wink:
2+
This page is still not complete and new sections might get added later. That said, the existing content is ready to be consumed. 🍔
33

44
## Introduction
55

docs/machine_learning/loss_functions.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
!!! note
2-
This page is still not complete and new sections might get added later. That said, the existing content is ready to be consumed. 🍔 :wink:
2+
This page is still not complete and new sections might get added later. That said, the existing content is ready to be consumed. 🍔
33

44
## Introduction
55

@@ -36,13 +36,13 @@ $${\displaystyle \operatorname {MSE_cost} ={\frac {1}{n}}\sum _{i=1}^{n}\operato
3636

3737
### Cross entropy loss
3838

39-
- Cross entropy loss is used for classification tasks. It is a simplication of Kullback–Leibler divergence that is used to compute the difference between two probability distributions *(here the model's prediction and true one)*. For binary classification the formula is shown below, ($y$ is the actual class and $\hat{y}$ is the predicted class)
39+
- Cross entropy loss is used for classification tasks. It is a simplification of Kullback–Leibler divergence that is used to compute the difference between two probability distributions *(here the model's prediction and true one)*. For binary classification the formula is shown below, ($y$ is the actual class and $\hat{y}$ is the predicted class)
4040

4141
$${\displaystyle \operatorname {CrossEntropy_loss}(i) = -(y_i \log(\hat{y_i})+(1-y_i) \log(1-\hat{y_i}))}$$
4242

4343
$${\displaystyle \operatorname {CrossEntropy_cost} ={\frac {1}{n}}\sum _{i=1}^{n}\operatorname {CrossEntropy_loss}(i)}$$
4444

45-
- Let's go through the different possibilities,
45+
- For binary classification, $y_i$ can be either 0 or 1. Let's go through the different possibilities,
4646
- if $y_i=1$,
4747
- the loss function reduces to only the left part i.e. $-y_i \log(\hat{y_i})$
4848
- now to have a small loss, model would want the $\log(\hat{y_i})$ to be large *(bcoz of negative sign)*.

0 commit comments

Comments
 (0)