You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
# Learn: Gini Impurity and Best Split in Decision Trees
2
+
3
+
## Overview
4
+
5
+
A core concept in Decision Trees (and by extension, Random Forests) is how the model chooses where to split the data at each node. One popular criterion used for splitting is **Gini Impurity**.
6
+
7
+
In this task, you will implement:
8
+
- Gini impurity computation
9
+
- Finding the best feature and threshold to split on based on impurity reduction
10
+
11
+
This helps build the foundation for how trees grow in a Random Forest.
12
+
13
+
---
14
+
15
+
## Gini Impurity
16
+
17
+
For a set of samples with class labels \( y \), the Gini Impurity is defined as:
18
+
19
+
$$
20
+
G(y) = 1 - \sum_{i=1}^{k} p_i^2
21
+
$$
22
+
23
+
Where \( p_i \) is the proportion of samples belonging to class \( i \).
24
+
25
+
A pure node (all one class) has \( G = 0 \), and higher values indicate more class diversity.
26
+
27
+
---
28
+
29
+
## Gini Gain for a Split
30
+
31
+
Given a feature and a threshold to split the dataset into left and right subsets:
0 commit comments