-
-
Notifications
You must be signed in to change notification settings - Fork 49.5k
Add Gaussian Mixture Model (GMM) Algorithm for Clustering #13637
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Click here to look at the relevant links ⬇️
🔗 Relevant Links
Repository:
Python:
Automated review generated by algorithms-keeper. If there's any problem regarding this review, please open an issue about it.
algorithms-keeper commands and options
algorithms-keeper actions can be triggered by commenting on this PR:
@algorithms-keeper reviewto trigger the checks for only added pull request files@algorithms-keeper review-allto trigger the checks for all the pull request files, including the modified files. As we cannot post review comments on lines not part of the diff, this command will post all the messages in one comment.NOTE: Commands are in beta and so this feature is restricted only to a member or owner of the organization.
| Gaussian Mixture Model implemented using the Expectation-Maximization algorithm. | ||
| """ | ||
|
|
||
| def __init__(self, n_components=2, max_iter=100, tol=1e-4, seed=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please provide return type hint for the function: __init__. If the function does not return a value, please provide the type hint as: def function() -> None:
Please provide type hint for the parameter: n_components
Please provide type hint for the parameter: max_iter
Please provide type hint for the parameter: tol
Please provide type hint for the parameter: seed
| self.covariances_ = None | ||
| self.log_likelihoods_ = [] | ||
|
|
||
| def _initialize_parameters(self, X): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As there is no test file in this pull request nor any test function or class in the file machine_learning/gaussian_mixture_model.py, please provide doctest for the function _initialize_parameters
Please provide return type hint for the function: _initialize_parameters. If the function does not return a value, please provide the type hint as: def function() -> None:
Please provide type hint for the parameter: X
Please provide descriptive name for the parameter: X
| ) | ||
| self.weights_ = np.ones(self.n_components) / self.n_components | ||
|
|
||
| def _e_step(self, X): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As there is no test file in this pull request nor any test function or class in the file machine_learning/gaussian_mixture_model.py, please provide doctest for the function _e_step
Please provide return type hint for the function: _e_step. If the function does not return a value, please provide the type hint as: def function() -> None:
Please provide type hint for the parameter: X
Please provide descriptive name for the parameter: X
| responsibilities /= responsibilities.sum(axis=1, keepdims=True) | ||
| return responsibilities | ||
|
|
||
| def _m_step(self, X, responsibilities): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As there is no test file in this pull request nor any test function or class in the file machine_learning/gaussian_mixture_model.py, please provide doctest for the function _m_step
Please provide return type hint for the function: _m_step. If the function does not return a value, please provide the type hint as: def function() -> None:
Please provide type hint for the parameter: X
Please provide descriptive name for the parameter: X
Please provide type hint for the parameter: responsibilities
| def _m_step(self, X, responsibilities): | ||
| """Update weights, means, and covariances""" | ||
| n_samples, n_features = X.shape | ||
| Nk = responsibilities.sum(axis=0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Variable and function names should follow the snake_case naming convention. Please update the following name accordingly: Nk
| # Add small regularization term for numerical stability | ||
| self.covariances_[k] += np.eye(n_features) * 1e-6 | ||
|
|
||
| def _compute_log_likelihood(self, X): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As there is no test file in this pull request nor any test function or class in the file machine_learning/gaussian_mixture_model.py, please provide doctest for the function _compute_log_likelihood
Please provide return type hint for the function: _compute_log_likelihood. If the function does not return a value, please provide the type hint as: def function() -> None:
Please provide type hint for the parameter: X
Please provide descriptive name for the parameter: X
| log_likelihood = np.sum(np.log(np.sum(total_pdf, axis=1) + 1e-12)) | ||
| return log_likelihood | ||
|
|
||
| def fit(self, X): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As there is no test file in this pull request nor any test function or class in the file machine_learning/gaussian_mixture_model.py, please provide doctest for the function fit
Please provide return type hint for the function: fit. If the function does not return a value, please provide the type hint as: def function() -> None:
Please provide type hint for the parameter: X
Please provide descriptive name for the parameter: X
|
|
||
| print(f"{TAG}Training complete. Final log-likelihood: {log_likelihood:.4f}") | ||
|
|
||
| def predict(self, X): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As there is no test file in this pull request nor any test function or class in the file machine_learning/gaussian_mixture_model.py, please provide doctest for the function predict
Please provide return type hint for the function: predict. If the function does not return a value, please provide the type hint as: def function() -> None:
Please provide type hint for the parameter: X
Please provide descriptive name for the parameter: X
| responsibilities = self._e_step(X) | ||
| return np.argmax(responsibilities, axis=1) | ||
|
|
||
| def plot_results(self, X): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As there is no test file in this pull request nor any test function or class in the file machine_learning/gaussian_mixture_model.py, please provide doctest for the function plot_results
Please provide return type hint for the function: plot_results. If the function does not return a value, please provide the type hint as: def function() -> None:
Please provide type hint for the parameter: X
Please provide descriptive name for the parameter: X
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Click here to look at the relevant links ⬇️
🔗 Relevant Links
Repository:
Python:
Automated review generated by algorithms-keeper. If there's any problem regarding this review, please open an issue about it.
algorithms-keeper commands and options
algorithms-keeper actions can be triggered by commenting on this PR:
@algorithms-keeper reviewto trigger the checks for only added pull request files@algorithms-keeper review-allto trigger the checks for all the pull request files, including the modified files. As we cannot post review comments on lines not part of the diff, this command will post all the messages in one comment.NOTE: Commands are in beta and so this feature is restricted only to a member or owner of the organization.
| warnings.filterwarnings("ignore") | ||
|
|
||
| TAG = "GAUSSIAN-MIXTURE/ " | ||
| FloatArray: TypeAlias = NDArray[np.float64] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Variable and function names should follow the snake_case naming convention. Please update the following name accordingly: FloatArray
Describe your change:
This pull request adds a new implementation of the Gaussian Mixture Model (GMM) algorithm under the
machine_learning/directory.The GMM is an unsupervised learning algorithm that represents data as a mixture of multiple Gaussian distributions.
It uses the Expectation–Maximization (EM) algorithm to iteratively estimate each component’s mean, covariance, and mixing probability.
Key Highlights:
References:
Journal of the Royal Statistical Society: Series B (Methodological), 39(1), 1–22.
Checklist: