An interactive Machine Learning web application built with Streamlit that predicts whether a mushroom is edible or poisonous using multiple classification algorithms.
The app allows users to:
- Select different machine learning classifiers
- Adjust model hyperparameters
- Train models interactively
- Evaluate performance using multiple visual metrics
The application demonstrates how machine learning models can be deployed through a simple and intuitive web interface without requiring users to write any code.
This project implements a binary classification system using the Mushroom dataset. The application trains different machine learning models and provides an interface where users can experiment with various algorithms and hyperparameters.
The workflow includes:
- Loading and preprocessing the dataset
- Encoding categorical features
- Splitting data into training and testing sets
- Training different classification models
- Evaluating models using multiple performance metrics
- Visualizing model performance
All interactions are handled through a Streamlit UI sidebar, allowing users to control model behavior dynamically.
Users can choose from three different machine learning algorithms:
- Support Vector Machine (SVM)
- Logistic Regression
- Random Forest
The app allows users to tune important parameters for each algorithm.
SVM Parameters:
C– Regularization parameterKernel– Linear or RBF kernelGamma– Kernel coefficient
Logistic Regression Parameters:
C– Regularization strengthmax_iter– Maximum number of training iterations
Random Forest Parameters:
n_estimators– Number of treesmax_depth– Maximum depth of treesbootstrap– Bootstrap sampling toggle
Users can visualize different evaluation metrics after training a model:
- Confusion Matrix
- ROC Curve
- Precision-Recall Curve
The app also displays:
- Accuracy
- Precision
- Recall
Users can optionally view the raw encoded dataset directly inside the application.
- Select a classifier from the sidebar.
- Adjust the hyperparameters.
- Select the evaluation metrics to display.
- Click Classify to train the model.
- View performance metrics and plots.
This project uses the Mushroom Dataset from the UCI Machine Learning Repository.
Dataset Source: https://archive.ics.uci.edu/dataset/73/mushroom
The dataset contains descriptions of hypothetical samples corresponding to 23 species of gilled mushrooms.
Each sample is labeled as:
- Edible
- Poisonous
The dataset contains categorical features describing mushroom characteristics such as cap shape, odor, gill size, habitat, and spore print color.
Before training the models, the dataset undergoes preprocessing steps:
The dataset is loaded using Pandas from the CSV file.
pd.read_csv('dataset/mushrooms.csv')
Since all features are categorical, they are converted into numeric values using LabelEncoder. Each column is encoded separately.
The dataset is split into:
- 70% training data
- 30% testing data
Using:
train_test_split(test_size=0.3, random_state=0)
This ensures reproducibility.
The app supports three classifiers implemented with Scikit-learn.
A powerful supervised learning algorithm used for classification tasks. The implementation allows configuration of:
- Regularization parameter (
C) - Kernel type (
linear,rbf) - Kernel coefficient (
gamma)
Model used: sklearn.svm.SVC
A linear model commonly used for binary classification. Configurable parameters:
- Regularization strength (
C) - Maximum training iterations (
max_iter)
Model used: sklearn.linear_model.LogisticRegression
An ensemble learning method that builds multiple decision trees and combines their predictions. Configurable parameters:
- Number of trees (
n_estimators) - Maximum tree depth (
max_depth) - Bootstrap sampling (
bootstrap)
Model used: sklearn.ensemble.RandomForestClassifier
The application evaluates model performance using several metrics.
Measures the percentage of correctly classified samples.
Measures how many predicted positives are actually positive.
Measures how many actual positives are correctly predicted.
Displays counts of: True Positives (TP), False Positives (FP), True Negatives (TN), and False Negatives (FN).
Implemented using: ConfusionMatrixDisplay
The Receiver Operating Characteristic curve shows the trade-off between:
- True Positive Rate
- False Positive Rate
Implemented using: RocCurveDisplay
Visualizes the balance between precision and recall.
Implemented using: PrecisionRecallDisplay
The app UI is divided into two main sections.
Displays:
- Application title
- Model results
- Evaluation plots
- Dataset preview
Controls the application:
- Classifier selection
- Hyperparameter tuning
- Metric selection
- Run model button
- Show dataset toggle
Binary-Classification-Web-App
├── app.py
├── dataset
│ └── mushrooms.csv
├── requirements.txt
└── README.md
Python, Streamlit, Scikit-learn, Pandas, NumPy, Matplotlib
1. Clone the repository.
git clone https://github.com/Avik-Das-567/Binary-Classification-Web-App.git
2. Navigate to the project directory.
cd Binary-Classification-Web-App
3. Install dependencies.
pip install -r requirements.txt
4. Run the Streamlit application.
streamlit run app.py
5. Open in browser.
Streamlit will automatically launch the app at:
http://localhost:8501
To improve performance, the app caches dataset loading and preprocessing using Streamlit’s caching decorator.
@st.cache_data(persist=True)
This prevents reloading and reprocessing the dataset every time the UI updates.
Models are trained only when the user clicks the "Classify" button, allowing users to experiment with parameters without retraining unnecessarily.
Users can select which evaluation metrics to display. The application dynamically generates the corresponding plots.