OptiPulse

Comprehensive Machine Learning Solutions for Customer Analytics, Network Optimization, Predictive Maintenance, and Business Intelligence.

It offers comprehensive machine learning solutions focused on customer analytics, network optimization, predictive maintenance, and business intelligence in the telecommunications sector. Users can clone the repository and run the application locally to engage with tools designed for anomaly detection in telecomunication data, covering data preparation, processing, model training, and visualization. The documentation provides detailed outlook on importing necessary libraries, setting up the working directory, loading datasets, and the entire workflow for identifying and visualizing anomalies. This project is a robust resource for telecom operators to enhance their network reliability and customer service through advanced analytics.

Clone the repo in your local machine git clone https://github.com/RJ-Hossan/OptiPulse.git

Then select the folder and write following command in respective folder

streamlit run app.py

Telecom Anomaly Detection Documentation

This documentation explains each step of the anomaly detection process, covering data preparation, processing, model training, and visualization.

1. Importing Necessary Libraries

import os
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import IsolationForest

os: Used to navigate directories.
pandas: For data manipulation and analysis.
numpy: For numerical operations.
matplotlib and seaborn: For data visualization.
scikit-learn: For preprocessing and implementing anomaly detection algorithms.

2. Setting the Working Directory

base_dir = r"C:\Users\ASUS\Downloads\Telecom Anomaly Detection"
os.chdir(base_dir)

Changes the current working directory to the location where the data and scripts are stored.
Ensures seamless loading and saving of files within the same directory.

3. Loading the Dataset

data_file = 'telecom_anomaly.csv'
data = pd.read_csv(data_file)

Reads the dataset telecom_anomaly.csv into a DataFrame for further processing.
Assumes the file contains columns relevant to the problem domain, such as network parameters.

4. Handling Missing Values

if data.isnull().sum().sum() > 0:
    print("Filling missing values with column means...")
    data.fillna(data.mean(), inplace=True)

Checks for missing values in the dataset.
If missing values are found, fills them with the mean of their respective columns to maintain consistency.
Prints a message for transparency.

5. Selecting Features for Anomaly Detection

features = ['Latency', 'Packet_Loss_Rate', 'Signal_Strength', 'Interference_Level', 'Energy_Efficiency']
X = data[features]

Specifies the features (columns) relevant for detecting anomalies.
Extracts these features into a new DataFrame X for preprocessing and model training.

6. Standardizing the Features

scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

Standardizes the features to have zero mean and unit variance.
Standardization ensures better performance for distance-based algorithms like Isolation Forest and One-Class SVM.

7. Anomaly Detection Using Isolation Forest

iso_forest = IsolationForest(contamination=0.1, random_state=42)
y_pred_iso = iso_forest.fit_predict(X_scaled)
data['Anomaly_Isolation_Forest'] = y_pred_iso

Isolation Forest:
- A tree-based algorithm that isolates anomalies by randomly splitting data.
- contamination=0.1: Assumes 10% of the data are anomalies.
- random_state=42: Ensures reproducibility.
Adds a new column Anomaly_Isolation_Forest to the DataFrame with anomaly labels (1 for normal, -1 for anomaly).

8. Saving the Results

output_file = 'telecom_anomaly_with_labels.csv'
data.to_csv(output_file, index=False)
print(f"Processed data with anomaly labels saved to {output_file}.")

Saves the updated DataFrame, including anomaly labels, to a new CSV file.
Prints a confirmation message with the file name.

9. Visualization for Isolation Forest

plt.figure(figsize=(10, 6))
sns.scatterplot(
    x=data['Latency'],
    y=data['Packet_Loss_Rate'],
    hue=data['Anomaly_Isolation_Forest'],
    palette={1: 'blue', -1: 'red'}
)
plt.title('Anomaly Detection using Isolation Forest')
plt.xlabel('Latency')
plt.ylabel('Packet Loss Rate')
plot_file = 'anomaly_detection_plot.png'
plt.savefig(plot_file)
print(f"Anomaly detection plot saved to {plot_file}.")
plt.show()

Creates a scatter plot of Latency vs. Packet Loss Rate, highlighting anomalies in red and normal points in blue.
Saves the plot as anomaly_detection_plot.png for further analysis.

10. Anomaly Detection Using Local Outlier Factor (LOF)

lof = LocalOutlierFactor(n_neighbors=20, contamination=0.1)
y_pred_lof = lof.fit_predict(X_scaled)
data['Anomaly_LOF'] = y_pred_lof

Local Outlier Factor:
- Identifies anomalies by measuring the local density deviation of data points.
- n_neighbors=20: Considers 20 nearest neighbors.
- contamination=0.1: Assumes 10% of the data are anomalies.
Adds a new column Anomaly_LOF with anomaly labels.

11. Visualization for LOF

plt.figure(figsize=(10, 6))
sns.scatterplot(x=data['Latency'], y=data['Signal_Strength'], hue=data['Anomaly_LOF'], palette={1: 'blue', -1: 'red'})
plt.title('Anomaly Detection using LOF')
plt.show()

Visualizes anomalies detected by LOF using Latency and Signal Strength as axes.
Highlights anomalies in red and normal points in blue.

12. Anomaly Detection Using One-Class SVM

svm = OneClassSVM(nu=0.1, kernel='rbf', gamma='scale')
y_pred_svm = svm.fit_predict(X_scaled)
data['Anomaly_SVM'] = y_pred_svm

One-Class SVM:
- A classification algorithm that separates normal data from outliers.
- nu=0.1: Upper bound on the fraction of training errors (assumes 10% anomalies).
- kernel='rbf': Uses a radial basis function kernel.
Adds a new column Anomaly_SVM with anomaly labels.

13. Visualization for One-Class SVM

plt.figure(figsize=(10, 6))
sns.scatterplot(x=data['Energy_Efficiency'], y=data['Throughput'], hue=data['Anomaly_SVM'], palette={1: 'blue', -1: 'red'})
plt.title('Anomaly Detection using One-Class SVM')
plt.show()

Visualizes anomalies detected by One-Class SVM using Energy Efficiency and Throughput as axes.

14. Pair Plots for Comparison

sns.pairplot(data, hue="Anomaly_Isolation_Forest", vars=features, palette={1: 'blue', -1: 'red'})
plt.suptitle('Pair Plot for Isolation Forest Anomalies', y=1.02)
plt.show()

sns.pairplot(data, hue="Anomaly_LOF", vars=features, palette={1: 'blue', -1: 'red'})
plt.suptitle('Pair Plot for LOF Anomalies', y=1.02)
plt.show()

sns.pairplot(data, hue="Anomaly_SVM", vars=features, palette={1: 'blue', -1: 'red'})
plt.suptitle('Pair Plot for One-Class SVM Anomalies', y=1.02)
plt.show()

Creates pair plots to compare the anomalies detected by each method across all features.
Highlights anomalies in red and normal points in blue for visual analysis of relationships.

Task 2: Customer Satisfaction Analysis Documentation

Overview

This task focuses on performing regression analysis to quantify factors influencing customer satisfaction. By understanding what drives satisfaction and dissatisfaction, Robi Axiata can implement targeted strategies to enhance customer experiences, reduce churn, and drive business growth.

Objectives

Identify key factors affecting customer satisfaction.
Uncover drivers of dissatisfaction to mitigate negative experiences.
Enable data-driven decision-making for service improvement.

Dataset

Source: Network parameters from Robi Axiata's systems.
Features: 35 features including service usage data, customer interactions, and feedback scores.
Size: 402,547 data points.

Methodology

Data Preprocessing:
- Handle missing values and outliers.
- Normalize or standardize numerical data to ensure model accuracy.
Feature Selection:
- Utilize correlation analysis to identify significant predictors of satisfaction.
Model Development:
- Linear Regression: Establishes a baseline for understanding linear relationships.
- Decision Tree: Captures non-linear patterns and interactions between variables.
- Random Forest: Enhances prediction accuracy through ensemble learning, reducing overfitting.
- Gradient Boosting: Focuses on correcting the errors of prior models, continuously improving performance.
Evaluation:
- Use MAE (Mean Absolute Error), MSE (Mean Squared Error), and R2-score to assess model performance.
- Visualize results to interpret model effectiveness in predicting customer satisfaction.
Implementation:
- Deploy the best-performing model to predict satisfaction levels for new customer data.
- Continuously update the model with new data to improve accuracy.

Visualization

Distribution of customer satisfaction indices.
Pair plots of selected features against satisfaction scores.
Performance metrics visualization on validation and test datasets.

Task 3: Customer Segmentation Documentation

Overview

Customer segmentation aims to categorize customers into distinct groups based on shared characteristics. This strategic approach helps in personalizing marketing efforts, optimizing resource allocation, and enhancing overall customer satisfaction.

Objectives

Divide customers into behaviorally and demographically similar groups.
Tailor marketing and service approaches to meet the specific needs of each segment.

Dataset

Source: Network parameters and customer demographic data from Robi Axiata.
Features: Usage patterns, payment history, service options, and customer demographics.
Preprocessing Steps:
- Handling missing values.
- Encoding categorical features.
- Scaling data to prepare for clustering.

Methodology

Exploratory Data Analysis:
- Understand the underlying structure and relationships within the data.
- Identify anomalies or distinctive patterns across customer behaviors.
Clustering:
- K-Means Clustering: Efficiently partitions customers into k distinct clusters based on feature similarity.
- Silhouette Analysis: Validates the appropriateness of the number of clusters.
- PCA (Principal Component Analysis): Reduces dimensionality while preserving the most important variance features.
Evaluation:
- Assess clusters' quality and distinctiveness using silhouette scores.
- Perform cluster profiling to understand the characteristics of each group.
Implementation:
- Apply clustering insights to customize marketing strategies and service offerings.
- Monitor and adjust the segmentation strategy based on customer feedback and changing market conditions.

Visualization

Distribution charts for each cluster.
Correlation heatmaps to understand feature interdependencies.
Silhouette analysis charts to evaluate clustering performance.

These documentation sections for Tasks 1,2 and 3 provide a framework for executing and assessing customer satisfaction analysis and segmentation strategies, respectively, using data-driven approaches at Robi Axiata.

Name		Name	Last commit message	Last commit date
Latest commit History 53 Commits
.devcontainer		.devcontainer
assets/figures		assets/figures
frontend		frontend
models		models
notebooks		notebooks
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
app.py		app.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OptiPulse

1. Importing Necessary Libraries

2. Setting the Working Directory

3. Loading the Dataset

4. Handling Missing Values

5. Selecting Features for Anomaly Detection

6. Standardizing the Features

7. Anomaly Detection Using Isolation Forest

8. Saving the Results

9. Visualization for Isolation Forest

10. Anomaly Detection Using Local Outlier Factor (LOF)

11. Visualization for LOF

12. Anomaly Detection Using One-Class SVM

13. Visualization for One-Class SVM

14. Pair Plots for Comparison

Task 2: Customer Satisfaction Analysis Documentation

Overview

Objectives

Dataset

Methodology

Visualization

Task 3: Customer Segmentation Documentation

Overview

Objectives

Dataset

Methodology

Visualization

About

Uh oh!

Releases

Packages

Contributors 5

Uh oh!

Languages

License

RJ-Hossan/OptiPulse

Folders and files

Latest commit

History

Repository files navigation

OptiPulse

1. Importing Necessary Libraries

2. Setting the Working Directory

3. Loading the Dataset

4. Handling Missing Values

5. Selecting Features for Anomaly Detection

6. Standardizing the Features

7. Anomaly Detection Using Isolation Forest

8. Saving the Results

9. Visualization for Isolation Forest

10. Anomaly Detection Using Local Outlier Factor (LOF)

11. Visualization for LOF

12. Anomaly Detection Using One-Class SVM

13. Visualization for One-Class SVM

14. Pair Plots for Comparison

Task 2: Customer Satisfaction Analysis Documentation

Overview

Objectives

Dataset

Methodology

Visualization

Task 3: Customer Segmentation Documentation

Overview

Objectives

Dataset

Methodology

Visualization

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 5

Uh oh!

Languages

Packages