Comprehensive Machine Learning Solutions for Customer Analytics, Network Optimization, Predictive Maintenance, and Business Intelligence.
It offers comprehensive machine learning solutions focused on customer analytics, network optimization, predictive maintenance, and business intelligence in the telecommunications sector. Users can clone the repository and run the application locally to engage with tools designed for anomaly detection in telecomunication data, covering data preparation, processing, model training, and visualization. The documentation provides detailed outlook on importing necessary libraries, setting up the working directory, loading datasets, and the entire workflow for identifying and visualizing anomalies. This project is a robust resource for telecom operators to enhance their network reliability and customer service through advanced analytics.
Clone the repo in your local machine
git clone https://github.com/RJ-Hossan/OptiPulse.git
Then select the folder and write following command in respective folder
streamlit run app.py
Telecom Anomaly Detection Documentation
This documentation explains each step of the anomaly detection process, covering data preparation, processing, model training, and visualization.
import os
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import IsolationForest- os: Used to navigate directories.
- pandas: For data manipulation and analysis.
- numpy: For numerical operations.
- matplotlib and seaborn: For data visualization.
- scikit-learn: For preprocessing and implementing anomaly detection algorithms.
base_dir = r"C:\Users\ASUS\Downloads\Telecom Anomaly Detection"
os.chdir(base_dir)- Changes the current working directory to the location where the data and scripts are stored.
- Ensures seamless loading and saving of files within the same directory.
data_file = 'telecom_anomaly.csv'
data = pd.read_csv(data_file)- Reads the dataset
telecom_anomaly.csvinto a DataFrame for further processing. - Assumes the file contains columns relevant to the problem domain, such as network parameters.
if data.isnull().sum().sum() > 0:
print("Filling missing values with column means...")
data.fillna(data.mean(), inplace=True)- Checks for missing values in the dataset.
- If missing values are found, fills them with the mean of their respective columns to maintain consistency.
- Prints a message for transparency.
features = ['Latency', 'Packet_Loss_Rate', 'Signal_Strength', 'Interference_Level', 'Energy_Efficiency']
X = data[features]- Specifies the features (columns) relevant for detecting anomalies.
- Extracts these features into a new DataFrame
Xfor preprocessing and model training.
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)- Standardizes the features to have zero mean and unit variance.
- Standardization ensures better performance for distance-based algorithms like Isolation Forest and One-Class SVM.
iso_forest = IsolationForest(contamination=0.1, random_state=42)
y_pred_iso = iso_forest.fit_predict(X_scaled)
data['Anomaly_Isolation_Forest'] = y_pred_iso- Isolation Forest:
- A tree-based algorithm that isolates anomalies by randomly splitting data.
contamination=0.1: Assumes 10% of the data are anomalies.random_state=42: Ensures reproducibility.
- Adds a new column
Anomaly_Isolation_Forestto the DataFrame with anomaly labels (1for normal,-1for anomaly).
output_file = 'telecom_anomaly_with_labels.csv'
data.to_csv(output_file, index=False)
print(f"Processed data with anomaly labels saved to {output_file}.")- Saves the updated DataFrame, including anomaly labels, to a new CSV file.
- Prints a confirmation message with the file name.
plt.figure(figsize=(10, 6))
sns.scatterplot(
x=data['Latency'],
y=data['Packet_Loss_Rate'],
hue=data['Anomaly_Isolation_Forest'],
palette={1: 'blue', -1: 'red'}
)
plt.title('Anomaly Detection using Isolation Forest')
plt.xlabel('Latency')
plt.ylabel('Packet Loss Rate')
plot_file = 'anomaly_detection_plot.png'
plt.savefig(plot_file)
print(f"Anomaly detection plot saved to {plot_file}.")
plt.show()- Creates a scatter plot of
Latencyvs.Packet Loss Rate, highlighting anomalies in red and normal points in blue. - Saves the plot as
anomaly_detection_plot.pngfor further analysis.
lof = LocalOutlierFactor(n_neighbors=20, contamination=0.1)
y_pred_lof = lof.fit_predict(X_scaled)
data['Anomaly_LOF'] = y_pred_lof- Local Outlier Factor:
- Identifies anomalies by measuring the local density deviation of data points.
n_neighbors=20: Considers 20 nearest neighbors.contamination=0.1: Assumes 10% of the data are anomalies.
- Adds a new column
Anomaly_LOFwith anomaly labels.
plt.figure(figsize=(10, 6))
sns.scatterplot(x=data['Latency'], y=data['Signal_Strength'], hue=data['Anomaly_LOF'], palette={1: 'blue', -1: 'red'})
plt.title('Anomaly Detection using LOF')
plt.show()- Visualizes anomalies detected by LOF using
LatencyandSignal Strengthas axes. - Highlights anomalies in red and normal points in blue.
svm = OneClassSVM(nu=0.1, kernel='rbf', gamma='scale')
y_pred_svm = svm.fit_predict(X_scaled)
data['Anomaly_SVM'] = y_pred_svm- One-Class SVM:
- A classification algorithm that separates normal data from outliers.
nu=0.1: Upper bound on the fraction of training errors (assumes 10% anomalies).kernel='rbf': Uses a radial basis function kernel.
- Adds a new column
Anomaly_SVMwith anomaly labels.
plt.figure(figsize=(10, 6))
sns.scatterplot(x=data['Energy_Efficiency'], y=data['Throughput'], hue=data['Anomaly_SVM'], palette={1: 'blue', -1: 'red'})
plt.title('Anomaly Detection using One-Class SVM')
plt.show()- Visualizes anomalies detected by One-Class SVM using
Energy EfficiencyandThroughputas axes.
sns.pairplot(data, hue="Anomaly_Isolation_Forest", vars=features, palette={1: 'blue', -1: 'red'})
plt.suptitle('Pair Plot for Isolation Forest Anomalies', y=1.02)
plt.show()
sns.pairplot(data, hue="Anomaly_LOF", vars=features, palette={1: 'blue', -1: 'red'})
plt.suptitle('Pair Plot for LOF Anomalies', y=1.02)
plt.show()
sns.pairplot(data, hue="Anomaly_SVM", vars=features, palette={1: 'blue', -1: 'red'})
plt.suptitle('Pair Plot for One-Class SVM Anomalies', y=1.02)
plt.show()-
Creates pair plots to compare the anomalies detected by each method across all features.
-
Highlights anomalies in red and normal points in blue for visual analysis of relationships.
This task focuses on performing regression analysis to quantify factors influencing customer satisfaction. By understanding what drives satisfaction and dissatisfaction, Robi Axiata can implement targeted strategies to enhance customer experiences, reduce churn, and drive business growth.
- Identify key factors affecting customer satisfaction.
- Uncover drivers of dissatisfaction to mitigate negative experiences.
- Enable data-driven decision-making for service improvement.
- Source: Network parameters from Robi Axiata's systems.
- Features: 35 features including service usage data, customer interactions, and feedback scores.
- Size: 402,547 data points.
-
Data Preprocessing:
- Handle missing values and outliers.
- Normalize or standardize numerical data to ensure model accuracy.
-
Feature Selection:
- Utilize correlation analysis to identify significant predictors of satisfaction.
-
Model Development:
- Linear Regression: Establishes a baseline for understanding linear relationships.
- Decision Tree: Captures non-linear patterns and interactions between variables.
- Random Forest: Enhances prediction accuracy through ensemble learning, reducing overfitting.
- Gradient Boosting: Focuses on correcting the errors of prior models, continuously improving performance.
-
Evaluation:
- Use MAE (Mean Absolute Error), MSE (Mean Squared Error), and R2-score to assess model performance.
- Visualize results to interpret model effectiveness in predicting customer satisfaction.
-
Implementation:
- Deploy the best-performing model to predict satisfaction levels for new customer data.
- Continuously update the model with new data to improve accuracy.
- Distribution of customer satisfaction indices.
- Pair plots of selected features against satisfaction scores.
- Performance metrics visualization on validation and test datasets.
Customer segmentation aims to categorize customers into distinct groups based on shared characteristics. This strategic approach helps in personalizing marketing efforts, optimizing resource allocation, and enhancing overall customer satisfaction.
- Divide customers into behaviorally and demographically similar groups.
- Tailor marketing and service approaches to meet the specific needs of each segment.
- Source: Network parameters and customer demographic data from Robi Axiata.
- Features: Usage patterns, payment history, service options, and customer demographics.
- Preprocessing Steps:
- Handling missing values.
- Encoding categorical features.
- Scaling data to prepare for clustering.
-
Exploratory Data Analysis:
- Understand the underlying structure and relationships within the data.
- Identify anomalies or distinctive patterns across customer behaviors.
-
Clustering:
- K-Means Clustering: Efficiently partitions customers into k distinct clusters based on feature similarity.
- Silhouette Analysis: Validates the appropriateness of the number of clusters.
- PCA (Principal Component Analysis): Reduces dimensionality while preserving the most important variance features.
-
Evaluation:
- Assess clusters' quality and distinctiveness using silhouette scores.
- Perform cluster profiling to understand the characteristics of each group.
-
Implementation:
- Apply clustering insights to customize marketing strategies and service offerings.
- Monitor and adjust the segmentation strategy based on customer feedback and changing market conditions.
- Distribution charts for each cluster.
- Correlation heatmaps to understand feature interdependencies.
- Silhouette analysis charts to evaluate clustering performance.
These documentation sections for Tasks 1,2 and 3 provide a framework for executing and assessing customer satisfaction analysis and segmentation strategies, respectively, using data-driven approaches at Robi Axiata.



















