This project serves to conduct various forms of outlier detection on streaming metrics, using Apache Kafka and Spark as the predominant methods of metric storage and analysis.
pip3 install kafka-python
pip3 install pyspark
pip3 install matplotlib
pip3 install pandas
pip3 install scikit-learn
pip3 install numpy
pip3 install psutil
git clone https://github.com/SiddharthRajaraman/streamingMetricsAnomolyDetection.git
kubectl apply -f kafkaConfig/zookeeper.yaml
kubectl apply -f kafkaConfig/kafkaBroker.yaml
kubectl port-forward <NAME OF KAFKA-BROKER POD> 9092
Kafka Producer, by default, sends local CPU metrics every .5 seconds
python3 producerFiles/producer.py
python3 consumerFiles/consumerDBSCAN.py
python3 consumerFiles/consumerKmeans.py
python3 consumerFiles/consumerQuartile.py