DeFi Sentinel - Comprehensive Fraud Detection System for Decentralized Exchanges

Project Overview

DeFi Sentinel is an advanced cryptocurrency fraud detection system that employs distributed computing and sophisticated analytics to identify suspicious activities in decentralized exchanges (DEX). The project implements a multi-layered approach combining rule-based detection, machine learning, and graph-based network analysis to provide robust protection against evolving fraud techniques in the cryptocurrency space.

Key Objectives Achieved

Developed three complementary pipelines for comprehensive cryptocurrency fraud detection
Successfully implemented a modular architecture enabling detection at multiple levels of sophistication
Delivered an end-to-end system deployed on Google Cloud Platform to ensure scalability and real-time processing
Created an integrated data warehouse for unified analysis and visualization of detected anomalies

Pipeline Architecture

The system consists of three specialized, complementary detection pipelines:

1. Rule-Based Batch Processing

This pipeline applies established rules to identify known fraud patterns in historical blockchain data:

Flash Trade Detection: Identifies rapid buying and selling of tokens to manipulate prices
Same Token Swap Analysis: Detects circular trades that create artificial volume
Rapid Transaction Sequencing: Spots unusual patterns of high-frequency trades that may indicate market manipulation
Implemented using: Apache Spark on Google Dataproc for distributed batch processing of blockchain data
Trigger mechanism: Cloud Functions automatically initiate processing on schedule or when new data arrives
Significance: Provides baseline detection of established fraud patterns with minimal false positives

2. ML-Based Stream Processing

This pipeline applies machine learning for real-time fraud detection of anomalies:

Isolation Forest Algorithm: Uses statistical anomaly detection to identify outlier transactions
Real-time Processing: Analyzes transaction data as it occurs on the blockchain
Temporal Pattern Recognition: Identifies unusual timing patterns that deviate from normal trading behavior
Implemented using: Apache Beam on Google Dataflow with containerized processing
Significance: Enables detection of previously unknown fraud patterns and adapts to evolving threats

3. Graph-Based Network Analysis

This pipeline focuses on relationship analysis between wallets and transactions:

PageRank Analysis: Identifies influential addresses in the transaction network
Community Detection: Discovers related trading groups that may be collaborating
KMeans Clustering: Groups wallets based on behavioral features (e.g., in/out-degree, volume ratios); flags addresses that are far from cluster centroids as anomalous
Cluster Analysis: Finds anomalous behavior patterns across the network
Implemented using: GraphFrames on Apache Spark for graph analytics
Significance: Reveals sophisticated fraud schemes that involve multiple coordinated accounts and complex transaction patterns

Source Code Structure

This repository contains the following key components:

1. Rule-Based Batch Processing (`/Batch_processing/`)

spark-job.py: Main PySpark application that implements rule-based fraud detection algorithms
cloud_functions/: Code that runs inside the cloud function that trigger the dataproc

2. ML-Based Stream Processing (`/beam_custom_pipeline/`)

rules.py: Apache Beam pipeline for real-time fraud detection using Isolation Forest
Dockerfile: Dockerfile to create the docker container
gcp_docker.sh: Script for deploying the docker container in gcr and create data flex template

3. Graph-Based Analytics (`/Network_analysis_batch_processing/`)

network_analysis.py: PySpark application using GraphFrames for network analysis

4. Integration and Infrastructure

terraform.tf: Infrastructure as Code for deploying the complete system

5. Database Schema

schema.json: Schema for the bigquery streaming data table
schema_batch_processing.json: Schema for the bigquery rule based batch processing data table

System Architecture and Data Flow

Data Ingestion: Transaction data from Ethereum blockchain enters the system via:
- Historical data dumps for batch processing
- Real-time Pub/Sub streams for live analysis
Processing: Each pipeline processes data independently:
- Rule-based: Scheduled Dataproc jobs analyze historical transaction batches
- ML-based: Continuous Dataflow jobs process streaming data
- Graph-based: Scheduled analysis of transaction networks
Integration: All detections are consolidated in BigQuery for:
- Cross-pipeline correlation of findings
- Aggregated reporting and visualization
- Historical analysis of detection patterns
Alert Generation: High-confidence fraud detections generate alerts for further investigation

Significance and Impact

DeFi Sentinel addresses a critical need in the cryptocurrency ecosystem:

Comprehensive Coverage: Detects fraud at multiple levels, from simple patterns to complex network schemes
Adaptability: Combines fixed rules with machine learning to handle both known and emerging fraud patterns
Scalability: Built on cloud infrastructure to handle growing transaction volumes
Integration: Provides a unified view across different detection methods
Real-time Capabilities: Enables timely reaction to suspicious activities
Reduced False Positives: Multi-pipeline approach improves overall accuracy by corroborating detections

Getting Started

To run this project in your own GCP environment:

Clone this repository and navigate to the repo.
Install requirements
```
pip install -r requirements.txt
```
Add terraform.tfvars file with variables project_id and region

Zip the cloud functions for the batch processing pipeline trigger

zip cloud_functions Batch_processing/cloud_functions/main.py Batch_processing/cloud_functions/requirements.txt

Deploy the container in GCR and create data flex template by running the commands inside beam_custom_pipeline/gcp_docker.sh file.

Deploy the infrastructure using Terraform:

terraform init
terraform plan -var-file="terraform.tfvars"
terraform apply -var-file="terraform.tfvars"

Monitor pipeline execution through the GCP Console
Run the local file for streaming data
```
python live_stream_data.py
```
Run the local file for batch processing of data
```
python Fetch.py
```
Run the local file for visualization
```
python visualization.py
```

Requirements

Google Cloud Platform account with billing enabled
The following GCP services:
- Cloud Storage
- Pub/Sub
- Dataflow
- Dataproc
- BigQuery
- Cloud Functions
- Cloud Scheduler
Apache Spark 3.0+
Python 3.8+
Apache Beam 2.40+

License

This project is licensed under the MIT License - see the LICENSE file for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DeFi Sentinel - Comprehensive Fraud Detection System for Decentralized Exchanges

Project Overview

Key Objectives Achieved

Pipeline Architecture

1. Rule-Based Batch Processing

2. ML-Based Stream Processing

3. Graph-Based Network Analysis

Source Code Structure

1. Rule-Based Batch Processing (`/Batch_processing/`)

2. ML-Based Stream Processing (`/beam_custom_pipeline/`)

3. Graph-Based Analytics (`/Network_analysis_batch_processing/`)

4. Integration and Infrastructure

5. Database Schema

System Architecture and Data Flow

Significance and Impact

Getting Started

Requirements

License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
Batch_processing		Batch_processing
Graph_based_anomaly		Graph_based_anomaly
Network_analysis_batch_processing		Network_analysis_batch_processing
beam_custom_pipeline		beam_custom_pipeline
Fetch.py		Fetch.py
README.md		README.md
live_stream_data.py		live_stream_data.py
requirements.txt		requirements.txt
schema.json		schema.json
schema_batch_processing.json		schema_batch_processing.json
terraform.tf		terraform.tf
variables.tf		variables.tf
visualization.py		visualization.py

lavanyavijayk/Defi-Sentinal

Folders and files

Latest commit

History

Repository files navigation

DeFi Sentinel - Comprehensive Fraud Detection System for Decentralized Exchanges

Project Overview

Key Objectives Achieved

Pipeline Architecture

1. Rule-Based Batch Processing

2. ML-Based Stream Processing

3. Graph-Based Network Analysis

Source Code Structure

1. Rule-Based Batch Processing (/Batch_processing/)

2. ML-Based Stream Processing (/beam_custom_pipeline/)

3. Graph-Based Analytics (/Network_analysis_batch_processing/)

4. Integration and Infrastructure

5. Database Schema

System Architecture and Data Flow

Significance and Impact

Getting Started

Requirements

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

1. Rule-Based Batch Processing (`/Batch_processing/`)

2. ML-Based Stream Processing (`/beam_custom_pipeline/`)

3. Graph-Based Analytics (`/Network_analysis_batch_processing/`)

Packages