This project focuses on analyzing on-chain transaction behavior of wallet addresses interacting with the Compound V2 Protocol to assess their risk levels. The objective is to develop a risk scoring system from scratch that evaluates wallet activities and assigns a risk score between 0 and 1000.
- Extract transaction data for provided wallet addresses.
- Fetching transaction data from the Compound V2 Protocol.
- Engineer meaningful features indicative of wallet risk.
- Develop a scalable scoring methodology.
- Deliver a final CSV file mapping wallet IDs to their respective risk scores.
- Python (Pandas, NumPy,)
- Jupyter Notebook / Python Script
- Seaborn, Matplotlib
- Scikit-learn (KMeans, MinMaxScaler)
├── README.md ├── analysis.md ├── compound_wallet_risk.ipynb / .py ├── wallet_id.csv ├── wallet_transactions_compound.csv ├── wallet_risk_scores.csv └── figures/ ├── risk_score_distribution OF WalletsS.png └── avg_tx_per_day_by_scor
- 1. Clone this repository to your local machine.
- 2. Open the compound_wallet_risk.ipynb in Jupyter Notebook or run the .py file.
- 3. Execute the script to generate wallet risk scores.
- 4. The output CSV wallet_risk_scores.csv will be created in the root folder.
- Wallet List: The assignment provided a Google Sheet containing 103 wallet addresses.
- Protocol: Compound V2
- API Used: Covalent API (Chain ID: 1 - Ethereum Mainnet)
- Tool Used: Python (
requestslibrary) to loop through wallet addresses and fetch transactional data - Approach: For each wallet, we fetched historical on-chain transaction data using Covalent's /transactions_v2/ endpoint.
- Fields Collected:
tx_hash, block_signed_at, gas_spent, gas_price, fees_paid, to_address, successful(and more). Data was collected and saved locally to prevent repeated API calls when restarting the kernel.Download Fetched_transactional_data (Google Drive)
Per wallet, the following behavioral features were engineered:
| Feature | Description |
|---|---|
| avg_tx_per_day | Average number of transactions per active day. |
| avg_gas_spent_per_tx | Average gas spent per transaction. |
| fees_paid_per_tx | Average transaction fees paid per transaction. |
| success_ratio | Ratio of successful transactions to total transactions. |
| destination_diversity | Ratio of unique destination addresses per total transactions. |
| avg_fee_per_day | Average transaction fees paid per active day. |
| total_transactions | Total number of transactions made. |
| total_gas_spent | Total gas used across all transactions. |
| avg_gas_price | Average gas price used. |
| total_fees_paid | Total fees paid across transactions. |
| unique_destinations | Number of unique recipients. |
| total_successful_transaction | Count of successful transactions. |
| active_days | Unique days the wallet was active. |
- Method: Min-Max Normalization.
- Applied Min-Max Normalization to scale all features between 0 and 1.
- Algorithm Used: KMeans Clustering (n_clusters = 5).
- KMeans was applied on engineered features to group wallets by behavioral
- A score metric was computed as:
score_metric = success_ratio + fees_paid_per_tx + avg_fee_per_day - destination_diversity
- Clusters were ranked based on this metric and assigned scores:
- Rank 1 → 1000(most reliable cluster)
- Rank 2 → 750
- Rank 3 → 500
- Rank 4 → 250
- Rank 5 → 100 (most risky cluster)
- Each wallet was assigned a Risk Score based on the cluster it belonged to.
- Wallet-wise risk scores exported to wallet_risk_scores.csv.
- Output example
- Risk Score Distribution of Wallets:A bar chart showing the number of wallets in each risk score category.

- Average Transactions per Day by Risk Score

-
steps involved
- Loaded wallet addresses from a CSV file.
- Used Covalent API's /transactions_v2/ endpoint to fetch all historical transactions for each wallet.
- Implemented API request handling (with retries for failures).
- Combined all transactions into a single CSV file (wallet_transactions_compound.csv) for downstream analysis.
- Features chosen to reflect wallet behavior, risk, reliability, and engagement.
| Feature | Description |
|---|---|
| avg_tx_per_day | Average number of transactions per active day. |
| avg_gas_spent_per_tx | Average gas spent per transaction. |
| fees_paid_per_tx | Average transaction fees paid per transaction. |
| success_ratio | Ratio of successful transactions to total transactions. |
| destination_diversity | Ratio of unique destination addresses per total transactions. |
| avg_fee_per_day | Average transaction fees paid per active day. |
| total_transactions | Total number of transactions made. |
| total_gas_spent | Total gas used across all transactions. |
| avg_gas_price | Average gas price used. |
| total_fees_paid | Total fees paid across transactions. |
| unique_destinations | Number of unique recipients. |
| total_successful_transaction | Count of successful transactions. |
| active_days | Unique days the wallet was active. |
- Activity Indicators: (avg_tx_per_day, active_days)
- Financial Indicators: (fees_paid_per_tx, avg_fee_per_day, total_gas_spent)
- Transaction Efficiency: (success_ratio) -Behavioral Risk Factors: (destination_diversity)
These features collectively help understand whether a wallet is consistently active, reliable in terms of transaction success, and whether its behavior indicates focus.
- Used KMeans clustering on scaled behavioral features.
- Ranked clusters using a custom score metric
- Scores mapped from 100 (high risk) to 1000 (low risk)
success_ratio: ndicates reliability. Wallets with a high success ratio are more likely to execute transactions properly, showing stable and responsible behavior.fees_paid_per_tx: Reflects willingness to prioritize transaction execution. Paying more gas can suggest higher seriousness and intent in interactions.avg_fee_per_day:it Reflects consistent protocol usage and cost commitmentdestination_diversity:Very high diversity can imply risky patterns like bot-like behavior
These indicators are chosen to reflect responsible and consistent wallet behavior in interacting with DeFi protocols.
