Out of 713 wallets analyzed:
- 87% were classified as low-risk (score 0–4)
- 58 wallets (≈8%) were medium-risk (score 5–9)
- Only 8 wallets (≈1.6%) were high-risk (score ≥10)
One wallet scored 28 due to a high transaction count and repeated trading pairs.
This demonstrates that the model is conservative and only flags accounts when multiple red flags are combined.
Risk Score Distribution


Graph-Based Risk Structures
Using NetworkX, transaction graphs revealed several laundering-like structures:
- Circular flows: NFT transfers that return to the sender
- Strongly Connected Components (SCCs): wallets forming closed mutual loops
- Star-shaped dispersal hubs: single wallet sending to multiple others
Wallets involved in these structures had risk scores 4.6× higher than average.
This confirms that transaction structure and behavior scoring reinforce each other.
Statistical Validation
Three statistical tests were applied to validate the risk scoring system:
- K-means Cluster Analysis
- Wallets were grouped based on behavioral features
- High-risk wallets fell into a distinct cluster with average score 9.07
- 100% of wallets with scores >10 were in this cluster
- Temporal Consistency Testing
- Data split into 3 time periods
- Average consistency across periods: only 5.6
- Suggests burst-like laundering activity, not long-term trading
- Simulated Expert Rating Correlation
- Manual scoring of top wallets compared to model outputs
- Achieved Pearson correlation of 0.96, showing strong agreement



These validation methods demonstrate that the scoring model is not arbitrary—it aligns with natural behavioral groupings, exposes time-bound laundering bursts, and approximates human expert judgment.
