Multi-Objective Variational Autoencoder for Blockchain Forensics: Detecting and Attributing Lazarus APT Group Wallets
Contents
Abstract:
The exponential growth of blockchain-based financial crimes necessitates advanced analytical frameworks capable of distinguishing between legitimate and illicit cryptocurrency activities. This paper presents a deep learning architecture based on an Advanced Variational Autoencoder (VAE) with multi-objective learning for Ethereum wallet classification. The model is trained on 116 behavioral indicators capturing graph topology, temporal dynamics, and transaction flow characteristics. The architecture integrates self-attention mechanisms, residual connections, and a dual-objective loss combining reconstruction and classification. Trained on a balanced dataset of 15260 Ethereum wallets (7603 Lazarus; 7657 non-Lazarus), the model achieves 98.998% accuracy, 99.128% precision, 98.862% recall, and 99.947% AUC. The non-Lazarus cohort includes 3930 normal/licit, 202 mixer, 1141 nested-service, and 2384 bridge users, of which 940 are identified as illicit. Mixer and nested-service users are treated as illicit but non-Lazarus, while the remaining bridge users are considered behavioral. The framework captures complex temporal patterns, cross-chain bridge interactions, gas optimization strategies, and …
The exponential growth of blockchain-based financial crimes necessitates advanced analytical frameworks capable of distinguishing between legitimate and illicit cryptocurrency activities. This paper presents a deep learning architecture based on an Advanced Variational Autoencoder (VAE) with multi-objective learning for Ethereum wallet classification. The model is trained on 116 behavioral indicators capturing graph topology, temporal dynamics, and transaction flow characteristics. The architecture integrates self-attention mechanisms, residual connections, and a dual-objective loss combining reconstruction and classification. Trained on a balanced dataset of 15260 Ethereum wallets (7603 Lazarus; 7657 non-Lazarus), the model achieves 98.998% accuracy, 99.128% precision, 98.862% recall, and 99.947% AUC. The non-Lazarus cohort includes 3930 normal/licit, 202 mixer, 1141 nested-service, and 2384 bridge users, of which 940 are identified as illicit. Mixer and nested-service users are treated as illicit but non-Lazarus, while the remaining bridge users are considered behavioral. The framework captures complex temporal patterns, cross-chain bridge interactions, gas optimization strategies, and …