Version 1

Ethereum Transaction Datasets for Training and Evaluation LLMs, ML, and DL models.

View Dataset
Anonymous Author(s).

Description

Primary Corpus (Unlabeled). We sourced the complete set of Ethereum transaction traces for calendar year 2024 (blocks 18,908,895 to 21,525,890). The raw dataset comprised 429,745,984 transactions. The majority of these are sim- ple value transfers or contract interactions with negligible fees, which are less indicative of sophisticated attack patterns. To focus on transactions with substantial economic weight and to maintain computational feasibility, we filtered the dataset to transactions with a miner fee (including priority fee) of at least 0.01 ETH (ap- proximately 18 USD at the time of analysis). This filtering yielded a final training corpus of 1,074,346 transactions for unsupervised representation learning and detector training. Evaluation Benchmark (Labeled). To evaluate generalization on novel threats, we constructed a separate, manually verified benchmark. We collected 439 transactions from 2023, 2024, and 2025 that were absent from the primary corpus. Each transaction was investigated via block explorers, security reports (e.g., Rekt News), and community analysis to assign a ground-truth label: malicious (confirmed exploit, hack, or scam) or benign. This benchmark includes diverse attack vectors (e.g., price oracle manipulation, logic bugs, phishing, flash loan attacks, reentrancy, access control vulnerabilities) across various protocols, providing a rigorous test for out-of-sample detection.We have restricted access to the dataset until our research paper is accepted. In the meantime, please contact us via this anonymous email: [email protected] access the dataset here https://anonymous.4open.science/r/tx-lens-artifacts-EB4B/README.md

Citations (0)

Mentions (0)

Metrics Over Time

Publication Details

DOI

Publisher

Zenodo

Assigned Domain

Subfield

General Social Sciences

Field

Social Sciences

Domain

Social Sciences

Confidence Score

40%

Source

Scholar Data Model

Keywords

BlockchainTransactionsanomaly detection