Automated Author Profile

Loporchio, Matteo

University of Pisa
0000-0002-9806-6475

Current S-Index

8.2

Sum of Dataset Indices for all datasets

Average Dataset Index per Dataset

1.0

Average Dataset Index per dataset

Total Datasets

8

Total datasets for this author

Average FAIR Score

36.8%

Average FAIR Score per dataset

Total Citations

0

Total citations to the author's datasets

Total Mentions

2

Total mentions of the author's datasets

S-Index Interpretation

S-Index Over Time

Cumulative Citations Over Time

Cumulative Mentions Over Time

Datasets

ERC-1155 Transfer Events (Version: 1.0.0)

IntroductionWith the introduction of smart contracts, the Ethereum blockchain has become a reference platform for tokens, digital assets that can be owned and transferred between users. Tokens can represent a wide range of assets beyond traditional cryptocurrencies (e.g., voting rights or digital trading cards). Tokens that are indistinguishable and interchangeable with each other, such as company shares, are called fungible, while those that have distinct and unique properties and value, such as digital collectibles, are referred to as non-fungible (often abbreviated as NFTs).The ERC-1155 standard, introduced in 2018, defines a set of common rules for the management and exchange of tokens. The standard allows for managing multiple tokens, both fungible and non-fungible, within a single smart contract. Moreover, it introduces the possibility of performing batch transfers, reducing transaction costs and enabling a more efficient use of blockchain resources. All ERC-1155 token transfers, whether in single or batch mode, are permanently recorded on the Ethereum blockchain as events and are publicly accessible, thus allowing for analysis of contract activity and user interactions.Dataset descriptionThis repository contains information about the transfer events emitted by ERC-1155 contracts on the Ethereum blockchain. Such events are recorded every time there is a transfer of tokens from one user to another. Note that the standard distinguishes between TransferSingle events, where a single token is transferred, and TransferBatch events, which represent the transfer of multiple tokens. In a batch transfer, tokens of any nature (i.e., fungible and non-fungible) can be exchanged, even mixed together.The dataset contains all events included in the Ethereum blocks ranging from height 0 to 21 525 890 (included), thus covering the time period between July 30th, 2015 03:26:13 PM UTC and December 31st, 2024 11:59:59 PM UTC.Events are stored in chronological order (i.e., from the oldest to the most recent) in a JSON file compressed with the gzip utility. The JSON file is newline-delimited, meaning that each line of the file corresponds to a single event (i.e., a single JSON object). To understand the structure of each line, consider the following example event.{   "address":"0xd0e4847359ae76c2786d242e5f45c4f6f1abd752",   "transaction_index":40,   "log_index":32,   "transaction_hash":"0x223600ba642f4dc6644e5eb4b0a02a6f67589ee4802be640b373fdd30bb00ff4",   "block_number":6930510,   "block_timestamp":"2018-12-22 04:21:31",   "type":"SINGLE",   "operator":"0x6e8a8a0de641161b306cd548710c6175546faf76",   "from":"0x0000000000000000000000000000000000000000",   "to":"0x463def03f98b328a75051ee5ebe9a6235de4ac59",   "token_ids":[      "32"   ],   "amounts":[      "100000000"   ]}The JSON object includes the following fields.“address” represents the Ethereum address of the smart contract that emitted the event.“transaction_index” indicates the position, within the block, of the transaction that produced the event.“log_index” corresponds to the position of the event in the list of all events emitted by the transaction.“transaction_hash” is the cryptographic hash of the transaction that produced the event. The details of the transaction used in the example are also visible through any explorer service, such as Etherscan: https://etherscan.io/tx/0x223600ba642f4dc6644e5eb4b0a02a6f67589ee4802be640b373fdd30bb00ff4.The “block_number” and “block_timestamp” fields represent the height and the date of addition of the block in which the transaction that produced the event is present.The “type” field indicates whether the event is of type TransferSingle (in which case it will have the value “SINGLE”) or if it is a TransferBatch (marked by the designation “BATCH”).The “operator” address represents the account that is allowed to perform the transfer. Note that this account may not correspond to the actual owner of the tokens to be transferred.The “from” address represents the token holder, namely the user whose token balance is decreased.The “to” address corresponds to the token recipient, i.e., the user whose token balance is increased.The “token_ids” and “amounts” arrays describe the tokens being transferred. Specifically, the i-th element of “token_ids” represents the numerical identifier of the i-th token involved in the transfer, while the i-th element in the “amounts” array describes the quantity transferred for that token. In the case of transfers in single mode, both “token_ids” and “amounts” will have a length of 1.Cite this workIf the information contained in the dataset has been useful for your work, please cite the following article.M. Loporchio, D. Di Francesco Maesa, A. Bernasconi, and L. Ricci, “Analyzing ERC-1155 Adoption: A Study of the Multi-token Ecosystem,” Studies in Computational Intelligence. Springer Nature Switzerland, pp. 385–397, 2025. doi: 10.1007/978-3-031-82427-2_32. FundingThis work was partially supported by Project Awesome: Analysis Framework for Web3 SOcial MEdia, project code: 2022MAWEZA, under the National Recovery and Resilience Plan funded by the European Union - NextGenerationEU, MISSION 4 COMPONENT 2, INVESTMENT N.1.1 CALL PRIN 2022 D.D. 104 02-02-2022, CUP N.I53D23003680006, and by Project DLT-FRUIT: A user centered framework for facilitating DLTs FRUITion, project code: P2022NZPJA, under the National Recovery and Resilience Plan funded by the European Union - NextGenerationEU, MISSION 4 COMPONENT 2, INVESTMENT N.1.1 CALL PRIN 2022 PNRR D.D. n. 1409 14/09/2022, CUP N.I53D23006100001.

Authors

  • Loporchio, Matteo
0 Citations0 Mentions13% FAIR0.3 Dataset Index
10.5281/zenodo.14901527August 2025

ERC-1155 Transfer Events (Version: 1.0.0)

IntroductionWith the introduction of smart contracts, the Ethereum blockchain has become a reference platform for tokens, digital assets that can be owned and transferred between users. Tokens can represent a wide range of assets beyond traditional cryptocurrencies (e.g., voting rights or digital trading cards). Tokens that are indistinguishable and interchangeable with each other, such as company shares, are called fungible, while those that have distinct and unique properties and value, such as digital collectibles, are referred to as non-fungible (often abbreviated as NFTs).The ERC-1155 standard, introduced in 2018, defines a set of common rules for the management and exchange of tokens. The standard allows for managing multiple tokens, both fungible and non-fungible, within a single smart contract. Moreover, it introduces the possibility of performing batch transfers, reducing transaction costs and enabling a more efficient use of blockchain resources. All ERC-1155 token transfers, whether in single or batch mode, are permanently recorded on the Ethereum blockchain as events and are publicly accessible, thus allowing for analysis of contract activity and user interactions.Dataset descriptionThis repository contains information about the transfer events emitted by ERC-1155 contracts on the Ethereum blockchain. Such events are recorded every time there is a transfer of tokens from one user to another. Note that the standard distinguishes between TransferSingle events, where a single token is transferred, and TransferBatch events, which represent the transfer of multiple tokens. In a batch transfer, tokens of any nature (i.e., fungible and non-fungible) can be exchanged, even mixed together.The dataset contains all events included in the Ethereum blocks ranging from height 0 to 21 525 890 (included), thus covering the time period between July 30th, 2015 03:26:13 PM UTC and December 31st, 2024 11:59:59 PM UTC.Events are stored in chronological order (i.e., from the oldest to the most recent) in a JSON file compressed with the gzip utility. The JSON file is newline-delimited, meaning that each line of the file corresponds to a single event (i.e., a single JSON object). To understand the structure of each line, consider the following example event.{   "address":"0xd0e4847359ae76c2786d242e5f45c4f6f1abd752",   "transaction_index":40,   "log_index":32,   "transaction_hash":"0x223600ba642f4dc6644e5eb4b0a02a6f67589ee4802be640b373fdd30bb00ff4",   "block_number":6930510,   "block_timestamp":"2018-12-22 04:21:31",   "type":"SINGLE",   "operator":"0x6e8a8a0de641161b306cd548710c6175546faf76",   "from":"0x0000000000000000000000000000000000000000",   "to":"0x463def03f98b328a75051ee5ebe9a6235de4ac59",   "token_ids":[      "32"   ],   "amounts":[      "100000000"   ]}The JSON object includes the following fields.“address” represents the Ethereum address of the smart contract that emitted the event.“transaction_index” indicates the position, within the block, of the transaction that produced the event.“log_index” corresponds to the position of the event in the list of all events emitted by the transaction.“transaction_hash” is the cryptographic hash of the transaction that produced the event. The details of the transaction used in the example are also visible through any explorer service, such as Etherscan: https://etherscan.io/tx/0x223600ba642f4dc6644e5eb4b0a02a6f67589ee4802be640b373fdd30bb00ff4.The “block_number” and “block_timestamp” fields represent the height and the date of addition of the block in which the transaction that produced the event is present.The “type” field indicates whether the event is of type TransferSingle (in which case it will have the value “SINGLE”) or if it is a TransferBatch (marked by the designation “BATCH”).The “operator” address represents the account that is allowed to perform the transfer. Note that this account may not correspond to the actual owner of the tokens to be transferred.The “from” address represents the token holder, namely the user whose token balance is decreased.The “to” address corresponds to the token recipient, i.e., the user whose token balance is increased.The “token_ids” and “amounts” arrays describe the tokens being transferred. Specifically, the i-th element of “token_ids” represents the numerical identifier of the i-th token involved in the transfer, while the i-th element in the “amounts” array describes the quantity transferred for that token. In the case of transfers in single mode, both “token_ids” and “amounts” will have a length of 1.Cite this workIf the information contained in the dataset has been useful for your work, please cite the following article.M. Loporchio, D. Di Francesco Maesa, A. Bernasconi, and L. Ricci, “Analyzing ERC-1155 Adoption: A Study of the Multi-token Ecosystem,” Studies in Computational Intelligence. Springer Nature Switzerland, pp. 385–397, 2025. doi: 10.1007/978-3-031-82427-2_32. FundingThis work was partially supported by Project Awesome: Analysis Framework for Web3 SOcial MEdia, project code: 2022MAWEZA, under the National Recovery and Resilience Plan funded by the European Union - NextGenerationEU, MISSION 4 COMPONENT 2, INVESTMENT N.1.1 CALL PRIN 2022 D.D. 104 02-02-2022, CUP N.I53D23003680006, and by Project DLT-FRUIT: A user centered framework for facilitating DLTs FRUITion, project code: P2022NZPJA, under the National Recovery and Resilience Plan funded by the European Union - NextGenerationEU, MISSION 4 COMPONENT 2, INVESTMENT N.1.1 CALL PRIN 2022 PNRR D.D. n. 1409 14/09/2022, CUP N.I53D23006100001.

Authors

  • Loporchio, Matteo
0 Citations0 Mentions13% FAIR0.3 Dataset Index
10.5281/zenodo.14901528August 2025

ERC-20 and ERC-721 Transfer events (Version: 1.0)

DescriptionThis repository contains information about the Transfer events emitted by ERC-20 and ERC-721 contracts on the Ethereum blockchain. Specifically, the events in the repository are those included in the first 15 million blocks of the chain, i.e., from block 0 (added on July 30th, 2015, 15:26:13 GMT) to block 14,999,999 (added on June 21st, 2022, 02:28:10 GMT).All files in the repository are written according to the CSV (Comma Separated Value) format. For space reasons, they have also been compressed using the XZ utility. The dataset consists of the following files.block_timestamps_0-14999999.csv.xz contains information about the timestamps of individual blocks. Specifically, each row in the file represents a block and consists of 2 fields.  The first field represents the block identifier (i.e., its height).The second field is the Unix epoch timestamp indicating the instant at which the block was added to the blockchain.The erc20_transfers.csv.xz (resp. erc721_transfers.csv.xz) file contains a row for each ERC-20 (resp. ERC-721) Transfer event. The description of each transfer includes 4 fields. Block identifier in which the transfer occurred.Numeric identifier of the contract that produced the event.Numeric identifier of the sender of the transfer.Numeric identifier of the recipient of the transfer.The erc20_contracts.csv.xz (resp. erc721_contracts.csv.xz) file contains the Ethereum addresses of the contracts that have raised at least one ERC-20 Transfer event (resp. ERC-721) within the considered blocks. More precisely, each row of these files represents a contract and consists of 2 fields.Ethereum address of the contract. Numerical identifier used to represent the contract. Specifically, the identifiers listed in erc20_contracts.csv.xz are used inside erc20_transfers.csv.xz, while those of erc721_contracts.csv.xz are used in the erc721_transfers.csv.xz file.The erc20_addresses.csv.xz (resp. erc721_addresses.csv.xz) file contains the Ethereum addresses of the participants involved in at least one ERC-20 Transfer event (resp. ERC-721) within the considered blocks. More precisely, each row of these files is composed of 2 fields.Ethereum address of the participant. Numerical identifier used to represent the participant. The identifiers contained in erc20_addresses.csv.xz are used in the erc20_transfers.csv.xz file, while those of erc721_addresses.csv.xz are used in the erc721_transfers.csv.xz file.The erc20_values_dec.csv.xz file contains the amounts of all ERC-20 token transfers. The file includes a row for each ERC-20 transfer and the n-th row contains the amount of fungible tokens exchanged in the n-th transfer of the erc20_transfers.csv.xz file.The erc721_values_dec.csv.xz file has one row for each ERC-721 token transfer. The n-th row of this file contains the identifier of the non-fungible token exchanged in the n-th transfer of the erc721_transfers.csv.xz file.Cite this work  If the data included in this repository have been useful, please cite the following articles in your work.@article{LoporchioMBR24, title = "Comparing Ethereum fungible and non-fungible tokens: an analysis of transfer networks", author = "Loporchio, Matteo and Di Francesco Maesa, Damiano and Bernasconi, Anna and Ricci, Laura", journal = "Applied Network Science", volume = "9", number = "1", pages = "72", year = 2024}@incollection{Loporchio2024-gb, title = "Analysis and characterization of {ERC-20} token network topologies", booktitle = "Complex Networks & Their Applications {XII}", author = "Loporchio, Matteo and Di Francesco Maesa, Damiano and Bernasconi, Anna and Ricci, Laura", publisher = "Springer Nature Switzerland", pages = "344--355", series = "Studies in computational intelligence", year = 2024, address = "Cham"}FundingThis work was partially supported by Project Awesome: Analysis Framework for Web3 SOcial MEdia, project code: 2022MAWEZA, under the National Recovery and Resilience Plan funded by the European Union - NextGenerationEU, MISSION 4 COMPONENT 2, INVESTMENT N.1.1 CALL PRIN 2022 D.D. 104 02-02-2022, CUP N.I53D23003680006, and by Project DLT-FRUIT: A user centered framework for facilitating DLTs FRUITion, project code: P2022NZPJA, under the National Recovery and Resilience Plan funded by the European Union - NextGenerationEU, MISSION 4 COMPONENT 2, INVEST- MENT N.1.1 CALL PRIN 2022 PNRR D.D. n. 1409 14/09/2022, CUP N.I53D23006100001.References[1] M. Loporchio, D. Di Francesco Maesa, A. Bernasconi, and L. Ricci, “Analysis and Characterization of ERC-20 Token Network Topologies,” Complex Networks & Their Applications XII. Springer Nature Switzerland, pp. 344–355, 2024. https://doi.org/10.1007/978-3-031-53472-0_29[2] M. Loporchio, D. Di Francesco Maesa, A. Bernasconi, and L. Ricci, "Comparing Ethereum Fungible and Non-Fungible Tokens: an Analysis of Transfer Networks," Applied Network Science, vol. 9, no. 1. Springer Science and Business Media LLC, Nov. 20, 2024. https://doi.org/10.1007/s41109-024-00682-8

Authors

  • Loporchio, Matteo
0 Citations0 Mentions73% FAIR1.8 Dataset Index
10.5281/zenodo.10644076May 2024

ERC-20 and ERC-721 Transfer events (Version: 1.0)

DescriptionThis repository contains information about the Transfer events emitted by ERC-20 and ERC-721 contracts on the Ethereum blockchain. Specifically, the events in the repository are those included in the first 15 million blocks of the chain, i.e., from block 0 (added on July 30th, 2015, 15:26:13 GMT) to block 14,999,999 (added on June 21st, 2022, 02:28:10 GMT).All files in the repository are written according to the CSV (Comma Separated Value) format. For space reasons, they have also been compressed using the XZ utility. The dataset consists of the following files.block_timestamps_0-14999999.csv.xz contains information about the timestamps of individual blocks. Specifically, each row in the file represents a block and consists of 2 fields.  The first field represents the block identifier (i.e., its height).The second field is the Unix epoch timestamp indicating the instant at which the block was added to the blockchain.The erc20_transfers.csv.xz (resp. erc721_transfers.csv.xz) file contains a row for each ERC-20 (resp. ERC-721) Transfer event. The description of each transfer includes 4 fields. Block identifier in which the transfer occurred.Numeric identifier of the contract that produced the event.Numeric identifier of the sender of the transfer.Numeric identifier of the recipient of the transfer.The erc20_contracts.csv.xz (resp. erc721_contracts.csv.xz) file contains the Ethereum addresses of the contracts that have raised at least one ERC-20 Transfer event (resp. ERC-721) within the considered blocks. More precisely, each row of these files represents a contract and consists of 2 fields.Ethereum address of the contract. Numerical identifier used to represent the contract. Specifically, the identifiers listed in erc20_contracts.csv.xz are used inside erc20_transfers.csv.xz, while those of erc721_contracts.csv.xz are used in the erc721_transfers.csv.xz file.The erc20_addresses.csv.xz (resp. erc721_addresses.csv.xz) file contains the Ethereum addresses of the participants involved in at least one ERC-20 Transfer event (resp. ERC-721) within the considered blocks. More precisely, each row of these files is composed of 2 fields.Ethereum address of the participant. Numerical identifier used to represent the participant. The identifiers contained in erc20_addresses.csv.xz are used in the erc20_transfers.csv.xz file, while those of erc721_addresses.csv.xz are used in the erc721_transfers.csv.xz file.The erc20_values_dec.csv.xz file contains the amounts of all ERC-20 token transfers. The file includes a row for each ERC-20 transfer and the n-th row contains the amount of fungible tokens exchanged in the n-th transfer of the erc20_transfers.csv.xz file.The erc721_values_dec.csv.xz file has one row for each ERC-721 token transfer. The n-th row of this file contains the identifier of the non-fungible token exchanged in the n-th transfer of the erc721_transfers.csv.xz file.Cite this work  If the data included in this repository have been useful, please cite the following articles in your work.@article{LoporchioMBR24, title = "Comparing Ethereum fungible and non-fungible tokens: an analysis of transfer networks", author = "Loporchio, Matteo and Di Francesco Maesa, Damiano and Bernasconi, Anna and Ricci, Laura", journal = "Applied Network Science", volume = "9", number = "1", pages = "72", year = 2024}@incollection{Loporchio2024-gb, title = "Analysis and characterization of {ERC-20} token network topologies", booktitle = "Complex Networks & Their Applications {XII}", author = "Loporchio, Matteo and Di Francesco Maesa, Damiano and Bernasconi, Anna and Ricci, Laura", publisher = "Springer Nature Switzerland", pages = "344--355", series = "Studies in computational intelligence", year = 2024, address = "Cham"}FundingThis work was partially supported by Project Awesome: Analysis Framework for Web3 SOcial MEdia, project code: 2022MAWEZA, under the National Recovery and Resilience Plan funded by the European Union - NextGenerationEU, MISSION 4 COMPONENT 2, INVESTMENT N.1.1 CALL PRIN 2022 D.D. 104 02-02-2022, CUP N.I53D23003680006, and by Project DLT-FRUIT: A user centered framework for facilitating DLTs FRUITion, project code: P2022NZPJA, under the National Recovery and Resilience Plan funded by the European Union - NextGenerationEU, MISSION 4 COMPONENT 2, INVEST- MENT N.1.1 CALL PRIN 2022 PNRR D.D. n. 1409 14/09/2022, CUP N.I53D23006100001.References[1] M. Loporchio, D. Di Francesco Maesa, A. Bernasconi, and L. Ricci, “Analysis and Characterization of ERC-20 Token Network Topologies,” Complex Networks & Their Applications XII. Springer Nature Switzerland, pp. 344–355, 2024. https://doi.org/10.1007/978-3-031-53472-0_29[2] M. Loporchio, D. Di Francesco Maesa, A. Bernasconi, and L. Ricci, "Comparing Ethereum Fungible and Non-Fungible Tokens: an Analysis of Transfer Networks," Applied Network Science, vol. 9, no. 1. Springer Science and Business Media LLC, Nov. 20, 2024. https://doi.org/10.1007/s41109-024-00682-8

Authors

  • Loporchio, Matteo
0 Citations0 Mentions13% FAIR0.3 Dataset Index
10.5281/zenodo.10644077May 2024

BF skip indexes for Ethereum (Version: 1.0)

General informationThis repository includes all data needed to reproduce the experiments presented in [1].The paper describes the BF skip index, a data structure based on Bloom filters [2] that can be used for answering inter-block queries on blockchains efficiently. The article also includes a historical analysis of logsBloom filters included in the Ethereum block headers, as well as an experimental analysis of the proposed data structure. The latter was conducted using the data set of events generated by the CryptoKitties Core contract, a popular decentralized application launched in 2017 (and also one of the first applications based on NFTs).In this description, we use the following abbreviations (also adopted throughout the paper) to denote two different sets of Ethereum blocks.D1: set of all Ethereum blocks between height 0 and 14999999.D2: set of all Ethereum blocks between height 14000000 and 14999999.Moreover, in accordance with the terminology adopted in the paper, we define the set of keys of a block as the set of all contract addresses and log topics of the transactions in the block. As defined in [3], log topics comprise event signature digests and the indexed parameters associated with the event occurrence.Data set descriptionFileDescriptionfilters_ones_0-14999999.csv.xzCompressed CSV file containing the number of ones for each logsBloom filter in D1.receipt_stats_0-14999999.csv.xzCompressed CSV file containing statistics about all transaction receipts in D1.Approval.csvCSV file containing the Approval event occurrences for the CryptoKitties Core contract in D2.Birth.csvCSV file containing the Birth event occurrences for the CryptoKitties Core contract in D2.Pregnant.csvCSV file containing the Pregnant event occurrences for the CryptoKitties Core contract in D2.Transfer.csvCSV file containing the Transfer event occurrences for the CryptoKitties Core contract in D2.events.xzCompressed binary file containing information about all contract events in D2.keys.xzCompressed binary file containing information about all keys in D2.File structureWe now describe the structure of the files included in this repository.filters_ones_0-14999999.csv.xz is a compressed CSV file with 15 million rows (one for each block in D1) and 3 columns. Note that it is not necessary to decompress this file, as the provided code is capable of processing it directly in its compressed form. The columns have the following meaning.blockId: the identifier of the block.timestamp: timestamp of the block.numOnes: number of bits set to 1 in the logsBloom filter of the block.receipt_stats_0-14999999.csv.xz is a compressed CSV file with 15 million rows (one for each block in D1) and 5 columns. As for the previous file, it is not necessary to decompress this file.blockId: the identifier of the block.txCount: number of transactions included in the block.numLogs: number of event logs included in the block.numKeys: number of keys included in the block.numUniqueKeys: number of distinct keys in the block (useful as the same key may appear multiple times).All CSV files related to the CryptoKitties Core events (i.e., Approval.csv, Birth.csv, Pregnant.csv, Transfer.csv) have the same structure. They consist of 1 million rows (one for each block in D2) and 2 columns, namely:blockId: identifier of the block.numOcc: number of event occurrences in the block.events.xz is a compressed binary file describing all unique event occurrences in the blocks of D2. The file contains 1 million data chunks (i.e., one for each Ethereum block). Each chunk includes the following information. Do note that this file only records unique event occurrences in each block, meaning that if an event from a contract is triggered more than once within the same block, there will be only one sequence within the corresponding chunk.blockId: identifier of the block (4 bytes).numEvents: number of event occurrences in the block (4 bytes).A list of numEvent sequences, each made up of 52 bytes. A sequence represents an event occurrence and is indeed the concatenation of two fields, namely:Address of the contract triggering the event (20 bytes).Event signature digest (32 bytes).keys.xz is a compressed binary file describing all unique keys in the blocks of D2. As for the previous file, duplicate keys only appear once. The file contains 1 million data chunks, each representing an Ethereum block and including the following information.blockId: identifier of the block (4 bytes)numAddr: number of unique contract addresses (4 bytes).numTopics: number of unique topics (4 bytes).A sequence of numAddr addresses, each represented using 20 bytes.A sequence of numTopics topics, each represented using 32 bytes.NotesFor space reasons, some of the files in this repository have been compressed using the XZ compression utility. Unless otherwise specified, these files need to be decompressed before they can be read. Please make sure you have an application installed on your system that is capable of decompressing such files.Cite this workIf the data included in this repository have been useful, please cite the following article in your work.@article{loporchio2025skip, title={Skip index: Supporting efficient inter-block queries and query authentication on the blockchain}, author={Loporchio, Matteo and Bernasconi, Anna and Di Francesco Maesa, Damiano and Ricci, Laura}, journal={Future Generation Computer Systems}, volume={164}, pages={107556}, year={2025}, publisher={Elsevier}}ReferencesLoporchio, Matteo et al. "Skip index: supporting efficient inter-block queries and query authentication on the blockchain". Future Generation Computer Systems 164 (2025): 107556. https://doi.org/10.1016/j.future.2024.107556Bloom, Burton H. "Space/time trade-offs in hash coding with allowable errors." Communications of the ACM 13.7 (1970): 422-426.Wood, Gavin. "Ethereum: A secure decentralised generalised transaction ledger." Ethereum project yellow paper 151.2014 (2014): 1-32.

Authors

  • Loporchio, Matteo
0 Citations0 Mentions13% FAIR0.3 Dataset Index
10.5281/zenodo.7957141May 2023

BF skip indexes for Ethereum (Version: 1.0)

General informationThis repository includes all data needed to reproduce the experiments presented in [1].The paper describes the BF skip index, a data structure based on Bloom filters [2] that can be used for answering inter-block queries on blockchains efficiently. The article also includes a historical analysis of logsBloom filters included in the Ethereum block headers, as well as an experimental analysis of the proposed data structure. The latter was conducted using the data set of events generated by the CryptoKitties Core contract, a popular decentralized application launched in 2017 (and also one of the first applications based on NFTs).In this description, we use the following abbreviations (also adopted throughout the paper) to denote two different sets of Ethereum blocks.D1: set of all Ethereum blocks between height 0 and 14999999.D2: set of all Ethereum blocks between height 14000000 and 14999999.Moreover, in accordance with the terminology adopted in the paper, we define the set of keys of a block as the set of all contract addresses and log topics of the transactions in the block. As defined in [3], log topics comprise event signature digests and the indexed parameters associated with the event occurrence.Data set descriptionFileDescriptionfilters_ones_0-14999999.csv.xzCompressed CSV file containing the number of ones for each logsBloom filter in D1.receipt_stats_0-14999999.csv.xzCompressed CSV file containing statistics about all transaction receipts in D1.Approval.csvCSV file containing the Approval event occurrences for the CryptoKitties Core contract in D2.Birth.csvCSV file containing the Birth event occurrences for the CryptoKitties Core contract in D2.Pregnant.csvCSV file containing the Pregnant event occurrences for the CryptoKitties Core contract in D2.Transfer.csvCSV file containing the Transfer event occurrences for the CryptoKitties Core contract in D2.events.xzCompressed binary file containing information about all contract events in D2.keys.xzCompressed binary file containing information about all keys in D2.File structureWe now describe the structure of the files included in this repository.filters_ones_0-14999999.csv.xz is a compressed CSV file with 15 million rows (one for each block in D1) and 3 columns. Note that it is not necessary to decompress this file, as the provided code is capable of processing it directly in its compressed form. The columns have the following meaning.blockId: the identifier of the block.timestamp: timestamp of the block.numOnes: number of bits set to 1 in the logsBloom filter of the block.receipt_stats_0-14999999.csv.xz is a compressed CSV file with 15 million rows (one for each block in D1) and 5 columns. As for the previous file, it is not necessary to decompress this file.blockId: the identifier of the block.txCount: number of transactions included in the block.numLogs: number of event logs included in the block.numKeys: number of keys included in the block.numUniqueKeys: number of distinct keys in the block (useful as the same key may appear multiple times).All CSV files related to the CryptoKitties Core events (i.e., Approval.csv, Birth.csv, Pregnant.csv, Transfer.csv) have the same structure. They consist of 1 million rows (one for each block in D2) and 2 columns, namely:blockId: identifier of the block.numOcc: number of event occurrences in the block.events.xz is a compressed binary file describing all unique event occurrences in the blocks of D2. The file contains 1 million data chunks (i.e., one for each Ethereum block). Each chunk includes the following information. Do note that this file only records unique event occurrences in each block, meaning that if an event from a contract is triggered more than once within the same block, there will be only one sequence within the corresponding chunk.blockId: identifier of the block (4 bytes).numEvents: number of event occurrences in the block (4 bytes).A list of numEvent sequences, each made up of 52 bytes. A sequence represents an event occurrence and is indeed the concatenation of two fields, namely:Address of the contract triggering the event (20 bytes).Event signature digest (32 bytes).keys.xz is a compressed binary file describing all unique keys in the blocks of D2. As for the previous file, duplicate keys only appear once. The file contains 1 million data chunks, each representing an Ethereum block and including the following information.blockId: identifier of the block (4 bytes)numAddr: number of unique contract addresses (4 bytes).numTopics: number of unique topics (4 bytes).A sequence of numAddr addresses, each represented using 20 bytes.A sequence of numTopics topics, each represented using 32 bytes.NotesFor space reasons, some of the files in this repository have been compressed using the XZ compression utility. Unless otherwise specified, these files need to be decompressed before they can be read. Please make sure you have an application installed on your system that is capable of decompressing such files.Cite this workIf the data included in this repository have been useful, please cite the following article in your work.@article{loporchio2025skip, title={Skip index: Supporting efficient inter-block queries and query authentication on the blockchain}, author={Loporchio, Matteo and Bernasconi, Anna and Di Francesco Maesa, Damiano and Ricci, Laura}, journal={Future Generation Computer Systems}, volume={164}, pages={107556}, year={2025}, publisher={Elsevier}}ReferencesLoporchio, Matteo et al. "Skip index: supporting efficient inter-block queries and query authentication on the blockchain". Future Generation Computer Systems 164 (2025): 107556. https://doi.org/10.1016/j.future.2024.107556Bloom, Burton H. "Space/time trade-offs in hash coding with allowable errors." Communications of the ACM 13.7 (1970): 422-426.Wood, Gavin. "Ethereum: A secure decentralised generalised transaction ledger." Ethereum project yellow paper 151.2014 (2014): 1-32.

Authors

  • Loporchio, Matteo
0 Citations1 Mention13% FAIR0.8 Dataset Index
10.5281/zenodo.7957140May 2023

Bitcoin dust transactions (Version: 1.0.0)

General information This repository contains data regarding Bitcoin dust transactions. In the Bitcoin protocol, dust refers to the small amounts of currency that are lower than the fee required to spend them in a transaction. The repository comprises all transactions with at least one dust output or input. According to our definition, a dust output (or input) is considered dust if the associated amount is between 1 and 545 satoshis (where 1 satoshi = 10-8 bitcoin). For more details about the definition of dust, see [1]. All dust transactions have been extracted from the first 479,970 blocks of the Bitcoin blockchain, thus covering the time period between January 3rd, 2009 18:15 GMT and August 10th, 2017 18:03 GMT. Data set description File Description txs A text file containing a representation of all Bitcoin transactions that create and consume dust. See the description below for more information about the structure of this file. txs_addr_map.csv A CSV file that maps numeric address identifiers to real Bitcoin addresses. This file comprises all addresses appearing in the txs data set. labels.csv A CSV file containing categorical entity labels for Bitcoin addresses appeared in transactions between 2010 and 2018. This file has been derived from the Entity-Address data set [2, 3] (see also: https://github.com/Maru92/EntityAddressBitcoin). outputs_spent_stats.csv A CSV file containing statistics about all spent outputs in the first 479970 blocks of the Bitcoin blockchain. The file describes the durations of dust and non-dust outputs. The duration is defined as the difference between the height of the block where the output is spent and the height of the block where it was created. cluster_sizes_*.csv These CSV files contain information about clusters of addresses induced by Bitcoin transactions. They have been used for the clustering analysis presented in [4]. See this GitHub repository for more information. Transaction representation The txs file contains a textual representation of dust transactions in the Bitcoin blockchain. Each row of the file corresponds to a transaction and is represented as a sequence of fields info:inputs:outputs with the following meaning. The info section contains general information about the transaction. It is represented as a list of comma-separated fields, namely: timestamp,blockId,txId,isCoinbase,fee,approxSize. The meaning of the fields is the following: timestamp represents the Unix timestamp of the block containing the transaction. blockId represents the height of the block containing the transaction. txId is a numeric value that univocally identifies the transaction. isCoinbase is equal to 1 if the transaction is a coinbase transaction, 0 otherwise. fee denotes the transaction fee, expressed in satoshis (i.e., the smallest bitcoin denomination). approximateSize denotes the approximate size of the transaction (expressed in bytes). The inputs section contains a sequence of (0 or more) transaction inputs separated by a semicolon. Each input, in turn, is represented as a comma-separated string addrId,amount,prevTxId,offset where: addrId represents the numeric identifier of the spending address; amount is the amount of value associated with the input (expressed in satoshis); prevTxId represents the numeric identifier of the transaction that created the output that is currently being spent; offset represents the position, among all outputs of prevTxId, of the output that is currently being spent.
The outputs section contains a sequence of (1 or more) transaction outputs separated by a semicolon. Each output, in turn, is represented as a comma-separated string addrId,amount,scriptType where:
addrId represents the numeric identifier of the receiving address; amount is the amount of value associated with the output (expressed in satoshis); scriptType is a numeric identifier representing the type of the script associated with the output (i.e., 0=UNKNOWN; 1=P2PK; 2=P2PKH; 3=P2SH; 4=RETURN; 5=EMPTY). Data analysis Data included in this repository have been employed for the analyses presented in [4, 5]. This GitHub repository contains several tools, written in Java and Python, for analyzing the data. Cite this work If the data included in this repository have been useful, please cite the following article in your work.

@article{loporchio2023bitcoin, title={Is Bitcoin gathering dust? An analysis of low-amount Bitcoin transactions}, author={Loporchio, Matteo and Bernasconi, Anna and Di Francesco Maesa, Damiano and Ricci, Laura}, journal={Applied Network Science}, volume={8}, number={1}, pages={1--28}, year={2023}, publisher={SpringerOpen} } 
References Pérez-Solà, Cristina, et al. "Another coin bites the dust: an analysis of dust in UTXO-based cryptocurrencies." Royal Society open science 6.1 (2019): 180817. Jourdan, Marc, et al. "Characterizing entities in the bitcoin blockchain." 2018 IEEE international conference on data mining workshops (ICDMW). IEEE, 2018. Jourdan, Marc, et al. "A probabilistic model of the bitcoin blockchain." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. 2019. Loporchio, Matteo, et al. "Is Bitcoin gathering dust? An analysis of low-amount Bitcoin transactions." Applied Network Science 8.1 (2023): 1-28. Loporchio, Matteo, et al. "An Analysis of Bitcoin Dust Through Authenticated Queries." Complex Networks and Their Applications XI: Proceedings of The Eleventh International Conference on Complex Networks and their Applications: COMPLEX NETWORKS 2022—Volume 2. Cham: Springer International Publishing, 2023.

Authors

  • Loporchio, Matteo ;
  • Di Francesco Maesa, Damiano
0 Citations1 Mention77% FAIR2.4 Dataset Index
10.5281/zenodo.7696454March 2023

Bitcoin dust transactions (Version: 1.0.0)

General information This repository contains data regarding Bitcoin dust transactions. In the Bitcoin protocol, dust refers to the small amounts of currency that are lower than the fee required to spend them in a transaction. The repository comprises all transactions with at least one dust output or input. According to our definition, a dust output (or input) is considered dust if the associated amount is between 1 and 545 satoshis (where 1 satoshi = 10-8 bitcoin). For more details about the definition of dust, see [1]. All dust transactions have been extracted from the first 479,970 blocks of the Bitcoin blockchain, thus covering the time period between January 3rd, 2009 18:15 GMT and August 10th, 2017 18:03 GMT. Data set description File Description txs A text file containing a representation of all Bitcoin transactions that create and consume dust. See the description below for more information about the structure of this file. txs_addr_map.csv A CSV file that maps numeric address identifiers to real Bitcoin addresses. This file comprises all addresses appearing in the txs data set. labels.csv A CSV file containing categorical entity labels for Bitcoin addresses appeared in transactions between 2010 and 2018. This file has been derived from the Entity-Address data set [2, 3] (see also: https://github.com/Maru92/EntityAddressBitcoin). outputs_spent_stats.csv A CSV file containing statistics about all spent outputs in the first 479970 blocks of the Bitcoin blockchain. The file describes the durations of dust and non-dust outputs. The duration is defined as the difference between the height of the block where the output is spent and the height of the block where it was created. cluster_sizes_*.csv These CSV files contain information about clusters of addresses induced by Bitcoin transactions. They have been used for the clustering analysis presented in [4]. See this GitHub repository for more information. Transaction representation The txs file contains a textual representation of dust transactions in the Bitcoin blockchain. Each row of the file corresponds to a transaction and is represented as a sequence of fields info:inputs:outputs with the following meaning. The info section contains general information about the transaction. It is represented as a list of comma-separated fields, namely: timestamp,blockId,txId,isCoinbase,fee,approxSize. The meaning of the fields is the following: timestamp represents the Unix timestamp of the block containing the transaction. blockId represents the height of the block containing the transaction. txId is a numeric value that univocally identifies the transaction. isCoinbase is equal to 1 if the transaction is a coinbase transaction, 0 otherwise. fee denotes the transaction fee, expressed in satoshis (i.e., the smallest bitcoin denomination). approximateSize denotes the approximate size of the transaction (expressed in bytes). The inputs section contains a sequence of (0 or more) transaction inputs separated by a semicolon. Each input, in turn, is represented as a comma-separated string addrId,amount,prevTxId,offset where: addrId represents the numeric identifier of the spending address; amount is the amount of value associated with the input (expressed in satoshis); prevTxId represents the numeric identifier of the transaction that created the output that is currently being spent; offset represents the position, among all outputs of prevTxId, of the output that is currently being spent.
The outputs section contains a sequence of (1 or more) transaction outputs separated by a semicolon. Each output, in turn, is represented as a comma-separated string addrId,amount,scriptType where:
addrId represents the numeric identifier of the receiving address; amount is the amount of value associated with the output (expressed in satoshis); scriptType is a numeric identifier representing the type of the script associated with the output (i.e., 0=UNKNOWN; 1=P2PK; 2=P2PKH; 3=P2SH; 4=RETURN; 5=EMPTY). Data analysis Data included in this repository have been employed for the analyses presented in [4, 5]. This GitHub repository contains several tools, written in Java and Python, for analyzing the data. Cite this work If the data included in this repository have been useful, please cite the following article in your work.

@article{loporchio2023bitcoin, title={Is Bitcoin gathering dust? An analysis of low-amount Bitcoin transactions}, author={Loporchio, Matteo and Bernasconi, Anna and Di Francesco Maesa, Damiano and Ricci, Laura}, journal={Applied Network Science}, volume={8}, number={1}, pages={1--28}, year={2023}, publisher={SpringerOpen} } 
References Pérez-Solà, Cristina, et al. "Another coin bites the dust: an analysis of dust in UTXO-based cryptocurrencies." Royal Society open science 6.1 (2019): 180817. Jourdan, Marc, et al. "Characterizing entities in the bitcoin blockchain." 2018 IEEE international conference on data mining workshops (ICDMW). IEEE, 2018. Jourdan, Marc, et al. "A probabilistic model of the bitcoin blockchain." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. 2019. Loporchio, Matteo, et al. "Is Bitcoin gathering dust? An analysis of low-amount Bitcoin transactions." Applied Network Science 8.1 (2023): 1-28. Loporchio, Matteo, et al. "An Analysis of Bitcoin Dust Through Authenticated Queries." Complex Networks and Their Applications XI: Proceedings of The Eleventh International Conference on Complex Networks and their Applications: COMPLEX NETWORKS 2022—Volume 2. Cham: Springer International Publishing, 2023.

Authors

  • Loporchio, Matteo ;
  • Di Francesco Maesa, Damiano
0 Citations0 Mentions77% FAIR1.9 Dataset Index
10.5281/zenodo.7696453March 2023