Scholar Data

Datasets

LO2: Microservice Dataset of Logs and Metrics

LO2 datasetThis is the data repository for the LO2 dataset.Here is an overview of the contents.lo2-data.zipThis is the main dataset. This is the completely unedited output of our data collection process. Note that the uncompressed size is around 540 GB. For more information, see the paper and the data-appendix in this repository.lo2-sample.zipThis is a sample that contains the data used for preliminary analysis. It contains only service logs and the most relevant metrics for the first 100 runs. Furthermore, the metrics are combined on a run level to a single csv to make them easier to utilize. data-appendix.pdfThis document contains further details and stats about the full dataset. These include file size distributions, empty file analysis, log type analysis and the appearance of an unknown file.lo2-scripts.zipVarious scripts for processing the data to create the sample, to conduct the preliminary analysis and to create the statistics seen in the data-appendix.csv_generator.py, csv_merge*.py: These scripts create and combine the metrics into csv files. They need to be run in order. Merging runs to global is very memory intensive.findempty.py: Finds empty files in the folders. As some are expected to be empty, it also counts the unexpected ones. Used in data-appendix.loglead_lo2.py: Script for the preliminary analysis of the logs for error detection. Requires LogLead version 1.2.1.logstats.py: Counts log lines and their type. Used for creating the figure of number of lines per type and service.node_exporter_metrics.txt: Metric descriptions exported from Prometheus (text file).pca.py: The Principal Component Analysis script used for preliminary analysis.reduce_logs.py: Very important for fair analysis as in the beginning of the files there are some initialization rows that leak information regarding correctness.requirements.txt: Required Python libraries to run the scripts.sizedist.py: Creating distributions of file sizes per filename for the data-appendix.Version v3: Updated data appendix introduction, added another stage in the log analysis process in loglead_lo2.py

Authors

Bakhtin, Alexander ;
Nyyssölä, Jesse ;
Wang, Yuqing ;
Ahmad, Noman ;
Ping, Ke ;
Esposito, Matteo ;
Mäntylä, Mika ;
Taibi, Davide

1 Citation0 Mentions13% FAIR0.7 Dataset Index

10.5281/zenodo.14257989February 2025

LO2: Microservice Dataset of Logs and Metrics

Authors

Bakhtin, Alexander ;
Nyyssölä, Jesse ;
Wang, Yuqing ;
Ahmad, Noman ;
Ping, Ke ;
Esposito, Matteo ;
Mäntylä, Mika ;
Taibi, Davide

0 Citations0 Mentions73% FAIR1.8 Dataset Index

10.5281/zenodo.14938118February 2025

Automated Author Profile
Wang, Yuqing
University of Helsinki
0000-0003-0175-005x

Wang, Yuqing

Current S-Index

Average Dataset Index per Dataset

Total Datasets

Average FAIR Score

Total Citations

Total Mentions

S-Index Interpretation

S-Index Over Time

Cumulative Citations Over Time

Cumulative Mentions Over Time

Datasets

LO2: Microservice Dataset of Logs and Metrics

LO2: Microservice Dataset of Logs and Metrics

Automated Author ProfileWang, YuqingUniversity of Helsinki0000-0003-0175-005x

Wang, Yuqing

Current S-Index

Average Dataset Index per Dataset

Total Datasets

Average FAIR Score

Total Citations

Total Mentions

S-Index Interpretation

S-Index Over Time

Cumulative Citations Over Time

Cumulative Mentions Over Time

Datasets

LO2: Microservice Dataset of Logs and Metrics

LO2: Microservice Dataset of Logs and Metrics

Automated Author Profile
Wang, Yuqing
University of Helsinki
0000-0003-0175-005x