Published on 08 May 2025

Phylogenetically compressed 661k collection (.tar.xz) - High quality genomes

View Dataset
Truong, Tam

Description

Phylogenetically compressed 661k collection (high-quality genomes only) with XZ as a low-level compressor. More information can be found on https://brinda.eu/mof/.Batching is done with the GTDBk classification from: https://bakrep.computational.bio/search.Criteria for genomes:Size between 100 kbp and 15 MbpFewer than 2,000 contigsN50 greater than 5,000Completeness ≥ 90%Contamination ≤ 5%These criteria yield 631,561 assemblies, compared to the 639,981 reported in the original 661k paper.Due to file count limitations on Zenodo, the 284 batches are grouped into 99 tar files. Metadata is provided to indicate which tar file contains each batch.Citation:Břinda, K., Lima, L., Pignotti, S. et al. Efficient and robust search of microbial genomes via phylogenetic compression. Nat Methods 22, 692–697 (2025). https://doi.org/10.1038/s41592-025-02625-2Grace A. Blackwell et al., “Exploring Bacterial Diversity via a Curated and Searchable Snapshot of Archived DNA Sequences,” PLOS Biology 19, no. 11 (November 9, 2021): e3001421, https://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.3001421.Linda Fenske et al., “BakRep – a Searchable Large-Scale Web Repository for Bacterial Genomes, Characterizations and Metadata,” Microbial Genomics 10, no. 10 (2024): 001305, https://www.microbiologyresearch.org/content/journal/mgen/10.1099/mgen.0.001305.

Citations (0)

Mentions (0)

Metrics

Dataset Index

0.8

FAIR Score

73%

Citations

0

Mentions

0

Metrics Over Time

Publication Details

DOI

Publisher

Zenodo

Assigned Domain

Subfield

Molecular Biology

Field

Biochemistry, Genetics and Molecular Biology

Domain

Life Sciences

Confidence Score

45%

Source

Scholar Data Model

Normalization Factors

FT

30.77

CTw

1.00

MTw

1.00