Published on 08 May 2025
Phylogenetically compressed 661k collection (.tar.xz) - High quality genomes
View DatasetDescription
Phylogenetically compressed 661k collection (high-quality genomes only) with XZ as a low-level compressor. More information can be found on https://brinda.eu/mof/.Batching is done with the GTDBk classification from: https://bakrep.computational.bio/search.Criteria for genomes:Size between 100 kbp and 15 MbpFewer than 2,000 contigsN50 greater than 5,000Completeness ≥ 90%Contamination ≤ 5%These criteria yield 631,561 assemblies, compared to the 639,981 reported in the original 661k paper.Due to file count limitations on Zenodo, the 284 batches are grouped into 99 tar files. Metadata is provided to indicate which tar file contains each batch.Citation:Břinda, K., Lima, L., Pignotti, S. et al. Efficient and robust search of microbial genomes via phylogenetic compression. Nat Methods 22, 692–697 (2025). https://doi.org/10.1038/s41592-025-02625-2Grace A. Blackwell et al., “Exploring Bacterial Diversity via a Curated and Searchable Snapshot of Archived DNA Sequences,” PLOS Biology 19, no. 11 (November 9, 2021): e3001421, https://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.3001421.Linda Fenske et al., “BakRep – a Searchable Large-Scale Web Repository for Bacterial Genomes, Characterizations and Metadata,” Microbial Genomics 10, no. 10 (2024): 001305, https://www.microbiologyresearch.org/content/journal/mgen/10.1099/mgen.0.001305.
Citations (0)
No citations found
Mentions (0)
No mentions found
Metrics Over Time
Publication Details
Subfield
Molecular Biology
Field
Biochemistry, Genetics and Molecular Biology
Domain
Life Sciences
Confidence Score
45%
Source
Scholar Data Model