Version v.1.0

MACREL software benchmark data set: Simulated metagenomes with sequencing quality, errors profile and abundance distributions derived from real samples

View Dataset
Santos-Junior, Celio Dias;Pan, Shaojun;Zhao, Xing-Ming;Coelho, Luis Pedro

Description

These metagenomes were used in the benchmarking of FACS pipeline, and were designed after NGLess benchmark dataset (doi.org/10.5281/zenodo.2560288). Metagenomes were simulated with ART-bin-MountRainier-2016.06.05 using real abundance profiles (.abund files) available elsewhere, and proGenomes' representative contigs as reference genomes. There are available metagenomes with 40, 60 and 80 M (million of reads) based in the reference genomes and abundances of the following samples:

SAMEA2466916 SAMEA2466953 SAMEA2466965 SAMEA2621107 SAMEA2621229 SAMEA2621247
To convert them from the CRAM format back to fastq files:
 ## 1. converting from cram to bam format: samtools view -b -T refgenome.fa -o file.bam file.cram ## 2. sorting the bam file: samtools sort -n file.bam -o input_sorted.bam # sort reads by identifier-name (-n) ## 3. converting from bam to fastq format: bedtools bamtofastq -i input_sorted.bam -fq output_r1.fastq -fq2 output_r2.fastq 

Citations (0)

Mentions (0)

Metrics

Dataset Index

1.8

FAIR Score

73%

Citations

0

Mentions

0

Metrics Over Time

Publication Details

DOI

Publisher

Zenodo

Assigned Domain

Subfield

Cancer Research

Field

Biochemistry, Genetics and Molecular Biology

Domain

Life Sciences

Confidence Score

85%

Source

Open Alex

Keywords

metagenomesimulationhuman gutsFACSbenchmark

Normalization Factors

FT

13.46

CTw

1.00

MTw

1.00