Published on 01 January 2021
Supporting data for "Vulcan: Improved long-read mapping and structural variant calling via dual-mode alignment"
View DatasetDescription
Long-read sequencing has enabled unprecedented surveys of structural variation across the entire human genome. To maximize the potential of long-read sequencing in this context, novel mapping methods have emerged that have primarily focused on either speed or accuracy. Various heuristics and scoring schemas have been implemented in widely used read mappers (minimap2 and NGMLR) to optimize for speed or accuracy, which have variable performance across different genomic regions and for specific structural variants. Our hypothesis is that constraining read mapping to the use of a single gap penalty across distinct mutational hotspots reduces read alignment accuracy and impedes structural variant detection.
We tested our hypothesis by implementing a read mapping pipeline called Vulcan that uses two distinct gap penalty modes, which we refer to as dual-mode alignment. The high-level idea is that Vulcan leverages the computed normalized edit distance of the mapped reads via e.g. minimap2 to identify poorly aligned reads and realigns them using the more accurate yet computationally more expensive long read mapper (NGMLR). In support of our hypothesis, we show Vulcan improves the alignments for Oxford Nanopore Technology (ONT) long-reads for both simulated and real datasets. These improvements, in turn, lead to improved accuracy for structural variant calling performance on human genome datasets compared to either of the read mapping methods alone.
Vulcan is the first long-read mapping framework that combines two distinct gap penalty modes, resulting in improved structural variant recall and precision. Vulcan is open-source and available under the MIT License at https://gitlab.com/treangenlab/vulcan
PLEASE NOTE: Most fo the files associated with this dataset are hosted in cold storage, please contact us via email with the details of the dataset and any particular files you would like to download, and we will be happy to make those available to you.
Citations (1)
- https://doi.org/10.1093/gigascience/giab063DataCite MDC
Cited on 01 September 2021
Weight: 1.00
Mentions (0)
No mentions found
Metrics Over Time
Publication Details
Subfield
Computational Theory and Mathematics
Field
Computer Science
Domain
Physical Sciences
Confidence Score
45%
Source
Scholar Data Model