Published on 14 September 2020

Data for "Learning the language of viral evolution and escape"

View Dataset
Hie, Brian

Description

Training data from: Influenza A HA protein sequences from the NIAID Influenza Research Database (IRD) (http://www.fludb.org) HIV-1 Env protein sequences from the Los Alamos National Laboratory (LANL) HIV database (https://www.hiv.lanl.gov) Coronavidae spike protein sequences from the Virus Pathogen Resource (ViPR) database (https://www.viprbrc.org/brc/home.spg?decorator=corona) SARS-CoV-2 Spike protein sequences from NCBI Virus (https://www.ncbi.nlm.nih.gov/labs/virus/vssi/) SARS-CoV-2 Spike and other Betacoronavirus spike protein sequences from GISAID (https://www.gisaid.org/) Datasets for fitness and escape validation: Fitness single-residue DMS of HA H1 WSN33 from Doud and Bloom (2016) Fitness combinatorial DMS of antigenic site B in six HA H3 strains from Wu et al. (2020) Fitness single-residue DMS of Env BF520 and BG505 from Haddox et al. (2018) ACE2 binding affinity combinatorial DMS of Spike from Starr et al. (2020) Escape single-residue DMS of HA H1 WSN33 from Doud et al. (2018) Escape single-residue DMS of HA H3 Perth09 from Lee et al. (2019) Escape single-residue DMS of Env BG505 from Dingens et al. (2019) Escape mutations of Spike from Baum et al. (2020) Escape single-residue DMS of Spike from Greaney et al. (2020)

Citations (0)

Mentions (0)

Metrics

Dataset Index

0.8

FAIR Score

73%

Citations

0

Mentions

0

Metrics Over Time

Publication Details

DOI

Publisher

Zenodo

Assigned Domain

Subfield

Molecular Biology

Field

Biochemistry, Genetics and Molecular Biology

Domain

Life Sciences

Confidence Score

82%

Source

Open Alex

Normalization Factors

FT

30.77

CTw

1.00

MTw

1.00