Scholar Data

Datasets

Data for Extremely sparse models of linkage disequilibrium in ancestrally diverse association studies

Data from Extremely sparse models of linkage disequilibrium in ancestrally diverse association studies (2023). This includes linkage disequilibrium graphical models (LDGMs) created from high-coverage 1000 Genomes Project sequencing data. This dataset consists of LDGM precision matrices, LDGM graphical models of SNPs, and lists of SNPs, all split into 1,361 approximately independent LD blocks across the genome. The dataset additionally contains genotype information from chromosomes 21 and 22, and inferred tree sequences of high coverage 1000 Genomes Project Data, summary statistics from four traits in the UK Biobank, and UK biobank correlation matrices from chromosomes 21 and 22. All genomic data is in the GRCh38 build. The data can be cited as follows: Pouria Salehi Nowbandegani, Anthony Wilder Wohns, Jenna L. Ballard, Eric S. Lander, Alex Bloemendal, Benjamin M. Neale, and Luke J. O’Connor. Extremely sparse models of linkage disequilibrium in ancestrally diverse association studies. Nat Genet. (2023) DOI: 10.1038/s41588-023-01487-8 The directory contains .tar.gz files, which can be extracted and unzipped with:

$ tar -xvf FILENAME.tar.gz

All LD block files are named by chromosome and start/end basepair coordinates. 1kg_nygc_trios_removed_All_pops_geno_ids_pops.csv: The file contains 5008 rows, 2 for each individual in the 1000 Genomes Project. Each row contains the individual ID of the 1000 genomes individual, and the ancestry group and continental ancestry group that individual was assigned to. Rows correspond to columns in .genos files. AFR/AMR/EAS/EUR/SAS.precision.tar.gz: Precision matrices for the relevant ancestry group for each LD block. Edge lists contain one row for each non-zero entry of the precision matrix. There are no column names. genos_chr21_22.tar.gz: for the 40 LD blocks on chromosomes 21-22, .genos files are 0/1 matrices, with dimension number-of-SNPs by number-of-samples . Each LD matrix contains one column for each row in the SNP list files, and one row for each row in the sample ID files. ldgms.tar.gz: 1361 LDGMs (*.edgelist files). Edge lists contain one row for each non-zero entry of the LDGM adjacency matrix. There is one LDGM edge list for each LD block. Each row represents an edge, as a tuple (index_1, index_2, entry). For the LDGM adjacency matrices, the entry is the edge weight, where 0 represents a strong dependency and e.g. 6 represents a weak dependency. snplists_GRch38positions.tar.gz: 1361 *.snplist files, each of which contains information on the SNPs in each LD block. Each SNP list is an n x 11 table (n = number of SNPs), one for each LD block. The columns are: index: these non-unique indices, starting at zero, correspond to rows and columns of the LDGMs. There can be multiple SNPs for a single index, which occurs when the corresponding mutations occur on the same brick of the bricked tree sequence. SNPs with the same index have high (nearly perfect) LD. anc_alleles: ancestral allele deriv_alleles: derived allele EUR: allele frequency of derived allele in EUR samples EAS: allele frequency of derived allele in EAS samples AMR: allele frequency of derived allele in AMR samples SAS: allele frequency of derived allele in SAS samples AFR: allele frequency of derived allele in AFR samples site_ids: unique identifier of each SNP, mostly as RSIDs position: GRCh38 position of SNP swap: indicates strandness swap ukb.tar: Correlation matrices and SNP lists for SNPs in the UK Biobank. correlation_matrices/: Correlation matrices for SNPs in the UK biobank, computed by Weissbrod et al. 2020 Nat Genet and can be downloaded by following the instructions here. snplists/: List of SNPs in the *.snplist format included in the UK Biobank tree_seqs.tar: contains 22 tree sequences inferred by tsinfer from the 30x 1000 Genomes Project Data. Tree sequences can be unzipped with tszip. Summary statistics: there are four summary statistics files, obtained from https://alkesgroup.broadinstitute.org/UKBB/, and computed by Loh et al. 2018 Nat Genet. Phenotype Heritability estimate Effective sample size Number of SNPs Height 0.570 650K 12 Million Body mass index 0.303 500K 12 Million Cardiovascular disease 0.155 450K 12 Million Type 2 diabetes 0.073 450K 12 Million

Authors

Nowbandegani, Pouria Salehi ;
Wohns, Anthony Wilder ;
Ballard, Jenna ;
Lander, Eric ;
Bloemendal, Alex ;
Neale, Ben ;
O'Connor, Luke

1 Citation0 Mentions79% FAIR0.7 Dataset Index

10.5281/zenodo.81571312023

Data for Extremely sparse models of linkage disequilibrium in ancestrally diverse association studies

$ tar -xvf FILENAME.tar.gz

Authors

Nowbandegani, Pouria Salehi ;
Wohns, Anthony Wilder ;
Ballard, Jenna ;
Lander, Eric ;
Bloemendal, Alex ;
Neale, Ben ;
O'Connor, Luke

0 Citations0 Mentions79% FAIR0.3 Dataset Index

10.5281/zenodo.81571302023

Automated Author Profile
Neale, Ben
Broad Institute of MIT and Harvard, Massachusetts General Hospital
0000-0003-1513-6077

Neale, Ben

Current S-Index

Average Dataset Index per Dataset

Total Datasets

Average FAIR Score

Total Citations

Total Mentions

S-Index Interpretation

S-Index Over Time

Cumulative Citations Over Time

Cumulative Mentions Over Time

Datasets

Data for Extremely sparse models of linkage disequilibrium in ancestrally diverse association studies

Data for Extremely sparse models of linkage disequilibrium in ancestrally diverse association studies

Automated Author ProfileNeale, BenBroad Institute of MIT and Harvard, Massachusetts General Hospital0000-0003-1513-6077

Neale, Ben

Current S-Index

Average Dataset Index per Dataset

Total Datasets

Average FAIR Score

Total Citations

Total Mentions

S-Index Interpretation

S-Index Over Time

Cumulative Citations Over Time

Cumulative Mentions Over Time

Datasets

Data for Extremely sparse models of linkage disequilibrium in ancestrally diverse association studies

Data for Extremely sparse models of linkage disequilibrium in ancestrally diverse association studies

Automated Author Profile
Neale, Ben
Broad Institute of MIT and Harvard, Massachusetts General Hospital
0000-0003-1513-6077