Automated Author ProfileNeale, Ben
Broad Institute of MIT and Harvard, Massachusetts General Hospital0000-0003-1513-6077
Neale, Ben
Current S-Index
Sum of Dataset Indices for all datasets
Average Dataset Index per Dataset
Average Dataset Index per dataset
Total Datasets
Total datasets for this author
Average FAIR Score
Average FAIR Score per dataset
Total Citations
Total citations to the author's datasets
Total Mentions
Total mentions of the author's datasets
S-Index Interpretation
The S-Index (Sharing Index) is a comprehensive metric that represents the cumulative impact of all your datasets. It is calculated as the sum of Dataset Index scores across all your claimed datasets.
What it means:
- A higher S-index indicates greater overall impact of your datasets relative to typical datasets in their fields of research
- The S-Index grows as you add more datasets or as existing datasets gain more citations and mentions
- It provides a single number to track your research data impact over time
Current S-Index: 1.0 (sum of 2 datasets Dataset Index scores)
More information here.
S-Index Over Time
Cumulative Citations Over Time
Cumulative Mentions Over Time
Datasets
Data from Extremely sparse models of linkage disequilibrium in ancestrally diverse association studies (2023). This includes linkage disequilibrium graphical models (LDGMs) created from high-coverage 1000 Genomes Project sequencing data. This dataset consists of LDGM precision matrices, LDGM graphical models of SNPs, and lists of SNPs, all split into 1,361 approximately independent LD blocks across the genome. The dataset additionally contains genotype information from chromosomes 21 and 22, and inferred tree sequences of high coverage 1000 Genomes Project Data, summary statistics from four traits in the UK Biobank, and UK biobank correlation matrices from chromosomes 21 and 22. All genomic data is in the GRCh38 build. The data can be cited as follows: Pouria Salehi Nowbandegani, Anthony Wilder Wohns, Jenna L. Ballard, Eric S. Lander, Alex Bloemendal, Benjamin M. Neale, and Luke J. O’Connor. Extremely sparse models of linkage disequilibrium in ancestrally diverse association studies. Nat Genet. (2023) DOI: 10.1038/s41588-023-01487-8 The directory contains .tar.gz files, which can be extracted and unzipped with:
$ tar -xvf FILENAME.tar.gz All LD block files are named by chromosome and start/end basepair coordinates. 1kg_nygc_trios_removed_All_pops_geno_ids_pops.csv: The file contains 5008 rows, 2 for each individual in the 1000 Genomes Project. Each row contains the individual ID of the 1000 genomes individual, and the ancestry group and continental ancestry group that individual was assigned to. Rows correspond to columns in .genos files. AFR/AMR/EAS/EUR/SAS.precision.tar.gz: Precision matrices for the relevant ancestry group for each LD block. Edge lists contain one row for each non-zero entry of the precision matrix. There are no column names. genos_chr21_22.tar.gz: for the 40 LD blocks on chromosomes 21-22, .genos files are 0/1 matrices, with dimension number-of-SNPs by number-of-samples . Each LD matrix contains one column for each row in the SNP list files, and one row for each row in the sample ID files. ldgms.tar.gz: 1361 LDGMs (*.edgelist files). Edge lists contain one row for each non-zero entry of the LDGM adjacency matrix. There is one LDGM edge list for each LD block. Each row represents an edge, as a tuple (index_1, index_2, entry). For the LDGM adjacency matrices, the entry is the edge weight, where 0 represents a strong dependency and e.g. 6 represents a weak dependency. snplists_GRch38positions.tar.gz: 1361 *.snplist files, each of which contains information on the SNPs in each LD block. Each SNP list is an n x 11 table (n = number of SNPs), one for each LD block. The columns are: index: these non-unique indices, starting at zero, correspond to rows and columns of the LDGMs. There can be multiple SNPs for a single index, which occurs when the corresponding mutations occur on the same brick of the bricked tree sequence. SNPs with the same index have high (nearly perfect) LD. anc_alleles: ancestral allele deriv_alleles: derived allele EUR: allele frequency of derived allele in EUR samples EAS: allele frequency of derived allele in EAS samples AMR: allele frequency of derived allele in AMR samples SAS: allele frequency of derived allele in SAS samples AFR: allele frequency of derived allele in AFR samples site_ids: unique identifier of each SNP, mostly as RSIDs position: GRCh38 position of SNP swap: indicates strandness swap ukb.tar: Correlation matrices and SNP lists for SNPs in the UK Biobank. correlation_matrices/: Correlation matrices for SNPs in the UK biobank, computed by Weissbrod et al. 2020 Nat Genet and can be downloaded by following the instructions here. snplists/: List of SNPs in the *.snplist format included in the UK Biobank tree_seqs.tar: contains 22 tree sequences inferred by tsinfer from the 30x 1000 Genomes Project Data. Tree sequences can be unzipped with tszip. Summary statistics: there are four summary statistics files, obtained from https://alkesgroup.broadinstitute.org/UKBB/, and computed by Loh et al. 2018 Nat Genet. Phenotype Heritability estimate Effective sample size Number of SNPs Height 0.570 650K 12 Million Body mass index 0.303 500K 12 Million Cardiovascular disease 0.155 450K 12 Million Type 2 diabetes 0.073 450K 12 Million
Authors
- Nowbandegani, Pouria Salehi ;
- Wohns, Anthony Wilder ;
- Ballard, Jenna ;
- Lander, Eric ;
- Bloemendal, Alex ;
- Neale, Ben ;
- O'Connor, Luke
Data from Extremely sparse models of linkage disequilibrium in ancestrally diverse association studies (2023). This includes linkage disequilibrium graphical models (LDGMs) created from high-coverage 1000 Genomes Project sequencing data. This dataset consists of LDGM precision matrices, LDGM graphical models of SNPs, and lists of SNPs, all split into 1,361 approximately independent LD blocks across the genome. The dataset additionally contains genotype information from chromosomes 21 and 22, and inferred tree sequences of high coverage 1000 Genomes Project Data, summary statistics from four traits in the UK Biobank, and UK biobank correlation matrices from chromosomes 21 and 22. All genomic data is in the GRCh38 build. The data can be cited as follows: Pouria Salehi Nowbandegani, Anthony Wilder Wohns, Jenna L. Ballard, Eric S. Lander, Alex Bloemendal, Benjamin M. Neale, and Luke J. O’Connor. Extremely sparse models of linkage disequilibrium in ancestrally diverse association studies. Nat Genet. (2023) DOI: 10.1038/s41588-023-01487-8 The directory contains .tar.gz files, which can be extracted and unzipped with:
$ tar -xvf FILENAME.tar.gz All LD block files are named by chromosome and start/end basepair coordinates. 1kg_nygc_trios_removed_All_pops_geno_ids_pops.csv: The file contains 5008 rows, 2 for each individual in the 1000 Genomes Project. Each row contains the individual ID of the 1000 genomes individual, and the ancestry group and continental ancestry group that individual was assigned to. Rows correspond to columns in .genos files. AFR/AMR/EAS/EUR/SAS.precision.tar.gz: Precision matrices for the relevant ancestry group for each LD block. Edge lists contain one row for each non-zero entry of the precision matrix. There are no column names. genos_chr21_22.tar.gz: for the 40 LD blocks on chromosomes 21-22, .genos files are 0/1 matrices, with dimension number-of-SNPs by number-of-samples . Each LD matrix contains one column for each row in the SNP list files, and one row for each row in the sample ID files. ldgms.tar.gz: 1361 LDGMs (*.edgelist files). Edge lists contain one row for each non-zero entry of the LDGM adjacency matrix. There is one LDGM edge list for each LD block. Each row represents an edge, as a tuple (index_1, index_2, entry). For the LDGM adjacency matrices, the entry is the edge weight, where 0 represents a strong dependency and e.g. 6 represents a weak dependency. snplists_GRch38positions.tar.gz: 1361 *.snplist files, each of which contains information on the SNPs in each LD block. Each SNP list is an n x 11 table (n = number of SNPs), one for each LD block. The columns are: index: these non-unique indices, starting at zero, correspond to rows and columns of the LDGMs. There can be multiple SNPs for a single index, which occurs when the corresponding mutations occur on the same brick of the bricked tree sequence. SNPs with the same index have high (nearly perfect) LD. anc_alleles: ancestral allele deriv_alleles: derived allele EUR: allele frequency of derived allele in EUR samples EAS: allele frequency of derived allele in EAS samples AMR: allele frequency of derived allele in AMR samples SAS: allele frequency of derived allele in SAS samples AFR: allele frequency of derived allele in AFR samples site_ids: unique identifier of each SNP, mostly as RSIDs position: GRCh38 position of SNP swap: indicates strandness swap ukb.tar: Correlation matrices and SNP lists for SNPs in the UK Biobank. correlation_matrices/: Correlation matrices for SNPs in the UK biobank, computed by Weissbrod et al. 2020 Nat Genet and can be downloaded by following the instructions here. snplists/: List of SNPs in the *.snplist format included in the UK Biobank tree_seqs.tar: contains 22 tree sequences inferred by tsinfer from the 30x 1000 Genomes Project Data. Tree sequences can be unzipped with tszip. Summary statistics: there are four summary statistics files, obtained from https://alkesgroup.broadinstitute.org/UKBB/, and computed by Loh et al. 2018 Nat Genet. Phenotype Heritability estimate Effective sample size Number of SNPs Height 0.570 650K 12 Million Body mass index 0.303 500K 12 Million Cardiovascular disease 0.155 450K 12 Million Type 2 diabetes 0.073 450K 12 Million
Authors
- Nowbandegani, Pouria Salehi ;
- Wohns, Anthony Wilder ;
- Ballard, Jenna ;
- Lander, Eric ;
- Bloemendal, Alex ;
- Neale, Ben ;
- O'Connor, Luke