Scholar Data

Datasets

Rare-Variant Extensions of the Transmission Disequilibrium Test: Application to Autism Exome Sequence Data

Many population-based rare-variant (RV) association tests, which aggregate variants across a region, have been developed to analyze sequence data. A drawback of analyzing population-based data is that it is difficult to adequately control for population substructure and admixture, and spurious associations can occur. For RVs, this problem can be substantial, because the spectrum of rare variation can differ greatly between populations. A solution is to analyze parent-child trio data, by using the transmission disequilibrium test (TDT), which is robust to population substructure and admixture. We extended the TDT to test for RV associations using four commonly used methods. We demonstrate that for all RV-TDT methods, using proper analysis strategies, type I error is well-controlled even when there are high levels of population substructure or admixture. For trio data, unlike for population-based data, RV allele-counting association methods will lead to inflated type I errors. However type I errors can be properly controlled by obtaining p values empirically through haplotype permutation. The power of the RV-TDT methods was evaluated and compared to the analysis of case-control data with a number of genetic and disease models. The RV-TDT was also used to analyze exome data from 199 Simons Simplex Collection autism trios and an association was observed with variants in ABCA7. Given the problem of adequately controlling for population substructure and admixture in RV association studies and the growing number of sequence-based trio studies, the RV-TDT is extremely beneficial to elucidate the involvement of RVs in the etiology of complex traits.

Authors

Eichler, E ;
Shendure, J ;
Nickerson, D ;
Krumm, N ;
Kan, M ;
Li, B ;
Santos-Cortez, R ;
Hooker, S ;
Wang, G ;
Smith, J ;
O'Roak, B ;
He, Z ;
Leal, Suzanne

0 Citations0 Mentions31% FAIR0.3 Dataset Index

10.15154/1177023January 2016

Genomic data from <em>Escherichia coli</em> O104:H4 isolate TY-2482

The May 2011 outbreak of an E. coli infection in Europe resulted in serious concerns about the potential appearance of a new deadly strain of bacteria, Escherichia coli O104:H4 TY-2482. In response to this situation, and immediately after the reports of deaths, the University Medical Centre Hamburg-Eppendorf and BGI-Shenzhen worked together to sequence the bacterium and assess its human health risk.

The bacteriums genome was first sequenced using Life Technologies; Ion Torrent sequencing platform. According to the results of the draft assembly, the estimated genome size of this new E. coli strain is about 5.2 Mb. Sequence analysis indicated this bacterium is an EHEC serotype O104 E. coli strain. Comparative analysis showed that this bacterium has 93% sequence similarity with the EAEC 55989 E. coli strain, which was isolated in the Central African Republic and known to cause serious diarrhea. This strain of E. coli, however, has also acquired specific sequences that appear to be similar to those involved in the pathogenicity of hemorrhagic colitis and hemolytic-uremic syndrome. The acquisition of these genes may have occurred through horizontal gene transfer.

To maximize its utility to the research community and aid those fighting the epidemic, this genomic data was released into the public domain under a CC0 license.

To the extent possible under law, BGI Shenzhen has waived all copyright and related or neighboring rights to genomic data from the 2011 E. coli outbreak. This work is published from China.

Authors

Li, Dongfang ;
Xi, Feng ;
Zhao, Meiru ;
Chen, Wentong ;
Cao, S ;
Xu, R ;
Wang, G ;
Wang, J ;
Zhang, Zhaoxi ;
Li, Yin ;
Cui, C ;
Chang, C ;
Cui, C ;
Luo, Y ;
Qin, Junjie ;
Li, Shenghui ;
Li, Junhua ;
Peng, Yangqing ;
Pu, Fei ;
Sun, Y ;
Chen, Y ;
Zong, Y ;
Ma, X ;
Yang, Xianwei ;
Cen, Zhong ;
Song, Yajun ;
Zhao, Xiangna ;
Chen, F ;
Yin, X ;
Rohde, Holger ;
Liang, Y ;
Li, Yingrui ;
, The <Em>Escherichia Coli</Em> O104:H4 TY-2482 Isolate Genome Sequencing Consortium

13 Citations0 Mentions31% FAIR6.7 Dataset Index

10.5524/100001January 2011

Automated Author Profile
Wang, G

Wang, G

Current S-Index

Average Dataset Index per Dataset

Total Datasets

Average FAIR Score

Total Citations

Total Mentions

S-Index Interpretation

S-Index Over Time

Cumulative Citations Over Time

Cumulative Mentions Over Time

Datasets

Rare-Variant Extensions of the Transmission Disequilibrium Test: Application to Autism Exome Sequence Data

Genomic data from <em>Escherichia coli</em> O104:H4 isolate TY-2482

Automated Author ProfileWang, G

Wang, G

Current S-Index

Average Dataset Index per Dataset

Total Datasets

Average FAIR Score

Total Citations

Total Mentions

S-Index Interpretation

S-Index Over Time

Cumulative Citations Over Time

Cumulative Mentions Over Time

Datasets

Rare-Variant Extensions of the Transmission Disequilibrium Test: Application to Autism Exome Sequence Data

Genomic data from <em>Escherichia coli</em> O104:H4 isolate TY-2482

Automated Author Profile
Wang, G