Automated Author ProfileYu, Chang
Yu, Chang
Current S-Index
Sum of Dataset Indices for all datasets
Average Dataset Index per Dataset
Average Dataset Index per dataset
Total Datasets
Total datasets for this author
Average FAIR Score
Average FAIR Score per dataset
Total Citations
Total citations to the author's datasets
Total Mentions
Total mentions of the author's datasets
S-Index Interpretation
The S-Index (Sharing Index) is a comprehensive metric that represents the cumulative impact of all your datasets. It is calculated as the sum of Dataset Index scores across all your claimed datasets.
What it means:
- A higher S-index indicates greater overall impact of your datasets relative to typical datasets in their fields of research
- The S-Index grows as you add more datasets or as existing datasets gain more citations and mentions
- It provides a single number to track your research data impact over time
Current S-Index: 19.4 (sum of 13 datasets Dataset Index scores)
More information here.
S-Index Over Time
Cumulative Citations Over Time
Cumulative Mentions Over Time
Datasets
The dataset is for a study conducted to understand genome-wide association (GWA) and genomic prediction of biomass yield and 14 yield-components traits in Miscanthus sacchariflorus. We evaluated a diversity panel with 590 accessions of M. sacchariflorus grown across four years in one subtropical and three temperate locations and genotyped with 268,109 single nucleotide polymorphisms (SNPs).
Authors
- Njuguna, Joyce ;
- Clark, Lindsay ;
- Lipka , Alexander ;
- Anzoua, Kossonou ;
- Bagmet, Larisa ;
- Chebukin, Pavel ;
- Dwiyanti, Maria ;
- Dzyubenko, Elena ;
- Dzyubenko, Nicolay ;
- Ghimire, Bimal ;
- Jin, Xiaoli ;
- Johnson, Douglas ;
- Nagano, Hironori ;
- Peng, Junhua ;
- Petersen, Karen ;
- Sabitov, Andrey ;
- Seong, Eun ;
- Yamada, Toshihiko ;
- Yoo, Ji ;
- Yu, Chang ;
- Zhao, Hu ;
- Long, Stephen ;
- Sacks, Erik
This dataset contains all data used in the paper "Impact of genotype-calling methodologies on genome-wide association and genomic prediction in polyploids". The dataset includes genotypes and phenotypic data from two autotetraploid species Miscanthus sacchariflorus and Vaccinium corymbosum that was used used for genome wide association studies and genomic prediction and the scripts used in the analysis. In this V2, 2 files have the raw data are added: "Miscanthus_sacchariflorus_RADSeq.vcf" is the VCF file with the raw SNP calls of the Miscanthus sacchariflorus data used for genotype calling using the 6 genotype calling methods. "Blueberry_data_read_depths.RData" is the a RData file with the read depth data that was used for genotype calling in the Blueberry dataset.
Authors
- Njuguna, Joyce ;
- Clark, Lindsay ;
- Lipka, Alexander ;
- Anzoua, Kossonou ;
- Bagmet, Larisa ;
- Chebukin, Pavel ;
- Dwiyanti, Maria ;
- Dzyubenko, Elena ;
- Dzyubenko, Nicolay ;
- Ghimire, Bimal ;
- Jin, Xiaoli ;
- Johnson, Douglas ;
- Kjeldsen, Jens ;
- Nagano, Hironori ;
- Oliveira, Ivone ;
- Peng, Junhua ;
- Petersen, Karen ;
- Sabitov, Andrey ;
- Seong, Eun ;
- Yamada, Toshihiko ;
- Yoo, Ji ;
- Yu, Chang ;
- Zhao, Hu ;
- Munoz, Patricio ;
- Long, Stephen ;
- Sacks, Erik
This dataset contains all data used in the paper "Impact of genotype-calling methodologies on genome-wide association and genomic prediction in polyploids". The dataset includes genotypes and phenotypic data from two autotetraploid species Miscanthus sacchariflorus and Vaccinium corymbosum that was used used for genome wide association studies and genomic prediction and the scripts used in the analysis.
Authors
- Njuguna, Joyce ;
- Clark, Lindsay ;
- Lipka, Alexander ;
- Anzoua, Kossonou ;
- Bagmet, Larisa ;
- Chebukin, Pavel ;
- Dwiyanti, Maria ;
- Dzyubenko, Elena ;
- Dzyubenko, Nicolay ;
- Ghimire, Bimal ;
- Jin, Xiaoli ;
- Johnson, Douglas ;
- Kjeldsen, Jens ;
- Nagano, Hironori ;
- Oliverira, Ivone ;
- Peng, Junhua ;
- Petersen, Karen ;
- Sabitov, Andrey ;
- Seong, Eun ;
- Yamada, Toshihiko ;
- Yoo, Ji ;
- Yu, Chang ;
- Zhao, Hu ;
- Munoz, Patricio ;
- Long , Stephen ;
- Sacks, Erik
Figure S1: Scree plots showing the proportion of variation explained (percentage, %; Y-axis) by each principal component (PC; X-axis) in (A) Miscanthus sinensis, and (B) Miscanthus sacchariflorus. These PCs are from a principal component analysis conducted on 5,140 genome-wide markers.Figure S2: Phenotypic distribution of individuals in the study populations. Distributions of Miscanthus sinensis (blue), Miscanthus sacchariflorus (green), and the 09F2 population (orange) for traits Basal circumference (Bcirc; cm), Compressed circumference (Ccric; cm), Culm length (CmL; cm), Diameter of basal internode (DBI; mm), days to first heading (HD1; days), and Yield (Yld; g/plant). The median value of each population is represented in solid lines with colors corresponding to their respective populations. The trait values of the parental lines are represented in broken lines with blue corresponding to ‘Cosmopolitan Revert’ from M. sinensis, and green corresponding to ‘Robustus’ from M. sacchariflorus.
Figure S3: Barplots showing the narrow-sense heritability (Y-axis) for basal circumference (Bcirc; cm), compressed circumference (Ccirc; cm), culm length (CmL; cm), days to first heading (HD1; days), and yield (Yld; g/plant) (X-axis), color coded based on the three populations considered in this study Miscanthus sinensis (Msi), Miscanthus sacchariflorus (Msa), and F2 breeding population (09F2).
Figure S4: Principal component (PC) analysis of Miscanthus sacchariflorus and Miscanthus sinensis diversity panels. Open circles are individuals distributed along PC1 (X-axis) and PC2 (Y-axis) in (A) M. sacchariflorus and (B) M. sinensis. These PCs are from a principal component analysis conducted on 5,140 genome-wide markers. The diamond shapes represent the parents ‘Robustus’ from M. sacchariflorus and “Cosmopolitan Revert” from M. sinensis that were used to develop the interspecific F2 population (09F2). Color coding of the individuals was based on genetic clusters from previous analyses (Clark et al. 2014, 2018) conducted on these data.
Figure S5: Heatmap showing the genetic relatedness using 5,140 genome-wide markers between accessions in the Miscanthus sinensis and Miscanthus sacchariflorus diversity panels and 09F2 breeding population. This heatmap is presented for (A) all three populations, (B) Msi only, (C) Msa only, and (D) 09F2 breeding population only.
Figure S6: Distribution of individuals selected by CDmean on the Miscanthus sinensis (Msi) principal component axes for basal circumference (Bcirc), compressed circumference (Ccirc), culm length (CmL), days to first heading (HD1), and yield (Yld). The X-axis on each graph is principal component (PC) 1, while the Y-axis is PC2. Both PCs are from a principal component analysis of 5,140 genome-wide markers. The individuals selected by the CDmean procedure are colored, and the Msi parent of the 09F2 population, “Cosmopolitan Revert”, is indicated by a diamond.
Figure S7: Distribution of individuals selected by CDmean on the Miscanthus sacchariflorus (Msa) principal component axes for basal circumference (Bcirc), compressed circumference (Ccirc), culm length (CmL), days to first heading (HD1), and yield (Yld). The X-axis on each graph is principal component (PC) 1, while the Y-axis is PC2. Both PCs are from a principal component analysis of 5,140 genome-wide markers. The individuals selected by the CDmean procedure are colored, and the Msa parent of the 09F2 population, “Robustus”, is indicated by a diamond.
Figure S8: Linkage disequilibrium decay curves for (A) Miscanthus sinensis diversity panel and Miscanthus sacchariflorus within 50 kilobase (kb) window, (B) Miscanthus sinensis diversity panel, Miscanthus sacchariflorus, and 09F2 breeding population within 250 kb window, and (C) Miscanthus sinensis diversity panel, Miscanthus sacchariflorus, and 09F2 breeding population within a 2,000 kb window. On each graph, the X-axis is the physical distance between marker pairs and they Y-axis is the squared Pearson correlation between the markers.
Figure S9: Comparison of using diversity panels and F2 populations as GS training sets for making predictions in simulated F2 populations. For each prediction accuracy of a given F2 population, either the diversity panels, or a stratified random sample of the remaining 49 F2 populations were used as training set. Thus, each boxplot represents a distribution of prediction accuracies (Y-axis) across 50 simulated interspecific F2 populations for traits with contrasting genetic architectures (X-axis). Each boxplot was colored based on the approach used to train the genomic selection model which are: Msa (all 598 individuals in the Miscanthus sacchariflorus panel), Msi (all 538 individuals in the Miscanthus sinensis panel), F2.6H (600 randomly selected individuals from the 50 simulated F2 populations), Msi.Msa (sum of the genomic estimated breeding values, or GEBVs, estimated from Msi and Msa panels), F2.1K (1,200 randomly selected individuals from the 50 simulated F2 populations), MM.F6H (sum of the GEBVs estimated from Msi and Msa panels, and 600 randomly selected individuals from the 50 simulated F2 populations), MM.F1K (sum of the GEBVs estimated from Msi and Msa panels, and 1,000 randomly selected individuals from the 50 simulated F2 populations), F2.10K (all the individuals (n=10,800) in the 50 simulated F2 populations), and MM.F9K (sum of the GEBVs estimated from Msi and Msa panels, and 10,800 individuals from the 50 simulated F2 populations). Traits were simulated using five different scenarios, namely: D.QTN (traits simulated with completely different QTN in Msi and Msa but with the same effect sizes), D.QTN.Msa (traits simulated with different QTNs in each of Msi and Msa, with Msa QTNs having large effects while Msi QTNs had small effects), D.QTN.Msi (traits simulated with different QTNs in each of Msi and Msa, with Msi QTNs having large effects while Msa QTNs had small effects), P.QTN (traits where with 50% of the QTNs were the same across Msi and Msa, while 50% were different), and S.QTN (traits simulated in Msi and Msa based on the same QTNs and same effect sizes). All simulated traits had 20 additive QTN, 0 dominance QTN, and 0 epistatic QTN, while the heritabilities are as presented in Table 2. The white dots represent the mean value of each distribution.
Table S1: Summary statistics of phenotypic least squares means within Miscanthus sinensis and Miscanthus sacchariflorus diversity panels and the 09F2 breeding population.
Authors
- Olatoye, Marcus O. ;
- Clark, Lindsay V. ;
- Labonte, Nicholas R. ;
- Hongxu Dong ;
- Dwiyanti, Maria S. ;
- Kossanou Anzoua ;
- Brummer, Joe E. ;
- Bimal Ghimire ;
- Glowacka, Katarzyna ;
- Heo, Kweon ;
- Xiaoli Jin ;
- Nagano, Hironori ;
- Junhua Peng ;
- Yu, Chang ;
- Yoo, Ji ;
- Zhao, Hua ;
- Long, Stephen P. ;
- Yamada, Toshihiko ;
- Lipka, Alexander E.
Figure S1: Scree plots showing the proportion of variation explained (percentage, %; Y-axis) by each principal component (PC; X-axis) in (A) Miscanthus sinensis, and (B) Miscanthus sacchariflorus. These PCs are from a principal component analysis conducted on 5,140 genome-wide markers.Figure S2: Phenotypic distribution of individuals in the study populations. Distributions of Miscanthus sinensis (blue), Miscanthus sacchariflorus (green), and the 09F2 population (orange) for traits Basal circumference (Bcirc; cm), Compressed circumference (Ccric; cm), Culm length (CmL; cm), Diameter of basal internode (DBI; mm), days to first heading (HD1; days), and Yield (Yld; g/plant). The median value of each population is represented in solid lines with colors corresponding to their respective populations. The trait values of the parental lines are represented in broken lines with blue corresponding to ‘Cosmopolitan Revert’ from M. sinensis, and green corresponding to ‘Robustus’ from M. sacchariflorus.
Figure S3: Barplots showing the narrow-sense heritability (Y-axis) for basal circumference (Bcirc; cm), compressed circumference (Ccirc; cm), culm length (CmL; cm), days to first heading (HD1; days), and yield (Yld; g/plant) (X-axis), color coded based on the three populations considered in this study Miscanthus sinensis (Msi), Miscanthus sacchariflorus (Msa), and F2 breeding population (09F2).
Figure S4: Principal component (PC) analysis of Miscanthus sacchariflorus and Miscanthus sinensis diversity panels. Open circles are individuals distributed along PC1 (X-axis) and PC2 (Y-axis) in (A) M. sacchariflorus and (B) M. sinensis. These PCs are from a principal component analysis conducted on 5,140 genome-wide markers. The diamond shapes represent the parents ‘Robustus’ from M. sacchariflorus and “Cosmopolitan Revert” from M. sinensis that were used to develop the interspecific F2 population (09F2). Color coding of the individuals was based on genetic clusters from previous analyses (Clark et al. 2014, 2018) conducted on these data.
Figure S5: Heatmap showing the genetic relatedness using 5,140 genome-wide markers between accessions in the Miscanthus sinensis and Miscanthus sacchariflorus diversity panels and 09F2 breeding population. This heatmap is presented for (A) all three populations, (B) Msi only, (C) Msa only, and (D) 09F2 breeding population only.
Figure S6: Distribution of individuals selected by CDmean on the Miscanthus sinensis (Msi) principal component axes for basal circumference (Bcirc), compressed circumference (Ccirc), culm length (CmL), days to first heading (HD1), and yield (Yld). The X-axis on each graph is principal component (PC) 1, while the Y-axis is PC2. Both PCs are from a principal component analysis of 5,140 genome-wide markers. The individuals selected by the CDmean procedure are colored, and the Msi parent of the 09F2 population, “Cosmopolitan Revert”, is indicated by a diamond.
Figure S7: Distribution of individuals selected by CDmean on the Miscanthus sacchariflorus (Msa) principal component axes for basal circumference (Bcirc), compressed circumference (Ccirc), culm length (CmL), days to first heading (HD1), and yield (Yld). The X-axis on each graph is principal component (PC) 1, while the Y-axis is PC2. Both PCs are from a principal component analysis of 5,140 genome-wide markers. The individuals selected by the CDmean procedure are colored, and the Msa parent of the 09F2 population, “Robustus”, is indicated by a diamond.
Figure S8: Linkage disequilibrium decay curves for (A) Miscanthus sinensis diversity panel and Miscanthus sacchariflorus within 50 kilobase (kb) window, (B) Miscanthus sinensis diversity panel, Miscanthus sacchariflorus, and 09F2 breeding population within 250 kb window, and (C) Miscanthus sinensis diversity panel, Miscanthus sacchariflorus, and 09F2 breeding population within a 2,000 kb window. On each graph, the X-axis is the physical distance between marker pairs and they Y-axis is the squared Pearson correlation between the markers.
Figure S9: Comparison of using diversity panels and F2 populations as GS training sets for making predictions in simulated F2 populations. For each prediction accuracy of a given F2 population, either the diversity panels, or a stratified random sample of the remaining 49 F2 populations were used as training set. Thus, each boxplot represents a distribution of prediction accuracies (Y-axis) across 50 simulated interspecific F2 populations for traits with contrasting genetic architectures (X-axis). Each boxplot was colored based on the approach used to train the genomic selection model which are: Msa (all 598 individuals in the Miscanthus sacchariflorus panel), Msi (all 538 individuals in the Miscanthus sinensis panel), F2.6H (600 randomly selected individuals from the 50 simulated F2 populations), Msi.Msa (sum of the genomic estimated breeding values, or GEBVs, estimated from Msi and Msa panels), F2.1K (1,200 randomly selected individuals from the 50 simulated F2 populations), MM.F6H (sum of the GEBVs estimated from Msi and Msa panels, and 600 randomly selected individuals from the 50 simulated F2 populations), MM.F1K (sum of the GEBVs estimated from Msi and Msa panels, and 1,000 randomly selected individuals from the 50 simulated F2 populations), F2.10K (all the individuals (n=10,800) in the 50 simulated F2 populations), and MM.F9K (sum of the GEBVs estimated from Msi and Msa panels, and 10,800 individuals from the 50 simulated F2 populations). Traits were simulated using five different scenarios, namely: D.QTN (traits simulated with completely different QTN in Msi and Msa but with the same effect sizes), D.QTN.Msa (traits simulated with different QTNs in each of Msi and Msa, with Msa QTNs having large effects while Msi QTNs had small effects), D.QTN.Msi (traits simulated with different QTNs in each of Msi and Msa, with Msi QTNs having large effects while Msa QTNs had small effects), P.QTN (traits where with 50% of the QTNs were the same across Msi and Msa, while 50% were different), and S.QTN (traits simulated in Msi and Msa based on the same QTNs and same effect sizes). All simulated traits had 20 additive QTN, 0 dominance QTN, and 0 epistatic QTN, while the heritabilities are as presented in Table 2. The white dots represent the mean value of each distribution.
Table S1: Summary statistics of phenotypic least squares means within Miscanthus sinensis and Miscanthus sacchariflorus diversity panels and the 09F2 breeding population.
Authors
- Olatoye, Marcus O. ;
- Clark, Lindsay V. ;
- Labonte, Nicholas R. ;
- Hongxu Dong ;
- Dwiyanti, Maria S. ;
- Kossanou Anzoua ;
- Brummer, Joe E. ;
- Bimal Ghimire ;
- Glowacka, Katarzyna ;
- Heo, Kweon ;
- Xiaoli Jin ;
- Nagano, Hironori ;
- Junhua Peng ;
- Yu, Chang ;
- Yoo, Ji ;
- Zhao, Hua ;
- Long, Stephen P. ;
- Yamada, Toshihiko ;
- Lipka, Alexander E.
The propensity score (PS) method is widely used to estimate the average treatment effect (TE) in observational studies. However, it is generally confined to the binary treatment assignment. In an extension to the settings of a multi-level treatment, Imbens proposed a generalized propensity score which is the conditional probability of receiving a particular level of the treatment given pre-treatment variables. The average TE can then be estimated by conditioning solely on the generalized PS under the assumption of weak unconfoundedness. In the present work, we adopted this approach and conducted extensive simulations to evaluate the performance of several methods using the generalized PS, including subclassification, matching, inverse probability of treatment weighting (IPTW), and covariate adjustment. Compared with other methods, IPTW had the preferred overall performance. We then applied these methods to a retrospective cohort study of 228,876 pregnant women. The impact of the exposure to different types of the antidepressant medications (no exposure, selective serotonin reuptake inhibitor (SSRI) only, non-SSRI only, and both) during pregnancy on several important infant outcomes (birth weight, gestation age, preterm labor, and respiratory distress) were assessed.
Authors
- Nian, Hui ;
- Yu, Chang ;
- Ding, Juan ;
- Huiyun Wu ;
- Dupont, William D. ;
- Brunwasser, Steve ;
- Tebeb Gebretsadik ;
- Hartert, Tina V. ;
- Pingsheng Wu
The propensity score (PS) method is widely used to estimate the average treatment effect (TE) in observational studies. However, it is generally confined to the binary treatment assignment. In an extension to the settings of a multi-level treatment, Imbens proposed a generalized propensity score which is the conditional probability of receiving a particular level of the treatment given pre-treatment variables. The average TE can then be estimated by conditioning solely on the generalized PS under the assumption of weak unconfoundedness. In the present work, we adopted this approach and conducted extensive simulations to evaluate the performance of several methods using the generalized PS, including subclassification, matching, inverse probability of treatment weighting (IPTW), and covariate adjustment. Compared with other methods, IPTW had the preferred overall performance. We then applied these methods to a retrospective cohort study of 228,876 pregnant women. The impact of the exposure to different types of the antidepressant medications (no exposure, selective serotonin reuptake inhibitor (SSRI) only, non-SSRI only, and both) during pregnancy on several important infant outcomes (birth weight, gestation age, preterm labor, and respiratory distress) were assessed.
Authors
- Nian, Hui ;
- Yu, Chang ;
- Ding, Juan ;
- Huiyun Wu ;
- Dupont, William D. ;
- Brunwasser, Steve ;
- Tebeb Gebretsadik ;
- Hartert, Tina V. ;
- Pingsheng Wu
Quality Control (QC) and preprocessing are essential steps for sequencing data analysis to ensure the accuracy of results. However, existing tools cannot provide a satisfying solution with integrated comprehensive functions, proper architectures and highly-scalable acceleration. In this article, we demonstrate SOAPnuke as a tool with abundant functions for a QC-Preprocess-QC workflow and MapReduce acceleration framework. Four modules with different preprocessing functions are designed for processing datasets from genomic, small RNA (sRNA), Digital Gene Expression (DGE) and metagenomic experiments respectively. As a workflow-like tool, SOAPnuke centralizes processing functions in one executable and predefine their order to avoid the necessity of reformatting different files when switching tools. Furthermore, the MapReduce framework enables large scalability to distribute all the processing works to an entire compute cluster. We conducted a benchmarking where SOAPnuke and other tools are used to preprocess ~30x NA12878 dataset published by GIAB. The standalone operation of SOAPnuke struck a balance between resource occupancy and performance. When accelerated on 16 working nodes with MapReduce, SOAPnuke achieved ~5.7 times of the fastest speed of other tools.
Authors
- Chen, Yuxin ;
- Chen, Yongsheng ;
- Shi, Chunmei ;
- Huang, Zhibo ;
- Zhang, Yong ;
- Li, Shengkang ;
- Li, Yan ;
- Ye, Jia ;
- Yu, Chang ;
- Li, Zhuo ;
- Zhang, Xiuqing ;
- Wang, Jian ;
- Yang, Huanming ;
- Fang, Lin ;
- Chen, Qiang
SOAPdenovo2 is the latest de novo genome assembly package from BGIs SOAP (short oligonucleotide analysis package) suite of tools (homepage here: http://soap.genomics.org.cn/). Compared to SOAPdenovo1, this new version has the advantage of a new algorithm design that reduces memory consumption in graph construction, resolves more repeat regions in contig assembly, increases coverage and length in scaffold construction, improves gap closure, and is optimized for large genomes.
Using new sequencing data from the YH (Homo sapiens) diploid genome the first sequenced Han Chinese individual, an updated assembly was produced (see dataset here: doi:10.5524/100038), with the N50 scores for the contig and scaffold being 3-fold and 50-fold longer, respectively, than the first published version. The genome coverage increased from 81.16% to 93.91%, and memory consumption was ~2/3 times lower during the point of largest memory consumption.
Benchmarking with Assemblathon1 and GAGE datasets shows that SOAPdenovo2 greatly surpasses its predecessor SOAPdenovo1 and is competitive to other assemblers on both assembly length and accuracy.
In order to facilitate readers to repeat and recreate these findings, configured packages with the compressed pipelines containing all of the necessary shell scripts and tools are available from the BGI FTP server (ftp://public.genomics.org.cn/BGI/SOAPdenovo2).
The latest version of SOAPdenovo2 is available from Sourceforge: http://soapdenovo2.sourceforge.net/
These pipelines are available from our data platform as Galaxy workflows: http://galaxy.cbiit.cuhk.edu.hk/
Authors
- Luo, Ruibang ;
- Liu, Binghang ;
- Xie, Yinlong ;
- Li, Zhenyu ;
- Huang, Weihua ;
- Yuan, Jianying ;
- He, Guangzhu ;
- Chen, Yanxiang ;
- Pan, Qi ;
- Liu, Yunjie ;
- Tang, Jingbo ;
- Wu, Gengxiong ;
- Zhang, Hao ;
- Shi, Yujian ;
- Liu, Yong ;
- Yu, Chang ;
- Wang, Bo ;
- Lu, Yao ;
- Han, Changlei ;
- Cheung, David ;
- Yiu, Siu-Ming ;
- Peng, Shaoliang ;
- Xiaoqian, Zhu ;
- Liu, Guangming ;
- Liao, Xiangke ;
- Li, Yingrui ;
- Yang, Huanming ;
- Wang, Jian ;
- Lam, Tak-Wah, W ;
- Wang, Jun
The methylome reported and analyzed here was generated from the same sample of peripheral blood mononuclear cells (PBMCs) from a consented donor (Homo sapiens) whose genome was deciphered in the YH project. YH is an anonymous male Han Chinese individual who has no known genetic diseases, and whose genome also serves as an Asian reference genome.Nuclear DNA was extracted and subjected to unbiased, whole-genome bisulfite sequencing (BS-seq) using the Illumina Genome Analyzer. In total, 103.5 Gbp of paired-end sequence data were generated. Of these, 70.4 Gbp (68%) were successfully aligned to either strand of the YH genome with an average mismatch rate of 1.3%, resulting in an average sequencing depth of 12.3-fold per DNA strand or a 24.7-fold overall depth. Of the 18,962,679 CpGs present in the unique haploid part (2.21 Gb) of the YH reference genome sequence, approximately 99.86% were covered by at least one unambiguously mapped read of quality score >14 on either strand, and 92.62% were unambiguously covered on both strands.
Authors
- Li, Yingrui ;
- Zhu, Jingde ;
- Tian, Geng ;
- Li, Ning ;
- Li, Qibin ;
- Ye, Mingzhi ;
- Zheng, Hancheng ;
- Yu, Jian ;
- Wu, Honglong ;
- Sun, Jihua ;
- Zhang, Hongyu ;
- Chen, Quan ;
- Luo, Ruibang ;
- Chen, Minfeng ;
- He, Yinghua ;
- Jin, Xin ;
- Zhang, Qinghui ;
- Yu, Chang ;
- Zhou, Guangyu ;
- Sun, Jinfeng ;
- Huang, Yebo ;
- Zheng, Huisong ;
- Cao, Hongzhi ;
- Zhou, Xiaoyu ;
- Guo, Shicheng ;
- Hu, Xueda ;
- Li, Xin ;
- Kristiansen, Karsten ;
- Bolund, Lars ;
- Xu, Jiujin ;
- Wang, Wen ;
- Yang, Huanming ;
- Wang, Jian ;
- Li, Ruiqiang ;
- Beck, Stephan ;
- Wang, Jun ;
- Zhang, Xiuqing