Scholar Data

Supporting material for: De novo assembly of a haplotype-resolved human genome.

Here we provide the first de novo haplotype-resolved diploid genome sequence of an Asian individual using a unique de novo assembly pipeline. Our pipeline uses fosmid pooling and whole genome shotgun strategies, based on next generation sequencing (NGS) technology. The assembled genome contains 5.15 Gb, with a haplotype N50 of 484 kb. This haplotype-resolved genome represents the most complete genome assembly so far. Our analysis further identified previously undetected indels and novel coding sequences, and thus provides the most complete representation of an individuals genetic variation.
We generated ~614,850 fosmid clones ranging from 20 kb-80 kb with a mean of 36kb, approximately 30 fosmid clones were pooled and each pool had one or two DNA libraries sequenced using Hiseq 2000. In total, 1,712 Gb of raw sequence data was generated for all the pooled fosmid libraries. Please see the linked paper for assembly pipeline details. We then analysed the newly generated haploid-resolved diploid genome (HDG) for SNPs, INDELs, inversions and translocations, of which we identified 3,580,000 SNPs, 762,000 short INDELs (<50bp) and 30,000 long INDELs, 111 inversions and 168 translocations.

Authors

Cao, Hongzhi ;
Wu, Honglong ;
Luo, Ruibang ;
Huang, Shujia ;
Sun, Yuhui ;
Tong, Xin ;
Xie, Yinlong ;
Liu, Binghang ;
Yang, Hailong ;
Zheng, Hancheng ;
Li, Jian, ;
Li, Bo ;
Wang, Yu ;
Yang, Fang ;
Sun, Peng ;
Liu, Siyang ;
Gao, Peng ;
Huang, Haodong ;
Sun, Jing ;
Chen, Dan ;
He, Guangzhu ;
Huang, Weihua ;
Huang, Zheng ;
Li, Yue ;
Tellier, Laurent, CAM ;
Liu, xiao ;
Feng, Qiang ;
Xu, Xun ;
Zhang, Xiuqing ;
Bolund, Lars ;
Krogh, Anders ;
Kristiansen, Karsten ;
Goodman, Laurie ;
Drmanac, Radoje ;
Drmanac, Snezana, A ;
Luo, Qiong ;
Li, Songgang ;
Wang, Jian ;
Yang, Huanming ;
Li, Yingrui ;
Wong, Gane, Ka-Shu ;
Wang, Jun

1 Citation0 Mentions31% FAIR1.2 Dataset Index

10.5524/100096January 2014

Hepatocellular carcinoma genomic data from the Asia Cancer Research Group.

Hepatocellular carcinoma (HCC) is one of the most common solid tumors worldwide and represents the third leading cause of cancer deaths. Hepatitis B virus (HBV) is a major etiologic agent, leading to an increased risk of developing HCC, in particular those with acute liver disease and cirrhosis. The Asian Cancer Research Group (ACRG) is an independent, not-for-profit company established to accelerate research and improve treatment for patients affected with the most commonly-diagnosed cancers in Asia.With HBV being endemic in China and Southeast Asia, and high levels of HCC being a result, the ACRG have studied the events and effects of HBV integration in the HCC genome. To do this massively parallel sequencing in a cohort of 88 Chinese patients diagnosed with HCC who underwent curative primary hepatectomy or liver transplantation at Queen Mary Hospital (Pokfulam, Hong Kong) was carried out. All patients gave written informed consent to use both tumor (T) and non-tumor (N) liver tissues for the study.Genomic DNA was purified for at least 30-fold coverage paired-end (PE) sequencing, and PE reads were mapped on human reference genome (UCSC build hg19) and HBV (NC_003977). Two sequencing libraries with different insert size were constructed for each genomic DNA sample (200 bp and 800 bp). Paired end, 90bp read length sequencing was performed in the HiSeq 2000 sequencer according to Manufacturers instructions. Raw gene expression profiling data of these human HCC samples has also deposited to GEO with the accession number GSE25097.

Authors

Kan, Zhengyan ;
Zheng, Hancheng ;
Liu, Xiao ;
Li, Shuyu ;
Barber, Thomas, D ;
Gong, Zhuolin ;
Gao, H ;
Hao, Ke ;
Willard, M, D ;
Xu, Jiangchun ;
Hauptschein, R ;
Rejto, P, A ;
Fernandez, J ;
Wang, Guan ;
Zhang, Qinghui ;
Wang, B ;
Chen, Ronghua ;
Wang, Jun ;
Lee, Nikki, P ;
Lee, Wah, H ;
Ariyaratne, Pramila, N ;
Tennakoon, Chandana ;
Mulawadi, Fabianus, H ;
Wong, Kwong, F ;
Liu, Angela, M ;
Chan, Kwong, L ;
Hu, Yujie ;
Chou, Wen-Chi ;
Buser, Carolyn ;
Zhou, Wei ;
Lin, Zhao ;
Peng, Z ;
Yi, K ;
Chen, S ;
Li, L ;
Fan, X ;
Yang, J ;
Ye, R ;
Ju, J ;
Wang, K ;
Estrella, H ;
Deng, S ;
Wulur, I, H ;
Liu, J ;
Ehsani, M, E ;
Zhang, Chunsheng ;
Loboda, A ;
Sung, Wing-Kin ;
Aggarwal, Amit ;
Poon, Ronnie, T ;
Fan, Sheung, T ;
Wang, Jun ;
Hardwick, James ;
Reinhard, Christoph ;
Dai, Hongyue ;
Li, Yingrui ;
Luk, John, M ;
Mao, Mao ;
, The Asian Cancer Research Group

9 Citations0 Mentions31% FAIR4.5 Dataset Index

10.5524/100034January 2012

Updated genome assembly of YH: the first diploid genome sequence of a Han Chinese individual (version 2, 07/2012)

Updated genomic data from the YH (Homo sapiens) diploid genome the first sequenced Han Chinese individual, a representative of the Asian population. The genomic DNA used in this study came from an anonymous male Han Chinese individual who has no known genetic diseases.The original version of the YH genome was assembled based on 3.3 billion reads using the Illumina Genome Analyzer (see dataset doi:10.5524/100015). This latest (as of 07/2012) and improved version of the YH genome was assembled based on 2.1 billion reads using the Illumina HiSeq2000. A total of 202G nucleotides data was achieved using 100 bp-long paired end reads with an insert size ranging from 180 bp to 40 kbp, and the genome was sequenced to 67.5-fold average coverage. The latest version of SOAPdenovo2 was used to reassemble, improve and update the previously assembled genome (tools and pipelines available here: doi:10.5524/100044). By aligning the short reads with SOAP, 177G nucleotides were mapped onto the NCBI reference genome and 99.99% of the genome was covered. The raw sequences, assemblies and relevant tools are released for public use under a CC0 license.More information about the YH genome can be viewed at: http://yh.genomics.org.cn/

Authors

Wang, Jun ;
Li, Yingrui ;
Luo, R ;
Liu, B ;
Xie, Y ;
Li, Zhuo ;
Fang, Xiaodong ;
Zheng, Hancheng ;
Qin, Junjie ;
Yang, Bin ;
Yu, C ;
Ni, Peixiang ;
Li, Ning ;
Guo, Guangwu ;
Ye, Jia ;
Fang, Lin ;
Su, Yeyang ;
, Asan ;
Zheng, Hongkun ;
Kristiansen, Karsten ;
Wong, Gane, Ka-Shu ;
Nielsen, Rasmus ;
Durbin, Richard ;
Bolund, Lars ;
Zhang, Xiuqing ;
Li, Songgang ;
Yang, Huanming ;
Wang, Jian

6 Citations0 Mentions31% FAIR3.3 Dataset Index

10.5524/100038January 2012

Genomic data from the giant panda (<em>Ailuropoda melanoleuca</em>).

The giant panda (Ailuropoda melanoleuca) is considered a symbol of China and is a much loved animal all around the world. It is also one of the worlds most endangered species, making it a flagship species for conservation efforts. As the first fully sequenced Ursidae and the second fully sequenced carnivore after the dog, the whole genome sequence and annotation data provide an unparalleled amount of information to aid in understanding the genetic and biological underpinnings of this unique species, and will help contribute to disease control and conservation efforts.In 2008, BGI completed a first draft of the genome sequence of a three-year old female giant panda named Jingjing, who was used as a model for the 2008 Olympics in Beijing, China (doi: 10.1038/nature08696). Using second-generation Illumina GA sequencing data, the first de novo genome assembly was created using short-read sequencing technology. Here you will find the giant panda genome sequence assembly as well as annotation information, such as gene structure and function, non-coding RNAs, and repeat elements. Also presented are polymorphism information detected in the diploid genome, including SNPs, indels, and structural variations (SVs). The assembly was done using SOAPdenovo software and the panda genome data is visualized via MapView, which is powered by the Google Web Toolkit.

Authors

Li, Ruiqiang ;
Fan, Wei ;
Tian, Geng ;
Zhu, Hongmei ;
He, Lin ;
Cai, Jing ;
Huang, Quanfei ;
Cai, Qingle ;
Li, Bo ;
Bai, Yinqi ;
Zhang, Zhihe ;
Zhang, Yaping ;
Wang, Wen ;
Li, Jun ;
Wei, Fuwen ;
Li, Heng ;
Jian, Min ;
Li, Jianwen ;
Zhang, Zhaolei ;
Nielsen, Rasmus ;
Li, Dawei ;
Gu, Wanjun ;
Yang, Zhentao ;
Xuan, Zhaoling ;
Ryder, Oliver, A ;
Leung, Frederick, Chi-Ching ;
Zhou, Yan ;
Cao, Jianjun ;
Sun, Xiao ;
Fu, Yonggui ;
Fang, Xiaodong ;
Guo, Xiaosen ;
Wang, Bo ;
Hou, Rong ;
Shen, Fujun ;
Mu, Bo ;
Ni, Peixiang ;
Lin, Runmao ;
Qian, Wubin ;
Wang, Guodong ;
Yu, Chang ;
Nie, Wenhui ;
Wang, Jinhuan ;
Wu, Zhigang ;
Liang, Huiqing ;
Min, Jiumeng ;
Wu, Qi ;
Cheng, Shifeng ;
Ruan, Jue ;
Wang, Mingwei ;
Shi, Zhongbin ;
Wen, Ming ;
Liu, Binghang ;
Ren, Xiaoli ;
Zheng, Huisong ;
Dong, Dong ;
Cook, Kathleen ;
Shan, Gao ;
Zhang, Hao ;
Kosiol, Carolin ;
Xie, Xueying ;
Lu, Zuhong ;
Zheng, Hancheng ;
Li, Yingrui ;
Steiner, Cynthia, C ;
Lam, Tommy, Tsan-Yuk ;
Lin, Siyuan ;
Zhang, Qinghui ;
Li, Guoqing ;
Tian, Jing ;
Gong, Timing ;
Liu, Hongde ;
Zhang, Dejin ;
Fang, Lin ;
Ye, Chen ;
Zhang, Juanbin ;
Hu, Wenbo ;
Xu, Anlong ;
Ren, Yuanyuan ;
Zhang, Guojie ;
Bruford, Michael, W ;
Li, Qibin ;
Ma, Lijia ;
Guo, Yiran ;
An, Na ;
Hu, Yujie ;
Zheng, Yang ;
Shi, Yongyong ;
Li, Zhiqiang ;
Liu, Qing ;
Chen, Yanling ;
Zhao, Jing ;
Qu, Ning ;
Zhao, Shancen ;
Tian, Feng ;
Wang, Xiaoling ;
Wang, Haiyin ;
Xu, Lizhi ;
Liu, Xiao ;
Vinar, Tomas ;
Wang, Yajun ;
Lam, Tak-Wah ;
Yiu, Siu-Ming ;
Liu, Shiping ;
Zhang, Hemin ;
Li, Desheng ;
Huang, Yan ;
Wang, Xia ;
Yang, Guohua ;
Jiang, Zhi ;
Wang, Junyi ;
Qin, Nan ;
Li, Li ;
Li, Jingxiang ;
Bolund, Lars ;
Kristiansen, Karsten ;
Wong, Gane, Ka-Shu ;
Olson, Maynard ;
Zhang, Xiuqing ;
Li, Songgang ;
Yang, Huanming ;
Wang, Jian ;
Wang, Jun

1 Citation0 Mentions31% FAIR1.1 Dataset Index

10.5524/100004January 2011

Genomic data for the domestic cucumber (<em>Cucumis sativus var. sativus L.</em>).

Here we present genomic data for the domestic cucumber (Cucumis sativus var. sativus L.). The cucumber is a member of the Cucurbitaceae or cucurbit family, a family of great agricultural and horticultural importance that also includes species such as melons, gourds and squashes. A biologically interesting as well as an economically relevant species, it is used as a model system for plant sex determination and vascular biology studies.The domestic cucumber has seven pairs of chromosomes and a haploid genome of 367 Mb, a smaller genome for the Cucurbitaceae family. The genome was sequenced and assembled with N50 contig and scaffold sizes of 19.8 Kb and 1.14 Mb, respectively. Using the genetic map, 72.8% of the assembled sequences were anchored onto the 7 chromosomes. A total of 26,682 genes were predicted in the current cucumber genome.

Authors

Huang, Sanwen ;
Li, Ruiqiang ;
Zhang, Zhonghua ;
Li, Li ;
Gu, Xingfang ;
Fan, Wei ;
Lucas, William, J ;
Wang, Xiaowu ;
Xie, Bingyan ;
Ni, Peixiang ;
Ren, Yuanyuan ;
Zhu, Hongmei ;
Li, Jun ;
Lin, Kui ;
Jin, Weiwei ;
Fei, Zhangjun ;
Li, Guangcun ;
Staub, Jack ;
Kilian, Andrzej ;
van der Vossen, Edwin, AG ;
Wu, Yang ;
Guo, Jie ;
He, Jun ;
Jia, Zhiqi ;
Ren, Yi ;
Tian, Geng ;
Lu, Yao ;
Ruan, Jue ;
Qian, Wubin ;
Wang, Mingwei ;
Huang, Quanfei ;
Li, Bo ;
Xuan, Zhaoling ;
Cao, Jianjun ;
, Asan ;
Wu, Zhigang ;
Zhang, Juanbin ;
Cai, Qingle ;
Bai, Yinqi ;
Zhao, Bowen ;
Han, Yonghua ;
Li, Ying ;
Li, Xuefeng ;
Wang, Shenhao ;
Shi, Qiuxiang ;
Liu, Shiqiang ;
Cho, Won, Kyong ;
Kim, Jae-Yean ;
Xu, Yong ;
Heller-Uszynska, Katarzyna ;
Miao, Han ;
Cheng, Zhouchao ;
Zhang, Shengping ;
Wu, Jian ;
Yang, Yuhong ;
Kang, Houxiang ;
Li, Man ;
Liang, Huiqing ;
Ren, Xiaoli ;
Shi, Zhongbin ;
Wen, Ming ;
Jian, Min ;
Yang, Hailong ;
Zhang, Guojie ;
Yang, Zhentao ;
Chen, Rui ;
Liu, Shifang ;
Li, Jianwen ;
Ma, Lijia ;
Liu, Hui ;
Zhou, Yan ;
Zhao, Jing ;
Fang, Xiaodong ;
Li, Guoqing ;
Fang, Lin ;
Li, Yingrui ;
Liu, Dongyuan ;
Zheng, Hongkun ;
Zhang, Yong ;
Qin, Nan ;
Li, Zhuo ;
Yang, Guohua ;
Yang, Shuang ;
Bolund, Lars ;
Kristiansen, Karsten ;
Zheng, Hancheng ;
Li, Shaochuan ;
Zhang, Xiuqing ;
Yang, Huanming ;
Wang, Jian ;
Sun, Rifei ;
Zhang, Baoxi ;
Jiang, Shuzhi ;
Wang, Jun ;
Du, Yongchen ;
Li, Songgang

5 Citations0 Mentions31% FAIR2.9 Dataset Index

10.5524/100025January 2011

DNA methylome of human peripheral blood mononuclear cells from the YH Han Chinese individual.

The methylome reported and analyzed here was generated from the same sample of peripheral blood mononuclear cells (PBMCs) from a consented donor (Homo sapiens) whose genome was deciphered in the YH project. YH is an anonymous male Han Chinese individual who has no known genetic diseases, and whose genome also serves as an Asian reference genome.Nuclear DNA was extracted and subjected to unbiased, whole-genome bisulfite sequencing (BS-seq) using the Illumina Genome Analyzer. In total, 103.5 Gbp of paired-end sequence data were generated. Of these, 70.4 Gbp (68%) were successfully aligned to either strand of the YH genome with an average mismatch rate of 1.3%, resulting in an average sequencing depth of 12.3-fold per DNA strand or a 24.7-fold overall depth. Of the 18,962,679 CpGs present in the unique haploid part (2.21 Gb) of the YH reference genome sequence, approximately 99.86% were covered by at least one unambiguously mapped read of quality score >14 on either strand, and 92.62% were unambiguously covered on both strands.

Authors

Li, Yingrui ;
Zhu, Jingde ;
Tian, Geng ;
Li, Ning ;
Li, Qibin ;
Ye, Mingzhi ;
Zheng, Hancheng ;
Yu, Jian ;
Wu, Honglong ;
Sun, Jihua ;
Zhang, Hongyu ;
Chen, Quan ;
Luo, Ruibang ;
Chen, Minfeng ;
He, Yinghua ;
Jin, Xin ;
Zhang, Qinghui ;
Yu, Chang ;
Zhou, Guangyu ;
Sun, Jinfeng ;
Huang, Yebo ;
Zheng, Huisong ;
Cao, Hongzhi ;
Zhou, Xiaoyu ;
Guo, Shicheng ;
Hu, Xueda ;
Li, Xin ;
Kristiansen, Karsten ;
Bolund, Lars ;
Xu, Jiujin ;
Wang, Wen ;
Yang, Huanming ;
Wang, Jian ;
Li, Ruiqiang ;
Beck, Stephan ;
Wang, Jun ;
Zhang, Xiuqing

1 Citation0 Mentions13% FAIR0.7 Dataset Index

10.5524/100014January 2011

Genome sequence of YH: the first diploid genome sequence of a Han Chinese individual.

Genomic data from the YH (Homo sapiens) genome first diploid genome sequence of a Han Chinese, a representative of the Asian population. The genomic DNA used in this study came from an anonymous male Han Chinese individual who has no known genetic diseases.The YH genome was assembled based on 3.3 billion reads using the Illumina Genome Analyzer. We achieved 117.7G nucleotides data and the genome was sequenced to 36-fold average coverage. By aligning the short reads with SOAP, 102.9G nucleotides were mapped onto the NCBI reference genome and 99.97% of the genome was covered. The raw sequences, alignments, consensus genome, variants and relevant tools are released for public use under a CC0 license.

Authors

Wang, Jun ;
Wang, Wei ;
Li, Ruiqiang ;
Li, Yingrui ;
Tian, Geng ;
Goodman, Laurie ;
Fan, Wei ;
Zhang, Junqing ;
Li, Jun ;
Zhang, Juanbin ;
Guo, Yiran ;
Feng, Binxiao ;
Li, Heng ;
Lu, Yao ;
Fang, Xiaodong ;
Liang, Huiqing ;
Du, Zhenglin ;
Li, Dong ;
Zhao, Yiqing ;
Hu, Yujie ;
Yang, Zhenzhen ;
Zheng, Hancheng ;
Hellmann, Ines ;
Inouye, Michael ;
Pool, John ;
Yi, Xin ;
Zhao, Jing ;
Duan, Jinjie ;
Zhou, Yan ;
Qin, Junjie ;
Ma, Lijia ;
Li, Guoqing ;
Yang, Zhentao ;
Zhang, Guojie ;
Yang, Bin ;
Yu, Chang ;
Liang, Fang ;
Li, Wenjie ;
Li, Shaochuan ;
Li, Dawei ;
Ni, Peixiang ;
Ruan, Jue ;
Li, Qibin ;
Zhu, Hongmei ;
Liu, Dongyuan ;
Lu, Zhike ;
Li, Ning ;
Guo, Guangwu ;
Zhang, Jianguo ;
Ye, Jia ;
Fang, Lin ;
Hao, Qin ;
Chen, Quan ;
Liang, Yu ;
Su, Yeyang ;
, Asan ;
Ping, Cuo ;
Yang, Shuang ;
Chen, Fang ;
Li, Li ;
Zhou, Ke ;
Zheng, Hongkun ;
Ren, Yuanyuan ;
Yang, Ling ;
Gao, Yang ;
Yang, Guohua ;
Li, Zhuo ;
Feng, Xiaoli ;
Kristiansen, Karsten ;
Wong, Gane, Ka-Shu ;
Nielsen, Rasmus ;
Durbin, Richard ;
Bolund, Lars ;
Zhang, Xiuqing ;
Li, Songgang ;
Yang, Huanming ;
Wang, Jian

2 Citations0 Mentions31% FAIR1.5 Dataset Index

10.5524/100015January 2011

Automated Author Profile
Zheng, Hancheng

Zheng, Hancheng

Current S-Index

Average Dataset Index per Dataset

Total Datasets

Average FAIR Score

Total Citations

Total Mentions

S-Index Interpretation

S-Index Over Time

Cumulative Citations Over Time

Cumulative Mentions Over Time

Datasets

Supporting material for: De novo assembly of a haplotype-resolved human genome.

Hepatocellular carcinoma genomic data from the Asia Cancer Research Group.

Updated genome assembly of YH: the first diploid genome sequence of a Han Chinese individual (version 2, 07/2012)

Genomic data from the giant panda (<em>Ailuropoda melanoleuca</em>).

Genomic data for the domestic cucumber (<em>Cucumis sativus var. sativus L.</em>).

DNA methylome of human peripheral blood mononuclear cells from the YH Han Chinese individual.

Genome sequence of YH: the first diploid genome sequence of a Han Chinese individual.

Automated Author ProfileZheng, Hancheng

Zheng, Hancheng

Current S-Index

Average Dataset Index per Dataset

Total Datasets

Average FAIR Score

Total Citations

Total Mentions

S-Index Interpretation

S-Index Over Time

Cumulative Citations Over Time

Cumulative Mentions Over Time

Datasets

Supporting material for: De novo assembly of a haplotype-resolved human genome.

Hepatocellular carcinoma genomic data from the Asia Cancer Research Group.

Updated genome assembly of YH: the first diploid genome sequence of a Han Chinese individual (version 2, 07/2012)

Genomic data from the giant panda (<em>Ailuropoda melanoleuca</em>).

Genomic data for the domestic cucumber (<em>Cucumis sativus var. sativus L.</em>).

DNA methylome of human peripheral blood mononuclear cells from the YH Han Chinese individual.

Genome sequence of YH: the first diploid genome sequence of a Han Chinese individual.

Automated Author Profile
Zheng, Hancheng