Scholar Data

Datasets

Supporting material for: De novo assembly of a haplotype-resolved human genome.

Here we provide the first de novo haplotype-resolved diploid genome sequence of an Asian individual using a unique de novo assembly pipeline. Our pipeline uses fosmid pooling and whole genome shotgun strategies, based on next generation sequencing (NGS) technology. The assembled genome contains 5.15 Gb, with a haplotype N50 of 484 kb. This haplotype-resolved genome represents the most complete genome assembly so far. Our analysis further identified previously undetected indels and novel coding sequences, and thus provides the most complete representation of an individuals genetic variation.
We generated ~614,850 fosmid clones ranging from 20 kb-80 kb with a mean of 36kb, approximately 30 fosmid clones were pooled and each pool had one or two DNA libraries sequenced using Hiseq 2000. In total, 1,712 Gb of raw sequence data was generated for all the pooled fosmid libraries. Please see the linked paper for assembly pipeline details. We then analysed the newly generated haploid-resolved diploid genome (HDG) for SNPs, INDELs, inversions and translocations, of which we identified 3,580,000 SNPs, 762,000 short INDELs (<50bp) and 30,000 long INDELs, 111 inversions and 168 translocations.

Authors

Cao, Hongzhi ;
Wu, Honglong ;
Luo, Ruibang ;
Huang, Shujia ;
Sun, Yuhui ;
Tong, Xin ;
Xie, Yinlong ;
Liu, Binghang ;
Yang, Hailong ;
Zheng, Hancheng ;
Li, Jian, ;
Li, Bo ;
Wang, Yu ;
Yang, Fang ;
Sun, Peng ;
Liu, Siyang ;
Gao, Peng ;
Huang, Haodong ;
Sun, Jing ;
Chen, Dan ;
He, Guangzhu ;
Huang, Weihua ;
Huang, Zheng ;
Li, Yue ;
Tellier, Laurent, CAM ;
Liu, xiao ;
Feng, Qiang ;
Xu, Xun ;
Zhang, Xiuqing ;
Bolund, Lars ;
Krogh, Anders ;
Kristiansen, Karsten ;
Goodman, Laurie ;
Drmanac, Radoje ;
Drmanac, Snezana, A ;
Luo, Qiong ;
Li, Songgang ;
Wang, Jian ;
Yang, Huanming ;
Li, Yingrui ;
Wong, Gane, Ka-Shu ;
Wang, Jun

1 Citation0 Mentions31% FAIR1.2 Dataset Index

10.5524/1000962014

Assemblathon 2 assemblies.

Assemblathon 2 is a genome assembly contest where participating teams attempted to assemble genomes for three vertebrate species using a mixture of next-generation sequencing data. In total, 43 assemblies were submitted for three species (15 for bird, 16 for fish, and 12 for snake). These assemblies were assessed using a wide variety of statistical approaches as well as using experimental data from Fosmid sequences and optical maps.

Authors

Bradnam, Keith, R ;
Fass, Joseph, N ;
Alexandrov, Anton ;
Baranay, Paul ;
Bechner, Michael ;
Birol, Inanç ;
Boisvert, Sébastien ;
Chapman, Jarrod, A ;
Chapuis, Guillaume ;
Chikhi, Rayan ;
Chitsaz, Hamidreza ;
Chou, Wen-Chi ;
Corbeil, Jacques ;
Del Fabbro, Cristian ;
Docking, T.Roderick, R ;
Durbin, Richard ;
Earl, Dent ;
Emrich, Scott ;
Fedotov, Pavel ;
Fonseca, Nuno, A ;
Ganapathy, Ganeshkumar ;
Gibbs, Richard, A ;
Gnerre, Sante ;
Godzaridis, Élénie ;
Goldstein, Steve ;
Haimel, Matthias ;
Hall, Giles ;
Haussler, David ;
Hiatt, Joseph, B ;
Ho, Isaac ;
Howard, Jason, T ;
Hunt, Martin ;
Jackman, Shaun, D ;
Jaffe, David, B ;
Jarvis, Erich, D ;
Jiang, Huaiyang ;
Kazakov, Sergey ;
Kersey, Paul, J ;
Kitzman, Jacob, O ;
Knight, James, R ;
Koren, Sergey ;
Lam, Tak-Wah ;
Lavenier, Dominique ;
Laviolette, François ;
Li, Yingrui ;
Li, Zhenyu ;
Liu, Binghang ;
Liu, Yue ;
Luo, Ruibang ;
MacCallum, Iain ;
MacManes, Matthew, D ;
Maillet, Nicolas ;
Melnikov, Sergey ;
Naquin, Delphine ;
Ning, Zemin ;
Otto, Thomas, D ;
Paten, Benedict ;
Paulo, Octávio, S ;
Phillippy, Adam, M ;
Pina-Martins, Francisco ;
Place, Michael ;
Przybylski, Dariusz ;
Qin, Xiang ;
Qu, Carson ;
Ribeiro, Filipe, J ;
Richards, Stephen ;
Rokhsar, Daniel, S ;
Ruby, J.Graham ;
Scalabrin, Simone ;
Schatz, Michael, C ;
Schwartz, David, C ;
Sergushichev, Alexey ;
Sharpe, Ted ;
Shaw, Timothy, I ;
Shendure, Jay ;
Shi, Yujian ;
Simpson, Jared, T ;
Song, Henry ;
Tsarev, Fedor ;
Vezzi, Francesco ;
Vicedomini, Riccardo ;
Vieira, Bruno, M ;
Wang, Jun ;
Worley, Kim, C ;
Yin, Shuangye ;
Yiu, Siu-Ming ;
Yuan, Jianying ;
Zhang, Guojie ;
Zhang, Hao ;
Zhou, Shiguo ;
Korf, Ian, F

6 Citations1 Mention31% FAIR3.8 Dataset Index

10.5524/1000602013

Software and supporting material for “SOAPdenovo2: An empirically improved memory-efficient short read de novo assembly”

SOAPdenovo2 is the latest de novo genome assembly package from BGIs SOAP (short oligonucleotide analysis package) suite of tools (homepage here: http://soap.genomics.org.cn/). Compared to SOAPdenovo1, this new version has the advantage of a new algorithm design that reduces memory consumption in graph construction, resolves more repeat regions in contig assembly, increases coverage and length in scaffold construction, improves gap closure, and is optimized for large genomes.
Using new sequencing data from the YH (Homo sapiens) diploid genome the first sequenced Han Chinese individual, an updated assembly was produced (see dataset here: doi:10.5524/100038), with the N50 scores for the contig and scaffold being 3-fold and 50-fold longer, respectively, than the first published version. The genome coverage increased from 81.16% to 93.91%, and memory consumption was ~2/3 times lower during the point of largest memory consumption.
Benchmarking with Assemblathon1 and GAGE datasets shows that SOAPdenovo2 greatly surpasses its predecessor SOAPdenovo1 and is competitive to other assemblers on both assembly length and accuracy.
In order to facilitate readers to repeat and recreate these findings, configured packages with the compressed pipelines containing all of the necessary shell scripts and tools are available from the BGI FTP server (ftp://public.genomics.org.cn/BGI/SOAPdenovo2).
The latest version of SOAPdenovo2 is available from Sourceforge: http://soapdenovo2.sourceforge.net/
These pipelines are available from our data platform as Galaxy workflows: http://galaxy.cbiit.cuhk.edu.hk/

Authors

Luo, Ruibang ;
Liu, Binghang ;
Xie, Yinlong ;
Li, Zhenyu ;
Huang, Weihua ;
Yuan, Jianying ;
He, Guangzhu ;
Chen, Yanxiang ;
Pan, Qi ;
Liu, Yunjie ;
Tang, Jingbo ;
Wu, Gengxiong ;
Zhang, Hao ;
Shi, Yujian ;
Liu, Yong ;
Yu, Chang ;
Wang, Bo ;
Lu, Yao ;
Han, Changlei ;
Cheung, David ;
Yiu, Siu-Ming ;
Peng, Shaoliang ;
Xiaoqian, Zhu ;
Liu, Guangming ;
Liao, Xiangke ;
Li, Yingrui ;
Yang, Huanming ;
Wang, Jian ;
Lam, Tak-Wah, W ;
Wang, Jun

7 Citations0 Mentions31% FAIR3.6 Dataset Index

10.5524/1000442012

Genomic data from the giant panda (Ailuropoda melanoleuca).

The giant panda (Ailuropoda melanoleuca) is considered a symbol of China and is a much loved animal all around the world. It is also one of the worlds most endangered species, making it a flagship species for conservation efforts. As the first fully sequenced Ursidae and the second fully sequenced carnivore after the dog, the whole genome sequence and annotation data provide an unparalleled amount of information to aid in understanding the genetic and biological underpinnings of this unique species, and will help contribute to disease control and conservation efforts.In 2008, BGI completed a first draft of the genome sequence of a three-year old female giant panda named Jingjing, who was used as a model for the 2008 Olympics in Beijing, China (doi: 10.1038/nature08696). Using second-generation Illumina GA sequencing data, the first de novo genome assembly was created using short-read sequencing technology. Here you will find the giant panda genome sequence assembly as well as annotation information, such as gene structure and function, non-coding RNAs, and repeat elements. Also presented are polymorphism information detected in the diploid genome, including SNPs, indels, and structural variations (SVs). The assembly was done using SOAPdenovo software and the panda genome data is visualized via MapView, which is powered by the Google Web Toolkit.

Authors

Li, Ruiqiang ;
Fan, Wei ;
Tian, Geng ;
Zhu, Hongmei ;
He, Lin ;
Cai, Jing ;
Huang, Quanfei ;
Cai, Qingle ;
Li, Bo ;
Bai, Yinqi ;
Zhang, Zhihe ;
Zhang, Yaping ;
Wang, Wen ;
Li, Jun ;
Wei, Fuwen ;
Li, Heng ;
Jian, Min ;
Li, Jianwen ;
Zhang, Zhaolei ;
Nielsen, Rasmus ;
Li, Dawei ;
Gu, Wanjun ;
Yang, Zhentao ;
Xuan, Zhaoling ;
Ryder, Oliver, A ;
Leung, Frederick, Chi-Ching ;
Zhou, Yan ;
Cao, Jianjun ;
Sun, Xiao ;
Fu, Yonggui ;
Fang, Xiaodong ;
Guo, Xiaosen ;
Wang, Bo ;
Hou, Rong ;
Shen, Fujun ;
Mu, Bo ;
Ni, Peixiang ;
Lin, Runmao ;
Qian, Wubin ;
Wang, Guodong ;
Yu, Chang ;
Nie, Wenhui ;
Wang, Jinhuan ;
Wu, Zhigang ;
Liang, Huiqing ;
Min, Jiumeng ;
Wu, Qi ;
Cheng, Shifeng ;
Ruan, Jue ;
Wang, Mingwei ;
Shi, Zhongbin ;
Wen, Ming ;
Liu, Binghang ;
Ren, Xiaoli ;
Zheng, Huisong ;
Dong, Dong ;
Cook, Kathleen ;
Shan, Gao ;
Zhang, Hao ;
Kosiol, Carolin ;
Xie, Xueying ;
Lu, Zuhong ;
Zheng, Hancheng ;
Li, Yingrui ;
Steiner, Cynthia, C ;
Lam, Tommy, Tsan-Yuk ;
Lin, Siyuan ;
Zhang, Qinghui ;
Li, Guoqing ;
Tian, Jing ;
Gong, Timing ;
Liu, Hongde ;
Zhang, Dejin ;
Fang, Lin ;
Ye, Chen ;
Zhang, Juanbin ;
Hu, Wenbo ;
Xu, Anlong ;
Ren, Yuanyuan ;
Zhang, Guojie ;
Bruford, Michael, W ;
Li, Qibin ;
Ma, Lijia ;
Guo, Yiran ;
An, Na ;
Hu, Yujie ;
Zheng, Yang ;
Shi, Yongyong ;
Li, Zhiqiang ;
Liu, Qing ;
Chen, Yanling ;
Zhao, Jing ;
Qu, Ning ;
Zhao, Shancen ;
Tian, Feng ;
Wang, Xiaoling ;
Wang, Haiyin ;
Xu, Lizhi ;
Liu, Xiao ;
Vinar, Tomas ;
Wang, Yajun ;
Lam, Tak-Wah ;
Yiu, Siu-Ming ;
Liu, Shiping ;
Zhang, Hemin ;
Li, Desheng ;
Huang, Yan ;
Wang, Xia ;
Yang, Guohua ;
Jiang, Zhi ;
Wang, Junyi ;
Qin, Nan ;
Li, Li ;
Li, Jingxiang ;
Bolund, Lars ;
Kristiansen, Karsten ;
Wong, Gane, Ka-Shu ;
Olson, Maynard ;
Zhang, Xiuqing ;
Li, Songgang ;
Yang, Huanming ;
Wang, Jian ;
Wang, Jun

1 Citation0 Mentions31% FAIR1.1 Dataset Index

10.5524/1000042011

Genomic data from Chinese cabbage (Brassica rapa).

Available here is genomic data for the polyploid plant Brassica rapa ssp. pekinensis line Chiifu-401-42, a Chinese cabbage. As there are several oil and vegetable crop species in the Brassica family, this genome is of great agricultural relevance. It also provides an important resource for studying the evolution of polyploid genomes.The Brassica rapa Genome Sequencing Project Consortium assembled a 283.8 Mb genome estimated to cover >98% of the gene space. Using 72X coverage of paired short read sequences generated by Illumina GA II technology and 199,452 BAC-end sequences, 159 super scaffolds were produced, representing 90% of the assembled sequences with an N50 scaffold size of 1.97 Mb. Using genetic mapping of 1,427 markers in B. rapa, ten pseudo chromosomes that included 90% of the assembly were produced. A total of 41,174 protein-coding genes in the B. rapa genome were modeled, and the genome was found to have undergone genome triplication.

Authors

Wang, Xiaowu ;
Wang, Hanzhong ;
Wang, Jun ;
Sun, Rifei ;
Wu, Jian ;
Liu, Shengyi ;
Bai, Yinqi ;
Mun, Jeong-Hwan ;
Bancroft, Ian ;
Cheng, Feng ;
Huang, Sanwen ;
Li, Xixiang ;
Hua, Wei ;
Wang, Junyi ;
Wang, Xiyin ;
Freeling, Michael ;
Pires, J.Chris ;
Paterson, Andrew, H ;
Chalhoub, Boulos ;
Wang, Bo ;
Hayward, Alice ;
Sharpe, Andrew, G ;
Park, Beom-Seok ;
Weisshaar, Bernd ;
Liu, Binghang ;
Li, Bo ;
Liu, Bo ;
Tong, Chaobo ;
Song, Chi ;
Duran, Christopher ;
Peng, Chunfang ;
Geng, Chunyu ;
Koh, Chushin ;
Lin, Chuyu ;
Edwards, David ;
Mu, Desheng ;
Shen, Di ;
Soumpourou, Eleni ;
Li, Fei ;
Fraser, Fiona ;
Conant, Gavin ;
Lassalle, Gilles ;
King, Graham, J ;
Bonnema, Guusje ;
Tang, Haibao ;
Wang, Haiping ;
Belcram, Harry ;
Zhou, Heling ;
Hirakawa, Hideki ;
Abe, Hiroshi ;
Guo, Hui ;
Wang, Hui ;
Jin, Huizhe ;
Parkin, Isobel, AP ;
Batley, Jacqueline ;
Kim, Jeong-Sun, S ;
Just, Jérémy ;
Li, Jianwen ;
Xu, Jiaohui ;
Deng, Jie ;
Kim, Jin, A ;
Li, Jingping ;
Yu, Jingyin ;
Meng, Jinling ;
Wang, Jinpeng ;
Min, Jiumeng ;
Poulain, Julie ;
Wang, Jun ;
Hatakeyama, Katsunori ;
Wu, Kui ;
Wang, Li ;
Fang, Lu ;
Trick, Martin ;
Links, Matthew, G ;
Zhao, Meixia ;
Jin, Mina ;
Ramchiary, Nirala ;
Drou, Nizar ;
Berkman, Paul, J ;
Cai, Qingle ;
Huang, Quanfei ;
Li, Ruiqiang ;
Tabata, Satoshi ;
Cheng, Shifeng ;
Zhang, Shu ;
Zhang, Shujiang ;
Huang, Shunmou ;
Sato, Shusei ;
Sun, Silong ;
Kwon, Soo-Jin, J ;
Choi, Su-Ryun, R ;
Lee, Tae-Ho, H ;
Fan, Wei ;
Zhao, Xiang ;
Tan, Xu ;
Xu, Xun ;
Wang, Yan ;
Qiu, Yang ;
Yin, Ye ;
Li, Yingrui ;
Du, Yongchen ;
Liao, Yongcui ;
Lim, Yongpyo ;
Narusaka, Yoshihiro ;
Wang, Yupeng ;
Wang, Zhenyi ;
Li, Zhenyu ;
Wang, Zhiwen ;
Xiong, Zhiyong ;
Zhang, Zhonghua ;
, Brassica Rapa Genome Sequencing Project Consortium

1 Citation0 Mentions31% FAIR1.1 Dataset Index

10.5524/1000212011

Automated Author Profile
Liu, Binghang

Liu, Binghang

Current S-Index

Average Dataset Index per Dataset

Total Datasets

Average FAIR Score

Total Citations

Total Mentions

S-Index Interpretation

S-Index Over Time

Cumulative Citations Over Time

Cumulative Mentions Over Time

Datasets

Supporting material for: De novo assembly of a haplotype-resolved human genome.

Assemblathon 2 assemblies.

Software and supporting material for “SOAPdenovo2: An empirically improved memory-efficient short read <em>de novo</em> assembly”

Genomic data from the giant panda (<em>Ailuropoda melanoleuca</em>).

Genomic data from Chinese cabbage (<em>Brassica rapa</em>).

Automated Author ProfileLiu, Binghang

Liu, Binghang

Current S-Index

Average Dataset Index per Dataset

Total Datasets

Average FAIR Score

Total Citations

Total Mentions

S-Index Interpretation

S-Index Over Time

Cumulative Citations Over Time

Cumulative Mentions Over Time

Datasets

Supporting material for: De novo assembly of a haplotype-resolved human genome.

Assemblathon 2 assemblies.

Software and supporting material for “SOAPdenovo2: An empirically improved memory-efficient short read <em>de novo</em> assembly”

Genomic data from the giant panda (<em>Ailuropoda melanoleuca</em>).

Genomic data from Chinese cabbage (<em>Brassica rapa</em>).

Automated Author Profile
Liu, Binghang