Automated Author Profile
Zhang, Xiuqing

Current S-Index

30.0

Sum of Dataset Indices for all datasets

Average Dataset Index per Dataset

1.8

Average Dataset Index per dataset

Total Datasets

Total datasets for this author

Average FAIR Score

31.8%

Average FAIR Score per dataset

Total Citations

Total citations to the author's datasets

Total Mentions

Total mentions of the author's datasets

S-Index Interpretation

The S-Index (Sharing Index) is a comprehensive metric that represents the cumulative impact of all your datasets. It is calculated as the sum of Dataset Index scores across all your claimed datasets.

What it means:

A higher S-index indicates greater overall impact of your datasets relative to typical datasets in their fields of research
The S-Index grows as you add more datasets or as existing datasets gain more citations and mentions
It provides a single number to track your research data impact over time

Current S-Index: 30.0 (sum of 17 datasets Dataset Index scores)

More information here.

S-Index Over Time

Cumulative Citations Over Time

Cumulative Mentions Over Time

Datasets

Supporting data for "SOAPnuke: A MapReduce Acceleration supported Software for integrated Quality Control and Preprocessing of High-Throughput Sequencing Data"

Quality Control (QC) and preprocessing are essential steps for sequencing data analysis to ensure the accuracy of results. However, existing tools cannot provide a satisfying solution with integrated comprehensive functions, proper architectures and highly-scalable acceleration. In this article, we demonstrate SOAPnuke as a tool with abundant functions for a QC-Preprocess-QC workflow and MapReduce acceleration framework. Four modules with different preprocessing functions are designed for processing datasets from genomic, small RNA (sRNA), Digital Gene Expression (DGE) and metagenomic experiments respectively. As a workflow-like tool, SOAPnuke centralizes processing functions in one executable and predefine their order to avoid the necessity of reformatting different files when switching tools. Furthermore, the MapReduce framework enables large scalability to distribute all the processing works to an entire compute cluster. We conducted a benchmarking where SOAPnuke and other tools are used to preprocess ~30x NA12878 dataset published by GIAB. The standalone operation of SOAPnuke struck a balance between resource occupancy and performance. When accelerated on 16 working nodes with MapReduce, SOAPnuke achieved ~5.7 times of the fastest speed of other tools.

Authors

Chen, Yuxin ;
Chen, Yongsheng ;
Shi, Chunmei ;
Huang, Zhibo ;
Zhang, Yong ;
Li, Shengkang ;
Li, Yan ;
Ye, Jia ;
Yu, Chang ;
Li, Zhuo ;
Zhang, Xiuqing ;
Wang, Jian ;
Yang, Huanming ;
Fang, Lin ;
Chen, Qiang

20 Citations0 Mentions31% FAIR7.4 Dataset Index

10.5524/100373January 2017

Supporting data for "PSSMHCpan: a novel PSSM based software for predicting class I peptide-HLA binding affinity"

Predicting peptides binding affinity with human leukocyte antigen (HLA) is a crucial step in developing powerful antitumor vaccine for cancer immunotherapy. Currently available methods work quite well in predicting peptide binding affinity with HLA alleles such as HLA-A0201, HLA-A0101, and HLA-B0702 in terms of sensitivity and specificity. However, quite a few types of HLA alleles that are present in majority of human populations including HLA-A0202, HLA-A0203, HLA-A6802, HLA-B5101, HLA-B5301, HLA-B5401 and HLA-B5701 still cannot be predicted with satisfactory accuracy using currently available methods. Further, currently most popularly used methods for predicting peptides binding affinity are inefficient in identifying neoantigens from large quantity of whole genome and transcriptome sequencing data.
Here we present a Position Specific Scoring Matrix (PSSM) based software called PSSMHCpan to accurately and efficiently predict peptide binding affinity with a broad coverage of HLA class I alleles. We evaluated the performance of PSSMHCpan by analyzing 10-fold cross-validation on a training database containing 87 HLA alleles and obtained an average area under receiver operating characteristic curve (AUC) of 0.94 and accuracy ACC of 0.85. In an independent dataset (Peptide Database of Cancer Immunity) evaluation, PSSMHCpan is substantially better than popularly used NetMHC-4.0, NetMHCpan-3.0, PickPocket, Nebula, and SMM with a sensitivity of 0.90, as compared to 0.74, 0.81, 0.77, 0.24 and 0.79. In addition, PSSMHCpan is more than 197 times faster than NetMHC-4.0, NetMHCpan-3.0, PickPocket, sNebula and SMM when predicting neoantigens from 661,263 peptides from a breast tumor sample. Finally, we built a neoantigen prediction pipeline and identified 117,017 neoantigens from 467 cancer samples of various cancers from TCGA.
PSSMHCpan is superior to currently available methods in predicting peptide binding affinity with a broad coverage of HLA class I alleles.

Authors

Liu, Geng ;
Li, Dongli ;
Li, Zhang ;
Qiu, Si ;
Li, Wenhui ;
Chao, Cheng-chi ;
Yang, Naibo ;
Li, Handong ;
Cheng, Zhen ;
Song, Xin ;
Cheng, Le ;
Zhang, Xiuqing ;
Wang, Jian ;
Yang, Huanming ;
Ma, Kun ;
Hou, Yong ;
Li, Bo

2 Citations0 Mentions31% FAIR1.0 Dataset Index

10.5524/100282January 2017

Supporting data for "Full-length single cell RNA-seq applied to a viral human cancer: Applications to HPV expression and splicing analysis in HeLa S3 cells".

Viral infection causes multiple forms of human cancer, and human papillomavirus (HPV) infection is the primary factor in cervical carcinomas. Single-cell RNA-seq studies highlight the tumor heterogeneity of most cancers, but virally induced tumors have not been studied. HeLa is a well characterized HPV+ cervical cancer cell line.We developed a new high-throughput platform to prepare single-cell RNA on a nanoliter scale based on a customized microwell chip. Using this method, we successfully amplified full-length transcripts of 669 single HeLa S3 cells, 40 of which were randomly selected to perform single-cell RNA sequencing. On the basis of this data, we obtained a comprehensive understanding of the heterogeneity of HeLa S3 cells in terms of gene expression, alternative splicing, and gene fusions. Furthermore, by co-expression analysis we can identify a high diversity of HPV-18 gene expression and splicing at the single-cell level. In addition to providing a characterization of the transcriptome of HeLa S3 cells at the single-cell level, our study demonstrates the power of single-cell RNA-seq analysis of virally infected cells and cancers.

Authors

Wu, Liang ;
Zhang, Xiaolong ;
Zhao, Zhikun ;
Wang, Ling ;
Li, Bo ;
Li, Guibo ;
Dean, Michael ;
Yu, Qichao ;
Wang, Yanhui ;
Lin, Xinxin ;
Rao, Weijian ;
Mei, Zhanlong ;
Li, Yang ;
Jiang, Runze ;
Yang, Huan ;
Li, Fuqiang ;
Xie, Guoyun ;
Xu, Liqin ;
Wu, Kui ;
Zhang, Jie ;
Chen, Jianghao ;
Wang, Ting ;
Kristiansen, Karsten ;
Zhang, Xiuqing ;
Li, Yingrui ;
Yang, Huanming ;
Wang, Jian ;
Hou, Yong ;
Xu, Xun

2 Citations0 Mentions31% FAIR1.3 Dataset Index

10.5524/100160January 2015

Supporting data for "Sparse whole-genome sequencing identifies two loci for major depressive disorder".

Major depressive disorder (MDD) is one of the most frequently encountered forms of mental illness and a leading cause of disability worldwide. However, due to the high heterogeneity of the disease, no robustly replicated genome loci have been identified. We have collected more than 12000 samples from 45 cities in China collaborated with local hospital. Most of the data were collected from 2007 to 2010 in different batches. Here, we performed low-coverage whole-genome sequencing of 5,303 Chinese women with recurrent MDD selected to reduce phenotypic heterogeneity, and 5,337 controls screened to exclude MDD. The sequencing reads were aligned to human reference genome GRCH37 and an average sequencing depth of 1.7x were achieved. Based on this dataset, we identified about 32 million SNPs and the association of them with MDD was analyzed. The availability of this big dataset will be vey helpful for the genetic studies of other complex trait in Chinese population. For the data available here are bam files storing the mapping result for each samples.

Authors

Cai, Na ;
Bigdeli, Tim, B ;
Kretzschmar, Warren ;
Li, Yihan ;
Liang, Jieqin ;
Song, Li ;
Hu, Jingchu ;
Li, Qibin ;
Jin, Wei ;
Hu, Zhenfei ;
Wang, Guangbiao ;
Wang, Linmao ;
Qian, Puyi ;
Liu, Yuan ;
Jiang, Tao ;
Lu, Yao ;
Zhang, Xiuqing ;
Yin, Ye ;
Li, Yingrui ;
Xu, Xun ;
Gan, Xiangchao ;
Reimers, Mark ;
Webb, Todd ;
Riley, Brien ;
Bacanu, Silviu ;
Peterson, Roseann, E ;
Chen, Yiping ;
Zhong, Hui ;
Liu, Zhengrong ;
Wang, Gang ;
Sun, Jing ;
Sang, Hong ;
Jiang, Guoqing ;
Zhou, Xiaoyan ;
Li, Yi ;
Li, Yi ;
Zhang, Wei ;
Wang, Xueyi ;
Fang, Xiang ;
Pan, Runde ;
Miao, Guodong ;
Zhang, Qiwen ;
Hu, Jian ;
Yu, Fengyu ;
Du, Bo ;
Sang, Wenhua ;
Li, Keqing ;
Chen, Guibing ;
Cai, Min ;
Yang, Lijun ;
Yang, Donglin ;
Ha, Baowei ;
Hong, Xiaohong ;
Deng, Hong ;
Li, Gongying ;
Li, Kan ;
Song, Yan ;
Gao, Shugui ;
Zhang, Jinbei ;
Gan, Zhaoyu ;
Meng, Huaqing ;
Pan, Jiyang ;
Gao, Chengge ;
Zhang, Kerang ;
Sun, Ning ;
Li, Youhui ;
Niu, Qihui ;
Zhang, Yutang ;
Liu, Tieqiao ;
Hu, Chunmei ;
Zhang, Zhen ;
Lv, Luxian ;
Dong, Jicheng ;
Wang, Xiaoping ;
Tao, Ming ;
Wang, Xumei ;
Xia, Jing ;
Rong, Han ;
He, Qiang ;
Liu, Tiebang ;
Huang, Guoping ;
Mei, Qiyi ;
Shen, Zhenming ;
Liu, Ying ;
Shen, Jianhua ;
Tian, Tian ;
Liu, Xiaojuan ;
Wu, Wenyuan ;
Gu, Danhua ;
Fu, Guangyi ;
Shi, Jianguo ;
Chen, Yunchun ;
Gao, Jingfang ;
Liu, Lanfen ;
Wang, Lina ;
Yang, Fuzhong ;
Cong, Enzhao ;
Marchini, Jonathan ;
Yang, Huanming ;
Wang, Jian ;
Shi, Shenxun ;
Mott, Richard ;
Xu, Qi ;
Wang, Jun ;
Kendler, Kenneth, S ;
Flint, Jonathan

2 Citations0 Mentions31% FAIR1.1 Dataset Index

10.5524/100155January 2015

Supporting materials for: "Clinical outcome of preimplantation genetic diagnosis and screening using next generation sequencing".

A total of 395 couples were subjected to IVF-PGD treatment, including 129 couples with NGS-based test and 266 couples with SNP array based test for the detection of embryonic chromosomal abnormalities. The NGS test was performed using low coverage whole genome sequencing with HiSeq 2000 platform. And the SNP array test was using Affymetrix Gene Chip Mapping Nsp I 262K. The average age of patients was 32.1 years (age range 20-44 years).

Due to the sensitive nature of this dataset it is being hosted in the secure restricted access database European Genome-Phenome Archive at the EBI. It has been assigned the accession number EGAD00001001037.
To gain access to this dataset you will need to apply for permission from the CITIC Xiangya Hospital and BGI PGD/PGS Data Access Committee (DAC).
There are two forms available to download from GigaDB FTP server (below), both should be completed and emailed to Dr Yueqiu Tan, who is the named representative of the CITIC Xiangya Hospital and BGI PGD/PGS DAC.
After sending the forms to the DAC you will be contacted either by the DAC to decline your application or from the EGA with login details if your application is approved. This process can take several days.

Authors

Tan, Yueqiu ;
Yin, Xuyang ;
Zhang, Shuoping ;
Jiang, Hui ;
Tan, Ke ;
Li, Kian ;
Xiong, Bo ;
Gong, Fei ;
Zhang, Chunlei ;
Pan, Xiaoyu ;
Chen, Fang ;
Chen, Shengpei ;
Gong, Chun ;
Lu, Changfu ;
Luo, Keli ;
Gu, Yifan ;
Zhang, Xiuqing ;
Wang, Wei ;
Xu, Xun ;
Vajta, Gabor ;
Bolund, Lars ;
Yang, Huanming ;
Lu, Guangxiu ;
Du, Yutao ;
Ge, Lin

1 Citation0 Mentions31% FAIR1.0 Dataset Index

10.5524/100112January 2014

Supporting material for: De novo assembly of a haplotype-resolved human genome.

Here we provide the first de novo haplotype-resolved diploid genome sequence of an Asian individual using a unique de novo assembly pipeline. Our pipeline uses fosmid pooling and whole genome shotgun strategies, based on next generation sequencing (NGS) technology. The assembled genome contains 5.15 Gb, with a haplotype N50 of 484 kb. This haplotype-resolved genome represents the most complete genome assembly so far. Our analysis further identified previously undetected indels and novel coding sequences, and thus provides the most complete representation of an individuals genetic variation.
We generated ~614,850 fosmid clones ranging from 20 kb-80 kb with a mean of 36kb, approximately 30 fosmid clones were pooled and each pool had one or two DNA libraries sequenced using Hiseq 2000. In total, 1,712 Gb of raw sequence data was generated for all the pooled fosmid libraries. Please see the linked paper for assembly pipeline details. We then analysed the newly generated haploid-resolved diploid genome (HDG) for SNPs, INDELs, inversions and translocations, of which we identified 3,580,000 SNPs, 762,000 short INDELs (<50bp) and 30,000 long INDELs, 111 inversions and 168 translocations.

Authors

Cao, Hongzhi ;
Wu, Honglong ;
Luo, Ruibang ;
Huang, Shujia ;
Sun, Yuhui ;
Tong, Xin ;
Xie, Yinlong ;
Liu, Binghang ;
Yang, Hailong ;
Zheng, Hancheng ;
Li, Jian, ;
Li, Bo ;
Wang, Yu ;
Yang, Fang ;
Sun, Peng ;
Liu, Siyang ;
Gao, Peng ;
Huang, Haodong ;
Sun, Jing ;
Chen, Dan ;
He, Guangzhu ;
Huang, Weihua ;
Huang, Zheng ;
Li, Yue ;
Tellier, Laurent, CAM ;
Liu, xiao ;
Feng, Qiang ;
Xu, Xun ;
Zhang, Xiuqing ;
Bolund, Lars ;
Krogh, Anders ;
Kristiansen, Karsten ;
Goodman, Laurie ;
Drmanac, Radoje ;
Drmanac, Snezana, A ;
Luo, Qiong ;
Li, Songgang ;
Wang, Jian ;
Yang, Huanming ;
Li, Yingrui ;
Wong, Gane, Ka-Shu ;
Wang, Jun

1 Citation0 Mentions31% FAIR1.2 Dataset Index

10.5524/100096January 2014

Single cell whole-exome sequences of bladder cancer from an individual.

This dataset contains single-cell and whole-tissue sequencing and annotation data from a muscle-invasive bladder transitional cell carcinoma from one individual. The data available includes: single-cell whole-exome sequences from 55 individual cells, including 44 from the tumor and 11 from normal adjacent tissue; whole-tissue DNA sequence data from this cancer and the matched normal. Additional data includes alignments, SNP calling, and high confidence somatic mutation calling and their allelic frequencies.

Authors

Li, Yingrui ;
Xu, Xun ;
Song, Luting ;
Hou, Yong ;
Li, Zesong ;
Wu, Kui ;
Wu, Hanjie ;
Liang, Jie ;
Jian, Min ;
Li, Jingxiang ;
Zhang, Xiuqing ;
Wang, Jian ;
Yang, Huanming ;
Wang, Jun

1 Citation0 Mentions31% FAIR1.1 Dataset Index

10.5524/100037January 2012

Updated genome assembly of YH: the first diploid genome sequence of a Han Chinese individual (version 2, 07/2012)

Updated genomic data from the YH (Homo sapiens) diploid genome the first sequenced Han Chinese individual, a representative of the Asian population. The genomic DNA used in this study came from an anonymous male Han Chinese individual who has no known genetic diseases.The original version of the YH genome was assembled based on 3.3 billion reads using the Illumina Genome Analyzer (see dataset doi:10.5524/100015). This latest (as of 07/2012) and improved version of the YH genome was assembled based on 2.1 billion reads using the Illumina HiSeq2000. A total of 202G nucleotides data was achieved using 100 bp-long paired end reads with an insert size ranging from 180 bp to 40 kbp, and the genome was sequenced to 67.5-fold average coverage. The latest version of SOAPdenovo2 was used to reassemble, improve and update the previously assembled genome (tools and pipelines available here: doi:10.5524/100044). By aligning the short reads with SOAP, 177G nucleotides were mapped onto the NCBI reference genome and 99.99% of the genome was covered. The raw sequences, assemblies and relevant tools are released for public use under a CC0 license.More information about the YH genome can be viewed at: http://yh.genomics.org.cn/

Authors

Wang, Jun ;
Li, Yingrui ;
Luo, R ;
Liu, B ;
Xie, Y ;
Li, Zhuo ;
Fang, Xiaodong ;
Zheng, Hancheng ;
Qin, Junjie ;
Yang, Bin ;
Yu, C ;
Ni, Peixiang ;
Li, Ning ;
Guo, Guangwu ;
Ye, Jia ;
Fang, Lin ;
Su, Yeyang ;
, Asan ;
Zheng, Hongkun ;
Kristiansen, Karsten ;
Wong, Gane, Ka-Shu ;
Nielsen, Rasmus ;
Durbin, Richard ;
Bolund, Lars ;
Zhang, Xiuqing ;
Li, Songgang ;
Yang, Huanming ;
Wang, Jian

6 Citations0 Mentions31% FAIR3.3 Dataset Index

10.5524/100038January 2012

Genomic data from an extinct Palaeo-Eskimo.

Available here is the genome of a male individual from an extinct Palaeo-Eskimo culture, the first known group of Homo sapiens to settle in Greenland.The DNA sample was obtained from ~4,000-year-old permafrost-preserved hair, and was shown to have very low modern DNA contamination. The diploid genome was sequenced to an average depth of 20x using Illumina GAII sequencing platforms, with 79% recovery. Correct indexed reads were mapped to the human genome (hg18) with a suffix array-based method that allows for residual primer trimming. Sequencing yielded a total of 3.5 billion reads.

Authors

Rasmussen, Morten ;
Li, Yingrui ;
Lindgreen, Stinus ;
Pedersen, Jakob, Skou ;
Albrechtsen, Anders ;
Moltke, Ida ;
Metspalu, Mait ;
Metspalu, Ene ;
Kivisild, Toomas ;
Gupta, Ramneek ;
Bertalan, Marcelo ;
Nielsen, Kasper ;
Gilbert, M.Thomas, P ;
Wang, Yong ;
Raghavan, Maanasa ;
Campos, Paula, F ;
Kamp, Hanne, Munkholm ;
Wilson, Andrew, S ;
Gledhill, Andrew ;
Tridico, Silvana ;
Bunce, Michael ;
Lorenzen, Eline, D ;
Binladen, Jonas ;
Guo, Xiaosen ;
Zhao, Jing ;
Zhang, Xiuqing ;
Zhang, Hao ;
Li, Zhuo ;
Chen, Minfeng ;
Orlando, Ludovic ;
Kristiansen, Karsten ;
Bak, Mads ;
Tommerup, Niels ;
Bendixen, Christian ;
Pierre, Tracey, L ;
Gronnow, Bjarne ;
Meldgaard, Morten ;
Andreasen, Claus ;
Fedorova, Sardana, A ;
Osipova, Ludmila, P ;
Higham, Thomas, FG ;
Ramsey, Christopher, Bronk ;
Hansen, Thomas, VO ;
Nielsen, Finn, C ;
Crawford, Michael, H ;
Brunak, Søren ;
Sicheritz-Ponten, Thomas ;
Villems, Richard ;
Nielsen, Rasmus ;
Krogh, Anders ;
Wang, Jun ;
Willerslev, Eske

1 Citation0 Mentions31% FAIR1.1 Dataset Index

10.5524/100026January 2011

Genomic data from the crab-eating macaque/cynomolgus monkey (<em>Macaca fascicularis</em>).

The crab-eating macaque (Macaca fascicularis), also known as the Java macaque or long-tailed macaque, is a species of primate located throughout Southeast Asia. Due to the frequent usage of the genus Macaca in scientific research, the sequence the crab-eating macaque furthers our understanding on how it differs from other macaque species, like the Chinese rhesus macaque and the Indian rhesus macaque. This is especially relevant considering the recent trend of using crab-eating macaque (CE) and Chinese rhesus macaques rather than the Indian rhesus macaque as laboratory models.The DNA sample for genome sequencing and analyses was from a female CE that was a captive-bred descendent of a CE from Vietnam. The genome was sequenced on the IlluminaGAIIx platform, and we obtained 162-Gb of high-quality sequence, representing 54-fold coverage. The sequencing data were processed with Illumina custom computational pipelines. The genome was de novo assembled using SOAPdenovo program based on the de Bruijn graph algorithm methods. The total size of the assembled genome was about 2.85 Gb, providing 54-fold coverage on average. The scaffolds were assigned to the chromosomes according to the synteny displayed with the Indian rhesus macaque and human genome sequences. About 92% of the CE scaffolds could be placed onto chromosomes.

Authors

Yan, Guangmei ;
Zhang, Guojie ;
Fang, Xiaodong ;
Zhang, Yanfeng ;
Li, Cai ;
Ling, Fei ;
Cooper, David, N ;
Li, Qiye ;
Li, Yan ;
van Gool, Alain, J ;
Du, Hongli ;
Chen, Jiesi ;
Chen, Ronghua ;
Zhang, Pei ;
Huang, Zhiyong ;
Thompson, John, R ;
Meng, Yuhuan ;
Bai, Yinqi ;
Wang, Jufang ;
Zhuo, Min ;
Wang, Tao ;
Huang, Ying ;
Wei, Liqiong ;
Li, Jianwen ;
Wang, Zhiwen ;
Hu, Haofu ;
Le, Liang ;
Stenson, Peter, D ;
Li, Bo ;
Liu, Xiaoming ;
Ball, Edward, V ;
An, Na ;
Huang, Quanfei ;
Zhang, Yong ;
Fan, Wei ;
Zhang, Xiuqing ;
Li, Yingrui ;
Wang, Wen ;
Katze, Michael, G ;
Su, Bing ;
Nielsen, Rasmus ;
Yang, Huanming ;
Wang, Jun ;
Wang, Xiaoning ;
Wang, Jian

5 Citations0 Mentions31% FAIR2.5 Dataset Index

10.5524/100003January 2011

Automated Author ProfileZhang, Xiuqing

Zhang, Xiuqing

Current S-Index

Average Dataset Index per Dataset

Total Datasets

Average FAIR Score

Total Citations

Total Mentions

S-Index Interpretation

S-Index Over Time

Cumulative Citations Over Time

Cumulative Mentions Over Time

Datasets

Supporting data for "SOAPnuke: A MapReduce Acceleration supported Software for integrated Quality Control and Preprocessing of High-Throughput Sequencing Data"

Supporting data for "PSSMHCpan: a novel PSSM based software for predicting class I peptide-HLA binding affinity"

Supporting data for "Full-length single cell RNA-seq applied to a viral human cancer: Applications to HPV expression and splicing analysis in HeLa S3 cells".

Supporting data for "Sparse whole-genome sequencing identifies two loci for major depressive disorder".

Supporting materials for: "Clinical outcome of preimplantation genetic diagnosis and screening using next generation sequencing".

Supporting material for: De novo assembly of a haplotype-resolved human genome.

Single cell whole-exome sequences of bladder cancer from an individual.

Updated genome assembly of YH: the first diploid genome sequence of a Han Chinese individual (version 2, 07/2012)

Genomic data from an extinct Palaeo-Eskimo.

Genomic data from the crab-eating macaque/cynomolgus monkey (<em>Macaca fascicularis</em>).

Automated Author Profile
Zhang, Xiuqing