Scholar Data

Datasets

Supporting data for "halSynteny: a fast, easy-to-use conserved synteny block construction method for multiple whole-genome alignments"

Large-scale sequencing projects provide high-quality full genome data that can be used for reconstruction of chromosomal exchanges and rearrangements that disrupt conserved syntenic blocks. The highest resolution of cross-species homology can be obtained based on whole-genome, reference-free alignments. Very large multiple alignments of full-genome sequences stored in a binary format demand an accurate and efficient computational approach for synteny blocks production.
halSynteny performs efficient processing of pairwise alignment blocks for any pair of genomes in the alignment. The tool is part of the HAL comparative genomics suite and is targeted to build synteny blocks for multi-hundred way, reference-free vertebrate alignments built with the Cactus system.
halSynteny enables an accurate and rapid identification of synteny in multiple full-genome alignments. The method is implemented in C++11 as a component of the halTools software and released under MIT license. The package is available at https://github.com/ComparativeGenomicsToolkit/hal/.

Authors

Krasheninnikova, Ksenia ;
Diekhans, Mark ;
Armstrong, Joel ;
Dievskii, Aleksei ;
Paten, Benedict ;
O’Brien, Stephen, J

1 Citation0 Mentions31% FAIR1.1 Dataset Index

10.5524/100740January 2020

High quality chimpanzee reference genome (Pan_tro_3.0) from hybrid assembly approach

The chimpanzee is arguably the most important species for the study of human origins. A key resource for these studies is a high quality reference genome assembly. The current iteration of the chimpanzee reference genome assembly (Pan_tro_2.1.4) is highly fragmented, with more than 183,000 contigs and incorporating over 159,000 gaps, with a genome wide contig N50 of 51 Kbp.
In this work we produce an extensive and diverse array of sequencing datasets to rapidly assemble a new chimpanzee reference that surpasses previous iterations in bases represented and organized in large scaffolds. We show substantial improvements over the Pan_tro_2.1.4 version by several metrics: increased contiguity by >750% and 300% on contigs and scaffolds, respectively; closure of 77% of gaps in the Pan_tro_2.1.4 assembly gaps spanning >850 Kbp of novel coding sequence based on RNASeq data. We furthermore report over 2,700 genes that had putatively erroneous frame-shift predictions to human in Pan_tro_2.1.4 and show a substantial increase in the annotation of repetitive elements.
We apply a simple 3-way hybrid approach to considerably improve the reference genome assembly for the chimpanzee, providing a valuable resource to study human origins. We furthermore produced extensive sequencing datasets that are all derived from the same cell line, generating a broad non-human benchmark dataset.

Authors

Kuderna, Lukas, FK ;
Tomlinson, Chad ;
Hillier, LaDeana, W ;
Tran, Annabel ;
Fiddes, Ian ;
Armstrong, Joel ;
Laayouni, Hafid ;
Gordon, David ;
Huddleston, John ;
Perez, Raquel, Garcia ;
Povolotskaya, Inna ;
Armero, Aitor, Serres ;
Garrido, Jessica, Gomez ;
Ho, Daniel ;
Ribeca, Paolo ;
Alioto, Tyler ;
Green, Richard, E ;
Paten, Benedict ;
Navarro, Arcadi ;
Betranpetit, Jaume ;
Herrero, Javier ;
Eichler, Evan, E ;
Sharp, Andrew, J ;
Feuk, Lars ;
Warren, Wesley, C ;
Marques-Bonet, Tomas

2 Citations0 Mentions31% FAIR1.0 Dataset Index

10.5524/100327January 2017

The genome of the American aligator (Alligator mississippiensis).

The American alligator (Alligator mississippiensis)), as the name suggests, is endemic to North America, specifically to the South-East United States, where it inhabits wetlands on the Atlantic coast from North Carolina to Florida, and in the Northern Gulf of Mexico west to Texas.
These data have been produced as part of the Crocodilian Genomes Project. Genomic DNA was isolated using blood from two wild-caught individuals. Briefly, the alligator data consisted of Illumina sequences from five libraries ranging from 5.5 to 88.7x coverage. These reads were assembled using AllPaths-LG with default parameters. Legacy data from 21 fully sequenced BACs, 1309 BAC-end read pairs, and RNASeq data, were also used to aid the assembly.

Authors

Green, Richard, E ;
Braun, Edward, L ;
Armstrong, Joel ;
Earl, Dent ;
Nguyen, Ngan ;
Hickey, Glenn ;
Vandewege, Michael, W ;
St John, John, A ;
Capella-Gutierrez, Salvador ;
Castoe, Todd, A ;
Kern, Colin ;
Fujita, Matthew, K ;
Opazo, Juan, C ;
Jurka, Jerzy ;
Kojima, Kenji, K ;
Caballero, Juan ;
Hubley, Robert, M ;
Smit, Arian, F ;
Platt, Roy, N ;
Lavoie, Christine, A ;
Ramakodi, Meganathan, P ;
Finger Jr., John, W ;
Suh, Alexander ;
Isberg, Sally, R ;
Miles, Lee ;
Chong, Amanda, Y ;
Jaratlerdsiri, Weerachai ;
Gongora, Jaime ;
Moran, Christopher ;
Iriarte, Andres ;
McCormack, John ;
Burgess, Shane, C ;
Edwards, Scott, V ;
Lyons, Eric ;
Williams, Christina ;
Breen, Matthew ;
Howard, Jason, T ;
Gresham, Cathy, R ;
Peterson, Daniel, G ;
Schmitz, Jurgen ;
Pollock, David, D ;
Haussler, David ;
Triplett, Eric, W ;
Zhang, Guojie ;
Irie, Naoki ;
Jarvis, Erich, D ;
Brochu, Christopher, A ;
Schmidt, Carl, J ;
McCarthy, Fiona, M ;
Faircloth, Brant, C ;
Hoffmann, Federico, G ;
Glenn, Travis, C ;
Gabaldon, Toni ;
Paten, Benedict ;
Ray, David, A

1 Citation0 Mentions31% FAIR1.1 Dataset Index

10.5524/100126January 2014

The genome of the Indian gharial (Gavialis gangeticus).

The Indian gharial (Gavialis gangeticus), also known as the gavial or the fish-eating crocodile, is a crocodilian of the family Gavialidae, native to the Indian Subcontinent. The global gharial population is estimated at fewer than 235 individuals, which are threatened by loss of riverine habitat, depletion of fish resources and use of fishing nets.
These data have been produced as part of the Crocodilian Genomes Project. Genomic DNA was isolated using blood from an adult female that died of natural causes in 2010. Briefly, the gharial data consisted of Illumina reads from three libraries ranging from 50 to 170x coverage. These reads were assembled using SOAPdenovo v2.04.

Authors

Green, Richard, E ;
Braun, Edward, L ;
Armstrong, Joel ;
Earl, Dent ;
Nguyen, Ngan ;
Hickey, Glenn ;
Vandewege, Michael, W ;
St John, John, A ;
Capella-Gutierrez, Salvador ;
Castoe, Todd, A ;
Kern, Colin ;
Fujita, Matthew, K ;
Opazo, Juan, C ;
Jurka, Jerzy ;
Kojima, Kenji, K ;
Caballero, Juan ;
Hubley, Robert, M ;
Smit, Arian, F ;
Platt, Roy, N ;
Lavoie, Christine, A ;
Ramakodi, Meganathan, P ;
Finger Jr., John, W ;
Suh, Alexander ;
Isberg, Sally, R ;
Miles, Lee ;
Chong, Amanda, Y ;
Jaratlerdsiri, Weerachai ;
Gongora, Jaime ;
Moran, Christopher ;
Iriarte, Andres ;
McCormack, John ;
Burgess, Shane, C ;
Edwards, Scott, V ;
Lyons, Eric ;
Williams, Christina ;
Breen, Matthew ;
Howard, Jason, T ;
Gresham, Cathy, R ;
Peterson, Daniel, G ;
Schmitz, Jurgen ;
Pollock, David, D ;
Haussler, David ;
Triplett, Eric, W ;
Zhang, Guojie ;
Irie, Naoki ;
Jarvis, Erich, D ;
Brochu, Christopher, A ;
Schmidt, Carl, J ;
McCarthy, Fiona, M ;
Faircloth, Brant, C ;
Hoffmann, Federico, G ;
Glenn, Travis, C ;
Gabaldon, Toni ;
Paten, Benedict ;
Ray, David, A

1 Citation0 Mentions31% FAIR1.1 Dataset Index

10.5524/100128January 2014

The genome of the saltwater crocodile (Crocodylus porosus).

Saltwater crocodiles (Crocodylus porosus) are the largest reptilian species alive today and as the name suggests they shows a high tolerance to salinity.
Native to much of Australasia and parts of Asia, they are found mostly in coastal waters or around rivers. They are strong swimmers and can be found very far from land.
These data have been produced as part of the Crocodilian Genomes Project. Genomic DNA was isolated using blood from "Errol", a 4.65 meter long male, sampled on the 25th October 2007 at Darwin Crocodile Farm, Noonamah, Northern Territory, Australia. Errol was relocated from the wild (Mud Island, Daly River, Northern Territory) after being deemed a problem crocodile in 1981, where he was housed individually within a purpose-built facility in the tourist section of the farm. He currently resides at the Fort Worth Zoo, Texas, USA.
Briefly, the crocodile data consisted of Illumina reads from three libraries ranging from 21.6 to 90.2x coverage. These reads were assembled using AllPaths-LG with default parameters. Legacy sequence data from 360 MHC region BAC assemblies as well as RNASeq data were used to aid the assembly.

Authors

Green, Richard, E ;
Braun, Edward, L ;
Armstrong, Joel ;
Earl, Dent ;
Nguyen, Ngan ;
Hickey, Glenn ;
Vandewege, Michael, W ;
St John, John, A ;
Capella-Gutierrez, Salvador ;
Castoe, Todd, A ;
Kern, Colin ;
Fujita, Matthew, K ;
Opazo, Juan, C ;
Jurka, Jerzy ;
Kojima, Kenji, K ;
Caballero, Juan ;
Hubley, Robert, M ;
Smit, Arian, F ;
Platt, Roy, N ;
Lavoie, Christine, A ;
Ramakodi, Meganathan, P ;
Finger Jr., John, W ;
Suh, Alexander ;
Isberg, Sally, R ;
Miles, Lee ;
Chong, Amanda, Y ;
Jaratlerdsiri, Weerachai ;
Gongora, Jaime ;
Moran, Christopher ;
Iriarte, Andres ;
McCormack, John ;
Burgess, Shane, C ;
Edwards, Scott, V ;
Lyons, Eric ;
Williams, Christina ;
Breen, Matthew ;
Howard, Jason, T ;
Gresham, Cathy, R ;
Peterson, Daniel, G ;
Schmitz, Jurgen ;
Pollock, David, D ;
Haussler, David ;
Triplett, Eric, W ;
Zhang, Guojie ;
Irie, Naoki ;
Jarvis, Erich, D ;
Brochu, Christopher, A ;
Schmidt, Carl, J ;
McCarthy, Fiona, M ;
Faircloth, Brant, C ;
Hoffmann, Federico, G ;
Glenn, Travis, C ;
Gabaldon, Toni ;
Paten, Benedict ;
Ray, David, A

1 Citation0 Mentions31% FAIR1.0 Dataset Index

10.5524/100127January 2014

Annotation and analysis of three crocodilian genomes.

Crocodilians are important model organisms in fields as diverse as developmental biology, osmoregulation, cardiophysiology, paleoclimatology, sex determination, population genetics, paleobiogeography, and functional morphology. Crocodilians, birds, dinosaurs, and pterosaurs comprise a monophyletic group known as the archosaurs. Crocodilians and birds are the only extant members and thus crocodilians (alligators, caimans, crocodiles, and gharials) are the closest living relatives of all birds.
To provide context for the diversifications of archosaurs, we generated draft genomes of three crocodilians, Alligator mississippiensis (the American alligator), Crocodylus porosus (the saltwater crocodile), and Gavialis gangeticus (the Indian gharial). We generated high-coverage Illumina sequence data from paired-end and mate-pair libraries from each species. The assembly strategy for each taxon differed due to varying legacy data and developments in library preparation methods during the course of the project. Gene annotation was accomplished using a combination of RNASeq data and homology-based analyses. We identified 23,323 protein-coding genes in the alligator compared to 13,321 and 14,043 in crocodile and gharial, respectively. Transposable elements (TEs) were identified de novo in all three crocodilians and analyses resulted in a library of 1269 different TEs.

Authors

Green, Richard, E ;
Braun, Edward, L ;
Armstrong, Joel ;
Earl, Dent ;
Nguyen, Ngan ;
Hickey, Glenn ;
Vandewege, Michael, W ;
St John, John, A ;
Capella-Gutierrez, Salvador ;
Castoe, Todd, A ;
Kern, Colin ;
Fujita, Matthew, K ;
Opazo, Juan, C ;
Jurka, Jerzy ;
Kojima, Kenji, K ;
Caballero, Juan ;
Hubley, Robert, M ;
Smit, Arian, F ;
Platt, Roy, N ;
Lavoie, Christine, A ;
Ramakodi, Meganathan, P ;
Finger Jr., John, W ;
Suh, Alexander ;
Isberg, Sally, R ;
Miles, Lee ;
Chong, Amanda, Y ;
Jaratlerdsiri, Weerachai ;
Gongora, Jaime ;
Moran, Christopher ;
Iriarte, Andres ;
McCormack, John ;
Burgess, Shane, C ;
Edwards, Scott, V ;
Lyons, Eric ;
Williams, Christina ;
Breen, Matthew ;
Howard, Jason, T ;
Gresham, Cathy, R ;
Peterson, Daniel, G ;
Schmitz, Jurgen ;
Pollock, David, D ;
Haussler, David ;
Triplett, Eric, W ;
Zhang, Guojie ;
Irie, Naoki ;
Jarvis, Erich, D ;
Brochu, Christopher, A ;
Schmidt, Carl, J ;
McCarthy, Fiona, M ;
Faircloth, Brant, C ;
Hoffmann, Federico, G ;
Glenn, Travis, C ;
Gabaldon, Toni ;
Paten, Benedict ;
Ray, David, A

2 Citations0 Mentions31% FAIR1.7 Dataset Index

10.5524/100125January 2014

Assemblathon 2 assemblies.

Assemblathon 2 is a genome assembly contest where participating teams attempted to assemble genomes for three vertebrate species using a mixture of next-generation sequencing data. In total, 43 assemblies were submitted for three species (15 for bird, 16 for fish, and 12 for snake). These assemblies were assessed using a wide variety of statistical approaches as well as using experimental data from Fosmid sequences and optical maps.

Authors

Bradnam, Keith, R ;
Fass, Joseph, N ;
Alexandrov, Anton ;
Baranay, Paul ;
Bechner, Michael ;
Birol, Inanç ;
Boisvert, Sébastien ;
Chapman, Jarrod, A ;
Chapuis, Guillaume ;
Chikhi, Rayan ;
Chitsaz, Hamidreza ;
Chou, Wen-Chi ;
Corbeil, Jacques ;
Del Fabbro, Cristian ;
Docking, T.Roderick, R ;
Durbin, Richard ;
Earl, Dent ;
Emrich, Scott ;
Fedotov, Pavel ;
Fonseca, Nuno, A ;
Ganapathy, Ganeshkumar ;
Gibbs, Richard, A ;
Gnerre, Sante ;
Godzaridis, Élénie ;
Goldstein, Steve ;
Haimel, Matthias ;
Hall, Giles ;
Haussler, David ;
Hiatt, Joseph, B ;
Ho, Isaac ;
Howard, Jason, T ;
Hunt, Martin ;
Jackman, Shaun, D ;
Jaffe, David, B ;
Jarvis, Erich, D ;
Jiang, Huaiyang ;
Kazakov, Sergey ;
Kersey, Paul, J ;
Kitzman, Jacob, O ;
Knight, James, R ;
Koren, Sergey ;
Lam, Tak-Wah ;
Lavenier, Dominique ;
Laviolette, François ;
Li, Yingrui ;
Li, Zhenyu ;
Liu, Binghang ;
Liu, Yue ;
Luo, Ruibang ;
MacCallum, Iain ;
MacManes, Matthew, D ;
Maillet, Nicolas ;
Melnikov, Sergey ;
Naquin, Delphine ;
Ning, Zemin ;
Otto, Thomas, D ;
Paten, Benedict ;
Paulo, Octávio, S ;
Phillippy, Adam, M ;
Pina-Martins, Francisco ;
Place, Michael ;
Przybylski, Dariusz ;
Qin, Xiang ;
Qu, Carson ;
Ribeiro, Filipe, J ;
Richards, Stephen ;
Rokhsar, Daniel, S ;
Ruby, J.Graham ;
Scalabrin, Simone ;
Schatz, Michael, C ;
Schwartz, David, C ;
Sergushichev, Alexey ;
Sharpe, Ted ;
Shaw, Timothy, I ;
Shendure, Jay ;
Shi, Yujian ;
Simpson, Jared, T ;
Song, Henry ;
Tsarev, Fedor ;
Vezzi, Francesco ;
Vicedomini, Riccardo ;
Vieira, Bruno, M ;
Wang, Jun ;
Worley, Kim, C ;
Yin, Shuangye ;
Yiu, Siu-Ming ;
Yuan, Jianying ;
Zhang, Guojie ;
Zhang, Hao ;
Zhou, Shiguo ;
Korf, Ian, F

6 Citations1 Mention31% FAIR3.8 Dataset Index

10.5524/100060January 2013

Automated Author Profile
Paten, Benedict
0000-0001-8863-3539

Paten, Benedict

Current S-Index

Average Dataset Index per Dataset

Total Datasets

Average FAIR Score

Total Citations

Total Mentions

S-Index Interpretation

S-Index Over Time

Cumulative Citations Over Time

Cumulative Mentions Over Time

Datasets

Supporting data for "halSynteny: a fast, easy-to-use conserved synteny block construction method for multiple whole-genome alignments"

High quality chimpanzee reference genome (Pan_tro_3.0) from hybrid assembly approach

The genome of the American aligator (<em>Alligator mississippiensis</em>).

The genome of the Indian gharial (<em>Gavialis gangeticus</em>).

The genome of the saltwater crocodile (<em>Crocodylus porosus</em>).

Annotation and analysis of three crocodilian genomes.

Assemblathon 2 assemblies.

Automated Author ProfilePaten, Benedict0000-0001-8863-3539

Paten, Benedict

Current S-Index

Average Dataset Index per Dataset

Total Datasets

Average FAIR Score

Total Citations

Total Mentions

S-Index Interpretation

S-Index Over Time

Cumulative Citations Over Time

Cumulative Mentions Over Time

Datasets

Supporting data for "halSynteny: a fast, easy-to-use conserved synteny block construction method for multiple whole-genome alignments"

High quality chimpanzee reference genome (Pan_tro_3.0) from hybrid assembly approach

The genome of the American aligator (<em>Alligator mississippiensis</em>).

The genome of the Indian gharial (<em>Gavialis gangeticus</em>).

The genome of the saltwater crocodile (<em>Crocodylus porosus</em>).

Annotation and analysis of three crocodilian genomes.

Assemblathon 2 assemblies.

Automated Author Profile
Paten, Benedict
0000-0001-8863-3539