Center for Agricultural Biotechnology, University of Maryland Biotechnology InstituteCollege Park, Maryland 20742, USA

Datasets

Data from: More taxa or more characters revisited: combining data from nuclear protein-encoding genes for phylogenetic analyses of Noctuoidea (Insecta: Lepidoptera) (Version: 1)

A central question concerning data collection strategy for molecular phylogenies has been, is it better to increase the number of characters or the number of taxa sampled to improve the robustness of a phylogeny estimate? A recent simulation study concluded that increasing the number of taxa sampled is preferable to increasing the number of nucleotide characters, if taxa are chosen specifically to break up long branches. We explore this hypothesis by using empirical data from noctuoid moths, one of the largest superfamilies of insects. Separate studies of two nuclear genes, elongation factor-1α (EF-1α) and dopa decarboxylase (DDC), have yielded similar gene trees and high concordance with morphological groupings for 49 exemplar species. However, support levels were quite low for nodes deeper than the subfamily level. We tested the effects on phylogenetic signal of (1) increasing the taxon sampling by nearly 60%, to 77 species, and (2) combining data from the two genes in a single analysis. Surprisingly, the increased taxon sampling, although designed to break up long branches, generated greater disagreement between the two gene data sets and decreased support levels for deeper nodes. We appear to have inadvertently introduced new long branches, and breaking these up may require a yet larger taxon sample. Sampling additional characters (combining data) greatly increased the phylogenetic signal. To contrast the potential effect of combining data from independent genes with collection of the same total number of characters from a single gene, we simulated the latter by bootstrap augmentation of the single-gene data sets. Support levels for combined data were at least as high as those for the bootstrap-augmented data set for DDC and were much higher than those for the augmented EF-1α data set. This supports the view that in obtaining additional sequence data to solve a refractory systematic problem, it is prudent to take them from an independent gene.

Authors

Mitchell, Andrew ;
Mitter, Charles ;
Regier, Jerome C.

1 Citation0 Mentions77% FAIR2.0 Dataset Index

10.5061/dryad.5582009

Automated Organization Profile
Center for Agricultural Biotechnology, University of Maryland Biotechnology InstituteCollege Park, Maryland 20742, USA

Center for Agricultural Biotechnology, University of Maryland Biotechnology InstituteCollege Park, Maryland 20742, USA

Current S-Index

Average Dataset Index per Dataset

Total Datasets

Average FAIR Score

Total Citations

Total Mentions

S-Index Interpretation

S-Index Over Time

Cumulative Citations Over Time

Cumulative Mentions Over Time

Datasets

Data from: More taxa or more characters revisited: combining data from nuclear protein-encoding genes for phylogenetic analyses of Noctuoidea (Insecta: Lepidoptera) (Version: 1)

Automated Organization ProfileCenter for Agricultural Biotechnology, University of Maryland Biotechnology InstituteCollege Park, Maryland 20742, USA

Center for Agricultural Biotechnology, University of Maryland Biotechnology InstituteCollege Park, Maryland 20742, USA

Current S-Index

Average Dataset Index per Dataset

Total Datasets

Average FAIR Score

Total Citations

Total Mentions

S-Index Interpretation

S-Index Over Time

Cumulative Citations Over Time

Cumulative Mentions Over Time

Datasets

Data from: More taxa or more characters revisited: combining data from nuclear protein-encoding genes for phylogenetic analyses of Noctuoidea (Insecta: Lepidoptera) (Version: 1)

Automated Organization Profile
Center for Agricultural Biotechnology, University of Maryland Biotechnology InstituteCollege Park, Maryland 20742, USA