Scholar Data

Single-copy orthologous genes used for Ricefish phylogeny

Ortholog set We generated a reference set consisting of 8390 single-copy protein-coding genes derived from OrthoDB v.9.1 (Waterhouse et al., 2013) available for the following species: Austrofundulus limnaeus, Centrocoris variegatus, Fundulus heteroclitus, Kryptolebias marmoratus, Nothobranchius furzeri, Oryzias latipes, O. melastigma, Poecilia formosa, P. latipinna ,P. mexicana, P. reticulata and Xiphophorus maculatus (NCBI Accession numbers in Table S7). The hierarchical split was set to Actinopterygii (ID 7898). We used the script “make-ogs-corresponding.pl” to check for inconsistencies between the amino acid sequences and the corresponding nucleotide sequences and removed 96 problematic genes (Tab. S7). Identification of orthologs for transcripts and genome and alignment of single-copy genes Ortholog identification among 16 ricefish species and four outgroups (DS1, supplementary tables Tab. S1a) was carried out with Orthograph v0.7.1 (Petersen et al., 2017). Forward search for candidate transcript was left at default. Best reciprocal hit: Ortholog candidate genes needed at least one hit in either O. latipes or O. melastigma and we allowed concatenation of hits if they met the criteria and did not overlap. Max-blast-searches were set to 50, blast-max-hits were also set to 50. “U” in the amino acid sequences was changed to “X” to avoid issues in downstream analysis. The results of the orthology prediction were summarized for all species using a custom perl script coming with the orthograph package. Sequences of only those orthologs with all species present were aligned using MAFFT v7.221 with the L-INS-I algorithm on amino acid level (Katoh & Standley, 2013). 915 orthologs with outliers were identified according to Misof et al. 2014 and were subsequently removed from further analysis. We used the amino-acid alignments as blue print to generate corresponding nucleotide alignments with a modified version of Pal2Nal v14 (Misof et al., 2014; Suyama et al., 2006). To check each amino acid alignment for ambiguously aligned regions, we ran ALISCORE v2.0 with the maximal number of possible sequence selected pairs to analyze (-r) (Kück et al., 2010; Misof et al., 2014; Misof & Misof, 2009). Sites which needed masking were cut out using ALICUT v2.3 (Kück, 2009) from the amino acid alignments and correspondingly also from the nucleotide alignments. For further analyses we only proceeded with the data set on nucleotide level.

Authors

Flury, Jana M. ;
Meusemann, Karen ;
Martin, Sebastian ;
Hilgers, Leon ;
Spanke, Tobias ;
Böhne, Astrid ;
Herder, Fabian ;
Mokodongan, Daniel F. ;
Altmüller, Janine ;
Wowor, Daisy ;
Misof, Bernhard ;
Nolte, Arne W. ;
Schwarzer, Julia

0 Citations0 Mentions13% FAIR0.1 Dataset Index

10.5281/zenodo.7993644July 2023

Single-copy orthologous genes used for Ricefish phylogeny

Authors

Flury, Jana M. ;
Meusemann, Karen ;
Martin, Sebastian ;
Hilgers, Leon ;
Spanke, Tobias ;
Böhne, Astrid ;
Herder, Fabian ;
Mokodongan, Daniel F. ;
Altmüller, Janine ;
Wowor, Daisy ;
Misof, Bernhard ;
Nolte, Arne W. ;
Schwarzer, Julia

0 Citations0 Mentions13% FAIR0.1 Dataset Index

10.5281/zenodo.7993643July 2023

Assembly data files Oryzias dopingdopingensis

Identification and masking of repetitive elements in the genome sequence of O. dopingdopingensis was performed with the following bioinformatic tool case. Nucleotides were masked using the DUST algorithm with dustmasker (version 1.0.0, part of blast+ 2.9.0 (Altschul et al., 1990; Camacho et al., 2009) (Kuzio et al., unpublished but described in (Morgulis et al., 2006). Tandem Repeats were identified with Tandem Repeat Finder (trf version 4.09) (Benson, 1999). A species-specific de novo repeat library was built with RepeatModeler v1.0.11 (http://www.repeatmasker.org/RepeatModeler/). Repeat Elements were located in the genome sequence using RepeatMasker (version 4.1.0) (http://www.repeatmasker.org) with the de novo and Danio rerio libraries. The information from all four repeat analyses was merged and the genome was softmasked with bedtools (2.29.2) (Quinlan & Hall, 2010) PMID: 20110278; PMCID: PMC2832824.]. All steps of masking repetitive regions were performed with scripts provided by the sigenae platform, following the workflow from (Feron et al., 2020). For the identification of genes the masked genome was annotated with funannotate (Palmer & Stajich, 2019). The sequences were sorted by length with the ‘funannotate sort’ function, followed by a gene prediction with ‘funannotate predict’. No training based on RNA-Seq data was performed since it was not available for this species. Additional external evidence from transcripts and proteins was added. As transcript evidence, gene predictions from Oryzias latipes (NCBI Bioproject:PRJNA183868; Assembly: GCF_002234675.1) (Kasahara et al., 2007) and Oryzias melastigma (NCBI Bioproject: PRJNA401159 ; Assembly: ASM292280v2) (Kim et al., 2018) were used. As protein evidence, a protein set from Oryzias javanicus (NCBI Bioprject : PRJNA505405 ; Assembly: GCA_003999625.1) (Lee et al., 2020), manually annotated reference sequences from UniProt Knowledgebase (UniProtKB) (Release 2020_02 (22-Apr-2020) UniProtKB/Swiss-Prot with 562,253 entries ) (Apweiler et al., 2004) and a set of orthologous sequences generated in this study. Furthermore, the de novo gene predictors were trained with the Busco dataset of actinopterygii_odb10. Gene prediction resulted in a total of 56658 genes.

Authors

Flury, Jana M. ;
Meusemann, Karen ;
Martin, Sebastian ;
Hilgers, Leon ;
Spanke, Tobias ;
Böhne, Astrid ;
Herder, Fabian ;
Mokodongan, Daniel F. ;
Altmüller, Janine ;
Wowor, Daisy ;
Misof, Bernhard ;
Nolte, Arne W. ;
Schwarzer, Julia

0 Citations0 Mentions69% FAIR0.7 Dataset Index

10.5281/zenodo.8064517July 2023

Assembly data files Oryzias dopingdopingensis

Authors

Flury, Jana M. ;
Meusemann, Karen ;
Martin, Sebastian ;
Hilgers, Leon ;
Spanke, Tobias ;
Böhne, Astrid ;
Herder, Fabian ;
Mokodongan, Daniel F. ;
Altmüller, Janine ;
Wowor, Daisy ;
Misof, Bernhard ;
Nolte, Arne W. ;
Schwarzer, Julia

0 Citations0 Mentions69% FAIR0.7 Dataset Index

10.5281/zenodo.8064516July 2023

Automated Author Profile
Misof, Bernhard
Leibniz Institute for the Analysis of Biodiversity Change, Bonn, Germany

Misof, Bernhard

Current S-Index

Average Dataset Index per Dataset

Total Datasets

Average FAIR Score

Total Citations

Total Mentions

S-Index Interpretation

S-Index Over Time

Cumulative Citations Over Time

Cumulative Mentions Over Time

Datasets

Single-copy orthologous genes used for Ricefish phylogeny

Single-copy orthologous genes used for Ricefish phylogeny

Assembly data files Oryzias dopingdopingensis

Assembly data files Oryzias dopingdopingensis

Automated Author ProfileMisof, BernhardLeibniz Institute for the Analysis of Biodiversity Change, Bonn, Germany

Misof, Bernhard

Current S-Index

Average Dataset Index per Dataset

Total Datasets

Average FAIR Score

Total Citations

Total Mentions

S-Index Interpretation

S-Index Over Time

Cumulative Citations Over Time

Cumulative Mentions Over Time

Datasets

Single-copy orthologous genes used for Ricefish phylogeny

Single-copy orthologous genes used for Ricefish phylogeny

Assembly data files Oryzias dopingdopingensis

Assembly data files Oryzias dopingdopingensis

Automated Author Profile
Misof, Bernhard
Leibniz Institute for the Analysis of Biodiversity Change, Bonn, Germany