Automated Author ProfileMisof, Bernhard
Leibniz Institute for the Analysis of Biodiversity Change, Bonn, Germany
Misof, Bernhard
Current S-Index
Sum of Dataset Indices for all datasets
Average Dataset Index per Dataset
Average Dataset Index per dataset
Total Datasets
Total datasets for this author
Average FAIR Score
Average FAIR Score per dataset
Total Citations
Total citations to the author's datasets
Total Mentions
Total mentions of the author's datasets
S-Index Interpretation
The S-Index (Sharing Index) is a comprehensive metric that represents the cumulative impact of all your datasets. It is calculated as the sum of Dataset Index scores across all your claimed datasets.
What it means:
- A higher S-index indicates greater overall impact of your datasets relative to typical datasets in their fields of research
- The S-Index grows as you add more datasets or as existing datasets gain more citations and mentions
- It provides a single number to track your research data impact over time
Current S-Index: 1.8 (sum of 4 datasets Dataset Index scores)
More information here.
S-Index Over Time
Cumulative Citations Over Time
Cumulative Mentions Over Time
Datasets
Ortholog set We generated a reference set consisting of 8390 single-copy protein-coding genes derived from OrthoDB v.9.1 (Waterhouse et al., 2013) available for the following species: Austrofundulus limnaeus, Centrocoris variegatus, Fundulus heteroclitus, Kryptolebias marmoratus, Nothobranchius furzeri, Oryzias latipes, O. melastigma, Poecilia formosa, P. latipinna ,P. mexicana, P. reticulata and Xiphophorus maculatus (NCBI Accession numbers in Table S7). The hierarchical split was set to Actinopterygii (ID 7898). We used the script “make-ogs-corresponding.pl” to check for inconsistencies between the amino acid sequences and the corresponding nucleotide sequences and removed 96 problematic genes (Tab. S7). Identification of orthologs for transcripts and genome and alignment of single-copy genes Ortholog identification among 16 ricefish species and four outgroups (DS1, supplementary tables Tab. S1a) was carried out with Orthograph v0.7.1 (Petersen et al., 2017). Forward search for candidate transcript was left at default. Best reciprocal hit: Ortholog candidate genes needed at least one hit in either O. latipes or O. melastigma and we allowed concatenation of hits if they met the criteria and did not overlap. Max-blast-searches were set to 50, blast-max-hits were also set to 50. “U” in the amino acid sequences was changed to “X” to avoid issues in downstream analysis. The results of the orthology prediction were summarized for all species using a custom perl script coming with the orthograph package. Sequences of only those orthologs with all species present were aligned using MAFFT v7.221 with the L-INS-I algorithm on amino acid level (Katoh & Standley, 2013). 915 orthologs with outliers were identified according to Misof et al. 2014 and were subsequently removed from further analysis. We used the amino-acid alignments as blue print to generate corresponding nucleotide alignments with a modified version of Pal2Nal v14 (Misof et al., 2014; Suyama et al., 2006). To check each amino acid alignment for ambiguously aligned regions, we ran ALISCORE v2.0 with the maximal number of possible sequence selected pairs to analyze (-r) (Kück et al., 2010; Misof et al., 2014; Misof & Misof, 2009). Sites which needed masking were cut out using ALICUT v2.3 (Kück, 2009) from the amino acid alignments and correspondingly also from the nucleotide alignments. For further analyses we only proceeded with the data set on nucleotide level.
Authors
- Flury, Jana M. ;
- Meusemann, Karen ;
- Martin, Sebastian ;
- Hilgers, Leon ;
- Spanke, Tobias ;
- Böhne, Astrid ;
- Herder, Fabian ;
- Mokodongan, Daniel F. ;
- Altmüller, Janine ;
- Wowor, Daisy ;
- Misof, Bernhard ;
- Nolte, Arne W. ;
- Schwarzer, Julia
Ortholog set We generated a reference set consisting of 8390 single-copy protein-coding genes derived from OrthoDB v.9.1 (Waterhouse et al., 2013) available for the following species: Austrofundulus limnaeus, Centrocoris variegatus, Fundulus heteroclitus, Kryptolebias marmoratus, Nothobranchius furzeri, Oryzias latipes, O. melastigma, Poecilia formosa, P. latipinna ,P. mexicana, P. reticulata and Xiphophorus maculatus (NCBI Accession numbers in Table S7). The hierarchical split was set to Actinopterygii (ID 7898). We used the script “make-ogs-corresponding.pl” to check for inconsistencies between the amino acid sequences and the corresponding nucleotide sequences and removed 96 problematic genes (Tab. S7). Identification of orthologs for transcripts and genome and alignment of single-copy genes Ortholog identification among 16 ricefish species and four outgroups (DS1, supplementary tables Tab. S1a) was carried out with Orthograph v0.7.1 (Petersen et al., 2017). Forward search for candidate transcript was left at default. Best reciprocal hit: Ortholog candidate genes needed at least one hit in either O. latipes or O. melastigma and we allowed concatenation of hits if they met the criteria and did not overlap. Max-blast-searches were set to 50, blast-max-hits were also set to 50. “U” in the amino acid sequences was changed to “X” to avoid issues in downstream analysis. The results of the orthology prediction were summarized for all species using a custom perl script coming with the orthograph package. Sequences of only those orthologs with all species present were aligned using MAFFT v7.221 with the L-INS-I algorithm on amino acid level (Katoh & Standley, 2013). 915 orthologs with outliers were identified according to Misof et al. 2014 and were subsequently removed from further analysis. We used the amino-acid alignments as blue print to generate corresponding nucleotide alignments with a modified version of Pal2Nal v14 (Misof et al., 2014; Suyama et al., 2006). To check each amino acid alignment for ambiguously aligned regions, we ran ALISCORE v2.0 with the maximal number of possible sequence selected pairs to analyze (-r) (Kück et al., 2010; Misof et al., 2014; Misof & Misof, 2009). Sites which needed masking were cut out using ALICUT v2.3 (Kück, 2009) from the amino acid alignments and correspondingly also from the nucleotide alignments. For further analyses we only proceeded with the data set on nucleotide level.
Authors
- Flury, Jana M. ;
- Meusemann, Karen ;
- Martin, Sebastian ;
- Hilgers, Leon ;
- Spanke, Tobias ;
- Böhne, Astrid ;
- Herder, Fabian ;
- Mokodongan, Daniel F. ;
- Altmüller, Janine ;
- Wowor, Daisy ;
- Misof, Bernhard ;
- Nolte, Arne W. ;
- Schwarzer, Julia
Identification and masking of repetitive elements in the genome sequence of O. dopingdopingensis was performed with the following bioinformatic tool case. Nucleotides were masked using the DUST algorithm with dustmasker (version 1.0.0, part of blast+ 2.9.0 (Altschul et al., 1990; Camacho et al., 2009) (Kuzio et al., unpublished but described in (Morgulis et al., 2006). Tandem Repeats were identified with Tandem Repeat Finder (trf version 4.09) (Benson, 1999). A species-specific de novo repeat library was built with RepeatModeler v1.0.11 (http://www.repeatmasker.org/RepeatModeler/). Repeat Elements were located in the genome sequence using RepeatMasker (version 4.1.0) (http://www.repeatmasker.org) with the de novo and Danio rerio libraries. The information from all four repeat analyses was merged and the genome was softmasked with bedtools (2.29.2) (Quinlan & Hall, 2010) PMID: 20110278; PMCID: PMC2832824.]. All steps of masking repetitive regions were performed with scripts provided by the sigenae platform, following the workflow from (Feron et al., 2020). For the identification of genes the masked genome was annotated with funannotate (Palmer & Stajich, 2019). The sequences were sorted by length with the ‘funannotate sort’ function, followed by a gene prediction with ‘funannotate predict’. No training based on RNA-Seq data was performed since it was not available for this species. Additional external evidence from transcripts and proteins was added. As transcript evidence, gene predictions from Oryzias latipes (NCBI Bioproject:PRJNA183868; Assembly: GCF_002234675.1) (Kasahara et al., 2007) and Oryzias melastigma (NCBI Bioproject: PRJNA401159 ; Assembly: ASM292280v2) (Kim et al., 2018) were used. As protein evidence, a protein set from Oryzias javanicus (NCBI Bioprject : PRJNA505405 ; Assembly: GCA_003999625.1) (Lee et al., 2020), manually annotated reference sequences from UniProt Knowledgebase (UniProtKB) (Release 2020_02 (22-Apr-2020) UniProtKB/Swiss-Prot with 562,253 entries ) (Apweiler et al., 2004) and a set of orthologous sequences generated in this study. Furthermore, the de novo gene predictors were trained with the Busco dataset of actinopterygii_odb10. Gene prediction resulted in a total of 56658 genes.
Authors
- Flury, Jana M. ;
- Meusemann, Karen ;
- Martin, Sebastian ;
- Hilgers, Leon ;
- Spanke, Tobias ;
- Böhne, Astrid ;
- Herder, Fabian ;
- Mokodongan, Daniel F. ;
- Altmüller, Janine ;
- Wowor, Daisy ;
- Misof, Bernhard ;
- Nolte, Arne W. ;
- Schwarzer, Julia
Identification and masking of repetitive elements in the genome sequence of O. dopingdopingensis was performed with the following bioinformatic tool case. Nucleotides were masked using the DUST algorithm with dustmasker (version 1.0.0, part of blast+ 2.9.0 (Altschul et al., 1990; Camacho et al., 2009) (Kuzio et al., unpublished but described in (Morgulis et al., 2006). Tandem Repeats were identified with Tandem Repeat Finder (trf version 4.09) (Benson, 1999). A species-specific de novo repeat library was built with RepeatModeler v1.0.11 (http://www.repeatmasker.org/RepeatModeler/). Repeat Elements were located in the genome sequence using RepeatMasker (version 4.1.0) (http://www.repeatmasker.org) with the de novo and Danio rerio libraries. The information from all four repeat analyses was merged and the genome was softmasked with bedtools (2.29.2) (Quinlan & Hall, 2010) PMID: 20110278; PMCID: PMC2832824.]. All steps of masking repetitive regions were performed with scripts provided by the sigenae platform, following the workflow from (Feron et al., 2020). For the identification of genes the masked genome was annotated with funannotate (Palmer & Stajich, 2019). The sequences were sorted by length with the ‘funannotate sort’ function, followed by a gene prediction with ‘funannotate predict’. No training based on RNA-Seq data was performed since it was not available for this species. Additional external evidence from transcripts and proteins was added. As transcript evidence, gene predictions from Oryzias latipes (NCBI Bioproject:PRJNA183868; Assembly: GCF_002234675.1) (Kasahara et al., 2007) and Oryzias melastigma (NCBI Bioproject: PRJNA401159 ; Assembly: ASM292280v2) (Kim et al., 2018) were used. As protein evidence, a protein set from Oryzias javanicus (NCBI Bioprject : PRJNA505405 ; Assembly: GCA_003999625.1) (Lee et al., 2020), manually annotated reference sequences from UniProt Knowledgebase (UniProtKB) (Release 2020_02 (22-Apr-2020) UniProtKB/Swiss-Prot with 562,253 entries ) (Apweiler et al., 2004) and a set of orthologous sequences generated in this study. Furthermore, the de novo gene predictors were trained with the Busco dataset of actinopterygii_odb10. Gene prediction resulted in a total of 56658 genes.
Authors
- Flury, Jana M. ;
- Meusemann, Karen ;
- Martin, Sebastian ;
- Hilgers, Leon ;
- Spanke, Tobias ;
- Böhne, Astrid ;
- Herder, Fabian ;
- Mokodongan, Daniel F. ;
- Altmüller, Janine ;
- Wowor, Daisy ;
- Misof, Bernhard ;
- Nolte, Arne W. ;
- Schwarzer, Julia