Version 1.0.0

Gene Annotations of 49 Bacillariophyta Genome Assemblies

View Dataset
Nenasheva, Natalia;Pitzschel, Clara;Cynthia, Webster;Hoff, Katharina;Hart, Alex;Wegrzyn, Jill;Bengtsson, Mia

Description

Genome Annotation Files of 49 Bacillariophyta Genome AssembliesThe genome annotation files (in gff3 format) in this folder attain to the manuscript "Annotation of protein-coding genes in 49 diatom genomes (Bacillariophyta clade)" by the authors Natalia Nenasheva, Clara Pitzschel, Cynthia Webster, Alex Hart, Jill Wegrzyn, Mia M. Bengtsson, and Katharina J. Hoff. In case of questions on this data set, please contact [email protected] following gff3-files with structural and functional genome annotation are included in the compressed archive Bacillariophyta_annotations.tar.gz:Asterionella_formosa_ncbi.gff Cylindrotheca_fusiformis_ncbi.gff Nitzschia_putrida_ncbi.gff Thalassiosira_allenii_ncbi.gffAsterionellopsis_glacialis_ncbi.gff Detonula_confervacea_ncbi.gff Porosira_glacialis_ncbi.gff Thalassiosira_delicatula_ncbi.gffBacterosira_constricta_ncbi.gff Discostella_pseudostelligera_ncbi.gff Psammoneis_japonica_ncbi.gff Thalassiosira_exigua_ncbi.gffChaetoceros_muellerii_ncbi.gff Discostella_stelligera_ncbi.gff Pseudo-nitzschia_multiseries_ncbi.gff Thalassiosira_gravida_ncbi.gffConticribra_guillardii_ncbi.gff Discostella_stelligeroides_ncbi.gff Pseudo-nitzschia_pungens_ncbi.gff Thalassiosira_livingstoniorum_ncbi.gffConticribra_weissflogii_ncbi.gff Epithemia_pelagica_ncbi.gff Skeletonema_costatum_ncbi.gff Thalassiosira_mediterranea_ncbi.gffCraspedostauros_australis_ncbi.gff Fistulifera_pelliculosa_ncbi.gff Skeletonema_marinoi_ncbi.gff Thalassiosira_oceanica_ncbi.gffCyclostephanos_invisitatus_ncbi.gff Fistulifera_solaris_ncbi.gff Skeletonema_menzelii_ncbi.gff Thalassiosira_ordinaria_ncbi.gffCyclostephanos_tholiformis_ncbi.gff Fragilaria_radians_ncbi.gff Skeletonema_potamos_ncbi.gff Thalassiosira_pacifica_ncbi.gffCyclotella_atomus_ncbi.gff Fragilariopsis_cylindrus_ncbi.gff Skeletonema_tropicum_ncbi.gff Thalassiosira_profunda_ncbi.gffCyclotella_baltica_ncbi.gff Licmophora_abbreviata_ncbi.gff Stephanocyclus_meneghinianus_ncbi.gffCyclotella_choctawhatcheeana_ncbi.gff Mediolabrus_comicus_ncbi.gff Stephanodiscus_minutulus_ncbi.gffCyclotella_cryptica_ncbi.gff Nitzschia_palea_ncbi.gff Stephanodiscus_triporus_ncbi.gff To extract the dataset, execute the following command: tar -xvf Bacillariophyta_annotations.tar.gzGenome AssembliesThe files in this folder attain to genome assemblies are publicly available at NCBI datasets (https://www.ncbi.nlm.nih.gov/datasets/). We used the following versions:Asterionella formosa GCA_002256025.1Asterionellopsis glacialis GCA_014885115.2Bacterosira constricta GCA_037356235.1Chaetoceros muellerii GCA_019693545.1Conticribra guillardii GCA_036939335.1Conticribra weissflogii GCA_036940025.1Craspedostauros australis GCA_026770025.1Cyclostephanos invisitatus GCA_036939675.1Cyclostephanos tholiformis GCA_036939975.1Cyclotella atomus GCA_036939935.1Cyclotella baltica GCA_036939635.1Cyclotella choctawhatcheeana GCA_036939855.1Cyclotella cryptica GCA_013187285.1Cylindrotheca fusiformis GCA_019693525.1Detonula confervacea GCA_036939415.1Discostella pseudostelligera GCA_036940085.1Discostella stelligera GCA_036939735.1Discostella stelligeroides GCA_036939555.1Epithemia pelagica GCA_946965045.2Fistulifera pelliculosa GCA_026008555.1Fistulifera solaris GCA_030295235.1Fragilaria radians GCA_900642245.1Fragilariopsis cylindrus GCA_900095095.1Licmophora abbreviata GCA_900291995.1Mediolabrus comicus GCA_036940125.1Nitzschia palea GCA_019593585.1Nitzschia putrida GCA_016586335.1Porosira glacialis GCA_036939395.1Psammoneis japonica GCA_008632985.1Pseudo-nitzschia multiseries GCA_037355745.1Pseudo-nitzschia pungens GCA_037355855.1Skeletonema costatum GCA_018806925.1Skeletonema marinoi GCA_030544225.1Skeletonema menzelii GCA_036940005.1Skeletonema potamos GCA_036940105.1Skeletonema tropicum GCA_037178625.1Stephanocyclus meneghinianus GCA_036940045.1Stephanodiscus minutulus GCA_036939435.1Stephanodiscus triporus GCA_036939755.1Thalassiosira allenii GCA_036939655.1Thalassiosira delicatula GCA_036939835.1Thalassiosira exigua GCA_036939895.1Thalassiosira gravida GCA_037356215.1Thalassiosira livingstoniorum GCA_036939595.1Thalassiosira mediterranea GCA_036939795.1Thalassiosira oceanica GCA_019693575.1Thalassiosira ordinaria GCA_036939695.1Thalassiosira pacifica GCA_036939875.1Thalassiosira profunda GCA_036939355.1Converting to Protein FASTA and Coding Sequences FASTATo save storage place at Zenodo, we did not upload the protein FASTA and coding sequence FASTA files. They can easily be generated from the genome FASTA file in combination with the respective GFF3 file. To do this, you can use the following commands:# assume that genome.fa ist you respective genome FASTA file downloaded from NCBI datasetssed '/^>/ s/ .*//' genome.fasta > genome_short_headers.fasta# assume that file.gff is the respective GFF3 filegetAnnoFastaFromJoingenes.py -g genome_short_headers.fasta -3 file.gff -o nameStemThis will produce the following files: nameStem.aa (protein FASTA file) and nameStem.codingseq (coding sequence FASTA file).The getAnnoFastaFromJoingenes.py script is available at https://raw.githubusercontent.com/Gaius-Augustus/Augustus/master/scripts/getAnnoFastaFromJoingenes.py . It is part of the AUGUSTUS software package.LicenseThe genome annotation files are licensed under the Creative Commons Attribution 4.0 International License (CC BY 4.0). To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ or send a letter to Creative Commons, PO Box 1866, Mountain View, CA 94042, USA.

Citations (0)

Mentions (0)

Metrics

Dataset Index

0.3

FAIR Score

79%

Citations

0

Mentions

0

Metrics Over Time

Publication Details

DOI

Publisher

Zenodo

Assigned Domain

Subfield

Rheumatology

Field

Medicine

Domain

Health Sciences

Confidence Score

46%

Source

Scholar Data Model

Keywords

genome annotationgenesfunctional annotationdiatoms

Normalization Factors

FT

13.46

CTw

1.00

MTw

1.00