Gene Annotations of 49 Bacillariophyta Genome Assemblies
View DatasetDescription
Contact: [email protected]. ManuscriptThe data hosted here is associated with the preprint https://doi.org/10.48550/arXiv.2410.05467FilesThe following gff3-files with structural and functional genome annotation are included in the compressed archive Bacillariophyta_annotations.tar.gz: Asterionella_formosa.gff3Asterionellopsis_glacialis.gff3Bacterosira_constricta.gff3Chaetoceros_muellerii.gff3concatenated_output.gff3Conticribra_guillardii.gff3Conticribra_weissflogii.gff3Craspedostauros_australis.gff3Cyclostephanos_invisitatus.gff3Cyclostephanos_tholiformis.gff3Cyclotella_atomus.gff3Cyclotella_baltica.gff3Cyclotella_choctawhatcheeana.gff3Cyclotella_cryptica.gff3Cylindrotheca_fusiformis.gff3Detonula_confervacea.gff3Discostella_pseudostelligera.gff3Discostella_stelligera.gff3Discostella_stelligeroides.gff3Epithemia_pelagica.gff3Fistulifera_pelliculosa.gff3Fistulifera_solaris.gff3Fragilaria_radians.gff3Fragilariopsis_cylindrus.gff3Licmophora_abbreviata.gff3Mediolabrus_comicus.gff3Nitzschia_palea.gff3Nitzschia_putrida.gff3Porosira_glacialis.gff3Psammoneis_japonica.gff3Pseudo-nitzschia_multiseries.gff3Pseudo-nitzschia_pungens.gff3Skeletonema_costatum.gff3Skeletonema_marinoi.gff3Skeletonema_menzelii.gff3Skeletonema_potamos.gff3Skeletonema_tropicum.gff3Stephanocyclus_meneghinianus.gff3Stephanodiscus_minutulus.gff3Stephanodiscus_triporus.gff3Thalassiosira_allenii.gff3Thalassiosira_delicatula.gff3Thalassiosira_exigua.gff3Thalassiosira_gravida.gff3Thalassiosira_livingstoniorum.gff3Thalassiosira_mediterranea.gff3Thalassiosira_oceanica.gff3Thalassiosira_ordinaria.gff3Thalassiosira_pacifica.gff3Thalassiosira_profunda.gff3 To extract the dataset, execute the following command: tar -xvf Bacillariophyta_annotations.tar.gzGenome AssembliesThe files in this folder attain to genome assemblies are publicly available at NCBI datasets (https://www.ncbi.nlm.nih.gov/datasets/). We used the following versions: Asterionella formosa GCA_002256025.1Asterionellopsis glacialis GCA_014885115.2Bacterosira constricta GCA_037356235.1Chaetoceros muellerii GCA_019693545.1Conticribra guillardii GCA_036939335.1Conticribra weissflogii GCA_036940025.1Craspedostauros australis GCA_026770025.1Cyclostephanos invisitatus GCA_036939675.1Cyclostephanos tholiformis GCA_036939975.1Cyclotella atomus GCA_036939935.1Cyclotella baltica GCA_036939635.1Cyclotella choctawhatcheeana GCA_036939855.1Cyclotella cryptica GCA_013187285.1Cylindrotheca fusiformis GCA_019693525.1Detonula confervacea GCA_036939415.1Discostella pseudostelligera GCA_036940085.1Discostella stelligera GCA_036939735.1Discostella stelligeroides GCA_036939555.1Epithemia pelagica GCA_946965045.2Fistulifera pelliculosa GCA_026008555.1Fistulifera solaris GCA_030295235.1Fragilaria radians GCA_900642245.1Fragilariopsis cylindrus GCA_900095095.1Licmophora abbreviata GCA_900291995.1Mediolabrus comicus GCA_036940125.1Nitzschia palea GCA_019593585.1Nitzschia putrida GCA_016586335.1Porosira glacialis GCA_036939395.1Psammoneis japonica GCA_008632985.1Pseudo-nitzschia multiseries GCA_037355745.1Pseudo-nitzschia pungens GCA_037355855.1Skeletonema costatum GCA_018806925.1Skeletonema marinoi GCA_030544225.1Skeletonema menzelii GCA_036940005.1Skeletonema potamos GCA_036940105.1Skeletonema tropicum GCA_037178625.1Stephanocyclus meneghinianus GCA_036940045.1Stephanodiscus minutulus GCA_036939435.1Stephanodiscus triporus GCA_036939755.1Thalassiosira allenii GCA_036939655.1Thalassiosira delicatula GCA_036939835.1Thalassiosira exigua GCA_036939895.1Thalassiosira gravida GCA_037356215.1Thalassiosira livingstoniorum GCA_036939595.1Thalassiosira mediterranea GCA_036939795.1Thalassiosira oceanica GCA_019693575.1Thalassiosira ordinaria GCA_036939695.1Thalassiosira pacifica GCA_036939875.1Thalassiosira profunda GCA_036939355.1 Converting to Protein FASTA and Coding Sequences FASTA To save storage place at Zenodo, we did not upload the protein FASTA and coding sequence FASTA files. They can easily be generated from the genome FASTA file in combination with the respective GFF3 file. To do this, you can use the following commands: # assume that genome.fa ist you respective genome FASTA file downloaded from NCBI datasetssed '/^>/ s/ .*//' genome.fasta > genome_short_headers.fasta# assume that file.gff is the respective GFF3 filegetAnnoFastaFromJoingenes.py -g genome_short_headers.fasta -3 file.gff -o nameStem This will produce the following files: nameStem.aa (protein FASTA file) and nameStem.codingseq (coding sequence FASTA file). The getAnnoFastaFromJoingenes.py script is available at https://raw.githubusercontent.com/Gaius-Augustus/Augustus/master/scripts/getAnnoFastaFromJoingenes.py . It is part of the AUGUSTUS software package.Release notesThis release contains a gene set where a results of an OrthoFinder run that did not include genes on contigs that are suspected to be contaminants or horizontal gene transfer candidates were used to filter single exon genes. This means the gene and transcript counts changed compared to the previous release.LicenseThe genome annotation files are licensed under the Creative Commons Attribution 4.0 International License (CC BY 4.0). To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ or send a letter to Creative Commons, PO Box 1866, Mountain View, CA 94042, USA.
Citations (1)
- https://doi.org/10.1038/s41597-025-05306-zMDC OpenAlex
Cited on 11 June 2025
Weight: 1.00
Mentions (0)
No mentions found
Metrics Over Time
Publication Details
Subfield
Rheumatology
Field
Medicine
Domain
Health Sciences
Confidence Score
39%
Source
Scholar Data Model