Published on 05 September 2024 |

Version 7

Rule-based deconstruction and reconstruction of diterpene libraries: Categorizing foundational patterns & unravelling the structural landscape

View Dataset
Mathieu, Davis;Schlecht, Nicholas;van Aalst, Marvin;Shebek, Kevin;Busta, Luke;Babineau, Nicole;Ebenhöh, Oliver;Hamberger, Björn

Description

Terpenoids make up the largest class of specialized metabolites with over 180,000 reports currently across all kingdoms of life. Their synthesis accentuates one of natures most choreographed enzymatic and non-reversible chemistries, leading to an extensive range of structural functionality and diversity. Current terpenoid repositories provide a seemingly endless playground of information regarding structure, sourcing, and synthesis. Efforts here investigate entries for the 20-carbon diterpenoid variants and deconstruct the complex patterns into simple, categorical groups. This deconstruction approach reduces over 60,000 unique compound entries to less than 1,000 categorical structures. Furthermore, over 75% of all diversity can be represented by just 25 structures. Diterpenoid diversity was mapped at an atomic scale, across the total compound landscape, and distributed throughout the tree of life. Additionally, these core structures provide guidelines for predicting how this diversity first originates via the mechanisms catalyzed by diterpene synthases. Over 95% of diterpenoid structures rely on cyclization. Here a reconstructive approach is reapplied based on known biochemical rules to model the birth of compound diversity. This computational synthesis validates previously identified reaction products and pathways, as well as enables predicting trajectories for synthesizing real and theoretical compounds. This deconstructive and reconstructive approach applied to the diterpene landscape provides modular, flexible, and an easy-to-use toolset for categorically simplifying otherwise complex or hidden patterns.

Citations (1)

Mentions (0)

Metrics

Dataset Index

1.8

FAIR Score

69%

Citations

1

Mentions

0

Metrics Over Time

Publication Details

DOI

Publisher

Dryad

Assigned Domain

Subfield

Organic Chemistry

Field

Chemistry

Domain

Physical Sciences

Confidence Score

47%

Source

Scholar Data Model

Keywords

FOS: Computer and information sciencesBioinformaticsSecondary metabolitesTerpenesComputational chemistry

Normalization Factors

FT

15.38

CTw

1.00

MTw

1.00