Sánchez-Cruz, Norberto

Fragment Library of Natural Products and Compound Databases for Drug Discovery

Natural products and semi-synthetic compounds continue to be a significant source of drug candidates for a broad range of diseases, including the current pandemic caused by COVID-19. Besides being attractive sources of bioactive compounds for further development or optimization, natural products are excellent candidates of unique substructures for fragment-based drug discovery inspired on natural products. To this end, fragment libraries are required that can be incorporated into automated drug design pipelines. However, it is still scarce to have public fragment libraries based on extensive collections of natural products. Herein we report the generation and analysis of a fragment library of natural products derived from a database with more than 400,000 compounds. We also report fragment libraries of food chemical databases and other compound data sets of interest in drug discovery, including compound libraries relevant for COVID-19 drug discovery. The fragment libraries were characterized in terms of contents and diversity.

Sopporting information contains:
COCONUT_COMPOUNDS.csv, FooDB_COMPOUNDS.csv, DCM_COMPOUNDS.csv, CAS_COMPOUNDS.csv, 3CLP_COMPOUNDS.csv. All datasets contain the curated structures and the following information: identicator number (ID), simplified molecular input line entry system (Smiles), Average Molecular Weight (AMW), number of carbons, oxygens, nitrogens, heavy atoms, aliphatic rings, aromatic rings, heterocycles, bridgehead atoms, fraction of sp3 carbon atoms and chiral carbons, and a list of fragments generated from each compound. FRAGMENTS_COCONUT.csv, FRAGMENTS_FooDB.csv, FRAGMENTS_DCM.csv, FRAGMENTS_CAS.csv, FRAGMENTS_3CLP.csv. All libraries contain structures generated (Fragments) from each compound library (Dataset) and the following information: number of compounds that contain that fragment in a dataset (Count) and fraction of them (Proportion), average Molecular Weight (AMW), number of carbons, oxygens, nitrogens, heavy atoms, aliphatic rings, aromatic rings, heterocycles, bridgehead atoms, fraction of sp3 carbon atoms and chiral carbons.

Authors

Chávez-Hernández, Ana Luisa ;
MEDINA-FRANCO, JOSÉ LUIS ;
Sánchez-Cruz, Norberto

2 Citations0 Mentions85% FAIR2.8 Dataset Index

10.6084/m9.figshare.13064231.v12020

Supporting Information for "A Fragment Library of Natural Products and its Comparative Chemoinformatic Characterization"

COCONUT_Compounds.sdf, ChEMBL_Compounds.csv and REAL_Compounds.csv contain the curated structures of drug-like subsets from those major compound data sets. All files contain the following information for each compound: identification number (ID), simplified molecular input line entry system (Smiles), Average Molecular Weight (AMW), partition coefficient octanol/water (SlogP), number of hydrogen bond donors (HBD), number of hydrogen bond acceptors (HBA), number of rotatable bonds (RB), topological polar surface area (TPSA), fraction of sp3 carbons(FractionCSP3), fraction of chiral carbons (FractionCC), number of generated fragments (NFragments) and a list of the fragments obtained if any (LFragments).
COCONUT_Fragments.sdf, ChEMBL_Fragments.csv and REAL_Fragments.csv contain the structures generated from the respective compound data sets. All files include the following information for each fragment: identification number (ID), source collection (Data Set), simplified molecular input line entry system (Fragment), belonging to one (Unique) or the three data sets (Overlapped), number of compounds containing that fragment in the data set (Counts) and fraction of them (Proportion), fraction of sp3 carbons (FractionCSP3), fraction of chiral carbons (FractionCC), number of heavy atoms (NumHeavyAtoms), number of oxygen atoms (NumO), number of nitrogen atoms (NumN), number of bridgehead atoms (NumBridgeHead), number of spiro atoms (NumSpiro), number of rings (NumRings), number of aromatic rings (NumArRings), number of aliphatic rings (NumAlRings), number of heterocycles (NumHet), number of aromatic heterocycles (NumArHet) and number of aliphatic heterocycles (NumAlHet).
SB-DFPs.csv contains Statistical-Based Database Fingerprints for COCONUT and REAL data sets. The file includes the value for each bit for a Morgan fingerprint of radius 2 (1024-bits) according to RDKit algorithm as well as the empirical minimum and maximum Tanimoto similarity values used for scaling of the data (MinSimilarity and MaxSimilarity).

Authors

MEDINA-FRANCO, JOSÉ LUIS ;
Chávez-Hernández, Ana Luisa ;
Sánchez-Cruz, Norberto

0 Citations0 Mentions85% FAIR0.3 Dataset Index

10.6084/m9.figshare.11997951.v42020

Fragment Library of Natural Products and Compound Databases for Drug Discovery

Authors

Chávez-Hernández, Ana Luisa ;
MEDINA-FRANCO, JOSÉ LUIS ;
Sánchez-Cruz, Norberto

0 Citations0 Mentions85% FAIR1.8 Dataset Index

10.6084/m9.figshare.130642312020

Supporting Information for "A Fragment Library of Natural Products and its Comparative Chemoinformatic Characterization"

Authors

MEDINA-FRANCO, JOSÉ LUIS ;
Chávez-Hernández, Ana Luisa ;
Sánchez-Cruz, Norberto

0 Citations0 Mentions85% FAIR0.3 Dataset Index

10.6084/m9.figshare.119979512020

Supporting Information for "A Fragment Library of Natural Products and its Comparative Chemoinformatic Characterization"

Authors

MEDINA-FRANCO, JOSÉ LUIS ;
Sánchez-Cruz, Norberto

0 Citations0 Mentions85% FAIR1.8 Dataset Index

10.6084/m9.figshare.11997951.v12020

Supporting Information for "A Fragment Library of Natural Products and its Comparative Chemoinformatic Characterization"

COCONUT_Compounds.sdf, ChEMBL_Compounds.csv and REAL_Compounds.csv contain the curated structures of drug-like subsets from those major compound data sets. All files contain the following information for each compound: identification number (ID), simplified molecular input line entry system (Smiles), Average Molecular Weight (AMW), partition coefficient octanol/water (SlogP), number of hydrogen bond donors (HBD), number of hydrogen bond acceptors (HBA), number of rotatable bonds (RB), topological polar surface area (TPSA), fraction of sp3 carbons(FractionCSP3), fraction of chiral carbons (FractionCC), number of generated fragments (NFragments) and a list of the fragments obtained inf any (LFragments).
COCONUT_Fragments.sdf, ChEMBL_Fragments.csv and REAL_Fragments.csv contain the structures generated from the respective compound data sets. All files include the following information for each fragment: identification number (ID), source collection (Data Set), simplified molecular input line entry system (Fragment), uniqueness (Unique), number of compounds containing that fragment in the data set (Counts) and fraction of them (Proportion), fraction of sp3 carbons (FractionCSP3) and fraction of chiral carbons (FractionCC).
SB-DFPs contains Statistical-Based Database Fingerprints for COCONUT and REAL data sets. The file includes the value for each bit for a Morgan fingerprint of radius 2 (1024-bits) according to RDKit algorithm as well as the empirical minimum and maximum Tanimoto similarity values used for scaling of the data (MinSimilarity and MaxSimilarity).

Authors

MEDINA-FRANCO, JOSÉ LUIS ;
Sánchez-Cruz, Norberto

0 Citations0 Mentions85% FAIR0.3 Dataset Index

10.6084/m9.figshare.11997951.v22020

Supporting Information for "A Fragment Library of Natural Products and its Comparative Chemoinformatic Characterization"

COCONUT_Compounds.sdf, ChEMBL_Compounds.csv and REAL_Compounds.csv contain the curated structures of drug-like subsets from those major compound data sets. All files contain the following information for each compound: identification number (ID), simplified molecular input line entry system (Smiles), Average Molecular Weight (AMW), partition coefficient octanol/water (SlogP), number of hydrogen bond donors (HBD), number of hydrogen bond acceptors (HBA), number of rotatable bonds (RB), topological polar surface area (TPSA), fraction of sp3 carbons(FractionCSP3), fraction of chiral carbons (FractionCC), number of generated fragments (NFragments) and a list of the fragments obtained if any (LFragments).
COCONUT_Fragments.sdf, ChEMBL_Fragments.csv and REAL_Fragments.csv contain the structures generated from the respective compound data sets. All files include the following information for each fragment: identification number (ID), source collection (Data Set), simplified molecular input line entry system (Fragment), uniqueness (Unique), number of compounds containing that fragment in the data set (Counts) and fraction of them (Proportion), fraction of sp3 carbons (FractionCSP3) and fraction of chiral carbons (FractionCC).
SB-DFPs.csv contains Statistical-Based Database Fingerprints for COCONUT and REAL data sets. The file includes the value for each bit for a Morgan fingerprint of radius 2 (1024-bits) according to RDKit algorithm as well as the empirical minimum and maximum Tanimoto similarity values used for scaling of the data (MinSimilarity and MaxSimilarity).

Authors

MEDINA-FRANCO, JOSÉ LUIS ;
Sánchez-Cruz, Norberto ;
Chávez-Hernández, Ana Luisa

0 Citations0 Mentions15% FAIR0.3 Dataset Index

10.6084/m9.figshare.11997951.v32020

BIOFACQUIM_V2.sdf

This file contains the chemical structures of 531 compounds in SDF format, along with the following information: identification number (ID), compound name, simplified molecular input line entry system (SMILES), reference (with the name of the journal, digital object identifier (DOI) number and publication year), kingdom (Plantae or Fungi), genus, species, geographical location of the collection of the natural product and the biological activity if any. Any commercial or free software capable of reading SDF files will open the data sets supplied.

Authors

MEDINA-FRANCO, JOSÉ LUIS ;
Sánchez-Cruz, Norberto ;
Pilón-Jimenez, B. Angélica

1 Citation0 Mentions85% FAIR2.6 Dataset Index

10.6084/m9.figshare.11312702.v12019

BIOFACQUIM_V2.sdf

Authors

MEDINA-FRANCO, JOSÉ LUIS ;
Sánchez-Cruz, Norberto ;
Pilón-Jimenez, B. Angélica

3 Citations0 Mentions85% FAIR3.2 Dataset Index

10.6084/m9.figshare.113127022019

Automated Author Profile
Sánchez-Cruz, Norberto
0000-0003-2707-3966

Sánchez-Cruz, Norberto

Current S-Index

Average Dataset Index per Dataset

Total Datasets

Average FAIR Score

Total Citations

Total Mentions

S-Index Interpretation

S-Index Over Time

Cumulative Citations Over Time

Cumulative Mentions Over Time

Datasets

Fragment Library of Natural Products and Compound Databases for Drug Discovery

Supporting Information for "A Fragment Library of Natural Products and its Comparative Chemoinformatic Characterization"

Fragment Library of Natural Products and Compound Databases for Drug Discovery

Supporting Information for "A Fragment Library of Natural Products and its Comparative Chemoinformatic Characterization"

Supporting Information for "A Fragment Library of Natural Products and its Comparative Chemoinformatic Characterization"

Supporting Information for "A Fragment Library of Natural Products and its Comparative Chemoinformatic Characterization"

Supporting Information for "A Fragment Library of Natural Products and its Comparative Chemoinformatic Characterization"

BIOFACQUIM_V2.sdf

BIOFACQUIM_V2.sdf

Automated Author ProfileSánchez-Cruz, Norberto0000-0003-2707-3966

Sánchez-Cruz, Norberto

Current S-Index

Average Dataset Index per Dataset

Total Datasets

Average FAIR Score

Total Citations

Total Mentions

S-Index Interpretation

S-Index Over Time

Cumulative Citations Over Time

Cumulative Mentions Over Time

Datasets

Fragment Library of Natural Products and Compound Databases for Drug Discovery

Supporting Information for "A Fragment Library of Natural Products and its Comparative Chemoinformatic Characterization"

Fragment Library of Natural Products and Compound Databases for Drug Discovery

Supporting Information for "A Fragment Library of Natural Products and its Comparative Chemoinformatic Characterization"

Supporting Information for "A Fragment Library of Natural Products and its Comparative Chemoinformatic Characterization"

Supporting Information for "A Fragment Library of Natural Products and its Comparative Chemoinformatic Characterization"

Supporting Information for "A Fragment Library of Natural Products and its Comparative Chemoinformatic Characterization"

BIOFACQUIM_V2.sdf

BIOFACQUIM_V2.sdf

Automated Author Profile
Sánchez-Cruz, Norberto
0000-0003-2707-3966