Automated Author ProfileSánchez-Cruz, Norberto
0000-0003-2707-3966
Sánchez-Cruz, Norberto
Current S-Index
Sum of Dataset Indices for all datasets
Average Dataset Index per Dataset
Average Dataset Index per dataset
Total Datasets
Total datasets for this author
Average FAIR Score
Average FAIR Score per dataset
Total Citations
Total citations to the author's datasets
Total Mentions
Total mentions of the author's datasets
S-Index Interpretation
The S-Index (Sharing Index) is a comprehensive metric that represents the cumulative impact of all your datasets. It is calculated as the sum of Dataset Index scores across all your claimed datasets.
What it means:
- A higher S-index indicates greater overall impact of your datasets relative to typical datasets in their fields of research
- The S-Index grows as you add more datasets or as existing datasets gain more citations and mentions
- It provides a single number to track your research data impact over time
Current S-Index: 12.9 (sum of 9 datasets Dataset Index scores)
More information here.
S-Index Over Time
Cumulative Citations Over Time
Cumulative Mentions Over Time
Datasets
Natural products and semi-synthetic compounds continue to be a significant source of drug candidates for a broad range of diseases, including the current pandemic caused by COVID-19. Besides being attractive sources of bioactive compounds for further development or optimization, natural products are excellent candidates of unique substructures for fragment-based drug discovery inspired on natural products. To this end, fragment libraries are required that can be incorporated into automated drug design pipelines. However, it is still scarce to have public fragment libraries based on extensive collections of natural products. Herein we report the generation and analysis of a fragment library of natural products derived from a database with more than 400,000 compounds. We also report fragment libraries of food chemical databases and other compound data sets of interest in drug discovery, including compound libraries relevant for COVID-19 drug discovery. The fragment libraries were characterized in terms of contents and diversity.
Sopporting information contains:
COCONUT_COMPOUNDS.csv, FooDB_COMPOUNDS.csv, DCM_COMPOUNDS.csv, CAS_COMPOUNDS.csv, 3CLP_COMPOUNDS.csv. All datasets contain the curated structures and the following information: identicator number (ID), simplified molecular input line entry system (Smiles), Average Molecular Weight (AMW), number of carbons, oxygens, nitrogens, heavy atoms, aliphatic rings, aromatic rings, heterocycles, bridgehead atoms, fraction of sp3 carbon atoms and chiral carbons, and a list of fragments generated from each compound. FRAGMENTS_COCONUT.csv, FRAGMENTS_FooDB.csv, FRAGMENTS_DCM.csv, FRAGMENTS_CAS.csv, FRAGMENTS_3CLP.csv. All libraries contain structures generated (Fragments) from each compound library (Dataset) and the following information: number of compounds that contain that fragment in a dataset (Count) and fraction of them (Proportion), average Molecular Weight (AMW), number of carbons, oxygens, nitrogens, heavy atoms, aliphatic rings, aromatic rings, heterocycles, bridgehead atoms, fraction of sp3 carbon atoms and chiral carbons.
Authors
- Chávez-Hernández, Ana Luisa ;
- MEDINA-FRANCO, JOSÉ LUIS ;
- Sánchez-Cruz, Norberto
COCONUT_Compounds.sdf, ChEMBL_Compounds.csv and REAL_Compounds.csv contain the curated structures of drug-like subsets from those major compound data sets. All files contain the following information for each compound: identification number (ID), simplified molecular input line entry system (Smiles), Average Molecular Weight (AMW), partition coefficient octanol/water (SlogP), number of hydrogen bond donors (HBD), number of hydrogen bond acceptors (HBA), number of rotatable bonds (RB), topological polar surface area (TPSA), fraction of sp3 carbons(FractionCSP3), fraction of chiral carbons (FractionCC), number of generated fragments (NFragments) and a list of the fragments obtained if any (LFragments).
COCONUT_Fragments.sdf, ChEMBL_Fragments.csv and REAL_Fragments.csv contain the structures generated from the respective compound data sets. All files include the following information for each fragment: identification number (ID), source collection (Data Set), simplified molecular input line entry system (Fragment), belonging to one (Unique) or the three data sets (Overlapped), number of compounds containing that fragment in the data set (Counts) and fraction of them (Proportion), fraction of sp3 carbons (FractionCSP3), fraction of chiral carbons (FractionCC), number of heavy atoms (NumHeavyAtoms), number of oxygen atoms (NumO), number of nitrogen atoms (NumN), number of bridgehead atoms (NumBridgeHead), number of spiro atoms (NumSpiro), number of rings (NumRings), number of aromatic rings (NumArRings), number of aliphatic rings (NumAlRings), number of heterocycles (NumHet), number of aromatic heterocycles (NumArHet) and number of aliphatic heterocycles (NumAlHet).
SB-DFPs.csv contains Statistical-Based Database Fingerprints for COCONUT and REAL data sets. The file includes the value for each bit for a Morgan fingerprint of radius 2 (1024-bits) according to RDKit algorithm as well as the empirical minimum and maximum Tanimoto similarity values used for scaling of the data (MinSimilarity and MaxSimilarity).
Authors
- MEDINA-FRANCO, JOSÉ LUIS ;
- Chávez-Hernández, Ana Luisa ;
- Sánchez-Cruz, Norberto
Natural products and semi-synthetic compounds continue to be a significant source of drug candidates for a broad range of diseases, including the current pandemic caused by COVID-19. Besides being attractive sources of bioactive compounds for further development or optimization, natural products are excellent candidates of unique substructures for fragment-based drug discovery inspired on natural products. To this end, fragment libraries are required that can be incorporated into automated drug design pipelines. However, it is still scarce to have public fragment libraries based on extensive collections of natural products. Herein we report the generation and analysis of a fragment library of natural products derived from a database with more than 400,000 compounds. We also report fragment libraries of food chemical databases and other compound data sets of interest in drug discovery, including compound libraries relevant for COVID-19 drug discovery. The fragment libraries were characterized in terms of contents and diversity.
Sopporting information contains:
COCONUT_COMPOUNDS.csv, FooDB_COMPOUNDS.csv, DCM_COMPOUNDS.csv, CAS_COMPOUNDS.csv, 3CLP_COMPOUNDS.csv. All datasets contain the curated structures and the following information: identicator number (ID), simplified molecular input line entry system (Smiles), Average Molecular Weight (AMW), number of carbons, oxygens, nitrogens, heavy atoms, aliphatic rings, aromatic rings, heterocycles, bridgehead atoms, fraction of sp3 carbon atoms and chiral carbons, and a list of fragments generated from each compound. FRAGMENTS_COCONUT.csv, FRAGMENTS_FooDB.csv, FRAGMENTS_DCM.csv, FRAGMENTS_CAS.csv, FRAGMENTS_3CLP.csv. All libraries contain structures generated (Fragments) from each compound library (Dataset) and the following information: number of compounds that contain that fragment in a dataset (Count) and fraction of them (Proportion), average Molecular Weight (AMW), number of carbons, oxygens, nitrogens, heavy atoms, aliphatic rings, aromatic rings, heterocycles, bridgehead atoms, fraction of sp3 carbon atoms and chiral carbons.
Authors
- Chávez-Hernández, Ana Luisa ;
- MEDINA-FRANCO, JOSÉ LUIS ;
- Sánchez-Cruz, Norberto
COCONUT_Compounds.sdf, ChEMBL_Compounds.csv and REAL_Compounds.csv contain the curated structures of drug-like subsets from those major compound data sets. All files contain the following information for each compound: identification number (ID), simplified molecular input line entry system (Smiles), Average Molecular Weight (AMW), partition coefficient octanol/water (SlogP), number of hydrogen bond donors (HBD), number of hydrogen bond acceptors (HBA), number of rotatable bonds (RB), topological polar surface area (TPSA), fraction of sp3 carbons(FractionCSP3), fraction of chiral carbons (FractionCC), number of generated fragments (NFragments) and a list of the fragments obtained if any (LFragments).
COCONUT_Fragments.sdf, ChEMBL_Fragments.csv and REAL_Fragments.csv contain the structures generated from the respective compound data sets. All files include the following information for each fragment: identification number (ID), source collection (Data Set), simplified molecular input line entry system (Fragment), belonging to one (Unique) or the three data sets (Overlapped), number of compounds containing that fragment in the data set (Counts) and fraction of them (Proportion), fraction of sp3 carbons (FractionCSP3), fraction of chiral carbons (FractionCC), number of heavy atoms (NumHeavyAtoms), number of oxygen atoms (NumO), number of nitrogen atoms (NumN), number of bridgehead atoms (NumBridgeHead), number of spiro atoms (NumSpiro), number of rings (NumRings), number of aromatic rings (NumArRings), number of aliphatic rings (NumAlRings), number of heterocycles (NumHet), number of aromatic heterocycles (NumArHet) and number of aliphatic heterocycles (NumAlHet).
SB-DFPs.csv contains Statistical-Based Database Fingerprints for COCONUT and REAL data sets. The file includes the value for each bit for a Morgan fingerprint of radius 2 (1024-bits) according to RDKit algorithm as well as the empirical minimum and maximum Tanimoto similarity values used for scaling of the data (MinSimilarity and MaxSimilarity).
Authors
- MEDINA-FRANCO, JOSÉ LUIS ;
- Chávez-Hernández, Ana Luisa ;
- Sánchez-Cruz, Norberto
COCONUT_Compounds.sdf, ChEMBL_Compounds.csv and REAL_Compounds.csv contain the curated structures of drug-like subsets from those major compound data sets. All files contain the following information for each compound: identification number (ID), simplified molecular input line entry system (Smiles), Average Molecular Weight (AMW), partition coefficient octanol/water (SlogP), number of hydrogen bond donors (HBD), number of hydrogen bond acceptors (HBA), number of rotatable bonds (RB), topological polar surface area (TPSA), fraction of sp3 carbons(FractionCSP3), fraction of chiral carbons (FractionCC), number of generated fragments (NFragments) and a list of the fragments obtained inf any (LFragments).
COCONUT_Fragments.sdf, ChEMBL_Fragments.csv and REAL_Fragments.csv contain the structures generated from the respective compound data sets. All files include the following information for each fragment: identification number (ID), source collection (Data Set), simplified molecular input line entry system (Fragment), uniqueness (Unique), number of compounds containing that fragment in the data set (Counts) and fraction of them (Proportion), fraction of sp3 carbons (FractionCSP3) and fraction of chiral carbons (FractionCC).FRAG8718983,COCONUT,*O,False,19145,0.1006894955795497,0.0,0.0
Authors
- MEDINA-FRANCO, JOSÉ LUIS ;
- Sánchez-Cruz, Norberto
COCONUT_Compounds.sdf, ChEMBL_Compounds.csv and REAL_Compounds.csv contain the curated structures of drug-like subsets from those major compound data sets. All files contain the following information for each compound: identification number (ID), simplified molecular input line entry system (Smiles), Average Molecular Weight (AMW), partition coefficient octanol/water (SlogP), number of hydrogen bond donors (HBD), number of hydrogen bond acceptors (HBA), number of rotatable bonds (RB), topological polar surface area (TPSA), fraction of sp3 carbons(FractionCSP3), fraction of chiral carbons (FractionCC), number of generated fragments (NFragments) and a list of the fragments obtained inf any (LFragments).
COCONUT_Fragments.sdf, ChEMBL_Fragments.csv and REAL_Fragments.csv contain the structures generated from the respective compound data sets. All files include the following information for each fragment: identification number (ID), source collection (Data Set), simplified molecular input line entry system (Fragment), uniqueness (Unique), number of compounds containing that fragment in the data set (Counts) and fraction of them (Proportion), fraction of sp3 carbons (FractionCSP3) and fraction of chiral carbons (FractionCC).
SB-DFPs contains Statistical-Based Database Fingerprints for COCONUT and REAL data sets. The file includes the value for each bit for a Morgan fingerprint of radius 2 (1024-bits) according to RDKit algorithm as well as the empirical minimum and maximum Tanimoto similarity values used for scaling of the data (MinSimilarity and MaxSimilarity).
Authors
- MEDINA-FRANCO, JOSÉ LUIS ;
- Sánchez-Cruz, Norberto
COCONUT_Compounds.sdf, ChEMBL_Compounds.csv and REAL_Compounds.csv contain the curated structures of drug-like subsets from those major compound data sets. All files contain the following information for each compound: identification number (ID), simplified molecular input line entry system (Smiles), Average Molecular Weight (AMW), partition coefficient octanol/water (SlogP), number of hydrogen bond donors (HBD), number of hydrogen bond acceptors (HBA), number of rotatable bonds (RB), topological polar surface area (TPSA), fraction of sp3 carbons(FractionCSP3), fraction of chiral carbons (FractionCC), number of generated fragments (NFragments) and a list of the fragments obtained if any (LFragments).
COCONUT_Fragments.sdf, ChEMBL_Fragments.csv and REAL_Fragments.csv contain the structures generated from the respective compound data sets. All files include the following information for each fragment: identification number (ID), source collection (Data Set), simplified molecular input line entry system (Fragment), uniqueness (Unique), number of compounds containing that fragment in the data set (Counts) and fraction of them (Proportion), fraction of sp3 carbons (FractionCSP3) and fraction of chiral carbons (FractionCC).
SB-DFPs.csv contains Statistical-Based Database Fingerprints for COCONUT and REAL data sets. The file includes the value for each bit for a Morgan fingerprint of radius 2 (1024-bits) according to RDKit algorithm as well as the empirical minimum and maximum Tanimoto similarity values used for scaling of the data (MinSimilarity and MaxSimilarity).
Authors
- MEDINA-FRANCO, JOSÉ LUIS ;
- Sánchez-Cruz, Norberto ;
- Chávez-Hernández, Ana Luisa
This file contains the chemical structures of 531 compounds in SDF format, along with the following information: identification number (ID), compound name, simplified molecular input line entry system (SMILES), reference (with the name of the journal, digital object identifier (DOI) number and publication year), kingdom (Plantae or Fungi), genus, species, geographical location of the collection of the natural product and the biological activity if any. Any commercial or free software capable of reading SDF files will open the data sets supplied.
Authors
- MEDINA-FRANCO, JOSÉ LUIS ;
- Sánchez-Cruz, Norberto ;
- Pilón-Jimenez, B. Angélica
This file contains the chemical structures of 531 compounds in SDF format, along with the following information: identification number (ID), compound name, simplified molecular input line entry system (SMILES), reference (with the name of the journal, digital object identifier (DOI) number and publication year), kingdom (Plantae or Fungi), genus, species, geographical location of the collection of the natural product and the biological activity if any. Any commercial or free software capable of reading SDF files will open the data sets supplied.
Authors
- MEDINA-FRANCO, JOSÉ LUIS ;
- Sánchez-Cruz, Norberto ;
- Pilón-Jimenez, B. Angélica