Scholar Data

Supporting data for "The probability of edge existence due to node degree: a baseline for network-based predictions"

Important tasks in biomedical discovery such as predicting gene functions, gene-disease associations, and drug repurposing opportunities are often framed as network edge prediction. The number of edges connecting to a node, termed degree, can vary greatly across nodes in real biomedical networks, and the distribution of degrees varies between networks. If degree strongly influences edge prediction, then imbalance or bias in the distribution of degrees could lead to nonspecific or misleading predictions. We introduce a network permutation framework to quantify the effects of node degree on edge prediction. Our framework decomposes performance into the proportions attributable to degree and the networks specific connections. We discover that performance attributable to factors other than degree is often only a small portion of overall performance. Degrees predictive performance diminishes when the networks used for training and testingdespite measuring the same biological relationshipswere generated using distinct techniques and hence have large differences in degree distribution. We introduce the permutation-derived edge prior as the probability that an edge exists based only on degree. The edge prior shows excellent discrimination and calibration for 20 biomedical networks (16 bipartite, 3 undirected, 1 directed), with AUROCs frequently exceeding 0.85. Researchers seeking to predict new or missing edges in biological networks should use the edge prior as a baseline to identify the fraction of performance that is nonspecific because of degree. We released our methods as an open-source Python package (https://github.com/hetio/xswap/).

Authors

Zietz, Michael ;
Himmelstein, Daniel, S ;
Kloster, Kyle ;
Williams, Christopher ;
Nagle, Michael, W ;
Greene, Casey, S

1 Citation0 Mentions31% FAIR1.2 Dataset Index

10.5524/1024792023

Supporting data for "Hetnet connectivity search provides rapid insights into how two biomedical entities are related"

Hetnets, short for heterogeneous networks, contain multiple node and relationship types and offer a way to encode biomedical knowledge. One such example, Hetionet connects 11 types of nodes - including genes, diseases, drugs, pathways, and anatomical structures - with over 2 million edges of 24 types. Previous work has demonstrated that supervised machine learning methods applied to such networks can identify drug repurposing opportunities. However, a training set of known relationships does not exist for many types of node pairs, even when it would be useful to examine how nodes of those types are meaningfully connected. For example, users may be curious not only how metformin is related to breast cancer, but also how the GJA1 gene might be involved in insomnia. We developed a new procedure, termed hetnet connectivity search, that proposes important paths between any two nodes without requiring a supervised gold standard. The algorithm behind connectivity search identifies types of paths that occur more frequently than would be expected by chance (based on node degree alone). We find that predictions are broadly similar to those from previously described supervised approaches for certain node type pairs. Scoring of individual paths is based on the most specific paths of a given type. Several optimizations were required to precompute significant instances of node connectivity at the scale of large knowledge graphs. We implemented the method on Hetionet and provide an online interface at https://het.io/search. We provide an open source implementation of these methods in our new Python package named hetmatpy.

Authors

Himmelstein, Daniel, S ;
Zietz, Michael ;
Rubinetti, Vincent ;
Kloster, Kyle ;
Heil, Benjamin, J ;
Alquaddoomi, Faisal ;
Hu, Dongbo ;
Nicholson, David, N ;
Hao, Yun ;
Sullivan, Blair, D ;
Nagle, Michael, W ;
Greene, Casey, S

1 Citation0 Mentions31% FAIR0.7 Dataset Index

10.5524/1023892023

Additional file 2 of Expanding a database-derived biomedical knowledge graph via multi-relation extraction from biomedical abstracts

Additional file 2.

Authors

Nicholson, David N. ;
Himmelstein, Daniel S. ;
Greene, Casey S.

0 Citations0 Mentions85% FAIR0.1 Dataset Index

10.6084/m9.figshare.213580892022

Additional file 2 of Expanding a database-derived biomedical knowledge graph via multi-relation extraction from biomedical abstracts

Additional file 2.

Authors

Nicholson, David N. ;
Himmelstein, Daniel S. ;
Greene, Casey S.

0 Citations0 Mentions85% FAIR0.1 Dataset Index

10.6084/m9.figshare.21358089.v12022

Additional file 3 of Compressing gene expression data using multiple latent space dimensionalities learns complementary biological representations

Neutrophil and Monocyte Gene Sets. Entrez gene IDs and gene symbols for two xCell gene signatures (Neutrophil_HPCA_2 and Monocyte_FANTOM_2). Associated with Fig. 6.

Authors

Way, Gregory P. ;
Zietz, Michael ;
Rubinetti, Vincent ;
Himmelstein, Daniel S. ;
Greene, Casey S.

1 Citation0 Mentions48% FAIR0.9 Dataset Index

10.6084/m9.figshare.122855062020

Additional file 4 of Compressing gene expression data using multiple latent space dimensionalities learns complementary biological representations

Model coefficients for predicting TP53 loss of function. Using all compressed features in the model implicates compressed features with cancer hallmark signatures. Associated with Fig. 7.

Authors

Way, Gregory P. ;
Zietz, Michael ;
Rubinetti, Vincent ;
Himmelstein, Daniel S. ;
Greene, Casey S.

1 Citation0 Mentions48% FAIR0.9 Dataset Index

10.6084/m9.figshare.12285512.v12020

Additional file 3 of Compressing gene expression data using multiple latent space dimensionalities learns complementary biological representations

Neutrophil and Monocyte Gene Sets. Entrez gene IDs and gene symbols for two xCell gene signatures (Neutrophil_HPCA_2 and Monocyte_FANTOM_2). Associated with Fig. 6.

Authors

Way, Gregory P. ;
Zietz, Michael ;
Rubinetti, Vincent ;
Himmelstein, Daniel S. ;
Greene, Casey S.

1 Citation0 Mentions85% FAIR0.5 Dataset Index

10.6084/m9.figshare.12285506.v12020

Additional file 6 of Compressing gene expression data using multiple latent space dimensionalities learns complementary biological representations

Hetnetpy metaedge summary. Network summary of edge and node counts for each gene set collection.

Authors

Way, Gregory P. ;
Zietz, Michael ;
Rubinetti, Vincent ;
Himmelstein, Daniel S. ;
Greene, Casey S.

1 Citation0 Mentions85% FAIR0.5 Dataset Index

10.6084/m9.figshare.12285518.v12020

Additional file 6 of Compressing gene expression data using multiple latent space dimensionalities learns complementary biological representations

Hetnetpy metaedge summary. Network summary of edge and node counts for each gene set collection.

Authors

Way, Gregory P. ;
Zietz, Michael ;
Rubinetti, Vincent ;
Himmelstein, Daniel S. ;
Greene, Casey S.

1 Citation0 Mentions48% FAIR0.9 Dataset Index

10.6084/m9.figshare.122855182020

Additional file 5 of Compressing gene expression data using multiple latent space dimensionalities learns complementary biological representations

Tissue types and counts for TARGET, TCGA, and GTEx.

Authors

Way, Gregory P. ;
Zietz, Michael ;
Rubinetti, Vincent ;
Himmelstein, Daniel S. ;
Greene, Casey S.

1 Citation0 Mentions56% FAIR1.7 Dataset Index

10.6084/m9.figshare.12285515.v12020

Automated Author Profile
Himmelstein, Daniel
0000-0002-3012-7446

Himmelstein, Daniel

Current S-Index

Average Dataset Index per Dataset

Total Datasets

Average FAIR Score

Total Citations

Total Mentions

S-Index Interpretation

S-Index Over Time

Cumulative Citations Over Time

Cumulative Mentions Over Time

Datasets

Supporting data for "The probability of edge existence due to node degree: a baseline for network-based predictions"

Supporting data for "Hetnet connectivity search provides rapid insights into how two biomedical entities are related"

Additional file 2 of Expanding a database-derived biomedical knowledge graph via multi-relation extraction from biomedical abstracts

Additional file 2 of Expanding a database-derived biomedical knowledge graph via multi-relation extraction from biomedical abstracts

Additional file 3 of Compressing gene expression data using multiple latent space dimensionalities learns complementary biological representations

Additional file 4 of Compressing gene expression data using multiple latent space dimensionalities learns complementary biological representations

Additional file 3 of Compressing gene expression data using multiple latent space dimensionalities learns complementary biological representations

Additional file 6 of Compressing gene expression data using multiple latent space dimensionalities learns complementary biological representations

Additional file 6 of Compressing gene expression data using multiple latent space dimensionalities learns complementary biological representations

Additional file 5 of Compressing gene expression data using multiple latent space dimensionalities learns complementary biological representations

Automated Author ProfileHimmelstein, Daniel0000-0002-3012-7446

Himmelstein, Daniel

Current S-Index

Average Dataset Index per Dataset

Total Datasets

Average FAIR Score

Total Citations

Total Mentions

S-Index Interpretation

S-Index Over Time

Cumulative Citations Over Time

Cumulative Mentions Over Time

Datasets

Supporting data for "The probability of edge existence due to node degree: a baseline for network-based predictions"

Supporting data for "Hetnet connectivity search provides rapid insights into how two biomedical entities are related"

Additional file 2 of Expanding a database-derived biomedical knowledge graph via multi-relation extraction from biomedical abstracts

Additional file 2 of Expanding a database-derived biomedical knowledge graph via multi-relation extraction from biomedical abstracts

Additional file 3 of Compressing gene expression data using multiple latent space dimensionalities learns complementary biological representations

Additional file 4 of Compressing gene expression data using multiple latent space dimensionalities learns complementary biological representations

Additional file 3 of Compressing gene expression data using multiple latent space dimensionalities learns complementary biological representations

Additional file 6 of Compressing gene expression data using multiple latent space dimensionalities learns complementary biological representations

Additional file 6 of Compressing gene expression data using multiple latent space dimensionalities learns complementary biological representations

Additional file 5 of Compressing gene expression data using multiple latent space dimensionalities learns complementary biological representations

Automated Author Profile
Himmelstein, Daniel
0000-0002-3012-7446