Automated Author ProfileHimmelstein, Daniel
0000-0002-3012-7446
Himmelstein, Daniel
Current S-Index
Sum of Dataset Indices for all datasets
Average Dataset Index per Dataset
Average Dataset Index per dataset
Total Datasets
Total datasets for this author
Average FAIR Score
Average FAIR Score per dataset
Total Citations
Total citations to the author's datasets
Total Mentions
Total mentions of the author's datasets
S-Index Interpretation
The S-Index (Sharing Index) is a comprehensive metric that represents the cumulative impact of all your datasets. It is calculated as the sum of Dataset Index scores across all your claimed datasets.
What it means:
- A higher S-index indicates greater overall impact of your datasets relative to typical datasets in their fields of research
- The S-Index grows as you add more datasets or as existing datasets gain more citations and mentions
- It provides a single number to track your research data impact over time
Current S-Index: 69.2 (sum of 43 datasets Dataset Index scores)
More information here.
S-Index Over Time
Cumulative Citations Over Time
Cumulative Mentions Over Time
Datasets
Important tasks in biomedical discovery such as predicting gene functions, gene-disease associations, and drug repurposing opportunities are often framed as network edge prediction. The number of edges connecting to a node, termed degree, can vary greatly across nodes in real biomedical networks, and the distribution of degrees varies between networks. If degree strongly influences edge prediction, then imbalance or bias in the distribution of degrees could lead to nonspecific or misleading predictions. We introduce a network permutation framework to quantify the effects of node degree on edge prediction. Our framework decomposes performance into the proportions attributable to degree and the networks specific connections. We discover that performance attributable to factors other than degree is often only a small portion of overall performance. Degrees predictive performance diminishes when the networks used for training and testingdespite measuring the same biological relationshipswere generated using distinct techniques and hence have large differences in degree distribution. We introduce the permutation-derived edge prior as the probability that an edge exists based only on degree. The edge prior shows excellent discrimination and calibration for 20 biomedical networks (16 bipartite, 3 undirected, 1 directed), with AUROCs frequently exceeding 0.85. Researchers seeking to predict new or missing edges in biological networks should use the edge prior as a baseline to identify the fraction of performance that is nonspecific because of degree. We released our methods as an open-source Python package (https://github.com/hetio/xswap/).
Authors
- Zietz, Michael ;
- Himmelstein, Daniel, S ;
- Kloster, Kyle ;
- Williams, Christopher ;
- Nagle, Michael, W ;
- Greene, Casey, S
Hetnets, short for heterogeneous networks, contain multiple node and relationship types and offer a way to encode biomedical knowledge. One such example, Hetionet connects 11 types of nodes - including genes, diseases, drugs, pathways, and anatomical structures - with over 2 million edges of 24 types. Previous work has demonstrated that supervised machine learning methods applied to such networks can identify drug repurposing opportunities. However, a training set of known relationships does not exist for many types of node pairs, even when it would be useful to examine how nodes of those types are meaningfully connected. For example, users may be curious not only how metformin is related to breast cancer, but also how the GJA1 gene might be involved in insomnia. We developed a new procedure, termed hetnet connectivity search, that proposes important paths between any two nodes without requiring a supervised gold standard. The algorithm behind connectivity search identifies types of paths that occur more frequently than would be expected by chance (based on node degree alone). We find that predictions are broadly similar to those from previously described supervised approaches for certain node type pairs. Scoring of individual paths is based on the most specific paths of a given type. Several optimizations were required to precompute significant instances of node connectivity at the scale of large knowledge graphs. We implemented the method on Hetionet and provide an online interface at https://het.io/search. We provide an open source implementation of these methods in our new Python package named hetmatpy.
Authors
- Himmelstein, Daniel, S ;
- Zietz, Michael ;
- Rubinetti, Vincent ;
- Kloster, Kyle ;
- Heil, Benjamin, J ;
- Alquaddoomi, Faisal ;
- Hu, Dongbo ;
- Nicholson, David, N ;
- Hao, Yun ;
- Sullivan, Blair, D ;
- Nagle, Michael, W ;
- Greene, Casey, S
Additional file 2.
Authors
- Nicholson, David N. ;
- Himmelstein, Daniel S. ;
- Greene, Casey S.
Additional file 2.
Authors
- Nicholson, David N. ;
- Himmelstein, Daniel S. ;
- Greene, Casey S.
Neutrophil and Monocyte Gene Sets. Entrez gene IDs and gene symbols for two xCell gene signatures (Neutrophil_HPCA_2 and Monocyte_FANTOM_2). Associated with Fig. 6.
Authors
- Way, Gregory P. ;
- Zietz, Michael ;
- Rubinetti, Vincent ;
- Himmelstein, Daniel S. ;
- Greene, Casey S.
Model coefficients for predicting TP53 loss of function. Using all compressed features in the model implicates compressed features with cancer hallmark signatures. Associated with Fig. 7.
Authors
- Way, Gregory P. ;
- Zietz, Michael ;
- Rubinetti, Vincent ;
- Himmelstein, Daniel S. ;
- Greene, Casey S.
Neutrophil and Monocyte Gene Sets. Entrez gene IDs and gene symbols for two xCell gene signatures (Neutrophil_HPCA_2 and Monocyte_FANTOM_2). Associated with Fig. 6.
Authors
- Way, Gregory P. ;
- Zietz, Michael ;
- Rubinetti, Vincent ;
- Himmelstein, Daniel S. ;
- Greene, Casey S.
Hetnetpy metaedge summary. Network summary of edge and node counts for each gene set collection.
Authors
- Way, Gregory P. ;
- Zietz, Michael ;
- Rubinetti, Vincent ;
- Himmelstein, Daniel S. ;
- Greene, Casey S.
Hetnetpy metaedge summary. Network summary of edge and node counts for each gene set collection.
Authors
- Way, Gregory P. ;
- Zietz, Michael ;
- Rubinetti, Vincent ;
- Himmelstein, Daniel S. ;
- Greene, Casey S.
Tissue types and counts for TARGET, TCGA, and GTEx.
Authors
- Way, Gregory P. ;
- Zietz, Michael ;
- Rubinetti, Vincent ;
- Himmelstein, Daniel S. ;
- Greene, Casey S.