Automated Organization Profile

School of Information, University of Michigan

Current S-Index

16.8

Sum of Dataset Indices for all datasets

Average Dataset Index per Dataset

1.9

Average Dataset Index per dataset

Total Datasets

9

Total datasets in this organization

Average FAIR Score

67.1%

Average FAIR Score per dataset

Total Citations

2

Total citations to the organization's datasets

Total Mentions

6

Total mentions of the organization's datasets

S-Index Interpretation

S-Index Over Time

Cumulative Citations Over Time

Cumulative Mentions Over Time

Datasets

SUSTAINABLESIGNALS: An AI Approach for Inferring Consumer Product Sustainability

The everyday consumption of household goods is a significant source of environmental pollution. As people increasingly shop online, this affords an opportunity to provide consumers with actionable feedback on the social and environmental impact of potential purchases. In our work, we explore the following questions on Amazon a) do consumers bring up the environment in their reviews either directly or through relevant related topics? b) do they tend to bring up the environment when they are satisfied or dissatisfied with a product? c) in what granular context do they bring up the environment?

To address these questions, we designed an annotation task and recruited knowledgeable students to annotate consumer product reviews. This dataset comprises annotations for 779 individual reviews, with each review corresponding to a distinct product.

In our paper, we propose a machine learning method using these annotations that can discover signals of sustainability and infer a product's sustainability score. Our model and code are released at https://github.com/Sabina321/sustainable_signals. The data is for the following paper: Tong Lin, Tianliang Xu, Amit Zac, and Sabina Tomkins. SUSTAINABLESIGNALS: An AI Approach for Inferring Consumer Product Sustainability. In Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, 2023. AI and Social Good Track.

Authors

  • Lin, Tong ;
  • Xu, Tianliang ;
  • Tomkins, Sabina ;
  • Zac, Amit
0 Citations0 Mentions15% FAIR0.3 Dataset Index
10.3886/jw31-2j382023

Older Adults and Social Engagement During Social Distancing (Version: v0)

Interviews with Older adults about their social distancing and social connectedness and technology use during COVID-19

Authors

  • Brewer, Robin
0 Citations0 Mentions69% FAIR1.5 Dataset Index
10.3886/e1544692021

Older Adults and Social Engagement During Social Distancing (Version: v1)

Interviews with Older adults about their social distancing and social connectedness and technology use during COVID-19

Authors

  • Brewer, Robin
0 Citations0 Mentions69% FAIR1.5 Dataset Index
10.3886/e154469v12021

Dataset for "Matching in the large: An experimental study" (Version: 1)

We compare the performance of the Boston Immediate Acceptance (IA) and Gale--Shapley Deferred Acceptance (DA) mechanisms in a laboratory setting where we increase the number of participants per match. In our experiment, we first increase the number of students per match from 4 to 40; when we do so, participant truth-telling increases under DA but decreases under IA, leading to a decrease in efficiency under both mechanisms. Furthermore, we find that DA remains more stable than IA, regardless of scale. We then further increase the number of participants per match to 4,000 through the introduction of robots. When robots report their preferences truthfully, we find that scale has no effect on human best response behavior. By contrast, when we program the robots to draw their strategies from the distribution of empirical human strategies, we find that our increase in scale increases human ex-post best responses under both mechanisms.

Authors

  • Chen, Yan ;
  • Jiang, Ming ;
  • Kesten, Onur ;
  • Robin, Stéphane ;
  • Zhu, Min
0 Citations0 Mentions73% FAIR0.8 Dataset Index
10.3886/e103521v12020

Data and Code for: Matching in the Large: An Experimental Study (Version: V0)

We compare the performance of the Boston Immediate Acceptance (IA) and Gale--Shapley Deferred Acceptance (DA) mechanisms in a laboratory setting where we increase the number of participants per match. In our experiment, we first increase the number of students per match from 4 to 40; when we do so, participant truth-telling increases under DA but decreases under IA, leading to a decrease in efficiency under both mechanisms. Furthermore, we find that DA remains more stable than IA, regardless of scale. We then further increase the number of participants per match to 4,000 through the introduction of robots. When robots report their preferences truthfully, we find that scale has no effect on human best response behavior. By contrast, when we program the robots to draw their strategies from the distribution of empirical human strategies, we find that our increase in scale increases human ex-post best responses under both mechanisms.

Authors

  • Chen, Yan ;
  • Jiang, Ming ;
  • Kesten, Onur ;
  • Robin, Stéphane ;
  • Zhu, Min
0 Citations0 Mentions73% FAIR1.6 Dataset Index
10.3886/e1035212020

Data and Code for: Matching in the Large: An Experimental Study (Version: 2)

We compare the performance of the Boston Immediate Acceptance (IA) and Gale--Shapley Deferred Acceptance (DA) mechanisms in a laboratory setting where we increase the number of participants per match. In our experiment, we first increase the number of students per match from 4 to 40; when we do so, participant truth-telling increases under DA but decreases under IA, leading to a decrease in efficiency under both mechanisms. Furthermore, we find that DA remains more stable than IA, regardless of scale. We then further increase the number of participants per match to 4,000 through the introduction of robots. When robots report their preferences truthfully, we find that scale has no effect on human best response behavior. By contrast, when we program the robots to draw their strategies from the distribution of empirical human strategies, we find that our increase in scale increases human ex-post best responses under both mechanisms.

Authors

  • Chen, Yan ;
  • Jiang, Ming ;
  • Kesten, Onur ;
  • Robin, Stéphane ;
  • Zhu, Min
0 Citations0 Mentions73% FAIR1.6 Dataset Index
10.3886/e103521v22020

Hybrid Approaches to Detect Comments Violating Macro Norms on Reddit (Version: 2.0)

[Content warning: Files may contain instances of highly inflammatory and offensive content.]
This dataset was generated as an extension of our CSCW 2018 paper: Eshwar Chandrasekharan, Mattia Samory, Shagun Jhaver, Hunter Charvat, Amy Bruckman, Cliff Lampe, Jacob Eisenstein, and Eric Gilbert. 2018. The Internet’s Hidden Rules: An Empirical Study of Reddit Norm Violations at Micro, Meso, and Macro Scales. Proceedings of the ACM on Human-Computer Interaction 2, CSCW (2018), 32. Description: Working with over 2M removed comments collected from 100 different communities on Reddit (subreddit names listed in data/study-subreddits.csv), we identified 8 macro norms, i.e., norms that are widely enforced on most parts of Reddit. We extracted these macro norms by employing a hybrid approach—classification, topic modeling, and open-coding—on comments identified to be norm violations within at least 85 out of the 100 study subreddits. Finally, we labelled over 40K Reddit comments removed by moderators according to the specific type of macro norm being violated, and make this dataset publicly available (also available on Github). For each of the labeled topics, we identified the top 5000 removed comments that were best fit by the LDA topic model. In this way, we identified over 5000 removed comments that are examples of each type of macro norm violation described in the paper. The removed comments were sorted by their topic fit, stored into respective files based on the type of norm violation they represent, and are made available on this repo. Here we make the following datasets publicly available: * 1 file containing the log of over 2M removed comments obtained from the top 100 subreddits between May 2016 to March 2017, after filtering out the following comments: 1) comments by u/AutoModerator, 2) replies to removed comments (i.e., children of the poisoned tree - refer to the paper for more information), and 3) non-readable comments (not utf-8 encoded). * 8 files, each containing 5000+ removed comments obtained from Reddit, are stored in: data/macro-norm-violations/ , and they are split into different files based on the macro norm they violated. Each new line in the files represent a comment that was posted on Reddit between May 2016 to March 2017, and subsequently removed by subreddit moderators for violating community norms. All comments were preprocessed using the script in code/preprocessing-reddit-comments.py , in order to do the following: 1. remove new lines, 2. convert text to lowercase, and 3. strip numbers and punctuations from comments. Description of 1 file containing over 2M removed comments from 100 subreddits. "reddit-removal-log.csv" - all comments that were removed from the 100 study subreddits during the study period described above (post-filtering). Descriptions of each file containing 5059 comments (that were removed from Reddit, and preprocessed) violating macro norms present in data/macro-norm-violations/: "macro-norm-violations-n10-t0-misogynistic-slurs.csv" - Comments that use misogynistic slurs. "macro-norm-violations-n15-t2-hatespeech-racist-homophobic.csv" - Comments containing hate speech that is racist or homophobic. "macro-norm-violations-n10-t3-opposing-political-views-trump.csv", "macro-norm-violations-n15-t10-opposing-political-views-trump.csv" - Comments with opposing political views around Trump (depends on originating sub). "macro-norm-violations-n10-t4-verbal-attacks-on-Reddit.csv" - Comments containing verbal attacks on Reddit or specific subreddits. "macro-norm-violations-n10-t5-porno-links.csv" - Comments with pornographic links. "macro-norm-violations-n10-t8-personal-attacks.csv", "macro-norm-violations-n10-t9-personal-attacks.csv"- Comments containing personal attacks. "macro-norm-violations-n15-t3-abusing-and-criticisizing-mods.csv" - Comments abusing and criticisizng moderators. "macro-norm-violations-n15-t9-namecalling-claiming-other-too-sensitive.csv" - Comments with name-calling, or claiming that the other person is too sensitive. More details about the dataset can be found on arXiv: https://arxiv.org/abs/1904.03596

Authors

  • Chandrasekharan, Eshwar ;
  • Samory, Mattia ;
  • Gilbert, Eric
2 Citations6 Mentions77% FAIR5.7 Dataset Index
10.5281/zenodo.33386982019

Hybrid Approaches to Detect Comments Violating Macro Norms on Reddit (Version: 2.0)

[Content warning: Files may contain instances of highly inflammatory and offensive content.]
This dataset was generated as an extension of our CSCW 2018 paper: Eshwar Chandrasekharan, Mattia Samory, Shagun Jhaver, Hunter Charvat, Amy Bruckman, Cliff Lampe, Jacob Eisenstein, and Eric Gilbert. 2018. The Internet’s Hidden Rules: An Empirical Study of Reddit Norm Violations at Micro, Meso, and Macro Scales. Proceedings of the ACM on Human-Computer Interaction 2, CSCW (2018), 32. Description: Working with over 2M removed comments collected from 100 different communities on Reddit (subreddit names listed in data/study-subreddits.csv), we identified 8 macro norms, i.e., norms that are widely enforced on most parts of Reddit. We extracted these macro norms by employing a hybrid approach—classification, topic modeling, and open-coding—on comments identified to be norm violations within at least 85 out of the 100 study subreddits. Finally, we labelled over 40K Reddit comments removed by moderators according to the specific type of macro norm being violated, and make this dataset publicly available (also available on Github). For each of the labeled topics, we identified the top 5000 removed comments that were best fit by the LDA topic model. In this way, we identified over 5000 removed comments that are examples of each type of macro norm violation described in the paper. The removed comments were sorted by their topic fit, stored into respective files based on the type of norm violation they represent, and are made available on this repo. Here we make the following datasets publicly available: * 1 file containing the log of over 2M removed comments obtained from the top 100 subreddits between May 2016 to March 2017, after filtering out the following comments: 1) comments by u/AutoModerator, 2) replies to removed comments (i.e., children of the poisoned tree - refer to the paper for more information), and 3) non-readable comments (not utf-8 encoded). * 8 files, each containing 5000+ removed comments obtained from Reddit, are stored in: data/macro-norm-violations/ , and they are split into different files based on the macro norm they violated. Each new line in the files represent a comment that was posted on Reddit between May 2016 to March 2017, and subsequently removed by subreddit moderators for violating community norms. All comments were preprocessed using the script in code/preprocessing-reddit-comments.py , in order to do the following: 1. remove new lines, 2. convert text to lowercase, and 3. strip numbers and punctuations from comments. Description of 1 file containing over 2M removed comments from 100 subreddits. "reddit-removal-log.csv" - all comments that were removed from the 100 study subreddits during the study period described above (post-filtering). Descriptions of each file containing 5059 comments (that were removed from Reddit, and preprocessed) violating macro norms present in data/macro-norm-violations/: "macro-norm-violations-n10-t0-misogynistic-slurs.csv" - Comments that use misogynistic slurs. "macro-norm-violations-n15-t2-hatespeech-racist-homophobic.csv" - Comments containing hate speech that is racist or homophobic. "macro-norm-violations-n10-t3-opposing-political-views-trump.csv", "macro-norm-violations-n15-t10-opposing-political-views-trump.csv" - Comments with opposing political views around Trump (depends on originating sub). "macro-norm-violations-n10-t4-verbal-attacks-on-Reddit.csv" - Comments containing verbal attacks on Reddit or specific subreddits. "macro-norm-violations-n10-t5-porno-links.csv" - Comments with pornographic links. "macro-norm-violations-n10-t8-personal-attacks.csv", "macro-norm-violations-n10-t9-personal-attacks.csv"- Comments containing personal attacks. "macro-norm-violations-n15-t3-abusing-and-criticisizing-mods.csv" - Comments abusing and criticisizng moderators. "macro-norm-violations-n15-t9-namecalling-claiming-other-too-sensitive.csv" - Comments with name-calling, or claiming that the other person is too sensitive. More details about the dataset can be found on arXiv: https://arxiv.org/abs/1904.03596

Authors

  • Chandrasekharan, Eshwar ;
  • Samory, Mattia ;
  • Gilbert, Eric
0 Citations0 Mentions77% FAIR1.9 Dataset Index
10.5281/zenodo.25414492019

Hybrid Approaches to Detect Comments Violating Macro Norms on Reddit (Version: 1.0)

This dataset was generated as an extension of our CSCW 2018 paper:Eshwar Chandrasekharan, Mattia Samory, Shagun Jhaver, Hunter Charvat, Amy Bruckman, Cliff Lampe, Jacob Eisenstein, and Eric Gilbert. 2018. The Internet’s Hidden Rules: An Empirical Study of Reddit Norm Violations at Micro, Meso, and Macro Scales. Proceedings of the ACM on Human-Computer Interaction 2, CSCW (2018), 32.Description:Working with over 2.8M removed comments collected from 100 different communities on Reddit (subreddit names listed in data/study-subreddits.csv), we identified 8 macro norms, i.e., norms that are widely enforced on most parts of Reddit. We extracted these macro norms by employing a hybrid approach—classification, topic modeling, and open-coding—on comments identified to be norm violations within at least 85 out of the 100 study subreddits. Finally, we labelled over 40K Reddit comments removed by moderators according to the specific type of macro norm being violated, and make this dataset publicly available (also available on Github).For each of the labeled topics, we identified the top 5000 removed comments that were best fit by the LDA topic model. In this way, we identified over 5000 removed comments that are examples of each type of macro norm violation described in the paper. The removed comments were sorted by their topic fit, stored into respective files based on the type of norm violation they represent, and are made available on this repo.8 files, each containing 5000+ removed comments obtained from Reddit, are stored in: data/macro-norm-violations/ , and they are split into different files based on the macro norm they violated. Each new line in the files represent a comment that was posted on Reddit between May 2016 to March 2017, and subsequently removed by subreddit moderators for violating community norms. All comments were preprocessed using the script in code/preprocessing-reddit-comments.py , in order to do the following: 1. remove new lines, 2. convert text to lowercase, and 3. strip numbers and punctuations from comments.Descriptions of each file containing 5059 comments (that were removed from Reddit, and preprocessed) violating macro norms present in data/macro-norm-violations/:"macro-norm-violations-n10-t0-misogynistic-slurs.csv" - Comments that use misogynistic slurs."macro-norm-violations-n15-t2-hatespeech-racist-homophobic.csv" - Comments containing hate speech that is racist or homophobic."macro-norm-violations-n10-t3-opposing-political-views-trump.csv", "macro-norm-violations-n15-t10-opposing-political-views-trump.csv" - Comments with opposing political views around Trump (depends on originating sub)."macro-norm-violations-n10-t4-verbal-attacks-on-Reddit.csv" - Comments containing verbal attacks on Reddit or specific subreddits."macro-norm-violations-n10-t5-porno-links.csv" - Comments with pornographic links."macro-norm-violations-n10-t8-personal-attacks.csv", "macro-norm-violations-n10-t9-personal-attacks.csv"- Comments containing personal attacks."macro-norm-violations-n15-t3-abusing-and-criticisizing-mods.csv" - Comments abusing and criticisizng moderators."macro-norm-violations-n15-t9-namecalling-claiming-other-too-sensitive.csv" - Comments with name-calling, or claiming that the other person is too sensitive.More details about the dataset can be found on arXiv: https://arxiv.org/abs/1904.03596

Authors

  • Chandrasekharan, Eshwar ;
  • Gilbert, Eric
0 Citations0 Mentions77% FAIR1.9 Dataset Index
10.5281/zenodo.25414502019