Scholar Data

This repository contains resources developed for the paper: Gupta, V., Zhang, S., Vempala, A., He, Y., Choji, T., Srikumar V., Right for the Right Reason: Evidence Extraction for Trustworthy Tabular Reasoning. In: Proceeding of the The Association of Computational Linguistic 2022 (ACL ’22), May 2022". It includes the relevant rows marking for the train set of the InfoTabS dataset (https://infotabs.github.io/) Gupta et. al. 2020 [1]. We followed the protocol of Gupta et al. (2022) [2] which annotated the development and test sets (alpha1, alpha2, alpha3) sets: one table and three distinct hypotheses formed a HIT. We divide the tasks equally into 110 batches, each batch having 51 HITs each having three examples. In total, we collected 81,282 annotations from 90 distinct annotators. Overall, twenty five annotators completed over 1000 tasks, corresponding to 87.75 % of the examples, indicating a tail distribution with the annotations. Overall, 16,248 training set table-hypothesis pairs were successfully labeled with the evidence rows. On average, we obtain 89.49% F1-score with equal precision and recall for annotation agreement when compared with majority vote. It also includes an annotation template used on the mTurk platform for crowdsourcing. The cited datasets were used in this work. The cited datasets were used in this work. Files to access the annotation follow the below structure: annotation_batches batches_test: contain final results “.csv” files for all the development and test set batches (taken from Gupta et. al. 2022) batches_train: contain our annotated results “.csv” files for all the train set batches README.md: contain the readme for the annotation batches details main_template_row_relevant.html: content the annotation template used for each HIT i.e. marking the relevant row for each instance annotation_stats.md: Have details of the annotation statistics release_mturk: contain the release batches details i.e. csv for corresponding batches released Files to recreate the annotation statistics and pre-processed data: results_test: contain the pre-processed batch csv for dev and test set each batch. In the dev and test set. The integrated one computes the agreement stats for all the batches.(taken from Gupta et. al. 2022) results_train: similar to resutls_train expect contain the pre-processed batch csv for train set. scripts: contain the scripts needed to create the csv in the results_test and results_train sets. The script title denotes the function (the statistic it computes) for the scripts. src: the scripts use these python files to create the relevant statistics. References: [1] InfoTabS: Inference on Tables as Semi-structured Data, Vivek Gupta, Maitrey Mehta, Pegah Nokhiz, Vivek Srikumar, ACL 2020 [2] Is My Model Using The Right Evidence? Systematic Probes for Examining Evidence-Based Tabular Reasoning, Vivek Gupta, Riyaz A. Bhat, Atreya Ghosal, Manish Srivastava, Maneesh Singh, Vivek Srikumar, TACL 2022, presented at ACL 2022

Right for the Right Reason: Evidence Extraction for Trustworthy Tabular Reasoning

Description

Citations (0)

No citations found

Mentions (0)

No mentions found

Metrics

Metrics Over Time

Publication Details

Assigned Domain

Keywords

Normalization Factors