Published on 15 May 2024

Humans vs ChatGPT texts on TOEFL questions and HC3 dataset

View Dataset
Javier, Conde;Pedro, Reviriego;Elena, Merino-Gómez;Gonzalo, Martínez;José Alberto, Hernández

Description

DescriptionHuman-ChatGPT (gpt4-o) comparison corpus. It extends the HC3 dataset and ChatGPT Generated Text Detection corpus. The original datasets include the questions and human & ChatGPT3.5 answers. These datasets extend the originals with answers from gpt4-o. Each line of each file is the gpt4-o answer to each of the questions.HC3 dataset:financemedicinecomputingopen questionsChatGPT Generated Text Detection corpus:toeflProgram.py: python script to lemmatize, POS, and clean the human/chatgpt texts PaperPaper: Playing with Words: Comparing the Vocabulary and Lexical Richness of ChatGPT and HumansCite:@misc{reviriego2023playing,      title={Playing with Words: Comparing the Vocabulary and Lexical Richness of ChatGPT and Humans},       author={Pedro Reviriego and Javier Conde and Elena Merino-Gómez and Gonzalo Martínez and José Alberto Hernández},      year={2023},      eprint={2308.07462},      archivePrefix={arXiv},      primaryClass={cs.CL}}

Citations (0)

Mentions (0)

Metrics

Dataset Index

1.8

FAIR Score

73%

Citations

0

Mentions

0

Metrics Over Time

Publication Details

DOI

Publisher

Zenodo

Assigned Domain

Subfield

Artificial Intelligence

Field

Computer Science

Domain

Physical Sciences

Confidence Score

45%

Source

Scholar Data Model

Normalization Factors

FT

13.46

CTw

1.00

MTw

1.00