SPIKE-QA: A 50K size English dataset for SLM

SPIKE-QA is a human-indicated QA dataset generated by the GPT4o-small model, the dataset is collected as well as merged by the author with Python script. It contains 50,262 pairs of Q&A samples without time information but just single independent questions and answers.(Zero-Shot)The topic covers basic science like physics, chemistry, or math to complex generation problems or some daily chat. The dataset is in the form of a bunch of Excel tables, each of which holds two feature meanings as they are named "Question" and "Answer." The file name SPIKE-QA.csv is the complete dataset in the form of CSV. The data collected by giving a prompt to GPT to ensure the generation is in a form in pairs of tuples, like lis=[("Question1", "Answer1"),("Question2", "Answer2"),...], and transform it with python scriptThe size of the data might not be enough to pre-train an LLM from the start, it only seems to be used for parameter tuning, but paraphrasing the dataset might be one way to change the data into a useful resource. The dataset could also be used for model evaluation due to its diversity and vary length of the samples. The most important thing is accessibility, this dataset is a CSV file, making it easy for beginner to practice.Copy right reserved by the author(ORCID:0009-0002-1449-2803). An alternative of doi for this dataset is 10.34740/kaggle/dsv/10346351.

SPIKE-QA: A 50K size English dataset for SLM

Description

Citations (0)

No citations found

Mentions (0)

No mentions found

Metrics

Metrics Over Time

Publication Details

Assigned Domain

Normalization Factors