Strengthening assessment integrity in the era of generative AI: evidence from a large-scale study

The advent of ChatGPT in November 2022 immediately rang alarm bells about the integrity of academic assessment, especially in distance learning. The challenge of detecting generative AI (GenAI) misuse has led some educators to advocate a reversion to in-person exams, proctoring and viva voces, even though these may undermine inclusion; others see a solution in assessment redesign. This large-scale empirical study at a UK university, based on 590 student and 354 AI-generated answers, provides evidence on markers’ ability to detect the GenAI scripts and whether some assessment types are more robust than others against GenAI misuse. Seventeen different assessment types were tested, using 59 example questions spanning 17 disciplines at FHEQ levels 3 to 6. Synthesising quantitative and qualitative analysis, the research found that training improved markers’ ability to detect GenAI answers but also increased false positives. This reinforces the case instead for assessment redesign and the research found that some assessment types were more robust against GenAI, particularly those that rely on higher level skills, align with ‘authentic’ assessment, require application of course materials and are supported by clear and rigorously-applied marking rubrics. The findings suggest evidence-based practical ways to redesign assessment for the GenAI era.

Strengthening assessment integrity in the era of generative AI: evidence from a large-scale study

Description

Citations (0)

No citations found

Mentions (0)

No mentions found

Metrics

Metrics Over Time

Publication Details

Assigned Domain

Keywords

Normalization Factors