Published on 22 October 2021

Replication Package for the Paper: "Understanding Code Smell Detection via Code Review: A Study of the OpenStack and Qt Communities"

View Dataset
Xiaofeng Han;Amjed Tahir;Liang, Peng;Counsell, Steve;Blincoe, Kelly;Li, Bing;Yajing Luo

Description

This repository contains the data and results from the paper "Understanding Code Smell Detection via Code Review: A Study of the OpenStack and Qt Communities" submitted to the ICPC 2021 special issue of the Empirical Software Engineering Journal, 2021. The replication package contains the following two folders: 1) data folder The data folder contains the following four folders, which is organized by research questions (RQs). RQ1: The RQ1 folder contains the retrieved 1,536 code reviews that discuss code smells. Each review includes four parts: Code Change URL, Code Smell, Code Smell Discussion, and Source Code URL. RQ2: The RQ2 folder contains the coded data for RQ2, called Data Labeling & Encoding for RQ2.mx18. It is the results of data labeling and encoding for RQ2, which was analyzed by the MAXQDA tool. RQ3 and RQ5: Extracted data for RQ3.1.xlsx: this file contains the extracted data (i.e., specific refactoring actions suggested by reviewers) for RQ3.1. Data Labeling & Encoding for RQ3 and RQ5.mx18: this file contains the extracted data for RQ3 (excluding the specific refactoring actions in RQ3.1) and RQ5. RQ4: The RQ4 folder contains the extracted data for RQ4, called Extracted data for RQ4.xlsx. Note: The mx18 files can be opened by MAXQDA 18 or higher versions, which are available at https://www.maxqda.com/ for download. You may also use the free 14-day trial version of MAXQDA 2018, which is available at https://www.maxqda.com/trial for download. 2) scripts folder The scripts folder contains the Python scripts that were used to search for code smell terms and the list of code smell terms. keyword.txt contains the keywords associated with code smells, such as "smell, duplication, and dead". get_changes.py is used for getting code changes from OpenStack and Qt. get_comments.py is used for getting review comments for each code change. keywords_search.py is used for searching review comments that contain at least one keyword. random_select.py is used for randomly selecting review comments that do not contain any keyword. keywords_improve.py is used for improving the keyword-based mining approach. tools.py is used for supporting the process of keywords improving.

Citations (0)

Mentions (0)

Metrics

Dataset Index

0.3

FAIR Score

13%

Citations

0

Mentions

0

Metrics Over Time

Publication Details

DOI

Publisher

Zenodo

Assigned Domain

Subfield

Information Systems

Field

Computer Science

Domain

Physical Sciences

Confidence Score

94%

Source

Open Alex

Keywords

Code ReviewCode SmellMining Software RepositoriesEmpirical Study

Normalization Factors

FT

13.46

CTw

1.00

MTw

1.00