RQ1
RQ1 is What is the state of the art in the research area of code review?. As stated in the introduction section, this question concerns itself with topics that are researched often, the results of that research, and research methods, tools and datasets that are used. Each of these topics will be discussed in the answer to this question.
Research methods
First, let us consider the research methods that are generally being used by the papers we incorporated in the survey. The majority of the papers we considered do quantitative research [18, 19, 21, 53, 75, 131, 171, 181, 183, 198] and some qualitative research has also been done [11, 29, 75, 131, 171, 184]. The quantitative research mainly concerns itself with research on open-source projects, while the qualitative research often also considers closed source projects. This is probably the case because the development history, and hence also the code history, is far easier to access for open-source projects than for closed-source projects. The qualitative research mainly concerns itself with interviews, mostly with developers. This is probably the case because it is more convenient to reach developers of proprietary projects, for example because they are often all in one place.
All research that is considered in this survey was done empirically. In other words, no explicit experimental setups have been created just for the purpose of doing the research, but all research has been done on existing situations.
Research subjects
The surveyed papers broadly consider four research subjects, namely factors that the code review process influences, factors that influence the code review process, general characteristics of the code review process, and the performance of tools assisting the code review process. These subjects will be discussed below.
Factors that the code review process influences: Bacchelli and Bird [11] found that code improvement is the most prevalent result of code reviews, followed by code understanding among the development team and social communication within the development team. They note that finding errors is not a prevalent result of doing code reviews, as opposed to what most people participating in it expect from it. In numbers they find that 14% of code review comments were about finding defects, while as much as 44% of the developers indicates finding defects as the main motivation for doing code reviews.
A bit to the contrary, McIntosh et al. [131] find that low participation in code reviews does lead to a significant increase in post-release defects, which suggests that reviews in which developers show much activity actually help in finding defects. They additionally find that review coverage, the share of code that has been reviewed, also influences the amount of post-release defects, though not as much as review participation. Shimagaki et al. [171] performed a replication at Sony Mobile study of the aforementioned study by McIntosh et al. Their results were the same for review coverage, and partly for review participation. Contrary to McIntosh et al., they found that the reviewing time and discussion length metrics for review participation did not contribute significantly to the amount of post-release defects.
Thongtanunam et al. [182] back up the claim that doing code reviews helps preventing defects in software, by stating that using code review activity can help to identify defect-prone software modules. They also state that developers who do not often author code changes in the relevant part of code, but still review much can deliver good code reviews. Only when the developer does not author much and does not review much, the code quality can decrease significantly.
Factors that influence the code review process: To start, Baysal et al. [18] found that mainly the experience of the writer of a patch influences the outcome (i.e., accepted or not) of the review. Gousios et al. [75] do not fully agree with this in the context of GitHub pull-requests. They found that only 13% of pull-requests are rejected due to technical reasons, and as much as 53% due to aspects of the distributed nature of pull requests. Thongtanunam et al. [181] add to this that low number of reviewers for prior patches of a patch submitter and a large time since the last modification of the files being modified by the patch, which is also agreed upon by Gousios et al. in the context of pull-requests [75], make it difficult to find reviewers for a patch. Although this does not mean it gets closed immediately, the effect may be the same in the long run. Contrary to what one would expect, they also find that the presence of test code in the patch does not influence the decision to merge it.
Related to this, some submitted patches may simply take a long time to be reviewed. Baysal et al. [18] attribute this to the patch size, which component the patch is for, organizational affiliation of the patch writer, the experience of the patch writer, and the amount of past activity of the reviewer. Bacchelli and Bird [11] note about the last point that understanding the code that is to be reviewed, by the reviewer, is an important challenge. Gousios et al. [75] add to this that the size of the project, its test coverage, and the projects track record on accepting external contributions are also relevant. Thongtanunam et al. [183] add that not being able to find a reviewer for a patch can significantly increase the time required to merge a patch, with an average of 12 days longer. In their research 4%-30% of reviews had this problem, depending on the project. Thongtanunam et al. [181] also note that if a previous path of a submitter took long to review, a new patch is very likely to have the same problem. They also point out that a patch takes longer to merge if the purpose of a patch is to introduce new features. According to Gousios et al. [75], most patches are merged or rejected within one day.
Thongtanunam et al. [181] also found that short patch descriptions, a small code churn, and a small discussion length decrease the chance that a patch will be discussed. Czerwonka, Greiler, and Tilford [53] add to this that when the number of changed files gets above 20, the amount of useful feedback gets lower.
Characteristics of the code review process: Beller et al. [21] found that 75% of the changes in code under review are related to evolvability of the system, and only 25% to its functionality. They also note that the amount of code churn, the number of changed files, and task type determine the number of changes that is done when a patch is under review. According to Gousios et al. [75], most patches are not that big, most being less than 20 lines long (in the context of pull-requests). They also note that discussions are only 3 comments long on average. Beller et al. [21] note about this that 78-90% of the changes that are done during review are because of those comments. The source of the rest of the changes is not known by them.
Another interesting point to note is that only 14% of the repositories on GitHub are actually using pull-requests on GitHub [75]. This may not be readily generalizable to the amount of changes that is being code reviewed in all projects, but is a quite low number nevertheless. Thongtanunam et al. [182] add to this that most developers that only do reviews are core developers, from which one could infer that most patch submissions (and also PRs) would come from external contributors. This together leads one to think that projects are not yet that open to external contributions.
Performance of tools assisting in the code review process: Two tools are proposed in the papers that have been surveyed: RevFinder [183] and cHRev [198]. Both tools aim to automatically recommend reviewers to a patch submission, in order to make patch processing faster. RevFinder works by looking at the paths of files that reviewers reviewed previously. It recommends a reviewer whose file path review history looks the most like that of the current patch submission. It uses several string comparison techniques for this. cHRev improves on this by considering how often a reviewer has reviewed changes for a certain component, and also how recently. It has much better accuracy than RevFinder. RevFinder recommends correct reviewers with a median rank of 4 (i.e., a good reviewer candidate is on position 4 on average) based on empirical evaluation. For cHRev, less than 2 recommendations are necessary to find a good reviewer candidate.
RQ2
When it comes to application of code review in industry, we collect information from three perspectives, namely popularity, variety and choices of tools. From collecting information from papers, we know that around one fourth of researched companies regard the code review process as a regular process and about 60 percent of respondents are implementing tool-based code review based on analysis from different companies who are selling code review tools in [15]. Most of the teams use one specialized review tool. One third of the teams choose generic software development tools, like ticket systems and version control systems. Some development teams indicate no tool has been used in their reviews [15]. Considering that there are various tools for code review, we find there are two groups. Specifically, for some teams, no specialized review software is used. Instead, the teams use a combination of IDE, source code management system (SCM) and ticket system/bug tracker. For others, lots of open source tools were used or mentioned: Gerrit, Crucible, Stash, GitHub pull requests, Upsource, Collaborator, ReviewClipse and CodeFlow [16].
Concluding, based on different enterprises’ expectation and requirements, they apply various methods for code review. Additionally, we also find different tools are not very comparable as research mentions these are tools for different teams, projects and metrics. It is hard to say which tool is generally better than others. We found that code review is commonly applied in industries and also it is a nice way to guarantee quality of software.
RQ3
What are future challenges in the area of code review? This concerns both research challenges and challenges for use in practice.
Since the concept of modern code review was proposed in 2013 [11], plenty of researchers spend their efforts on exploring code review. According to reference [21, 131], modern code review can be regarded as a lightweight variant of formal code inspections. However, code inspections mandates strict review research criteria and has been proved to improve the software quality. Therefore, in this stage, many papers aim at increasing the understanding of modern code review and figuring out how it improves the software quality. During these study processes, to find out the practical application and impact, qualitative and quantitative methods are applied and some suggested challenges and improvements are found.
- Future research challenges
Firstly, exploration into modern code review is still needed. Many studies suggest that further understanding of modern code review can be helpful to the future research. As an example, in reference [53] it says “Due to its costs, code reviewing practice is a topic deserving to be better understood, systematized and applied to software engineering workflow with more precision than the best practice currently prescribes.”
Specifically, some properties of modern code review such as code ownership can be explored, inspired by the reference [131] which proposed a workflow to quantitatively research the relationship between code review coverage and software quality.
In reference [11], awareness and learning during code review are cited as motivations for code review by developers. Future research could research these aspects more explicitly.
Inspired by the progress of the understanding of modern code review, researchers also propose some possible topics that can be explored to obtain more findings.
Bacchelli et al. [11] suggest further research on code comprehension during code review. According to the paper research has been done on this with new developers in mind, but it would also be applicable to code reviews. The authors note that IDEs often include tools for code comprehension, but code review tools do not.
According to reference [53] prior research has neglected the impact of undocumented changes on code review. Future research can focus on this and figure out whether the undocumented changes make a difference.
The authors of reference [75] propose to research on the effect of the democratization of the development process, which occurs for example through the use of pull requests. Democratization could for example lead to a substantially stronger commons ecosystem.
They also suggest research on formation of teams and management hierarchies with respect to open-source projects and research on the motives of developers to work in a highly transparent workspace, as prior work do not take these issues into consideration.
Besides, research on studying how best to interpret empirical software engineering research within the context of contextual factors in reference [19]. Understanding the reasons behind observable developer behaviour requires an understanding of the contexts, processes, organizational and individual factors, which can be helpful to realize their influence on code review and the outcome.
- Future challenges in practice
So far, the code review process is adopted both in industry and communities. In reference [11] the authors propose future research on automating code review tasks, which mainly concerns low-level tasks, like checking boundary conditions or catching common mistakes.
Similarly, authors of reference [29] suggest to explore an automatic way to classify and assess the usefulness of comments. This was specifically requested by an interviewee’s and is still an open challenge regarding CodeFlow, an in-house code review tool. They also propose to research on methods to automatically recommend reviewers for changes in the system.
In reference [75], the ways to managing tasks in the pull-based development model can be explored, in order to increase the efficiency and readability.
This paper also gives us an example a tool which would suggest whether a pull request can be merged or not, because this can be predicted with fairly high accuracy. Therefore, the development of tools to help the core team of a project with prioritizing their work can be explored.
Several code review tools, such as CodeFlow, ReDA and RevFinder, can still be explored. In reference [183], further research can focus on how RevFinder works in practice, in terms of how effectively and practically it helps developers in recommending code-reviewers, when deployed in a live development environment. According to reference [184], the authors aim to develop a live code review monitoring dashboard based on ReDA. They also aim to create a more portable version of ReDA that is also compatible with other tools supporting the MCR process.