Chapter 7 Code Review

7.1 Motivation

Code review is a manual assessment process of proposed code changes by other developers than the author, in order to improve code quality and reduce the amount of software defects. As a common software engineering practice, code review is applied in industry and many open-source projects.

Concerning research, code review has become a popular topic recently according to the number of published papers. As noted in reference [11], the concept of modern code review was proposed in 2013. Since then, many researchers not only explored the modern code review process and its impact on software quality, but also pointed out the possible methods to improve the application of code review.

We collected and read relevant papers, and will present an overview of current research progress in code review in this survey chapter.

7.2 Research protocol

This section describes the review protocol used for the systematic review presented in this chapter. The protocol has been set up using Kitchenham’s method as described by [103].

7.2.1 Research questions

The goal of the review is to summarize the state of the art and identify future challenges in the code review area. The research questions are as follows:

  • RQ1: What is the state of the art in the research area of code review? This question focuses on topics that are researched often, the results of that research, and research methods, tools and datasets that are used.
  • RQ2: What is the current state of practice in the area of code review? This concerns tools and techniques that are developed and used in practice, by open source projects but also by commercial companies.
  • RQ3: What are future challenges in the area of code review? This concerns both research challenges and challenges for use in practice.

7.2.2 Search process

The search process consists of the following steps:

  • A Google Scholar search using the search query “modern code review” OR “modern code reviews”. The results list will be sorted by decreasing relevance by Google Scholar and will be considered by us in order.
  • A general Google search for non-scientific reports (e.g., blog posts) and implemented code review tools. For this search queries code review and code review tools are used, respectively. The result list will be considered in order.
  • All papers in the initial seed provided by the course instructor will be considered.
  • All papers referenced by already collected papers will be considered. We exude papers found using this rule of the search process. In other words, we do not apply this rule recursively.

From now on, elements of all four categories listed above in general will be called resource.

7.2.3 Inclusion criteria

From the scientific literature, the following types of papers will be considered:

Papers researching recent code review

  • concepts,
  • methodologies,
  • tools and platforms,
  • and experiments concerning the preceding.

From non-scientific resources, all resources discussing recent tools and techniques used in practice will be considered.

7.2.4 Exclusion criteria

Resources published before 2008 will be excluded from the study, in order for the survey to show only the state of the art of the field.

7.2.5 Primary study selection process

We will select a number of candidate resources based on the criteria stated above. For each resource, each person participating in the review can select it as a candidate.

From all candidates, resource will be selected that will actually be reviewed. This can also be done by each person participating in the review. All resources that are candidates but are not selected for actual review must be explicitly rejected, with accompanying reasoning, by at least two persons participating in the review.

7.2.6 Data collection

The following data will be collected from each considered resource:

  • Source (for example, the blog website or specific journal)
  • Year published
  • Type of resource
  • Author(s) and organization(s)
  • Summary of the resource of a maximum of 100 words
  • Data for answering RQ1:
    • Sub-topic of research
    • Research method
    • Tools used for research
    • Datasets used for research
    • Research questions and their answers
  • Data for answering RQ2:
    • Tools used
    • Company/organization using the tool
    • Evaluation of the tool
  • Data for answering RQ3:
    • Future research challenges posed
    • Challenges for use in practice

All data will be collected by one person participating in the review and checked by another.

7.3 Candidate resources

In the appendix, all candidates that are collected using the described search process are presented. The In survey column in the tables indicates whether the paper has been included in the survey in the end or if it has been excluded for some reason. If it has been excluded, the reason will be stated in the section Excluded papers.

7.3.1 Initial seed

This table lists all initial seed papers provided by the course instructor that conform to the stated criteria. They are listed in alphabetical order of the first author’s name, and then by publish year in Table 1 in the appendix.

7.3.2 Google Scholar

This table lists all candidates that have been collected through the Google Scholar search described in the search process. They are listed in alphabetical order of the first author’s name, and then by publish year. Note that as described in the search process section, papers in the search are considered in order of search result number. The Search date and Result number columns indicate the date on which the search was executed and the position in the search result list, respectively. The table can be found in Table 2 in appendix.

7.3.3 By reference

We list all candidates that have been found by being referenced by another paper we found in Table 3 in appendix.

7.4 Answers

7.4.1 RQ1

RQ1 is What is the state of the art in the research area of code review?. As stated in the introduction section, this question concerns itself with topics that are researched often, the results of that research, and research methods, tools and datasets that are used. Each of these topics will be discussed in the answer to this question.

7.4.1.1 Research methods

First, let us consider the research methods that are generally being used by the papers we incorporated in the survey. The majority of the papers we considered do quantitative research [18, 19, 21, 53, 75, 131, 171, 181, 183, 198] and some qualitative research has also been done [11, 29, 75, 131, 171, 184]. The quantitative research mainly concerns itself with research on open-source projects, while the qualitative research often also considers closed source projects. This is probably the case because the development history, and hence also the code history, is far easier to access for open-source projects than for closed-source projects. The qualitative research mainly concerns itself with interviews, mostly with developers. This is probably the case because it is more convenient to reach developers of proprietary projects, for example because they are often all in one place.

All research that is considered in this survey was done empirically. In other words, no explicit experimental setups have been created just for the purpose of doing the research, but all research has been done on existing situations.

7.4.1.2 Tools and datasets

All papers that do quantitative research use some tools for processing the data. None of the papers in this survey use pre-built tools. All of them have created some custom tools to process the data [18, 19, 21, 53, 75, 171, 181183, 198]. This mostly concerns custom code in the R programming language that is created for this specific use case only. Some of the papers make the source code of the tools they use available online, so it can be used by other people. The paper by Gousios et al. [75] does this, for example.

As for datasets that are used: there are a few open-source projects that are used very often for quantitative research, notably the following projects: Qt [131, 181183, 195], Android [181, 183, 184, 195, 198], OpenStack [181183, 195], LibreOffice [183, 195], Eclipse [195, 198]. Apart from these, mostly other open-source projects have been used, along with a few closed-source projects, for example from Sony Mobile [171] or from Microsoft [198]. What is used from these projects is often only the data from the code reviewing system, possibly along with the code that is under review. As stated above, the reason that open-source projects are used much more as datasets is probably that they are much easier to access.

On another note, the paper by Yang et al. [195] introduces a well structured new dataset for the purpose of performing research and as a benchmark dataset on which code review tools can be tested to compare them. It aims to create a more clear research environment that way.

7.4.1.3 Research subjects

The surveyed papers broadly consider four research subjects, namely factors that the code review process influences, factors that influence the code review process, general characteristics of the code review process, and the performance of tools assisting the code review process. These subjects will be discussed below.

Factors that the code review process influences: Bacchelli and Bird [11] found that code improvement is the most prevalent result of code reviews, followed by code understanding among the development team and social communication within the development team. They note that finding errors is not a prevalent result of doing code reviews, as opposed to what most people participating in it expect from it. In numbers they find that 14% of code review comments were about finding defects, while as much as 44% of the developers indicates finding defects as the main motivation for doing code reviews.

A bit to the contrary, McIntosh et al. [131] find that low participation in code reviews does lead to a significant increase in post-release defects, which suggests that reviews in which developers show much activity actually help in finding defects. They additionally find that review coverage, the share of code that has been reviewed, also influences the amount of post-release defects, though not as much as review participation. Shimagaki et al. [171] performed a replication at Sony Mobile study of the aforementioned study by McIntosh et al. Their results were the same for review coverage, and partly for review participation. Contrary to McIntosh et al., they found that the reviewing time and discussion length metrics for review participation did not contribute significantly to the amount of post-release defects.

Thongtanunam et al. [182] back up the claim that doing code reviews helps preventing defects in software, by stating that using code review activity can help to identify defect-prone software modules. They also state that developers who do not often author code changes in the relevant part of code, but still review much can deliver good code reviews. Only when the developer does not author much and does not review much, the code quality can decrease significantly.

Factors that influence the code review process: To start, Baysal et al. [18] found that mainly the experience of the writer of a patch influences the outcome (i.e., accepted or not) of the review. Gousios et al. [75] do not fully agree with this in the context of GitHub pull-requests. They found that only 13% of pull-requests are rejected due to technical reasons, and as much as 53% due to aspects of the distributed nature of pull requests. Thongtanunam et al. [181] add to this that low number of reviewers for prior patches of a patch submitter and a large time since the last modification of the files being modified by the patch, which is also agreed upon by Gousios et al. in the context of pull-requests [75], make it difficult to find reviewers for a patch. Although this does not mean it gets closed immediately, the effect may be the same in the long run. Contrary to what one would expect, they also find that the presence of test code in the patch does not influence the decision to merge it.

Related to this, some submitted patches may simply take a long time to be reviewed. Baysal et al. [18] attribute this to the patch size, which component the patch is for, organizational affiliation of the patch writer, the experience of the patch writer, and the amount of past activity of the reviewer. Bacchelli and Bird [11] note about the last point that understanding the code that is to be reviewed, by the reviewer, is an important challenge. Gousios et al. [75] add to this that the size of the project, its test coverage, and the projects track record on accepting external contributions are also relevant. Thongtanunam et al. [183] add that not being able to find a reviewer for a patch can significantly increase the time required to merge a patch, with an average of 12 days longer. In their research 4%-30% of reviews had this problem, depending on the project. Thongtanunam et al. [181] also note that if a previous path of a submitter took long to review, a new patch is very likely to have the same problem. They also point out that a patch takes longer to merge if the purpose of a patch is to introduce new features. According to Gousios et al. [75], most patches are merged or rejected within one day.

Thongtanunam et al. [181] also found that short patch descriptions, a small code churn, and a small discussion length decrease the chance that a patch will be discussed. Czerwonka, Greiler, and Tilford [53] add to this that when the number of changed files gets above 20, the amount of useful feedback gets lower.

Characteristics of the code review process: Beller et al. [21] found that 75% of the changes in code under review are related to evolvability of the system, and only 25% to its functionality. They also note that the amount of code churn, the number of changed files, and task type determine the number of changes that is done when a patch is under review. According to Gousios et al. [75], most patches are not that big, most being less than 20 lines long (in the context of pull-requests). They also note that discussions are only 3 comments long on average. Beller et al. [21] note about this that 78-90% of the changes that are done during review are because of those comments. The source of the rest of the changes is not known by them.

Another interesting point to note is that only 14% of the repositories on GitHub are actually using pull-requests on GitHub [75]. This may not be readily generalizable to the amount of changes that is being code reviewed in all projects, but is a quite low number nevertheless. Thongtanunam et al. [182] add to this that most developers that only do reviews are core developers, from which one could infer that most patch submissions (and also PRs) would come from external contributors. This together leads one to think that projects are not yet that open to external contributions.

Performance of tools assisting in the code review process: Two tools are proposed in the papers that have been surveyed: RevFinder [183] and cHRev [198]. Both tools aim to automatically recommend reviewers to a patch submission, in order to make patch processing faster. RevFinder works by looking at the paths of files that reviewers reviewed previously. It recommends a reviewer whose file path review history looks the most like that of the current patch submission. It uses several string comparison techniques for this. cHRev improves on this by considering how often a reviewer has reviewed changes for a certain component, and also how recently. It has much better accuracy than RevFinder. RevFinder recommends correct reviewers with a median rank of 4 (i.e., a good reviewer candidate is on position 4 on average) based on empirical evaluation. For cHRev, less than 2 recommendations are necessary to find a good reviewer candidate.

7.4.2 RQ2

When it comes to application of code review in industry, we collect information from three perspectives, namely popularity, variety and choices of tools. From collecting information from papers, we know that around one fourth of researched companies regard the code review process as a regular process and about 60 percent of respondents are implementing tool-based code review based on analysis from different companies who are selling code review tools in [15]. Most of the teams use one specialized review tool. One third of the teams choose generic software development tools, like ticket systems and version control systems. Some development teams indicate no tool has been used in their reviews [15]. Considering that there are various tools for code review, we find there are two groups. Specifically, for some teams, no specialized review software is used. Instead, the teams use a combination of IDE, source code management system (SCM) and ticket system/bug tracker. For others, lots of open source tools were used or mentioned: Gerrit, Crucible, Stash, GitHub pull requests, Upsource, Collaborator, ReviewClipse and CodeFlow [16].

Concluding, based on different enterprises’ expectation and requirements, they apply various methods for code review. Additionally, we also find different tools are not very comparable as research mentions these are tools for different teams, projects and metrics. It is hard to say which tool is generally better than others. We found that code review is commonly applied in industries and also it is a nice way to guarantee quality of software.

7.4.3 RQ3

What are future challenges in the area of code review? This concerns both research challenges and challenges for use in practice.

Since the concept of modern code review was proposed in 2013 [11], plenty of researchers spend their efforts on exploring code review. According to reference [21, 131], modern code review can be regarded as a lightweight variant of formal code inspections. However, code inspections mandates strict review research criteria and has been proved to improve the software quality. Therefore, in this stage, many papers aim at increasing the understanding of modern code review and figuring out how it improves the software quality. During these study processes, to find out the practical application and impact, qualitative and quantitative methods are applied and some suggested challenges and improvements are found.

  • Future research challenges

Firstly, exploration into modern code review is still needed. Many studies suggest that further understanding of modern code review can be helpful to the future research. As an example, in reference [53] it says “Due to its costs, code reviewing practice is a topic deserving to be better understood, systematized and applied to software engineering workflow with more precision than the best practice currently prescribes.”

Specifically, some properties of modern code review such as code ownership can be explored, inspired by the reference [131] which proposed a workflow to quantitatively research the relationship between code review coverage and software quality.

In reference [11], awareness and learning during code review are cited as motivations for code review by developers. Future research could research these aspects more explicitly.

Inspired by the progress of the understanding of modern code review, researchers also propose some possible topics that can be explored to obtain more findings.

Bacchelli et al. [11] suggest further research on code comprehension during code review. According to the paper research has been done on this with new developers in mind, but it would also be applicable to code reviews. The authors note that IDEs often include tools for code comprehension, but code review tools do not.

According to reference [53] prior research has neglected the impact of undocumented changes on code review. Future research can focus on this and figure out whether the undocumented changes make a difference.

The authors of reference [75] propose to research on the effect of the democratization of the development process, which occurs for example through the use of pull requests. Democratization could for example lead to a substantially stronger commons ecosystem.

They also suggest research on formation of teams and management hierarchies with respect to open-source projects and research on the motives of developers to work in a highly transparent workspace, as prior work do not take these issues into consideration.

Besides, research on studying how best to interpret empirical software engineering research within the context of contextual factors in reference [19]. Understanding the reasons behind observable developer behaviour requires an understanding of the contexts, processes, organizational and individual factors, which can be helpful to realize their influence on code review and the outcome.

  • Future challenges in practice

So far, the code review process is adopted both in industry and communities. In reference [11] the authors propose future research on automating code review tasks, which mainly concerns low-level tasks, like checking boundary conditions or catching common mistakes.

Similarly, authors of reference [29] suggest to explore an automatic way to classify and assess the usefulness of comments. This was specifically requested by an interviewee’s and is still an open challenge regarding CodeFlow, an in-house code review tool. They also propose to research on methods to automatically recommend reviewers for changes in the system.

In reference [75], the ways to managing tasks in the pull-based development model can be explored, in order to increase the efficiency and readability.

This paper also gives us an example a tool which would suggest whether a pull request can be merged or not, because this can be predicted with fairly high accuracy. Therefore, the development of tools to help the core team of a project with prioritizing their work can be explored.

Several code review tools, such as CodeFlow, ReDA and RevFinder, can still be explored. In reference [183], further research can focus on how RevFinder works in practice, in terms of how effectively and practically it helps developers in recommending code-reviewers, when deployed in a live development environment. According to reference [184], the authors aim to develop a live code review monitoring dashboard based on ReDA. They also aim to create a more portable version of ReDA that is also compatible with other tools supporting the MCR process.

7.5 Conclusions

While answering RQ1, we found that doing code reviews can improve social aspects and improve code robustness against errors. Additionally, finding a reviewer and the experience of a reviewer significantly influence the code review process, along with technical and non-technical aspects of the submitter and patch, like experience or patch description length. On another note, most changes done during code review are done for maintainability, not for resolving errors, and often the number of comments that are done during review are small. Last, we discussed two tools for recommending reviewers that aim to reduce the time to find appropriate reviewers.

As for RQ2, we noticed that tools like Gerrit, Crucible, Stash, GitHub pull requests, Upsource, Collaborator, ReviewClipse and CodeFlow are commonly used in practice. Most research proves that code review helps share knowledge and develop defect-free software. It is not very reasonable to rank various tools, as code review itself is much more important for quality of software compared to the choice of a specific tool.

Concerning RQ3, we found that as the concept of modern code review was proposed 5 year ago, most of the study aim at increasing the understanding of code review and its impact on software quality. However, prior work were still not enough such that some properties of were not taken into consideration. Therefore, further study on modern code review is still needed, to obtain more useful information. And learned from the research progress, several possible topics based on the prior work that can be explored to obtain more findings. In the practical area, more applications of code review, such as automating code review tasks, are proposed by researchers, which can increase the efficiency. And the code review tools can also be improved according to the current application.

References

[11] Bacchelli, A. and Bird, C. 2013. Expectations, outcomes, and challenges of modern code review. Proceedings of the 2013 international conference on software engineering (2013), 712–721.

[103] Kitchenham 2007. Guidelines for performing systematic literature reviews in software engineering. Keele University; University of Durham.

[18] Baysal, O. et al. 2016. Investigating technical and non-technical factors influencing modern code review. Empirical Software Engineering. 21, 3 (2016), 932–959.

[19] Baysal, O. et al. 2013. The influence of non-technical factors on code review. Reverse engineering (wcre), 2013 20th working conference on (2013), 122–131.

[21] Beller, M. et al. 2014. Modern code reviews in open-source projects: Which problems do they fix? Proceedings of the 11th working conference on mining software repositories (2014), 202–211.

[53] Czerwonka, J. et al. 2015. Code reviews do not find bugs: How the current code review best practice slows us down. Proceedings of the 37th international conference on software engineering-volume 2 (2015), 27–28.

[75] Gousios, G. et al. 2014. An exploratory study of the pull-based software development model. Proceedings of the 36th international conference on software engineering (2014), 345–355.

[131] McIntosh, S. et al. 2014. The impact of code review coverage and code review participation on software quality: A case study of the qt, vtk, and itk projects. Proceedings of the 11th working conference on mining software repositories (2014), 192–201.

[171] Shimagaki, J. et al. 2016. A study of the quality-impacting practices of modern code review at sony mobile. Software engineering companion (icse-c), ieee/acm international conference on (2016), 212–221.

[181] Thongtanunam, P. et al. 2017. Review participation in modern code review. Empirical Software Engineering. 22, 2 (2017), 768–817.

[183] Thongtanunam, P. et al. 2015. Who should review my code? A file location-based code-reviewer recommendation approach for modern code review. Software analysis, evolution and reengineering (saner), 2015 ieee 22nd international conference on (2015), 141–150.

[198] Zanjani, M.B. et al. 2016. Automatically recommending peer reviewers in modern code review. IEEE Transactions on Software Engineering. 42, 6 (2016), 530–543.

[29] Bird, C. et al. 2015. Lessons learned from building and deploying a code review analytics platform. Proceedings of the 12th working conference on mining software repositories (2015), 191–201.

[184] Thongtanunam, P. et al. 2014. Reda: A web-based visualization tool for analyzing modern code review dataset. Software maintenance and evolution (icsme), 2014 ieee international conference on (2014), 605–608.

[195] Yang, X. et al. 2016. Mining the modern code review repositories: A dataset of people, process and product. Proceedings of the 13th international conference on mining software repositories (2016), 460–463.

[182] Thongtanunam, P. et al. 2016. Revisiting code ownership and its relationship with software quality in the scope of modern code review. Proceedings of the 38th international conference on software engineering (2016), 1039–1050.

[15] Baum, T. et al. 2017. The choice of code review process: A survey on the state of the practice. International conference on product-focused software process improvement (2017), 111–127.

[16] Baum, T. et al. 2016. A faceted classification scheme for change-based industrial code review processes. Software quality, reliability and security (qrs), 2016 ieee international conference on (2016), 74–85.