Appendix

A.1 Appendix to Chapter 3 (Build Analytics)

Table 10.1: Selected papers
Paper with reference	Source	RQ	Notes
1. Bird et al. 2017 [28]	Initial seed	RQ1	1
2. Beller et al. 2017 [24]	Initial seed	RQ3	-
3. Rausch et al. 2017 [160]	Initial seed	RQ2	-
4. Beller et al. TravisTorrent 2017 [25]	Initial seed	RQ2	-
5. Pinto et al. 2018 [149]	Initial seed	RQ3	-
6. Zhao et al. 2017 [200]	Initial seed	RQ3	2
7. Widder et al. 2018 [193]	Initial seed	RQ1	-
8. Hilton et al. 2016 [86]	Initial seed	RQ2	-
9. Vassallo et al. 2017 [188]	Ref 2	-	3
11. Hassan and Wang 2018 [82]	Ref 4	RQ1	-
12. Vassallo et al. 2018 [187]	Ref 2,3	RQ1, RQ3	-
13. Zampetti et al. 2017 [197]	Ref by 12	-	3
14. Baltes et al. 2018 [12]	GScholar Search	RQ1, RQ3	4
15. Bisong et al. 2017 [30]	GScholar Search	RQ1, RQ3	5
16. Santolucito et al. 2018 [166]	GScholar Search	RQ1	4
17. Ni and Li 2018 [140]	GScholar Search	RQ1	6
18. Fowler and Foemmel 2006 [68]	GScholar Search	RQ2	7
19. Stolberg 2009 [176]	GScholar Search	RQ2	7
20. Vasilescu et al. 2014 [186]	GScholar Search	RQ2	7

Notes

US patent owned by Microsoft.
Collaboration between universities in China, The Netherlands and The USA.
Not included in this survey as it did not introduce a new technique or practice.
Using search term “Github Continuous Integration”.
Using search term “Predicting build time”
Using search term “Predicting build failures”
Using search term “Current practices in Continuous Integration”

A.2 Appendix to Chapter 5 (Ecosystem Analytics)

Selected	Author(s)	Title	Year	Keywords
-	Abate, Di Cosmo, Boender, Zacchiroli [2]	Strong dependencies between software components	2009
-	Abate, Di Cosmo [1]	Predicting upgrade failures using dependency analysis	2011
+	Abdalkareem, Nourry, Wehaibi, Mujahid, Shihab [3]	Why do developers use trivial packages? An empirical case study on NPM	2017	JavaScript; Node.js; Code Reuse; Empirical Studies
+	Bogart, Kästner, Herbsleb, Thung [32]	How to break an api: Cost negotiation and community values in three software ecosystem	2016	Software ecosystems; Dependency management; semantic versioning; Collaboration; Qualitative research
+	Claes, Mens, DI Cosmo, Vouillon [45]	A historical analysis of Debian package incompatibilities	2015	debian, conflict, empirical, analysis, software, evolution, distribution, package, dependency, maintenance
+	Constantinou, Mens [47]	An empirical comparison of developer retention in the RubyGems and NPM software ecosystems	2017	Software ecosystem, Socio-technical interaction, Software evolution, Empirical analysis, Survival analysis
+	Hejderup, van Deursen, Gousios [84]	Software Ecosystem Call Graph for Dependency Management	2018
+	Kikas, Gousios, Dumas, Pfahl [99]	Structure and evolution of package dependency networks	2017
+	Kula, German, Ouni, Ishio, Inoue [106]	Do developers update their library dependencies?	2017	Software reuse, Software maintenance, Security vulnerabilities
-	Mens, Claes, Grosjean, Serebrenik [132]	Studying Evolving Software Ecosystems based on Ecological Models	2013	Coral Reef, Natural Ecosystem, Open Source Software, Ecological Model, Software Project
+	Raemaekers, van Deursen, Visser [156]	Semantic versioning and impact of breaking changes in the Maven repository	2017	Semantic versioning, Breaking changes, Software libraries
+	Robbes, Lungu, Röthlisberger [161]	How do developers react to API deprecation? The case of a smalltalk ecosystem	2012	Ecosystems, Mining Software Repositories, Empirical Studies
+	Trockman [185]	Adding sparkle to social coding: An empirical study of repository badges in the npm ecosystem	2018

Table 4.1: Papers provided by MSc. Joseph Hejderup. The first column describes whether the paper of the row will be used. A ‘+’ means it will be used, a ‘-’ means it will not.

Paper Reference	Reason not selected
[2]	This paper seems to delve more into one software project itself whereas we are more interested in the relationship between different software projects
[1]	Similarly to [2], we are more interested in the relationship between different software projects
[132]	We were in doubt over this one, it could be useful but we weren’t convinced that it was. Since we already had a lot of material we decided to not use this

Table 4.2: Papers from the initial seed that were not selected for the literature survey, along with a specification of the reason why this is the case.

Author(s)	Title	Year	Keywords	Query Used
Decan, Mens, Grosjean [55]	An empirical comparison of dependency network evolution in seven software packaging ecosystems	2018	Software repository mining, Software ecosystem, Package manager, Dependency network, Software evolution	“software ecosystems” AND “empirical analysis”
[61]	Software engineering beyond the project – Sustaining software ecosystems	2014		engineering software ecosystems
Hora, Robbes, Valente, Anquetil, Etien, Ducasse [87]	How do developers react to API evolution? A large-scale empirical study	2016	API evolution, API deprecation, Software ecosystem, Empirical study	“software ecosystem” AND “empirical”
Izquierdo, Gonzalez-Barahona, Kurth, Robles [90]	Software Development Analytics for Xen: Why and How	2018	Companies, Ecosystems, Software, Measurement, Object recognition, Monitoring, Virtualization	software ecosystem analytics
Jansen [91]	Measuring the Health of Open Source Software Ecosystems: Beyond the Scope of Project Health	2014		“open source software ecosystems”
Kula, German, Ishio, Inoue [105]	An exploratory study on library aging by monitoring client usage in a software ecosystem	2017		“software ecosystem” AND “analysis”
Malloy, Power [119]	An empirical analysis of the transition from Python 2 to Python 3	2018	Python programming, Programming language evolution, Compliance	“software ecosystem” AND “empirical”
Manikas [121]	Revisiting software ecosystems Research: A longitudinal literature study	2016	Software ecosystems; Longitudinal literature study; Software ecosystem maturity	“Software ecosystems” OR “Dependency management” OR “semantic version”
Rajlich [159]	Software evolution and maintenance	2014		Software Evolution and Maintenance
Teixeira, Robles, Gonzalez-Barahona [178]	Lessons learned from applying social network analysis on an industrial Free/Libre/Open Source Software Ecosystem	2015	Social network analysis Open source Open-coopetition Software ecosystems Business models Homophily Cloud computing OpenStack	“software ecosystem analytics”

Table 4.3: Papers selected from searches using Google Scholar. The column “Query Used” describes which of the queries is used to retrieve the paper.

Author(s)	Title	Year	Keywords	Referenced In
Bavota, Canfora, Di Penta, Oliveto, Panichella [17]	How the Apache community upgrades dependencies: an evolutionary study	2014	Software Ecosystems · Project dependency upgrades · Mining software repositories	[106]
Blincoe, Harrison, Damian [31]	Ecosystems in GitHub and a method for ecosystem identification using reference coupling.	2015	Referene Coupling, Ecosystems, Technical Dependencies, GitHub, cross-reference	[47]
Cox, Bouwers, Eekelen, Visser [50]	Measuring Dependency Freshness in Software Systems	2015	software metrics, software maintenance	[99]
Decan, Mens, Claes [54]	An empirical comparison of dependency issues in OSS packaging ecosystems	2017		[3], [47], [55]
Dietrich, Jezek, Brada [60]	Broken Promises - An Empirical Study into Evolution Problems in Java Programs Caused by Library Upgrades	2014		[156]
Malloy, Power [120]	Quantifying the transition from Python 2 to 3: an empirical study of Python applications.	2017	Python, programming language evolution, language features	[119]
McDonnell, Ray, Kim [126]	An empirical study of api stability and adoption in the android ecosystem	2013		[121]

Table 4.4: Papers selected which are referenced in previously selected papers. The column “Referenced In” describes in which selected paper the paper is referenced.

Reference	Explored topic(s)	Research method(s)	Tool(s)	Dataset(s)	Ecosystem(s)	Conclusion
[3]	Use of trivial packages	Quantitative, Qualitative (Statistics over data, survey)	-	npm, GitHub	npm	Used because it is assumed to be well implemented and tested (only 45% actually has tests) and increases productivity. Quantitative research has shown that 10% of NodeJS uses trivial packages, where 16.8% are trivial packages in npm
[17]	Upgrading of dependencies	Quantitative, Qualitative (Statistics, Looking through mailing lists)	-	Apache	Apache	Upgrade done when bugfixes, but no API changes
[32]	Attitude towards breaking changes for different ecosystems	Qualitative (Interviews)	-	-	Eclipse, CRAM, npm	There are numerous ways of dealing with breaking changes and ecosystems play an essential role in the chosen way.
[45]	Debian Package Incompatibilities	Quantitative	-	Debian i386 testing / stable	Debian	-
[50]	Metric for dependency freshness of a system	Qualitative / Quantitative (Statistics, Interviews)	-	Maven	Maven	Metric has been found and verified with Maven
[54]	Dependency Issues in OSS Packaging Ecosystems	Quantitative analysis (Survival analysis, statistics)	-	Eclipse, CRAM, npm	Eclipse, CRAM, npm	In all ecosystems, dependency updates result in issues, however the problems and solutions do vary.
[99]	Package dependency networks	Quantitative (Statistics)	-	npm, RubyGems, Crates.io	npm, RubyGems, Crates.io	All ecosystems are growing and over time, ecosystems become less dependent on a single popular package.
[161]	Developers response to API deprecation	Quantitative (Statistics)	-	Squeak, Pharo	Squeak, Pharo	API changes can have a large impact on ecosystem. Projects take a long time to adapt to an API change.
[55]	Quantitative empirical analysis of differences and similarities between the evolution of 7 varying ecosystems	Survival analysis	-	libraries.io	Cargo, CPAN, CRAN, npm, NuGet, Packagist, RubyGems	Package updates, which may cause dependent package failures, are done on average every few months. Many packages in the analyzed package dependency networks were found to have a high number of transitive reverse dependencies, implying that package failures can affect a large number of other packages in the ecosystem.
[61]	The article provides a holistic understanding of the observed and reported practices as a starting point to device specific support for the development in software ecosystems	Qualitative interview study	-	-	-	The main contribution of this article is the presentation of common features of product development and evolution in four companies. Although size, kind of software and business models differ
[90]	Code review analysis	Virtualization of process	-	Xen Github data	Xen	Analysis of code review has lead to more reviews and a more thoughtful and participatory review process. Also providing accommodations for new software developers on OSS by easy access is very important.

Table 4.5: Papers and findings for RQ1.

Reference	Explored topic(s)	Research method(s)	Dataset(s)	Ecosystem(s)	Conclusion
[87]	Exploratory study aimed at observing API evolution and its impact	Empirical study	3600 distinct systems	Pharo	After API changes, clients need time to react and rarely react at all. Replacements cannot be resolved in a uniform manner throughout the ecosystem. API changes and depreciation can present different characteristics.
[106]	An Empirical Study on the Impact of Security Advisories on Library Migration	Empirical study	4,600 GitHub software projects and 2,700 library dependencies	Github, Maven	Currently, developers do not actively update their libraries, leading to security risks.
[47]	Empirical comparison of developer retention in the RubyGems and NPM software ecosystems	Measurement of frequency and intensity of activity and retention	Github	NPM and RubyGems	Developers are more likely to abandon an ecosystem when they: 1) do not communicate with other developers, 2) do not have strong social and technical activity intensity, 3) communicate or commit less frequently and 4) do not communicate or commit for a longer period of time.
[156]	To what degree do versioning schemes provide information signals about breaking changes	Snapshot analysis	More than 100.000 jar files from Maven Central	Maven	1) Minor or major does not matter, both about 33% breaking chances, 2) breaking changes have significant impact and need fix before an upgrade, 3) Bigger libraries introduce more breaking changes, maturity is not a factor
[119]	Analysis of the transition from Python 2 to Python 3	Empirical impact study	51 applications in the Qualitas suite for Python	Python	Most Python developers choose to maintain compatibility with Python 2 and 3, thereby only using a subset of the Python 3 language and not using the new features but instead limiting themselves to a language that is no longer under active development.
[60]	Java	Empirical study into evolution problems caused by library upgrades	Qualitas corpus (Java OSS)	Java	There are currently a lot of problems caused, but some relatively minor changes to developments tools and the language could be very effective.

Table 4.6: Papers and findings for RQ2.

Reference	Open Challenges Found
[3]	Examine relationship between team experience and project maturity and usage of trivial packages
[3]	Compare use of code snippets on Q&A sites and trivial packages
[3]	How to manage and help developers choose the best packages
[55]	Findings for one ecosystem cannot necessarily be generalized to another
[55]	Transitive dependencies are very frequent, meaning that package failures can affect a large number of other packages in the ecosystem
[91]	Determining the health of a system from an ecosystem perspective instead of project level is needed to determine which systems to use. This paper provides an initial approach but a lot more research could and should be done to determine system health.

Table 4.7: Papers and findings for RQ1.

A.3 Appendix to Chapter 6 (Release Engineering Analytics)

This appendix contains sections that were part of our process during this literature survey. They are not directly needed to answer the research questions, but are still relevant in order to validate our survey. This Appendix contains our project timetable, the research protocol in full detail, and the raw extracted data from the selected studies.

A.3.1 Project Timetable

The literature review was conducted over the course of four weeks. We worked iteratively and planned for four weekly milestones.

Milestone	Deadline	Goals
1	16/9/18	Develop the search strategy Collect initial publications
2	23/9/18	Write full research protocol
3	30/9/18	Collect additional literature according to the protocol Perform data extraction
4	7/10/18	Perform data synthesis Write final version of the chapter

A.3.2 Research Protocol

In this appendix, we will describe in detail how we applied the protocol for performing systematic literature reviews by Kitchenham [104]. In order, we will go over the search strategy, study selection, study quality assessment, and data extraction. The last subsection will list which studies we included in this review and which we have found, but excluded from the review for a specific reason.

A.3.2.1 Search Strategy

Since release engineering is a relatively new research topic, we took an exploratory approach in collecting any literature revolving around the topic of release engineering from the perspective of software analytics. This aided us to determine a more narrow scope for our survey, subsequently to allow us to find additional literature to fit this scope.

At the start of this project, we were provided with an initial seed of five papers as a starting point for our literature survey [4, 48, 49, 97, 98].

We collected other publications using two search engines: Scopus and Google Scholar. Each of the two search engines comprises several databases such as ACM Digital Library, Springer, IEEE Xplore and ScienceDirect. The main query that we constructed is displayed in Figure 1. The publications found using this query were:

[95]
[96]
[37]
[94]
[44]
[69]
[172]
[108]
[63]

TITLE-ABS-KEY(
  (
    "continuous release" OR "rapid release" OR "frequent release"
    OR "quick release" OR "speedy release" OR "accelerated release"
    OR "agile release" OR "short release" OR "shorter release"
    OR "lightning release" OR "brisk release" OR "hasty release"
    OR "compressed release" OR "release length" OR "release size"
    OR "release cadence" OR "release frequency"
    OR "continuous delivery" OR "rapid delivery" OR "frequent delivery"
    OR "fast delivery" OR "quick delivery" OR "speedy delivery"
    OR "accelerated delivery" OR "agile delivery" OR "short delivery"
    OR "lightning delivery" OR "brisk delivery" OR "hasty delivery"
    OR "compressed delivery" OR "delivery length" OR "delivery size"
    OR "delivery cadence" OR "continuous deployment" OR "rapid deployment"
    OR "frequent deployment" OR "fast deployment" OR "quick deployment"
    OR "speedy deployment" OR "accelerated deployment" OR "agile deployment"
    OR "short deployment" OR "lightning deployment" OR "brisk deployment"
    OR "hasty deployment" OR "compressed deployment" OR "deployment length"
    OR "deployment size" OR "deployment cadence"
  ) AND (
    "release schedule" OR "release management" OR "release engineering"
    OR "release cycle" OR "release pipeline" OR "release process"
    OR "release model" OR "release strategy" OR "release strategies"
    OR "release infrastructure"
  )
  AND software
) AND (
  LIMIT-TO(SUBJAREA, "COMP") OR LIMIT-TO(SUBJAREA, "ENGI")
)
AND PUBYEAR AFT 2014

Figure 1. Query used for retrieving release engineering publications via Scopus.

In addition to querying search engines as described above, references related to retrieved papers were analyzed. These reference lists were obtained from Google Scholar and from the References section in the papers themselves. We selected all papers on release engineering that are citing or being cited by the initial set of papers. Using this approach, we have found six additional papers. The results of the reference analysis are listed in Table 1.

Table 1. Papers found indirectly by investigating citations of/by other papers.

Starting point	Type	Result
[172]	has cited	[153] [125]
[97]	is cited by	[154] [177]
[125]	is cited by	[163] [41]
[108]	has cited	[107]

All the papers that were found, were stored in a custom-built web-based tool for conducting literature reviews. The source code of this tool is published in a GitHub repository.²⁸ The tool was hosted on a virtual private server, such that all retrieved publications were stored centrally, accessible to all reviewers.

A.3.2.2 Study Selection

We selected the studies that we wanted to include in the survey with aid of the aforementioned tool for storing the papers. In this tool, it is possible to label papers with tags and leave comments and ratings. Every paper is reviewed based on the inclusion and exclusion criteria. Based on this, the tool allowed to filter out all papers that appeared not to be relevant for this literature survey.

We only used one exclusion criteria: studies that are published before 2014, will not be included in our survey (this is enforced by our search query). The inclusion criteria are as follows:

The study must show (at least) one release engineering technique.
The study must not just show a release engineering technique, but analyze its performance compared to other techniques.

The last subsection of this appendix lists which studies were selected and which were discareded.

A.3.2.3 Study Quality Assessment

Based on Kitchenham [104], the quality of a paper will be assessed by the evidence it provides, based on the following scale. All levels of quality in this scale will be accepted, except for level 5 (evidence obtained from expert opinion).

Evidence obtained from at least one properly-designed randomised controlled trial.
Evidence obtained from well-designed pseudo-randomised controlled trials (i.e. non-random allocation to treatment).
Comparative studies in a real-world setting:
1. Evidence obtained from comparative studies with concurrent controls and allocation not randomised, cohort studies, case-control studies or interrupted time series with a control group.
2. Evidence obtained from comparative studies with historical control, two or more single arm studies, or interrupted time series without a parallel control group.
Experiments in artificial settings:
1. Evidence obtained from a randomised experiment performed in an artificial setting.
2. Evidence obtained from case series, either post-test or pre-test/post-test.
3. Evidence obtained from a quasi-random experiment performed in an artificial setting.
Evidence obtained from expert opinion based on theory or consensus.

Also, the studies will be examined to see if they contain any type of bias. For this, the same types of biases will be used as described by Kitchenham@kitchenham2004procedures:

Selection/Allocation bias: Systematic difference between comparison groups with respect to treatment.
Performance bias: Systematic difference is the conduct of comparison groups apart from the treatment being evaluated.
Measurement/Detection bias: Systematic difference between the groups in how outcomes are ascertained.
Attrition/Exclusion bias: Systematic differences between comparison groups in terms of withdrawals or exclusions of participants from the study sample.

The studies will be labeled by their quality level and possible biases. This information can be used during the Data Synthesis phase to weigh the importance of individual studies [104].

A.3.2.4 Data Extraction

To accurately capture the information contributed by each publication in our survey, we will use a systematic approach to extracting data. To guide this process, we will be using a data extraction form which describes what aspects of a publication are crucial to record. Besides general publication information (title, author etc.), the form contains questions that are based on our defined research questions. Furthermore, the form contains a section for quantitative research, where aspects such as population and evaluation will be documented. The form that is used for this is shown below:

General information:

- Name of person extracting data:
- Date form completed (dd/mm/yyyy):
- Publication title:
- Author information:
- Publication type:
- Conference/Journal:
- Type of study:

What practices in release engineering does this publication mention?

Are these practices to be classified under dated, state of the art or state of
the practice? Why?

What open challenges in release engineering does this publication mention?

What research gaps does this publication contain?

Are these research gaps filled by any other publications in this survey?

Quantitative research publications:

- Study start date:
- Study end date or duration:
- Population description:
- Method(s) of recruitment of participants:
- Sample size:
- Evaluation/measurement description:
- Outcomes:
- Limitations:
- Future research:

Notes:

A.3.2.5 Data Synthesis

To summarize the contributions and limitations of each of the included publications, we will apply a descriptive synthesis approach. In this part of our survey, we will compare the data that was extracted of the included publications. Publications with similar findings will be grouped and evaluated, and differences between groups of publications will be structured and elaborated on. In this we will compare them using specifics such as their study types, time of publication and study quality.

If the extracted data allows for a structured tabular visualization of similarities and differences between publications this we serve as an additional form of synthesis. However, this depends on the final included publications of this survey.

A.3.2.6 Included and Excluded Studies

Included:

[4]
[37]
[41]
[44]
[48]
[49]
[63]
[69]
[94]
[96]
[97]
[107]
[108]
[125]
[153]
[154]
[163]
[172]
[177]

Excluded:

[98] has been excluded, because it presents the same results as [97], while the latter is more extensive because it is a journal article instead of a conference article.
[95] has been excluded, because we could not obtain the actual paper since it has not yet been officially released.

A.3.3 Raw Extracted Data

A.3.3.1 Understanding the impact of rapid releases on software quality – The Case of Firefox

Reference: [97]

General information:

Name of person extracting data: Maarten Sijm
Date form completed: 27-09-2018
Author information: Foutse Khomh, Bram Adams, Tejinder Dhaliwal, Ying Zou
Publication type: Paper in Conference Proceedings
Conference: Mining Software Repositories (MSR)
Type of study: Quantitative, empirical case study

What practices in release engineering does this publication mention?

Changing from traditional to rapid release cycles in Mozilla Firefox

Are these practices to be classified under dated, state of the art or state of the practice? Why?

State of the practice, because they study Firefox and Firefox is still using rapid release cycles. However, it is dated because the data is six years old.

What open challenges in release engineering does this publication mention?

More case studies are needed

What research gaps does this publication contain?

More case studies are needed

Are these research gaps filled by any other publications in this survey?

Quantitative research publications:

Study start date: 01-01-2010 (Firefox 3.6)
Study end date or duration: 20-12-2011 (Firefox 9.0)
Population description: Mozilla Wiki, VCS, Crash Repository, Bug Repository
Method(s) of recruitment of participants: N/A (case study)
Sample size: 25 alpha versions, 25 beta versions, 29 minor versions and 7 major versions. Amount of bugs/commits/etc. is not specified.
Evaluation/measurement description: Wilcoxon rank sum test
Outcomes:
- With shorter release cycles, users do not experience significantly more post-release bugs
- Bugs are fixed faster
- Users experience these bugs earlier during software execution (the program crashes earlier)
Limitations: Results are specific to Firefox
Future research: More case studies are needed

A.3.3.2 On the influence of release engineering on software reputation

Reference: [153]

General information:

Name of person extracting data: Maarten Sijm
Date form completed: 27-09-2018
Author information: Christian Plewnia, Andrej Dyck, Horst Lichter
Publication type: Paper in Conference Proceedings
Conference: 2nd International Workshop on Release Engineering
Type of study: Quantitative, empirical case study on multiple software

What practices in release engineering does this publication mention?

Rapid releases

Are these practices to be classified under dated, state of the art or state of the practice? Why?

Dated practice, data is from before 2014

What open challenges in release engineering does this publication mention?

Identifying software reputation can better be done using a qualitative study.

What research gaps does this publication contain?

Identifying software reputation can better be done using a qualitative study.

Are these research gaps filled by any other publications in this survey?

Quantitative research publications:

Study start date: Q3 2008
Study end date or duration: Q4 2013
Population description: Chrome, Firefox, Internet Explorer
Method(s) of recruitment of participants: N/A (case study)
Sample size: 3 browsers
Evaluation/measurement description: No statistical analysis, just presenting market share results
Outcomes:
- Chrome’s market share increased after adopting rapid releases
- Firefox’s market share decreased after adopting rapid releases
- IE’s market share decreased
Limitations:
- Identifying software reputation can better be done using a qualitative study.
Future research:
- Identifying software reputation can better be done using a qualitative study.

A.3.3.3 On rapid releases and software testing: a case study and a semi-systematic literature review

Reference: [125]

General information:

Name of person extracting data: Maarten Sijm
Date form completed: 28-09-2018
Author information: Mäntylä, Mika V. and Adams, Bram and Khomh, Foutse and Engström, Emelie and Petersen, Kai
Publication type: Journal/Magazine Article
Journal: Empirical Software Engineering
Type of study: Empirical case study and semi-systematic literature review

What practices in release engineering does this publication mention?

Impact of rapid releases on testing effort

Are these practices to be classified under dated, state of the art or state of the practice? Why?

State of the practice for the case study
State of the art for the literature review

What open challenges in release engineering does this publication mention?

Future work should focus on empirical studies of these factors that complement the existing qualitative observations and perceptions of rapid releases.

What research gaps does this publication contain?

See open challenges

Are these research gaps filled by any other publications in this survey?

Quantitative research publications:

Study start date: June 2006 (Firefox 2.0)
Study end date or duration: June 2012 (Firefox 13.0)
Population description: System-level test execution data
Method(s) of recruitment of participants: N/A (case study)
Sample size: 1,547 unique test cases, 312,502 executions, performed by 6,058 individuals on 2,009 software builds, 22 OS versions and 78 locales.
Evaluation/measurement description: Wilcoxon rank-sum test, Cliff’s delta, Cohen’s Kappa for Firefox Research Question (FF-RQ) 5.
Outcomes (FF-RQs; RR = rapid release; TR = traditional release):
1. RRs perform more test executions per day, but these tests focus on a smaller subset of the test case corpus.
2. RRs have less testers, but they have a higher workload.
3. RRs test fewer, but larger builds.
4. RRs test fewer platforms in total, but test each supported platform more thoroughly.
5. RRs have higher similarity of test suites and testers within a release series than TRs had.
6. RR testing happens closer to the release date and is more continuous, yet these findings were not confirmed by the QA engineer.
Limitations:
- Study measures correlation, not causation
- Not generalizable, as it is a case study on FF
Future research: More empirical studies

Semi-systematic literature survey:

Study date: Unknown (before 2015)
Population description: Papers with main focus on:
- Rapid Releases (RRs)
- Aspect of software engineering largely impacted by RRs
- An agile, lean or open source process having results of RRs
- Excluding: opinion papers without empirical data on RRs
Method(s) of recruitment of participants: Scopus queries
Sample size: 24 papers
Outcomes:
- Evidence is scarce. Often RRs are implemented as part of agile adoption. This makes it difficult to separate the impact of RRs from other process changes.
- Originates from several software development paradigms: Agile, FOSS, Lean, internet-speed software development
- Prevalence
  - Practiced in many software engineering domains, not just web applications
  - Between 23% and 83% of practitioners do RRs
- (Perceived) Problems:
  - Increased technical debt
  - RRs are in conflict with high reliability and high test coverage
  - Customers might be dipleased with RRs (many updates)
  - Time-pressure / Deadline oriented work
- (Perceived) Benefits:
  - Rapid feedback leading to increased quality focus of the devs and testers
  - Easier monitoring of progress and quality
  - Customer satisfaction
  - Shorter time-to-market
  - Continuous work / testing
- Enablers:
  - Sequential development where multiple releases are under work simultaneously
  - Tools for automated testing and efficient deployment
  - Involvement of product management and productive customers
Limitations:
- Not all papers that present results about RRs, have “rapid release” mentioned in the abstract.
Future research:
- Systematically search for agile and lean adoption papers

A.3.3.4 Release management in free and open source software ecosystems

Reference: [154]

General information:

Name of person extracting data: Maarten Sijm
Date form completed: 28-09-2018
Author information: Germán Poo-Caamaño
Publication type: PhD Thesis
Type of study: Empirical case study on two large-scale FOSSs: GNOME and OpenStack

What practices in release engineering does this publication mention?

Communication in release engineering

Are these practices to be classified under dated, state of the art or state of the practice? Why?

State of the practice, because case study

What open challenges in release engineering does this publication mention?

Is the ecosystem [around the studied software] shrinking or expanding?
How have communications in the ecosystem changed over time?

What research gaps does this publication contain?

More case studies are needed

Are these research gaps filled by any other publications in this survey?

Quantitative research publications (GNOME):

Study start date: January 2009 (GNOME 2.x)
Study end date or duration: August 2011 (GNOME 3.x)
Population description: Mailing lists
Method(s) of recruitment of participants: GNOME’s website recommends this channel of communication. IRC is also recommended, but its history is not stored.
Sample size: 285 mailing lists, 6947 messages, grouped into 945 discussions.
Evaluation/measurement description: Counting
Outcomes:
- Developers also communicate via blogs, bug trackers, conferences, and hackfests.
- The Release Team has direct contact with almost all participants in the mailing list
- The tasks of the Release Team:
  - defining requirements of GNOME releases
  - coordinating and communicating with projects and teams
  - shipping a release within defined quality and time specifications
- Major challenges of the Release Team:
  - coordinate projects and teams of volunteers without direct power over them
  - keep the build process manageable
  - monitor for unplanned changes
  - monitor for changes during the stabilization phase
  - test the GNOME release
Limitations:
- Only mailing list was investigated, other channels were not
- Possible subjective bias in manually categorizing email subjects
- Not very generalizable, as it’s just one case study
Future research:
- Fix the limitations

Quantitative research publications (OpenStack):

Study start date: May 2012
Study end date or duration: July 2014
Population description: Mailing lists
Method(s) of recruitment of participants: Found on OpenStack’s website
Sample size: 47 mailing lists, 24,643 messages, grouped into 7,650 discussions. Filtered data: 14,486 messages grouped into 2,682 discussions.
Evaluation/measurement description: Counting
Outcomes:
- Developers communicate via email, blogs, launchpad, wiki, gerrit, face-to-face, IRC, video-conferences, and etherpad.
- Project Team Leaders and the Release Team members are the key players in the communication and coordination across projects in the context of release management
- The tasks for the Release Team and Project Team Leaders:
  - defining the requirements of an OpenStack release
  - coordinating and communicating with projects and teams to reach the objectives of each milestone
  - coordinating feature freeze exceptions at the end of a release
  - shipping a release within defined quality and time specifications
- Major challenges of these teams:
  - coordinate projects and teams without direct power over them
  - keep everyone informed and engaged
  - decide what becomes part of of the integrated release
  - monitor changes
  - set priorities in cross-project coordination
  - overcome limitations of the communication infrastructure
Limitations:
- Only studies mailing list, to compare with GNOME case study
- Possible subjective bias in manually categorizing email subjects
- Not very generalizable, as it’s just one case study
Future research:
- Fix the limitations

Notes:

Since there are two case studies, the results become a bit more generalizable
The author set up a theory that encapsulates the communication and coordination regarding release management in FOSS ecosystems, and can be summarized as:
1. The size and complexity of the integrated product is constrained by the release managers capacity
2. The release management should reach the whole ecosystem to increase awareness and participation
3. The release managers need social and technical skills

A.3.3.5 Release Early, Release Often and Release on Time. An Empirical Case Study of Release Management

Reference: [177]

General information:

Name of person extracting data: Maarten Sijm
Date form completed: 28-09-2018
Author information: Jose Teixeira
Publication type: Paper in Conference Proceedings
Conference: Open Source Systems: Towards Robust Practices
Type of study: Empirical case study

What practices in release engineering does this publication mention?

Shifting towards rapid releases in OpenStack

Are these practices to be classified under dated, state of the art or state of the practice? Why?

State of the practice, because it is a recent case study on OpenStack

What open challenges in release engineering does this publication mention?

More case studies are needed.

What research gaps does this publication contain?

More case studies are needed.

Are these research gaps filled by any other publications in this survey?

Quantitative research publications:

Study start date: Not specified
Study end date or duration: Not specified
Population description: Websites and blogs
Method(s) of recruitment of participants: Random clicking through OpenStack websites
Sample size: Not specified
Evaluation/measurement description: Not specified
Outcomes:
- OpenStack releases in a cycle of six months
- The release management process is a hybrid of feature-based and time-based
- Having a time-based release strategy is a challenging coopearative task involving multiple people and technology
Limitations:
- Study is not completed yet, these are preliminary results
Future research:
- Not indicated

A.3.3.6 Kanbanize the release engineering process

Reference: [96]

General information:

Name of person extracting data: Jesse Tilro
Date form completed: 29-09-2018
Author information: Kerzazi, N. and Robillard, P.N.
Publication type: Paper in Conference Proceedings
Journal: 2013 1st International Workshop on Release Engineering, RELENG 2013 - Proceedings
Type of study: Action research

What practices in release engineering does this publication mention?

Following principles of the Kanban agile software development life-cycle model that implicitly describe the release process
(Switching to) more frequent (daily) release cycles
(Transitioning to) a structured release process

Are these practices to be classified under dated, state of the art or state of the practice? Why?

Either dated or state of the practice, not sure. Would have to do some additional research on the adoption of Kanban

What open challenges in release engineering does this publication mention?

Release effectiveness: minimize system failure and customer impact
Problems with releasing encountered in practice

What research gaps does this publication contain?

Are these research gaps filled by any other publications in this survey?

Quantitative research publications:

Study start date:
Study end date or duration:
Population description:
Method(s) of recruitment of participants:
Sample size:
Evaluation/measurement description:
Outcomes:
Limitations:
Future research:

Notes:

A.3.3.7 Is it safe to uplift this patch? An empirical study on mozilla firefox

Reference: [37]

General information:

Name of person extracting data: Jesse Tilro
Date form completed: 29-09-2018
Author information: Castelluccio, M. and An, L. and Khomh, F.
Publication type: Paper in Conference Proceedings
Journal: Proceedings - 2017 IEEE International Conference on Software Maintenance and Evolution, ICSME 2017
Type of study: Case study, both quantitative (data analysis) and qualitative (interviews)

What practices in release engineering does this publication mention?

Patch uplift (meaning the promotion of patches from development directly to a stabilization channel, potentially skipping several channels)

Are these practices to be classified under dated, state of the art or state of the practice? Why?

State of the practice: case study of what is being done in the field, quite recently (2017).

What open challenges in release engineering does this publication mention?

Exploring possibilities to leverage this research by building classifiers capable of automatically assessing the risk associated with patch uplift candidates and recommend patches that can be uplifted safely.
Validate and extend results of this study for generalizability.

What research gaps does this publication contain?

Study aimed to fill two identified gaps identified in literature:
- How do urgent patches in rapid release models affect software quality (in terms of fault proneness)?
- How can the reliability of the integration of urgent patches be improved?

Are these research gaps filled by any other publications in this survey?

The paper itself

Quantitative research publications:

Study start date:
Study end date or duration:
Population description:
Method(s) of recruitment of participants:
Sample size:
Evaluation/measurement description:
Outcomes:
Limitations:
Future research:

Notes:

A.3.3.8 Systematic literature review on the impacts of agile release engineering practices

Reference: [94]

General information:

Name of person extracting data: Jesse Tilro
Date form completed: 29-09-2018
Author information: Karvonen, T. and Behutiye, W. and Oivo, M. and Kuvaja, P.
Publication type: Journal/Magazine Article
Journal: Information and Software Technology
Type of study: Systematic literature review

What practices in release engineering does this publication mention?

Agile release engineering (ARE) practices
- Continuous integration (CI)
- Continuous delivery (CD)
- Rapid Release (RR)
- Continuous deployment
- DevOps (similar to CD, congruent with release engineering practices)

Are these practices to be classified under dated, state of the art or state of the practice? Why?

State of the art, for it concerns a state of the art report and was published recently (2017).

What open challenges in release engineering does this publication mention?

Claims that modern release engineering practices allow for software to be delivered faster and cheaper should be further empirically validated.
This analysis could be extended with industry case studies, to develop a checklist for analyzing company and ecosystem readiness for continuous delivery and continuous deployment.
The comprehensive reporting of the context and how the practice is implemented instead of merely referring to usage of the practice should be considered by future research.
Different stakeholders’ points of view, such as customer perceptions regarding practices require further research.
Research on DevOps would be highly relevant for release engineering and the continuous software engineering research domain.
Future research on the impact of RE practices could benefit from more extensive use of quantitative methodologies from case studies, and the combination of quantitative with qualitative (e.g. interviews) methods.

What research gaps does this publication contain?

Refer to challenges

Are these research gaps filled by any other publications in this survey?

Quantitative research publications:

Study start date: N/A
Study end date or duration: N/A
Population description: N/A
Method(s) of recruitment of participants: N/A
Sample size: N/A
Evaluation/measurement description: N/A
Outcomes: N/A
Limitations: N/A
Future research: N/A

Notes:

A.3.3.9 Abnormal Working Hours: Effect of Rapid Releases and Implications to Work Content

Reference: [44]

General information:

Name of person extracting data: Jesse Tilro
Date form completed: 29-09-2018
Author information: Claes, M. and Mantyla, M. and Kuutila, M. and Adams, B.
Publication type: Paper in Conference Proceedings
Journal: IEEE International Working Conference on Mining Software Repositories
Type of study: Quantitative case study

What practices in release engineering does this publication mention?

Faster release cycles

Are these practices to be classified under dated, state of the art or state of the practice? Why?

What open challenges in release engineering does this publication mention?

Future research might further study the impact of time pressure and work patterns - indirectly release practices - on software developers.

What research gaps does this publication contain?

Are these research gaps filled by any other publications in this survey?

Quantitative research publications:

Study start date: first data item 2012-12-21
Study end date or duration: last data item 2016-01-03
Population description: N/A
Method(s) of recruitment of participants: N/A
Sample size: 145691 bug tracker contributors (1.8% timezone), 11.11 million comments (53% author with timezone)
Evaluation/measurement description: measure distributions on number of comments per day of the week and time of the day, before and after transition to rapid release cycles. Test distribution difference using Mann-Whitney U test and test effect size using Cohen’s d and Cliff’s delta. Also evaluate general development of number of comments, working day against weekend and day against night.
Outcomes:
1. Switching to rapid releases has reduced the amount of work performed outside of office hours. (Supported by results in psychology.)
2. Thus, rapid release cycles seem to have a positive effect on occupational health.
3. Comments posted during the weekend contained more technical terms.
4. Comments posted during weekdays contained more positive and polite vocabulary.
Limitations:
Future research:

Notes:

A.3.3.10 Does the release cycle of a library project influence when it is adopted by a client project?

Reference: [69]

General information:

Name of person extracting data: Jesse Tilro
Date form completed: 29-09-2018
Author information: Fujibayashi, D. and Ihara, A. and Suwa, H. and Kula, R.G. and Matsumoto, K.
Publication type: Paper in Conference Proceedings
Journal: SANER 2017 - 24th IEEE International Conference on Software Analysis, Evolution, and Reengineering
Type of study: Quantitative study

What practices in release engineering does this publication mention?

Rapid release cycles

Are these practices to be classified under dated, state of the art or state of the practice? Why?

State of the art and practice: practitioners currently practice it, researchers currently research it.

What open challenges in release engineering does this publication mention?

Gaining an understanding of the effect of a library’s release cycle on its adoption.

What research gaps does this publication contain?

First step towards solving the above challenge.

Are these research gaps filled by any other publications in this survey?

This paper

Quantitative research publications:

Study start date: 21-07-2016 (data extraction)
Study end date or duration:
Population description:
Method(s) of recruitment of participants:
Sample size: 23 libraries, 415 client projects
Evaluation/measurement description:
Scott-Knott test to group libraries with similar release cycle.
Outcomes:
1. There is a relationship between release cycle of a library project and the time for clients to adopt it: quicker release seems to be associated with quicker adoption.
Limitations:
- Small sample size
- Not controlled for many factors
- No statistical significance tests?
Future research:

Notes:

Very short, probably not very strong evidence, refer to limitations
Nice that the focus is libraries here, very interesting population because most studies focus on end-user targeting software systems

A.3.3.11 Rapid releases and patch backouts: A software analytics approach

Reference: [172]

General information:

Name of person extracting data: Jesse Tilro
Date form completed: 29-09-2018
Author information: Souza, R. and Chavez, C. and Bittencourt, R.A.
Publication type: Journal/Magazine Article
Journal: IEEE Software
Type of study: Quantitative case study (Mozilla Firefox)

What practices in release engineering does this publication mention?

Rapid release
Backing out of broken patches (patch backouts)
Stabilization channels / monitored integration repository

Are these practices to be classified under dated, state of the art or state of the practice? Why?

State of the practice (case study)

What open challenges in release engineering does this publication mention?

How rapid release cycles affect code integration, where patch backouts are a proxy for studying code integration
Integrate backout rate analysis in an analytics tool to provide release engineers with up-to-date information on the process

Quantitative research publications:

Study start date: first data item 30 june 2009
Study end date or duration: last data item 17 september 2013
Population description:
Method(s) of recruitment of participants:
Sample size: 43198 bug fixes, no further sample sizes of the raw data mentioned anywhere unfortunately. (Data from Mozilla Firefox project.)
Evaluation/measurement description: Associate commit log, bug reports and releases. Classify backouts. Measure rate of backouts against all fixed bugs, per month and per release strategy period. Test for statistical significance using Fisher’s exact test and Wilcoxon signed-rank test.
Outcomes:
1. Absolute numbers of bug fixes and backouts increased under rapid releases (probably the increase in regular contributors played a role, cannot conclude anything about workload.)
2. Backout rate increased under rapid releases (sheriff managed integration repositories may have increased the prevalence of backout culture)
3. Higher early backout rate and lower late backout rate indicate a shift towards earlier problem detection (proportion early from 57 to 88 %) The time-to-backout also dropped.
Limitations:
- Sample size not mentioned
- Quite trivial statistics
Future research:
- Integrate backout rate analysis in an analytics tool to provide release engineers with up-to-date information on the process

Interview triangulation

Explanations of quantitative outcomes:
- larger code base and more products -> more conflicts
- evolution of automated testing toolset -> earlier and more backouts
- sheriff managed integration repos -> earlier and more backouts
Explanations of impact
- cultural shift reduced testing efforts beforehand, and higher early backout rate eventually reduced the effort to integrate bug fixes for developers
- given the many stabilization channels and the rarity of very late backouts both in traditional and rapid release cycles, changes in backouts do not seem to influence users’ perception of quality (even though frequent update notifications and broken compatibilities caused upset users)

Notes:

Also reviews existing literature well.
Treats transitional period from traditional to rapid releases as a separate period.

A.3.3.12 Comparison of release engineering practices in a large mature company and a startup

Reference: [108]

General information:

Name of person extracting data: Jesse Tilro
Date form completed: 29-09-2018
Author information: Laukkanen, E. and Paasivaara, M. and Itkonen, J. and Lassenius, C.
Publication type: Journal/Magazine Article
Journal: Empirical Software Engineering
Type of study: Case study (2 cases)

What practices in release engineering does this publication mention?

Continuous Integration (mainly)
Code review
Internal Verification Scope
Domain Expert Testing
Testing with customers

Are these practices to be classified under dated, state of the art or state of the practice? Why?

What open challenges in release engineering does this publication mention?

The results in this study can be verified by additional case studies or or even surveys to close the of empirical research on release engineering

Quantitative research publications:

Study start date:
Study end date or duration:
Data acquisition period: 22 weeks (BigCorp) and 24 weeks (SmallCorp)
Population description:
Method(s) of recruitment of participants:
Sample size: 1889 builds (BigCorp) and 760 builds (SmallCorp)
Evaluation/measurement description:
Outcomes:
- High internal quality standards combined with the large distributed organizational context of BigCorp slowed the verification process down and therefore had a negative impact on release capability
- In SmallCorp, the lack of internal verification measures due to a lack of resources was mitigated by code review, disciplined CI and external verification by customers in customer environments. This allowed for fast release capability and gaining feedback from production.
- Variables
  - Multiple customers -> High quality standards
  - High quality standards -> Complex CI
  - High quality standards -> Slow Verification
  - Complex CI -> Undisciplined CI
  - Large distributed organization -> Undisciplined CI
  - Undisciplined CI -> Slow verification
  - Slow verification -> Slow release capability
Limitations:
- Only a case study, so difficult to generalize
Future research:

Notes:

Quantitative results triangulated with interviews

A.3.3.13 Modern Release Engineering in a Nutshell

Reference: [4]

General information:

Name of person extracting data: Nels Numan
Date form completed (dd/mm/yyyy): 28/09/2018
Publication title: Modern Release Engineering in a Nutshell
Author information: Bram Adams and Shane McIntosh
Journal: 23rd International Conference on Software Analysis, Evolution, and Reengineering (2016)
Publication type: Conference paper
Type of study: Survey

What practices in release engineering does this publication mention?

Branching and merging
- Software teams rely on Version Control Systems
- Quality assurance activities like code reviews are used before doing a merge or even allowing a code change to be committed into a branch
- Keep branches short-lived and merge often. If this is impossible, a rebase can be done.
- “trunk-based development” can be applied to eliminate most branches below the master branch.
- Feature toggles are used to provide isolation for new features in case of the absence of branches.
Building and testing
- To help assess build and test conflicts, many projects also provide “try” servers to development teams, which automatically runs a build and test process referred to as CI.
- The CI process often does not run full test, but a representative subset.
- The more intensive tests, such as integration, system or performance typically get run nightly or in weekends.
Build system:
- GNU Make is the most popular file-based build system technology. Ant is the prototypical task-based build system technology. Lifecycle-based build technologies like Maven consider the build system of a project to have a sequence of standard build activities that together form a “build lifecycle.”
- “Reproducible builds” involve for a given feature and hardware configuration of the code base, every build invoca- tion should yield bit-to-bit identical build results.
Infrastructure-as-code
- Containers or virtual machines are used to deploy new versions of the system for testing or even production.
- It has been recommended that infrastructure code is to be stored in a separate VCS repository than source code, in order to restrict access to infrastructure code.
Deployment
- The term “dark launching” corresponds to deploying new features without releasing them to the public, in which parts of the system automatically make calls to the hidden features in a way invisible to end users.
- “Blue green deployment” deploys the next software version on a copy of the production environment, and changes this to be the main enviroment on release.
- In “canary deployment” a prospective release of the software system is loaded onto a subset of the production environments for only a subset of users.
- “A/B testing” deploys alternative A of a feature to the environment of a subset of the user base, while alternative B is deployed to the environment of another subset.
Release
- Once a deployed version of a system is released, the release engineers monitor telemetry data and crash logs to track the performance and quality of releases. Several frameworks and applications have been introduced for this.

Are these practices to be classified under dated, state of the art or state of the practice? Why?

The majority of these practices are classified by the paper as state of the practice, but state of the art practices are also mentioned.

What open challenges in release engineering does this publication mention?

Branching and merging
- No methodology or insight exists on how to empirically validate the best branching structure for a given organization or project, and what results in the smallest amount of merge conflicts.
- Release engineers need to pay particular attention to conflicts and incompatibilities caused by evolving library and API dependencies.
Building and testing
- Speeding up CI might be the major concern of practitioners. This speed up can be achieved through predicting whether a code change will break the build, or by “chunking” code changes into a group and only compile and test each group once.
- The concept of “green builds” slowly is becoming an issue, in the sense that frequent triggering of the CI server consumes energy.
- Security of the release engineering pipeline in general, and the CI server in particular, also has become a major concern.
Release
- Qualitative studies are not only essential to understand the rationale behind quantitative findings, but also to identify design patterns and best practices for build systems.
  - How can developers make their builds more maintainable and of higher quality?
  - What refactorings should be performed for which build system anti-patterns?
- Identification and resolution of build bugs, i.e., source code or build specification changes that cause build breakage, possibly on a subset of the supported platforms.
- Basic tools have a hard time determining what part of the system is necessary to build.
- Studies on non-GNU Make build systems are missing.
- Apart from identifying bottlenecks, such approaches should also suggest concrete refactorings of the build system specifications or source code.
Infrastructure-as-code
- Research on differences between infrastructure languages is lacking.
- Best practices and design patterns for infrastructure-as-code need to be documented.
- Qualitative analysis of infrastructure code will be necessary to understand how developers address different infrastructure needs.
- Quantitative analysis of the version control and bug report systems can then help to determine which patterns were beneficial in terms of maintenance effort and/or quality.
Deployment
- More emperical studies can be done to answer question like this:
  - Is blue-green deployment the fastest means to deploy a new version of a web app?
  - Are A/B testing and dark launching worth the investment and risk?
  - Should one use containers or virtual machines for a medium-sized web app in order to meet application performance and robustness criteria?
  - If an app is part of a suite of apps built around a common database, should each app be deployed in a different container?
- Better tools for quality assurance are required, to prevent showstopper bugs from slipping through and requiring re-deployment of a mobile app version (with corresponding vetting), these include:
  - Defect prediction (either file- or commit-based)
  - Smarter/safer update mechanisms
  - Tools for improving code review
  - Generating tests
  - Filtering and interpreting crash reports
  - Prioritization and triaging of defect reports
Release
- More research is needed on determining which code change is the perfect one for triggering the release of one of these releases, or whether a canary is good enough to be released to another data centre.
- Question such as the following should be investigated:
  - Should one release on all platforms at the same time?
  - In the case of defects, which platform should receive priority?
  - Should all platforms use the same version numbering, or should that be feature-dependent?
  - Research on the continuous delivery and rapid releases from other systems should be explored.

What research gaps does this publication contain?

As is common with surveys, it does not contain the state of the field today. More quantitive and qualitive research has been done, which can not possibly be included.

Are these research gaps filled by any other publications in this survey?

An example of further research that expand on this study is [49]

A.3.3.14 The Impact of Switching to a Rapid Release Cycle on the Integration Delay of Addressed Issues

Reference: [49]

General information:

Name of person extracting data: Nels Numan
Date form completed (dd/mm/yyyy): 28/09/2018
Publication title: The Impact of Switching to a Rapid Release Cycle on the Integration Delay of Addressed Issues
Author information: Daniel Alencar da Costa, Shane McIntosh, Uira Kulesza, Ahmed E. Hassan
Journal: 13th Working Conference on Mining Software Repositories (2016)
Publication type: Conference paper
Type of study: Emperical study

What practices in release engineering does this publication mention?

To give a context to the study, the paper describes the concept of traditional releases, rapid releases, their differences, and how issue reports are structured.

Are these practices to be classified under dated, state of the art or state of the practice? Why?

State of the practice. The paper describes common practices that were in use at the time of the publication.

What open challenges in release engineering does this publication mention?

The study mentions that comparing systems with different release structures is difficult since one has to distinguish to what extent the results are due to the release strategy and which are due to intricacies of the systems or organization itself.

What research gaps does this publication contain?

The main gap in this study is the specificity of the data. Only Mozilla has been considered, and external factors such as other organizational challenges which could have an effect on release time could not be included. More research that looks further into comparing this case to that of other organizations is needed.

Are these research gaps filled by any other publications in this survey?

Quantitative research publications:

Study start date: Used data starts from 1999
Study end date or duration: Used data ends in 2010
Population description: The paper describes multiple steps to describe their data collection approach. The paper collected the date and version number of each Firefox release. Tags within the VCS were used to link issue IDs to releases. The paper discards issues that are potential false positives: IDs that have less five digits, issues that refer to tests instead of bugfixes, any potential ID that is the name of a file. Since the commit logs are linked to the VCS tags, the paper is able to link the issue IDs found within these commit logs to the releases that correspond to those tags.
Method(s) of recruitment of participants: Firefox release history wiki and VCS logs
Sample size: 72114 issue reports from the Firefox system (34673 for traditional releases and 37441 for rapid releases)
Evaluation/measurement description: The paper aims to answer three research questions:
- Are addressed issues integrated more quickly in rapid releases?
  - Approach: Through beanplots to compare the distributions, the paper first observes the lifetime of the issues of traditional and rapid releases. Next, it looks at the time span of the triaging, fixing, and integration phases within the lifetime of an issue.
- Why can traditional releases integrate addressed issues more quickly?
  - Approach: the paper groups traditional and rapid releases into major and minor releases and study their integration delay through beanplots, Mann-Whiteney-Wilcoxon tests, Cliff’s delta, and MAD.
- Did the change in the release strategy have an impact on the characteristics of delayed issues?
  - Approach: the paper builds linear regression models for both release approaches. The paper firstly estimates the degrees of freedom that can be spent on the models. Secondly, they check for metrics that are highly correlated using Spearman rank correlation tests and perform a redundancy check to remove redundant metrics. The paper then assesses the fit of our models using the ROC area and the Brier score. The ROC area is used to evaluate the degree of discrimination achieved by the model. The Brier score is used to evaluate the accuracy of probabilistic predictions. The used metrics include reporter experience, resolver experience, issue severity, issue priority, project queue rank, number of impacted files and fix time. A full list of metrics can be found in Table 2 of the paper.
Outcomes:
- Are addressed issues integrated more quickly in rapid releases?
  - Results: There is no significant difference between traditional and rapid releases regarding issue lifetime. Results:
- Why can traditional releases integrate addressed issues more quickly?
  - Results: Minor-traditional releases tend to have less integration delay than major/minor-rapid releases.
- Did the change in the release strategy have an impact on the characteristics of delayed issues?
  - Results: The models achieve a Brier score of 0.05- 0.16 and ROC areas of 0.81-0.83. Traditional releases prioritize the integration of backlog issues, while rapid releases prioritize the inte- gration of issues of the current release cycle.
Limitations: Defects in the tools that were developed to perform the data collection and evaluation could have an effect on the outcomes. Furthermore, the way that issue IDs are linked to releases may not represent the total addressed issues per release. The results cannot be generalized as the evaluation was solely done on the Firefox system.
Future research: Further research can look into applying the same evaluation strategy to other organizations that switched from traditional to rapid release.

Notes:

A.3.3.15 An Empirical Study of Delays in the Integration of Addressed Issues

Reference: [48]

General information:

Name of person extracting data: Nels Numan
Date form completed (dd/mm/yyyy): 29/09/18
Publication title: An Empirical Study of Delays in the Integration of Addressed Issues
Author information: Daniel Alencar da Costa, Surafel Lemma Abebe, Shane McIntosh, Uira Kulesza, Ahmed E. Hassan
Journal: 2014 IEEE International Conference on Software Maintenance and Evolution
Publication type: Conference paper
Type of study: Emperical study

What practices in release engineering does this publication mention?

This publication discusses the usage of issue tracking systems, and what the term issue means to form a context around the study.

Are these practices to be classified under dated, state of the art or state of the practice? Why?

State of the practice.

What open challenges in release engineering does this publication mention?

The results based on the investigated open source projects may not be generalizable and replication of the study is required on a larger set of projects to form a more general conclusion. Another challenge is finding metrics that are truly correlated with the integration delay of issues.

What research gaps does this publication contain?

Please see last question.

Are these research gaps filled by any other publications in this survey?

[49]

Quantitative research publications:

Study start date:
Used data start dates:
- ArgoUML: 18/08/2003
- Eclipse: 03/11/2003
- Firefox: 05/06/2012
Used data end dates:
- ArgoUML: 15/12/2011
- Eclipse: 12/02/2007
- Firefox: 04/02/2014
Population description:
Method(s) of recruitment of participants: The data was collected from both ITSs and VCSs of the studied systems.
Sample size: 20,995 issues from ArgoUML, Eclipse and Firefox projects
Evaluation/measurement description:
- How long are addressed issues typically delayed by the integration process?
  - Approach: models are created using metrics from four dimensions: reporter, issue, project, and history. Please refer to Table 2 in the paper for all of the metrics considered. The models are trained using the random forest technique. Precision, recall, F-measure, and ROC area are used to evaluate the models.
Outcomes:
- How long are addressed issues typically delayed by the integration process?
  - Addressed issues are usually delayed in a rapid release cycle. Many delayed issues were addressed well before releases from which they were omitted. Many delayed issues were addressed well before releases from which they were omitted.
- Can we accurately predict when an addressed issue will be integrated?
  - The prediction models achieve a weighted average precision between 0.59 to 0.88 and a recall between 0.62 to 0.88, with ROC areas of above 0.74. The models achieve better F-measure values than Zero-R.
- What are the most influential attributes for estimating integration delay?
  - The integrator workload has a bigger influence on integrator delay than the other attributes. Severity and priority have little influence on issue in- tegration delay.
Limitations: See open challenges.
Future research: See open challenges.

Notes:

A.3.3.16 Towards Definitions for Release Engineering and DevOps

Reference: [63]

General information:

Name of person extracting data: Nels Numan
Date form completed (dd/mm/yyyy): 30/09/2018
Publication title: Towards Definitions for Release Engineering and DevOps
Author information: Andrej Dyck, Ralf Penners, Horst Lichter
Journal:
Publication type:
Type of study: Survey

What practices in release engineering does this publication mention?

This paper talks about approaches to improve the collaboration between development and IT operations teams, in order to streamline software engineering processes. The paper defines for release engineering and devops.

Are these practices to be classified under dated, state of the art or state of the practice? Why?

Not applicable.

What open challenges in release engineering does this publication mention?

The paper mentions that creating a definition which is uniform and valid for many situations is difficult to find and that further research is needed.

What research gaps does this publication contain?

This paper aims to form a uniform definition for release engeneering and devops, in collaboration with experts. It is unclear how many experts were consulted for this definition, and more consultations and research could be done to further improve the definition.

Are these research gaps filled by any other publications in this survey?

Quantitative research publications:

Study start date:
Study end date or duration:
Population description:
Method(s) of recruitment of participants:
Sample size:
Evaluation/measurement description:
Outcomes:
Limitations:
Future research:

Notes:

A.3.3.17 Continuous deployment of software intensive products and services: A systematic mapping study

Reference: [163]

General information:

Name of person extracting data: Nels Numan
Date form completed (dd/mm/yyyy): 30/09/18
Publication title: Continuous deployment of software intensive products and services: A systematic mapping study
Author information: Pilar Rodrígueza, Alireza Haghighatkhaha, Lucy Ellen Lwakatarea, Susanna Teppolab, Tanja Suomalainenb, Juho Eskelib, Teemu Karvonena, Pasi Kuvajaa, June M. Vernerc, Markku Oivoa
Journal:
Publication type:
Type of study: Semantic study

What practices in release engineering does this publication mention?

This paper discussed the developments of continuous development over the years until June 2014. This paper has performed a semantic study to identify, classify and analyze primary studies related to continuous development. The paper has found the following major points:
- Almost all primary studies make reference in one way or another to accelerate the releae cycle by shortening the release cadence and turning it into a continuous flow.
- Some reviewed publications claim that accelerating the release cycle can make it harder to perform re-engineering activities.
- CD challenges and changes traditional planning towards continuous planning in order to achieve fast and frequent releases.
- Tighter integration between planning and execution is required in order to achieve a more holisitic view on planning in CD.
- It is important for the engineering and QA teams to ensure backward compatibility of enhancements, so that users perceive only improvements rather than experience any loss of functionality.
- Code change activities tend to focus more on bug fixing and maintenance than functional- ity expansion
- The architecture must be robust enough to allow the organization to invest its resources in offensive initiatives such as new functionalitity, product enhancements and innovation rather than defensive efforts such as bugfixes.
- A major challenge in CD is to retain the balance between speed and quality. Some approaches reviewed by this study propose a focus on measuring and monitoring source code and architectural quality.
- To avoid issues such as duplicated testing efforts and slow feedback loops it is important to make all testing activities transparent to individual developers.

What open challenges in release engineering does this publication mention?

Continuous and rapid experimentation is an emerging research topic with many possibilities for future work. This is why it’s important to keep up with the newly contributed studies and add them to future reviews to compare their findings.

What research gaps does this publication contain?

Notes:

A.3.3.18 Frequent Releases in Open Source Software: A Systematic Review

Reference: [41]

General information:

Name of person extracting data: Nels Numan
Date form completed (dd/mm/yyyy): 30/09/18
Publication title: Frequent Releases in Open Source Software: A Systematic Review
Author information: Antonio Cesar Brandão Gomes da Silva, Glauco de Figueiredo Carneiro, Fernando Brito e Abreu and Miguel Pessoa Monteiro
Journal: Information
Publication type: Journal
Type of study: Survey

What practices in release engineering does this publication mention?

This paper discussed the developments of continuous development over the years. This paper has performed a semantic study to identify, classify and analyze primary studies related to continuous development. The paper finds:
- Two main motivations for the implementation of frequent software releases in the context of OSS projects, which are the project attractiveness/increase of participants and maintenance and increase of market share
- Four main strategies are adopted by practitioners to implement frequent software releases in the context of OSS projects: time-based release, automated release, test-driven development and continuous delivery/deployment.
- The main positive points associated to rapid releases are: quick return on customer needs, rapid delivery of new features, quick bug fixes, immediate release security patches, increased efficiency, entry of new collaborators, and greater focus on quality on the part of developers and testers.
- The main negative points assocaited to rapid releases are reliability of new versions, increase in the “technical debt”, pressure felt by employees and community dependence.

Are these practices to be classified under dated, state of the art or state of the practice? Why?

The practices discussed are a combination of state of the art and state of the practice approaches.

What open challenges in release engineering does this publication mention?

A meta-model for the mining of open source bases in view of gathering data that leads to assessment of the quality of projects adoping the frequent release approach.

What research gaps does this publication contain?

Are these research gaps filled by any other publications in this survey?

A.4 Appendix to Chapter 7 (Code Review)

A.4.1 Extracted data

This section contains data extracted from all resources included in the survey, according to the Data collection section of the review protocol. Note that if some data could not be collected, it is explicitly stated.

The resources are listed in alphabetical order of first author name, and then by year published.

A.4.1.1 Expectations, outcomes, and challenges of modern code review

Reference: [11]

Summary

This paper describes research about the goals and actual effects of code reviews. Interviews and experiments have been done with people in the programming field.

One of the main conclusions is that the main effect of doing code reviews is that everyone involved understands the code better. This is opposed to what the goal of code reviews generally is: discovering errors.

For answering RQ1:

Sub-topic: in practice; tools
Research method: empirical; qualitative
Tools: N/A
Datasets: Data collected from interviews, surveys and code reviews

Research questions and answers:

What are the motivations and expectations for modern code review? Do they change from managers to developers and testers? The top motivation for code reviews is finding defects, closely followed by code improvement. There does not seem to be a large difference between managers, developers and testers.
What are the actual outcomes of modern code review? Do they match the expectations? Code improvements are the most seen outcomes of code review, followed by code understanding and social communication. The outcomes do not match the expectations well. For example, only 14% of researched review comments was about code defects, while about 44% chose finding defects as the main motivation for doing code review.
What are the main challenges experienced when performing modern code reviews relative to the expectations and outcomes? The main challenges is by far understanding the code under review. This occurs for example when code has to be reviewed that is not in the same system as a developers works on daily.

For answering RQ2:

Tools used: CodeFlow, a reviewing tool. It is not publicly available.
Company/organization: Microsoft
Evaluation: At the time of this paper, it still focusses mainly on fixing errors, and not on the more often ocurring results of doing code review.

For answering RQ3:

Future research challenges:

Research on automating code review tasks. This mainly concerns low-level tasks, like checking boundary conditions or catching common mistakes.
Research on code comprehension during code review. According to the authors research has been done on this with new developers in mind, but it would also be applicable to code reviews. The authors note that IDEs often include tools for code comprehension, but code review tools do not.
Research on awareness and learning during code review. Those two aspects were cited as motivations for code review by developers. Future research could research these aspects more explicitly.

A.4.1.2 A Faceted Classification Scheme for Change-Based Industrial Code Review Processes

Reference: [16]

Summary The broad research questions answered in this article are: How is code review performed in industry today? Which commonalities and variations exist between code review processes of different teams and companies? The article describes a classification scheme for change-based code review processes in industry. This scheme is based on descriptions of the code review processes of eleven companies, obtained from interviews with software engineering professionals that were performed during a Grounded Theory study.

A.4.1.3 The Choice of Code Review Process: A Survey on the State of the Practice

Reference: [15]

Summary This paper, published in 2017, is trying to answer 3 RQs. Firstly, how prevalent is change-based review in the industry? Secondly, does the chance that code review remains in use increase if code review is embedded into the process (and its supporting tools) so that it does not require a conscious decision to do a review? Thirdly, are the intended and acceptable levels of review effects a mediator in determining the code review process?

A.4.1.4 The influence of non-technical factors on code review

Reference: [19]

Summary This paper focus on the influence of several non-technical factors on code review response time and outcome. An empirical study of code review process for WebKit, a large open source project was described to see the influence. Specifically, the authors replicated some previously studied factors and extended several more factors that had not beed explored.

For answering RQ1:

Sub-topic: open-source, impact
Research method: empirical study
Tools: WebKit
Datasets: WebKit code review data extracted from Bugzilla.

Research questions and answers:

What factors can influence how long it takes for a patch to be reviewed? The organizational and personal factors influence review timeliness. Some factors that influenced the time required to review a patch, such as the size of the patch itself or the part of the code base being modified, are unsurprising and are likely related to the technical complexity of a given change. The most influential factors of the code review process on review time are the organization a patch writer is affiliated with and their level of participation within the project.
What factors influence the outcome of the review process? The organizational and personal factors influence the likelihood of a patch being accepted. The most influential factors of the code review process on patch acceptance are the organization a patch writer is affiliated with and their level of participation within the project.

For answering RQ3:

Future research challenges:

Research on studying how best to interpret empirical software engineering research within the context of contextual factors. Understanding the reasons behind observable developer behaviour requires an understanding of the contexts, processes, organizational and individual factors that can influence code review and its outcome.

Notes:

This paper has an extended version [18].

A.4.1.5 Investigating technical and non-technical factors influencing modern code review

Reference: [18]

Summary:

This article primirarily discusses some non-technical factors that influence the code review process. This are factors like review experience, amount of contributions to a project and company affiliation.

It is found that the most important factors influencing the code review process, in terms of both review time and patch acceptance, are the organization affiliation of the patch writer and the amount of participation of the patch writer in the project.

For answering RQ1:

Sub-topic: non-technical
Research method: empirical; quantitative
Tools: Custom
Datasets: WebKit reviews, Google Blink reviews

Research questions and answers:

What factors can influence how long it takes for a patch to be reviewed? “Based on the results of two empirical studies, we found that both technical (patch size and component) , as well as non-technical (organization, patch writer experience, and reviewer activity) factors affect review timeliness when studying the effect of individual variables. While priority appears to influence review time for WebKit, we were not able to confirm this for Blink.”
What factors influence the outcome of the review process? “Our findings from both studies suggest that patch writer experience affects code review outcome. For the WebKit project, factors like priority, organization, and review queue also have an effect on the patch acceptance.”

For answering RQ2:

Tools: N/A
Company/Organization: N/A
Evaluation: N/A

Notes:

This paper has a shorter version [19].

For answering RQ3:

Future research challenges:

Not stated

A.4.1.6 Modern code reviews in open-source projects: Which problems do they fix?

Reference: [21]

Summary It has been researched what kinds of problems are solved by doing code reviews. The conclusion is that 75% are improvements in evolvability of the code, and 25% in functional aspects.

It has also been researched which part of the review comments is actually followed up by an action, and which part of the edits after a review are actually caused by review comments.

For answering RQ1:

Sub-topic: impact,changes
Research method: empirically explore; change classification
Tools: R
Datasets: documented history of ConQAT and GROMACS

Research questions and answers:

Which types of changes occur in code under review? 75% of changes are related to the evolvability of the system, and only 25% to its functionality.
What triggered the changes occurring in code under review? 78-90% of the trigger are review comments and the remaining 10-22% are ‘undocumented’.
What influences the number of changes in code under review? Code churn, number of changed files and task type are the most important factors influencing the number of changes.

A.4.1.7 Lessons learned from building and deploying a code review analytics platform

Reference: [29]

Summary:

A code review data analyzation platform developed and used by Microsoft is discussed. It is mainly presented what users of the system think of it and how its use influences development teams. One of the conclusions is that in general, the platform has a positive influence on development teams and their products.

For answering RQ2:

Tools used: CodeFlow, CodeFlow Analytics
Company/organization using the tool: Microsoft
Evaluation of the tool: CodeFlow has already had a positive implace on development teams because of its simplicity, low barrier for feedback and flexible support of Microsoft’s disparate engineering systems. But some challenges such as dealing with branches and linking reviews to commits need to improve.

As for CodeFlow Analytics: the tool is being used increasingly throughout Microsoft, with different teams using the tool for different purposes. It is for example effectively used to create dashboards with code review evaluation information, or for examining past reviews in detail. However, some parts of the tool still need to improve in terms of user-friendliness, for example because some functionality is difficult to find.

For answering RQ3:

Future research challenges:

Research on an automatic way to classify and assess the usefulness of comments. This was specifically requested by an interviewees’s and is still an open challenge regarding CodeFlow.
Research on many aspects of code review based on data from CodeFlow Analytics or other similar tools.
Research on methods to automatically recommend reviewers for changes in the system.

A.4.1.8 Software Reviews: The State of the Practice

Reference: [43]

Summary To investigate how industry carries out software reviews and in what forms, this paper conducted a two-part survey in 2002, the first part based on a national initiative in Germany and the second involving companies worldwide. Additionally, this paper also include some fundamental concepts of code review, such as functionalities of code review.

A.4.1.9 Code reviews do not find bugs: how the current code review best practice slows us down

Reference: [53]

Summary As code review has many uses and benefits, the authors hope to find out whether the current code review methods are sufficiently efficient. They also research whether other methods may be more efficient. With experience gained at Microsoft and with support of data, the authors posit (1) that code reviews often do not find functionality issues that should block a code submission; (2) that effective code reviews should be performed by people with a specific set of skills; and (3) that the social aspect of code reviews cannot be ignored.

For answering RQ1:

Sub-topic: impact
Research method: empirical
Tools: not mentioned
Datasets: data collected from engineering systems

Research questions and answers:

In what situations, do code reviews provide more value than others? Unlike inspections, code reviews do not require participants to be in the same place nor do they happen at a fixed, prearranged time. Aligning with a distributed nature of many projects, code reviews are asynchronous and frequently supporting geographically distributed reviewers.
What is the value of consistency of applying code reviews equally to all code changes? Code review usefulness is negatively correlated with the size of a code review. With 20 or more changed files, the more files there are in a single review, the lower the overall rate of useful feedback.

For answering RQ3:

Future research challenges:

Research on undocumented changes of code review because prior research has neglected.
Due to its costs, code reviewing practice is a topic deserving to be better understood, systematized and applied to software engineering workflow with more precision than the best practice currently prescribes.

A.4.1.10 Design and code inspections to reduce errors in program development

Reference: [67]

Summary This paper describes a method to thoroughly check code quality after each step of the development process, in a heavyweight manner. It does not really concern agile development.

The authors state that these methods do not affect the developing process negatively, and that they work well for improving software quality.

A.4.1.11 An exploratory study of the pull-based software development model

Reference: [75]

Summary This article focuses on how much pull requests are being used and how they are used, focusing on GitHub. For example, it is concluded that pull-requests are not being used that much, that pull-requests are being merged fast after they have been submitted, and that a pull request not being merged is most of the time not caused by technical errors in the pull-request.

For answering RQ1:

Sub-topic: open-source, in practice
Research method: empirical; qualitative for finding out reasons for closing pull request, rest quantitative.
Tools: Custom developed tools, available online
Datasets: GHTorrent dataset, along with data collected by authors. The last is also available online

Research questions and answers:

How popular is the pull based development model? “14% of repositories are using pull requests on Github. Pull requests and shared repositories are equally used among projects. Pull request usage is increasing in absolute numbers, even though the proportion of repositories using pull requests has decreased slightly.”
What are the lifecycle characteristics of pull requests? “Most pull requests are less than 20 lines long and processed (merged or discarded) in less than 1 day. The discussion spans on average to 3 comments, while code reviews affect the time to merge a pull request. Inclusion of test code does not affect the time or the decision to merge a pull request. Pull requests receive no special treatment, irrespective whether they come from contributors or the core team.”
What factors affect the decision and the time required to merge a pull request? “The decision to merge a pull request is mainly influenced by whether the pull request modifies recently modified code. The time to merge is influenced by the developer’s previous track record, the size of the project and its test coverage and the project’s openness to external contributions.”
Why are some pull requests not merged? “53% of pull requests are rejected for reasons having to do with the distributed nature of pull based development. Only 13% of the pull requests are rejected due to technical reasons.”

For answering RQ2:

Tools used: GitHub PR system
Company/organization: Several open-source projects
Evaluation: N/A

For answering RQ3:

Future research challenges:

More research is needed on drive-by commits, which the paper loosely defines as commits added to a repository through a PR by a user that has never contributed to the repository and hence does so for the first time. Often this new contributor also has created a fork for the sole purpose of creating this PR. More research is needed on accurately defining drive-by commits and on assessing their implications.
More research is needed on the effect of the democratization of the develoment process, which occurs for example through the use of pull requests. Democratization could for example lead to a substantially stronger commons ecosystem.
Validating the used models on data from different sources and on projects on different languages.
Research on the motives of developers to work in a highly transparent workspace.
Research on formation of teams and management hierarchies with respect to open-source projects.
Research on novel code review practices.
Research on ways to managing tasks in the pull-based development model.

Challenges in practice:

Development of tools to help the core team of a project with prioritizing their work. The paper gives as an example a tool which would suggest whether a pull request can be merged or not, because this can be predicted with fairly high accuracy.
Development of tools that would suggest categories of improvement for pull request, for example by suggesting that more documentation needs to be added.

A.4.1.12 The impact of code review coverage and code review participation on software quality: A case study of the qt, vtk, and itk projects

Reference: [131]

Summary This paper focuses on the influence of doing light-weight code reviews on software quality. In particular, the effect of review coverage (the part of the code that has been reviewed) and review participation (a measure for how much reviewers are involved in the review process) are being assessed.

It turns out that both aspects improve software quality when they are higher. Review participation is the most influential. According to the authors there are other aspects, which they have not looked into, that are of significant importance for the review process.

For answering RQ1:

Sub-topic: open-source, in practice, impact
Research method: qualitative for finding out the impact of code review coverage and code review participation on software quality rest quantitative.
Tools: N/A
Datasets: Data extracted from Qt, VTK and ITK code review dataset and necessary metrics including version control metrics, coverage metrics and participation metrics.

Research questions and answers:

Is there a relationship between code review coverage and post-release defects? Although review coverage is negatively associated with software quality in our models, several defect-prone components have high coverage rates, suggesting that other properties of the code review process are at play.
Is there a relationship between code review participation and post-release defects? Lack of participation in code review has a negative impact on software quality. Reviews without discussion are associated with higher post-release defect counts, suggesting that the amount of discussion generated during review should be considered when making integration decisions.

For answering RQ2:

Tools: Gerrit
Company/Organization: N/A
Evaluation: N/A

For answering RQ3:

Future research challenges:

Research on other properties of modern code review such as code ownership. Inspired by this paper, other properties of modern code review can also be explored.

Notes:

There exists an extended and improved version of this paper [130]. Only the original version of the paper has been included in this survey.

A.4.1.13 A Study of the Quality-Impacting Practices of Modern Code Review at Sony Mobile

Reference: [171]

Summary First the study by McIntosh et al. [130] is replicated in a proprietary setting at Sony Mobile. A qualitative study, including interviews, is also done with the question “Why are certain reviewing practices associated with better software quality?”

The results from this study are the same as those from the replicated study for RQ1, but not for RQ2. Also, what has been found has been confirmed by the quanitative study has been supported by the qualitative study.

For answering RQ1:

Sub-topic:
Research method: replication: empirical, quantitative; qualitative
Tools: N/A
Datasets: Review data from Sony Mobile

Research questions and answers:

Is there a relationship between code review coverage and post-release defects? “Although our review coverage model outperforms our baseline model, of the three studied review coverage metrics, only the proportion of In-House contributions contributes significantly to our model fits. Comparison with previous work. Similar to the prior work [130], we find that Reviewed Commit and Reviewed Churn provide little explanatory power, suggesting that other reviewing factors are at play.”
Is there a relationship between code review participation and post-release defects? “Our review participation model also outperforms our baseline model. Of the studied review participation metrics, only the measure of accumulated effort to improve code changes (Patch Sd) and the rate of author self-verification (Self Verify) contribute significantly to our model fits. Comparison with previous work. Unlike the prior work [130], code reviewing time and discussion length did not provide exploratory power to the Sony Mobile model”

For answering RQ2:

Tools: Gerrit
Company/Organization: Sony Mobile
Evaluation: N/A

For answering RQ3:

Future research challenges:

Not stated

A.4.1.14 ReDA: A Web-based Visualization Tool for Analyzing Modern Code Review Dataset

Reference: [184]

Summary:

This paper intoduces ReDA, a web-based visualization tool for code review datasets. It processes data from Gerrit, presents statistics about the data, visualizes it, and points the user towards possible problems occurring during the review process. It was tested briefly on some open-source projects.

For answering RQ1:

Sub-topic: visualization; tools
Research method: qualitative; empirical
Tools: ReDA
Datasets: Android code review data

Research questions and answers:

N/A

For answering RQ2:

Tools: N/A
Company/Organization: N/A
Evaluation: N/A

For answering RQ3:

Future research challenges:

The authors aim to develop a live code review monitoring dashboard based on ReDA. They also aim to create a more portable version of ReDA that is also compatible with other tools supporting the MCR process.

A.4.1.15 Who should review my code? A file location-based code-reviewer recommendation approach for modern code review

Reference: [183]

Summary:

This paper presents (1) research on how often a reviewer cannot be found for a code change and the influence of this on the time it takes to process a code change, (2) a tool (RevFinder) for automatically suggesting reviewers based on files reviewed previously, and (3) an empirical evaluation of that tool on four open-source projects.

Of the researched projects, up to 30% of the code changes have problems finding a reviewer. These reviews take on average 12 days longer. Also, it is found that RevFinder works 3 to 4 times better than an existing tool.

For answering RQ1:

Sub-topic: reviewers; tools
Research method: quantitative; empirical
Tools: Custom
Datasets: Custom: Gerrit review data from Android, OpenStack, Qt and LibreOffice

Research questions and answers:

How do reviews with code-reviewer assignment problem impact reviewing time? “4%-30% of reviews have code-reviewer assignment problem. These reviews significantly take 12 days longer to approve a code change. A code-reviewer recommendation tool is necessary in distributed software development to speed up a code review process.”
Does RevFinder accurately recommend code-reviewers? “RevFinder correctly recommended 79% of reviews with a top-10 recommendation. RevFinder is 4 times more accurate than ReviewBot. This indicates that leveraging a similarity of previously reviewed file path can accurately recommend code-reviewers.”
Does RevFinder provide better ranking of recommended code-reviewers? “RevFinder recommended the correct code-reviewers with a median rank of 4. The code-reviewers ranking of RevFinder is 3 times better than that of ReviewBot, indicating that RevFinder provides a better ranking of correct code-reviewers.”

For answering RQ2:

Tools: Gerrit
Company/Organization: Google (Android), OpenStack, Qt, The Document Foundation (LibreOffice)
Evaluation: N/A

For answering RQ3:

Future research challenges:

Researching how RevFinder works in practice, in terms of how effectively and practically it helps developers in recommending code-reviewers, when deployed in a live development environment.

A.4.1.16 Revisiting code ownership and its relationship with software quality in the scope of modern code review

Reference: [182]

Summary:

This paper researches the effect code reviews have on code ownership. This question is answered by looking at two open-source projects. It was found that a lot of contributors do not submit code changes for a specific ticket, but still do quite some reviewing. It was also found that code that contains post-release errors has often been reviewed or authored by people who neither author or review often.

For answering RQ1:

Sub-topic: code ownership
Research method: empirical; quantitative
Tools: R; Custom
Datasets: Review dataset from Hamasaki et al. [79]. Code dataset from the Qt system from McIntosh et al. [131]. Ammended with custom datasets for Qt and OpenStack.

Research questions and answers:

How do code authoring and reviewing contributions differ? “The developers who only contribute to a module by reviewing code changes account for the largest set of contributors to that module. Moreover, 18%-50% of these review-only developers are documented core developers of the studied systems, suggesting that code ownership heuristics that only consider authorship activity are missing the activity of these major contributors.”
Should code review activity be used to refine traditional code ownership heuristics? “Many minor authors are major reviewers who actually make large contributions to the evolution of modules by reviewing code changes. Code review activity can be used to refine traditional code ownership heuristics to more accurately identify the defect-prone modules.”
Is there a relationship between review-specific and review-aware code ownership heuristics and defect-proneness? “Even when we control for several confounding factors, the proportion of developers in the minor author & minor reviewer category shares a strong relationship with defectproneness. Indeed, modules with a larger proportion of developers without authorship or reviewing expertise are more likely to be defect-prone.”

For answering RQ2:

Tools: Gerrit
Company/Organization: The Qt, OpenStack, VTK and ITK projects
Evaluation: N/A

A.4.1.17 Review participation in modern code review

Reference: [181]

Summary This paper discusses the factors that influence review participation in code review. Previous studies identified that review participation influences the code review process significantly, but did not study the factors that actually influence review participation.

It was most importantly found that “(…) the review participation history, the description length, the number of days since the last modification of files, the past involvement of an author, and the past involvement of reviewers share a strong relationship with the likelihood that a patch will suffer from poor review participation.”

For answering RQ1:

Sub-topic: review participation
Research method: empirical; quantitative
Tools: N/A
Datasets: Review data for the Android, Qt and OpenStack projects

Research questions and answers:

What patch characteristics share a relationship with the likelihood of a patch not being selected by reviewers? “We find that the number of reviewers of prior patches, the number of days since the last modification of the patched files share a strong increasing relationship with the likelihood that a patch will have at least one reviewer. The description length is also a strong indicator of a patch that is likely to not be selected by reviewers.”
What patch characteristics share a relationship with the likelihood of a patch not being discussed? “We find that the description length, churn, and the discussion length of prior patches share an increasing relationship with the likelihood that a patch will be discussed. We also find that the past involvement of reviewers shares an increasing relationship with the likelihood. On the other hand, the past involvement of an author shares an inverse relationship with the likelihood.”
What patch characteristics share a relationship with the likelihood of a patch receiving slow initial feedback? “We find that the feedback delay of prior patches shares a strong relationship with the likelihood that a patch will receive slow initial feedback. Furthermore, a patch is likely to receive slow initial feedback if its purpose is to introduces new features.”

For answering RQ2:

Tools: Gerrit
Company/Organization: Android, Qt and OpenStack
Evaluation: N/A

For answering RQ3:

Future research challenges:

The paper notes that it assumes that the review process is the same for a whole project, even for larger projects. Future work should examine whether there are differences in review processes across subsystems.

A.4.1.18 Mining the Modern Code Review Repositories: A Dataset of People, Process and Product

Reference: [195]

Summary:

This paper introduces a dataset that has been systematically collected from review data from several projects. The subject projects are OpenStack, LibreOffice, AOSP, Qt and Eclipse. The dataset is made public for the purpose of doing further research using it. Also, tools may be tested on the data in the dataset, in order to have one benchmark dataset to compare different tools.

For answering RQ1:

Sub-topic: tools; dataset
Research method: N/A
Tools: N/A
Datasets: Review data from the OpenStack, LibreOffice, AOSP, Qt and Eclipse projects

Research questions and answers: N/A

For answering RQ2:

Tools: Gerrit
Company/Organization: OpenStack, LibreOffice, AOSP, Qt, Eclipse
Evaluation: N/A

For answering RQ3:

Future research challenges:

Research using the dataset that has been created, and tests of tools on the dataset.

A.4.1.19 Automatically recommending peer reviewers in modern code review

Reference: [198]

Summary:

This paper introduces cHRev, a reviewer recommendation approach that, according to the paper, works better in most circumstances than RevFinder introduces by Thongtanunam et al. [183]. It recommends reviewers based on their previous review activity. For this it notably uses the frequency of reviews for a specific part of the system and also how recent the reviewing activity was.

For answering RQ1:

Sub-topic: reviewer recomendation
Research method: quantitative; empirical
Tools: Custom
Datasets: Reviewing data for Mylyn, Eclipse, Android, and MS Office

Research questions and answers:

What is the accuracy of cHRev in recommending reviewers on real software systems across closed and open source projects? “cHRev makes accurate reviewer recommendations in terms of precision and recall. On average, less than two recommendations are needed to find the first correct reviewer in both closed and open source systems.”
How do the accuracies of cHRev (trained from the code review history), REVFINDER (also, trained from the code review history, albeit differently), xFinder (trained from the commit history), and RevCom (trained from a combination of the code review and commit histories) compare in recommending code reviewers? “cHRev performs much better than REVFINDER which is based on reviewers of files with similar names and paths and xFinder which relies on source code repository data, and cHRev is statistically equivalent to RevCom which requires both past reviews and commits.”

For answering RQ2:

Tools: Gerrit; CodeFlow
Company/Organization: CodeFlow by Microsoft; Gerrit by the other three projects
Evaluation: N/A

For answering RQ3:

Future research challenges:

The authors plan to include textual analysis of review comments and additional measures of reviewers’ contributions and impact in their approach.

A.4.2 Excluded papers

The following papers have been excluded from the survey. These papers are candidates, but have not been added to the final survey for the stated reason.

[46]: This book is not accessible via the TU Delft subscription of Safari Books Online, and hence we could not read it to include it in the survey.
[130]: This is an extended and improved version of a paper already included in the survey. Because of time constraints we will not reconsider this version.
[67]: This paper does not conform to our exclusion criterion saying that it should be published in 2008 or later.

A.4.3 Table 1

Title	Year	Reference	In survey? (Y/N)
Expectations, outcomes, and challenges of modern code review	2013	[11]	Y
Modern code reviews in open-source projects: Which problems do they fix?	2014	[21]	Y
Lessons learned from building and deploying a code review analytics platform	2015	[29]	Y
An exploratory study of the pull-based software development model	2014	[75]	Y
The impact of code review coverage and code review participation on software quality: A case study of the qt, vtk, and itk projects	2014	[131]	Y

A.4.4 Table 2

Title	Year	Reference	Search date	Result number	In survey? (Y/N)
Investigating technical and non-technical factors influencing modern code review	2016	[18]	29-09-2018	9	Y
Modern code review	2010	[46]	25-09-2018	1	N
An empirical study of the impact of modern code review practices on software quality	2016	[130]	25-09-2018	4	N
A Study of the Quality-Impacting Practices of Modern Code Review at Sony Mobile	2016	[171]	29-09-2018	11	Y
Reda: A web-based visualization tool for analyzing modern code review dataset	2014	[184]	29-09-2018	8	Y
Who should review my code? A file location-based code-reviewer recommendation approach for modern code review	2015	[183]	29-09-2018	5	Y
Revisiting code ownership and its relationship with software quality in the scope of modern code review	2016	[182]	29-09-2018	6	Y
Review participation in modern code review	2017	[181]	29-09-2018	10	Y
Mining the Modern Code Review Repositories: A Dataset of People, Process and Product	2016	[195]	29-09-2018	12	Y
Automatically recommending peer reviewers in modern code review	2016	[198]	29-09-2018	7	Y

A.4.5 Table 3

Title	Year	Reference	In survey? (Y/N)
A Faceted Classification Scheme for Change-Based Industrial Code Review Processes	2016	[16]	Y
The Choice of Code Review Process: A Survey on the State of the Practice	2017	[15]	Y
The influence of non-technical factors on code review	2013	[19]	Y
Impact of peer code review on peer impression formation: A survey	2013	[33]	N
Software Reviews: The State of the Practice	2003	[43]	N
Code reviews do not find bugs: how the current code review best practice slows us down	2015	[53]	Y

References

[28] Bird, C. and Zimmermann, T. 2017. Predicting software build errors. Google Patents.

[24] Beller, M. et al. 2017. Oops, my tests broke the build: An explorative analysis of travis ci with github. Mining software repositories (msr), 2017 ieee/acm 14th international conference on (2017), 356–367.

[160] Rausch, T. et al. 2017. An empirical analysis of build failures in the continuous integration workflows of java-based open-source software. Proceedings of the 14th international conference on mining software repositories (2017), 345–355.

[25] Beller, M. et al. 2017. Travistorrent: Synthesizing travis ci and github for full-stack research on continuous integration. Proceedings of the 14th international conference on mining software repositories (2017), 447–450.

[149] Pinto, G. and Rebouças, F.C.R.B.M. 2018. Work practices and challenges in continuous integration: A survey with travis ci users. (2018).

[200] Zhao, Y. et al. 2017. The impact of continuous integration on other software development practices: A large-scale empirical study. Proceedings of the 32nd ieee/acm international conference on automated software engineering (2017), 60–71.

[193] Widder, D.G. et al. 2018. I’im leaving you, travis: A continuous integration breakup story. (2018).

[86] Hilton, M. et al. 2016. Usage, costs, and benefits of continuous integration in open-source projects. Proceedings of the 31st ieee/acm international conference on automated software engineering (2016), 426–437.

[188] Vassallo, C. et al. 2017. A tale of ci build failures: An open source and a financial organization perspective. Software maintenance and evolution (icsme), 2017 ieee international conference on (2017), 183–193.

[82] Hassan, F. and Wang, X. 2018. HireBuild: An automatic approach to history-driven repair of build scripts. Proceedings of the 40th international conference on software engineering (2018), 1078–1089.

[187] Vassallo, C. et al. 2018. Un-break my build: Assisting developers with build repair hints. (2018).

[197] Zampetti, F. et al. 2017. How open source projects use static code analysis tools in continuous integration pipelines. Mining software repositories (msr), 2017 ieee/acm 14th international conference on (2017), 334–344.

[12] Baltes, S. et al. 2018. (No) influence of continuous integration on the commit activity in github projects. arXiv preprint arXiv:1802.08441. (2018).

[30] Bisong, E. et al. 2017. Built to last or built too fast?: Evaluating prediction models for build times. Proceedings of the 14th international conference on mining software repositories (2017), 487–490.

[166] Santolucito, M. et al. 2018. Statically verifying continuous integration configurations. arXiv preprint arXiv:1805.04473. (2018).

[140] Ni, A. and Li, M. 2018. ACONA: Active online model adaptation for predicting continuous integration build failures. Proceedings of the 40th international conference on software engineering: Companion proceeedings (2018), 366–367.

[68] Fowler, M. and Foemmel, M. 2006. Continuous integration. Thought-Works) http://www. thoughtworks. com/Continuous Integration. pdf. 122, (2006), 14.

[176] Stolberg, S. 2009. Enabling agile testing through continuous integration. Agile conference, 2009. agile’09. (2009), 369–374.

[186] Vasilescu, B. et al. 2014. Continuous integration in a social-coding world: Empirical evidence from github. Software maintenance and evolution (icsme), 2014 ieee international conference on (2014), 401–405.

[2] Abate, P. et al. 2009. Strong dependencies between software components. 2009 3rd international symposium on empirical software engineering and measurement (Oct. 2009).

[1] Abate, P. and Cosmo, R.D. 2011. Predicting upgrade failures using dependency analysis. 2011 IEEE 27th international conference on data engineering workshops (Apr. 2011).

[3] Abdalkareem, R. et al. 2017. Why do developers use trivial packages? An empirical case study on npm. Proceedings of the 2017 11th joint meeting on foundations of software engineering - ESEC/FSE 2017 (2017).

[32] Bogart, C. et al. 2016. How to break an API: Cost negotiation and community values in three software ecosystems. Proceedings of the 2016 24th ACM SIGSOFT international symposium on foundations of software engineering - FSE 2016 (2016).

[45] Claes, M. et al. 2015. A historical analysis of debian package incompatibilities. 2015 IEEE/ACM 12th working conference on mining software repositories (May 2015).

[47] Constantinou, E. and Mens, T. 2017. An empirical comparison of developer retention in the RubyGems and npm software ecosystems. Innovations in Systems and Software Engineering. 13, 2-3 (Aug. 2017), 101–115.

[84] Hejderup, J. et al. 2018. Software ecosystem call graph for dependency management. Proceedings of the 40th international conference on software engineering new ideas and emerging results - ICSE-NIER 18 (2018).

[99] Kikas, R. et al. 2017. Structure and evolution of package dependency networks. 2017 IEEE/ACM 14th international conference on mining software repositories (MSR) (May 2017).

[106] Kula, R.G. et al. 2017. Do developers update their library dependencies? Empirical Software Engineering. 23, 1 (May 2017), 384–417.

[132] Mens, T. et al. 2013. Studying evolving software ecosystems based on ecological models. Evolving software systems. Springer Berlin Heidelberg. 297–326.

[156] Raemaekers, S. et al. 2017. Semantic versioning and impact of breaking changes in the maven repository. Journal of Systems and Software. 129, (Jul. 2017), 140–158.

[161] Robbes, R. et al. 2012. How do developers react to API deprecation? Proceedings of the ACM SIGSOFT 20th international symposium on the foundations of software engineering - FSE 12 (2012).

[185] Trockman, A. 2018. Adding sparkle to social coding. Proceedings of the 40th international conference on software engineering companion proceeedings - ICSE 18 (2018).

[55] Decan, A. et al. 2018. An empirical comparison of dependency network evolution in seven software packaging ecosystems. Empirical Software Engineering. (Feb. 2018).

[61] Dittrich, Y. 2014. Software engineering beyond the project sustaining software ecosystems. Information and Software Technology. 56, 11 (Nov. 2014), 1436–1456.

[87] Hora, A. et al. 2016. How do developers react to API evolution? A large-scale empirical study. Software Quality Journal. 26, 1 (Oct. 2016), 161–191.

[90] Izquierdo, D. et al. 2018. Software development analytics for xen: Why and how. IEEE Software. (2018), 1–1.

[91] Jansen, S. 2014. Measuring the health of open source software ecosystems: Beyond the scope of project health. Information and Software Technology. 56, 11 (Nov. 2014), 1508–1519.

[105] Kula, R.G. et al. 2017. An exploratory study on library aging by monitoring client usage in a software ecosystem. 2017 IEEE 24th international conference on software analysis, evolution and reengineering (SANER) (Feb. 2017).

[119] Malloy, B.A. and Power, J.F. 2018. An empirical analysis of the transition from python 2 to python 3. Empirical Software Engineering. (Jul. 2018).

[121] Manikas, K. 2016. Revisiting software ecosystems research: A longitudinal literature study. Journal of Systems and Software. 117, (Jul. 2016), 84–103.

[159] Rajlich, V. 2014. Software evolution and maintenance. Proceedings of the on future of software engineering - FOSE 2014 (2014).

[178] Teixeira, J. et al. 2015. Lessons learned from applying social network analysis on an industrial free/libre/open source software ecosystem. Journal of Internet Services and Applications. 6, 1 (Jul. 2015).

[17] Bavota, G. et al. 2014. How the apache community upgrades dependencies: An evolutionary study. Empirical Software Engineering. 20, 5 (Sep. 2014), 1275–1317.

[31] Blincoe, K. et al. 2015. Ecosystems in GitHub and a method for ecosystem identification using reference coupling. 2015 IEEE/ACM 12th working conference on mining software repositories (May 2015).

[50] Cox, J. et al. 2015. Measuring dependency freshness in software systems. 2015 IEEE/ACM 37th IEEE international conference on software engineering (May 2015).

[54] Decan, A. et al. 2017. An empirical comparison of dependency issues in OSS packaging ecosystems. 2017 IEEE 24th international conference on software analysis, evolution and reengineering (SANER) (Feb. 2017).

[60] Dietrich, J. et al. 2014. Broken promises: An empirical study into evolution problems in java programs caused by library upgrades. 2014 software evolution week - IEEE conference on software maintenance, reengineering, and reverse engineering (CSMR-WCRE) (Feb. 2014).

[120] Malloy, B.A. and Power, J.F. 2017. Quantifying the transition from python 2 to 3: An empirical study of python applications. 2017 ACM/IEEE international symposium on empirical software engineering and measurement (ESEM) (Nov. 2017).

[126] McDonnell, T. et al. 2013. An empirical study of API stability and adoption in the android ecosystem. 2013 IEEE international conference on software maintenance (Sep. 2013).

[104] Kitchenham, B. 2004. Procedures for performing systematic reviews. Keele, UK, Keele University. 33, 2004 (2004), 1–26.

[4] Adams, B. and McIntosh, S. 2016. Modern release engineering in a nutshell–why researchers should care. Software analysis, evolution, and reengineering (saner), 2016 ieee 23rd international conference on (2016), 78–90.

[48] Costa, D.A. da et al. 2014. An empirical study of delays in the integration of addressed issues. 2014 ieee international conference on software maintenance and evolution (2014), 281–290.

[49] Costa, D.A. da et al. 2016. The impact of switching to a rapid release cycle on the integration delay of addressed issues - an empirical study of the mozilla firefox project. 2016 ieee/acm 13th working conference on mining software repositories (msr) (2016), 374–385.

[97] Khomh, F. et al. 2015. Understanding the impact of rapid releases on software quality. Empirical Software Engineering. 20, 2 (2015), 336–373.

[98] Khomh, F. et al. 2012. Do faster releases improve software quality?: An empirical case study of mozilla firefox. Proceedings of the 9th ieee working conference on mining software repositories (Piscataway, NJ, USA, 2012), 179–188.

[95] Kaur, A. and Vig, V. 2019. On understanding the release patterns of open source java projects. Advances in Intelligent Systems and Computing. 711, (2019), 9–18.

[96] Kerzazi, N. and Robillard, P. 2013. Kanbanize the release engineering process. 2013 1st International Workshop on Release Engineering, RELENG 2013 - Proceedings (2013), 9–12.

[37] Castelluccio, M. et al. 2017. Is it safe to uplift this patch? An empirical study on mozilla firefox. Proceedings - 2017 IEEE International Conference on Software Maintenance and Evolution, ICSME 2017 (2017), 411–421.

[94] Karvonen, T. et al. 2017. Systematic literature review on the impacts of agile release engineering practices. Information and Software Technology. 86, (2017), 87–100.

[44] Claes, M. et al. 2017. Abnormal working hours: Effect of rapid releases and implications to work content. IEEE International Working Conference on Mining Software Repositories (2017), 243–247.

[69] Fujibayashi, D. et al. 2017. Does the release cycle of a library project influence when it is adopted by a client project? SANER 2017 - 24th IEEE International Conference on Software Analysis, Evolution, and Reengineering (2017), 569–570.

[172] Souza, R. et al. 2015. Rapid releases and patch backouts: A software analytics approach. IEEE Software. 32, 2 (2015), 89–96.

[108] Laukkanen, E. et al. 2018. Comparison of release engineering practices in a large mature company and a startup. Empirical Software Engineering. (2018), 1–43.

[63] Dyck, A. et al. 2015. Towards definitions for release engineering and devops. Release engineering (releng), 2015 ieee/acm 3rd international workshop on (2015), 3–3.

[153] Plewnia, C. et al. 2014. On the influence of release engineering on software reputation. Mountain view, ca, usa: In 2nd international workshop on release engineering (2014).

[125] Mäntylä, M.V. et al. 2015. On rapid releases and software testing: A case study and a semi-systematic literature review. Empirical Software Engineering. 20, 5 (2015), 1384–1425.

[154] Poo-Caamaño, G. 2016. Release management in free and open source software ecosystems.

[177] Teixeira, J. 2017. Release early, release often and release on time. an empirical case study of release management. Open source systems: Towards robust practices (Cham, 2017), 167–181.

[163] Rodríguez, P. et al. 2017. Continuous deployment of software intensive products and services: A systematic mapping study. Journal of Systems and Software. 123, (2017), 263–291.

[41] Cesar Brandão Gomes da Silva, A. et al. 2017. Frequent releases in open source software: A systematic review. Information. 8, 3 (2017), 109.

[107] Laukkanen, E. et al. 2017. Problems, causes and solutions when adopting continuous delivery—A systematic literature review. Information and Software Technology. 82, (2017), 55–79.

[11] Bacchelli, A. and Bird, C. 2013. Expectations, outcomes, and challenges of modern code review. Proceedings of the 2013 international conference on software engineering (2013), 712–721.

[16] Baum, T. et al. 2016. A faceted classification scheme for change-based industrial code review processes. Software quality, reliability and security (qrs), 2016 ieee international conference on (2016), 74–85.

[15] Baum, T. et al. 2017. The choice of code review process: A survey on the state of the practice. International conference on product-focused software process improvement (2017), 111–127.

[19] Baysal, O. et al. 2013. The influence of non-technical factors on code review. Reverse engineering (wcre), 2013 20th working conference on (2013), 122–131.

[18] Baysal, O. et al. 2016. Investigating technical and non-technical factors influencing modern code review. Empirical Software Engineering. 21, 3 (2016), 932–959.

[21] Beller, M. et al. 2014. Modern code reviews in open-source projects: Which problems do they fix? Proceedings of the 11th working conference on mining software repositories (2014), 202–211.

[29] Bird, C. et al. 2015. Lessons learned from building and deploying a code review analytics platform. Proceedings of the 12th working conference on mining software repositories (2015), 191–201.

[43] Ciolkowski, M. et al. 2003. Software reviews: The state of the practice. IEEE software. 6 (2003), 46–51.

[53] Czerwonka, J. et al. 2015. Code reviews do not find bugs: How the current code review best practice slows us down. Proceedings of the 37th international conference on software engineering-volume 2 (2015), 27–28.

[67] Fagan, M. 2002. Design and code inspections to reduce errors in program development. Software pioneers. Springer. 575–607.

[75] Gousios, G. et al. 2014. An exploratory study of the pull-based software development model. Proceedings of the 36th international conference on software engineering (2014), 345–355.

[131] McIntosh, S. et al. 2014. The impact of code review coverage and code review participation on software quality: A case study of the qt, vtk, and itk projects. Proceedings of the 11th working conference on mining software repositories (2014), 192–201.

[130] McIntosh, S. et al. 2016. An empirical study of the impact of modern code review practices on software quality. Empirical Software Engineering. 21, 5 (2016), 2146–2189.

[171] Shimagaki, J. et al. 2016. A study of the quality-impacting practices of modern code review at sony mobile. Software engineering companion (icse-c), ieee/acm international conference on (2016), 212–221.

[184] Thongtanunam, P. et al. 2014. Reda: A web-based visualization tool for analyzing modern code review dataset. Software maintenance and evolution (icsme), 2014 ieee international conference on (2014), 605–608.

[183] Thongtanunam, P. et al. 2015. Who should review my code? A file location-based code-reviewer recommendation approach for modern code review. Software analysis, evolution and reengineering (saner), 2015 ieee 22nd international conference on (2015), 141–150.

[182] Thongtanunam, P. et al. 2016. Revisiting code ownership and its relationship with software quality in the scope of modern code review. Proceedings of the 38th international conference on software engineering (2016), 1039–1050.

[79] Hamasaki, K. et al. 2013. Who does what during a code review? Datasets of oss peer review repositories. Proceedings of the 10th working conference on mining software repositories (2013), 49–52.

[181] Thongtanunam, P. et al. 2017. Review participation in modern code review. Empirical Software Engineering. 22, 2 (2017), 768–817.

[195] Yang, X. et al. 2016. Mining the modern code review repositories: A dataset of people, process and product. Proceedings of the 13th international conference on mining software repositories (2016), 460–463.

[198] Zanjani, M.B. et al. 2016. Automatically recommending peer reviewers in modern code review. IEEE Transactions on Software Engineering. 42, 6 (2016), 530–543.

[46] Cohen, J. 2010. Modern code review. Making Software: What Really Works, and Why We Believe It. (2010), 329–336.

[33] Bosu, A. and Carver, J.C. 2013. Impact of peer code review on peer impression formation: A survey. Empirical software engineering and measurement, 2013 acm/ieee international symposium on (2013), 133–142.

See https://github.com/jessetilro/research ↩