Chapter 1 What is Software Analytics?

Modern software development

Modern software development

Modern software development can be seen as a process comprising many smaller, looping, isolated processes. Each loop is triggered by an input and produces output that triggers other processes. Moreover, each process has a different granularity level, while processes are can be distributed within teams or even among members of various teams. The developers may

  • Software production begins with new feature requests, developed in co-operation with the application clients/stakeholders. In agile environments, a requirements team will developer and iterate over user stories, which are then transformed into feature requests that developers have to work on.

  • A developer loop is what we typically associate with software development. In response to a feature request, a single developer edits code to satisfy the requirements, builds it on his local workstation (if necessary), tests it either automatically or manually and then creates a changeset in the form of a pull request or a patch to be integrated with the project mainline.

  • Within the context of a developer team, developers review and/or test each other’s changesets. They are assisted in this process by Continuous Integration, testing infrastructures and other automated or semi-automated tools. Changes might be requested by the originating developer. The end result of this loop is a feature merge to the project’s mainline development.

  • Then, the release team takes over. Given an incoming feature, they will update the documentation and they will release the software (i.e., produce a new software version and store it as an immutable artifact). In places where devops is practiced, the released version will be deployed in production. After a released version is produced, the maintenance phase begins: incoming bugs are received and after initial analysis are triaged and forwarded to developers for fixing.

Modern software teams use a variety of tools to accomplish these tasks:

  • Version control systems (such as Git) are the heart of the collaboration infrastructure of modern software projects. Not only do they store the software code, they also act as triggers to all

  • Issue management tools (such as Jira and GitHub issues) not only keep bug reports and new feature requests, but they are increasingly used as project planning infrastructure (e.g., for backlog management) and for soliciting user feedback on new planned features.

  • Continuous Integration serves the dual role of both executing software tests and, in many cases, triggering the deployment of the built artifacts on testing or production environments.

  • Various teams use custom tools to statically analyze code (e.g., Fortify), assist them with code review (e.g., Gerrit) and communication (e.g., Slack)

  • Infrastructure as code tools (like Puppet and Ansible) automate the creation of deployment environments.

  • After deployment into production, tools, like the ELK stack, store application runtime logs, while application stores store user reviews.

All those tools maintain records of developer activity in all stages of development. This information can be extracted, analyzed and linked across tools and across projects. The results of the analysis can be used to identify process bottlenecks, to train models that assist developers in their workflows and to help software teams make data-driven decisions.

Within this context, various works have attempted to define what software analytics are. A selection of definitions can be seen in the table below.

Ref Who? Definition
[81] Hassan [Software Intelligence] offers software practitioners (not just developers) up-to-date and pertinent information to support their daily decision-making processes.
[36] Buse The idea of analytics is to leverage potentially large amounts of data into real and actionable insights.
[199] Zhang Software analytics is to enable software practitioners to perform data exploration and analysis in order to obtain insightful and actionable information for data-driven tasks around software and services.
[133] Menzies Software analytics is analytics on software data for managers and software engineers with the aim of empowering software development individuals and teams to gain and share insight from their data to make better decisions.

In our process- and feedback-centric view:

The overarching goal of software analytics is to exploit the wealth of information in software repositories to optimize production, delivery and runtime operation of software.

References

[81] Hassan, A.E. and Xie, T. 2010. Software intelligence: The future of mining software engineering data. Proceedings of the fse/sdp workshop on future of software engineering research (New York, NY, USA, 2010), 161–166.

[36] Buse, R.P. and Zimmermann, T. 2010. Analytics for software development. Proceedings of the fse/sdp workshop on future of software engineering research (New York, NY, USA, 2010), 77–80.

[199] Zhang, D. et al. 2011. Software analytics as a learning case in practice: Approaches and experiences. Proceedings of the international workshop on machine learning technologies in software engineering (New York, NY, USA, 2011), 55–58.

[133] Menzies, T. and Zimmermann, T. 2013. Software analytics: So what? IEEE Software. 30, 4 (July 2013), 31–37.