From Data Tsunami to Actionable Insights
The data available about open source projects can feel like a tsunami with wave after wave of overwhelming data. But there are ways to make this more manageable by finding and focusing on the metrics that matter the most for you. One of the ways the open source CHAOSS project overcomes the data Tsunami is by collecting sets of metrics that are more useful together using techniques from data science. This session will highlight how people can use data science to generate meaningful insights about their open source software communities.
This talk will start with a discussion about how to approach the tsunami of data using data science based approaches to move from data points toward insight and wisdom about your open source software. The first (and largest) step in a data science workflow is the data collection and preprocessing. By having access to data in a relational database populated by Augur, the major lifting is done and allows data scientists to focus on the analysis. The next section will discuss how collections of related metrics can be used to understand some aspect of your community more holistically than looking at individual metrics. The final portion of this talk will include examples of how to interpret the data from these collections of metrics to move beyond analysis and find tangible ways to make your open source community even better.
We’ll show examples using the open source Augur and 8Knot toolchain to show what is possible with structured data, a python stack, and metrics informed by the metrics models from the CHAOSS project.
The audience will walk away with tips and techniques for making sense of those waves of data using collections of metrics and data science to result in actionable insights about your open source software.