Text mining (also known as text processing) refers to methods for analysing pieces of unstructured text, to identify meaningful sequences in them, represent their meaning and classify them in classes related to its content. Examples of text mining techniques are sentiment analysis, emotion detection, keyword extraction, meaning representation and document classification.
In parallel to software development, Open Source Software (OSS) communities generate a wealth of textual messages to facilitate communication while working collaboratively. This text is persisted in bug trackers, repositories and email archives and is usually minimally used, although it can provide information, such as:
- How vibrant is the community that develops and uses an OSS project?
- What level of support would you receive as a novice or an experiences user?
- How quickly are bugs addressed and fixed, depending on their severity?
- How do the users feel in the end of a discussion thread about an issue that they came across while using an OSS project?
This information can help developers decide which OSS project is best to use for their purposes, out of a large variety of available ones for a given purpose. Moreover, while developing, being able to search bug reports, comments and discussion threads associated to the software being developed can speed up the process of debugging and developing.
CrossMiner/Scava is a platform that analyses code, documentation, online discussions and issue trackers related to OSS projects, extracts knowledge and injects it into the Eclipse IDE, at the time that developers need it to make design decisions. This allows them to reduce their effort in knowledge acquisition and to increase the quality of their code. CrossMiner is a project funded by European Commission’s Horizon 2020 Programme. I am involved in CrossMiner as the leader of the text mining team at Edge Hill University, United Kingdom.
What will I learn?
This talk will briefly introduce how text mining methods work, and how they can be applied on collections of textual communication messages associated to OSS development. We will discuss how the processed outcomes of text mining can enrich text before it is stored in search engines, available to the developers while programming. Finally the talk will introduce the CrossMiner/Scava platform and Eclipse plugin and present some use cases and results.
Audience
The talk is intended for software designers and developers that have an interest in mining data, text in particular, for improving the software design and decision process. There are no particular pre-requisites for attending the talk.