The emergence of online communities has brought about a significant impact on the research landscape. Today, we have access to vast amounts of data generated by users worldwide, making it possible for academics to conduct empirical studies with greater ease and convenience. Two of the most popular online communities for developers are StackOverflow and GitHub. In this blog, we discuss how these two communities may be used for empirical research and propose a method to extract data from them. We also highlight three papers that utilize these communities to conduct empirical studies on microservice development.
StackOverflow is a question-and-answer platform where developers can ask and answer technical questions related to programming. On the other hand, GitHub is a code-hosting platform where developers can collaborate on software projects. Both of these communities generate a vast data that could be useful for empirical research.
The first step to extracting data from these communities is to identify keywords and queries that could help us find the data we need. It involves understanding the research questions and objectives and formulating queries that could capture relevant information. Researchers may use a combination of keywords, operators, and filters to refine their queries and narrow down the search results.
For StackOverflow, we can use the search bar to find posts related to a specific topic. We may use keywords related to the research question, such as "microservice development" or "software architecture." We can filter the results based on criteria such as the number of views, votes, and comments. For example, we can select posts that have received high views or votes to ensure the most relevant content.
We can use the search bar in GitHub to find repositories related to a specific topic. We can use keywords related to the research question, such as "microservices" or "code annotations." We can filter the results based on criteria such as the number of stars, forks, and issues. For example, we can select repositories with certain stars or forks to ensure that we are analyzing the most popular repositories.
Once we have identified the relevant data, we categorise and analyse it. For StackOverflow, we can categorise the posts based on their content. For example, we can categorise posts based on covered topics such as deployment, scalability, or security. We can also categorise posts based on the type of content: questions, answers, or comments. Then, we analyse the content of each category to identify patterns and trends.
We could catalogue repositories in GitHub based on their content, type of project: web applications, mobile applications, or desktop applications, and programming languages: Java, Python, or JavaScript. Then, we analyse the code for each category to identify patterns and trends.
Three papers that utilise StackOverflow and GitHub to conduct empirical studies on microservice development are:
The first paper is "An Empirical Study on Microservice Software Development." In this paper, the authors use StackOverflow posts to identify the top concerns in microservice development. They identify the most frequently asked questions related to microservice development, such as how to implement service discovery, handle data consistency, and deploy microservices in a containerised environment. The authors then analyse the content of these posts to identify the main challenges and issues faced by developers when building microservices.
The second paper is "Semantics-Driven Learning for Microservice Annotations." In this paper, the authors use GitHub to extract code fragments to learn the relation between code fragments and annotations. They use these code fragments to train a model that can suggest annotations for microservices based on the code. They then evaluate the model using a search engine that can suggest annotations based on the code.
The third paper is "Mining the Limits of Granularity for Microservice Annotations." In this paper, the authors use GitHub to extract operations, clusters and similarities in the code. They then mine the granularity to identify the limits for other operations with similar behavior. By analysing the code of various microservices, they identify the patterns and similarities in the code and cluster them based on their behavior. They then use these clusters to identify granularity limits for other operations with similar behavior.
In conclusion, StackOverflow and GitHub are valuable resources for empirical research in academia. By using these online communities, researchers can access a vast amount of data generated by developers community. To extract data from these communities, researchers need to identify the relevant keywords and queries and filter the results based on various criteria. Once the data is collected, we need to categorise and analyse to identify patterns and trends. The three papers we discussed in this blog demonstrate how StackOverflow and GitHub can be used to conduct empirical studies on microservice development, highlighting the potential of these communities for future research in this field.
#itconsultore #microservices #stackoverflow #github
Referencias:
Ramírez, F., Mera-Gómez, C., Bahsoon, R., & Zhang, Y. (2022, November). Mining the Limits of Granularity for Microservice Annotations. In Service-Oriented Computing: 20th International Conference, ICSOC 2022, Seville, Spain, November 29–December 2, 2022, Proceedings (pp. 273-281). Cham: Springer Nature Switzerland.
Ramírez, F., Mera-Gómez, C., Chen, S., Bahsoon, R., & Zhang, Y. (2022, November). Semantics-Driven Learning for Microservice Annotations. In Service-Oriented Computing: 20th International Conference, ICSOC 2022, Seville, Spain, November 29–December 2, 2022, Proceedings (pp. 255-263). Cham: Springer Nature Switzerland.
Ramírez, Francisco, et al. "An Empirical Study on Microservice Software Development." 2021 IEEE/ACM Joint 9th International Workshop on Software Engineering for Systems-of-Systems and 15th Workshop on Distributed Software Development, Software Ecosystems and Systems-of-Systems (SESoS/WDES). IEEE, 2021.
Add new comment