Abstract:
Bug triage is essential in efficiently assigning bugs to developers by leveraging past
experiences. Without this crucial process, experienced developers may be inundated with
assignments, while newer developers may be underutilized. Furthermore, improper bug
distribution among different developer types can lead to various issues, including delays,
errors, decreased capacity, and diminished job satisfaction. Previous bug triaging methods
often do not account for newly joined developers, making them ineffective in recommending
these developers for bug assignments. Consequently, these methods lead to improper task
allocation, denying new team members valuable learning opportunities during bug resolution.
Furthermore, prior research tends to overlook workload distribution among different
developer categories, neglecting the need to balance bug assignments among experienced
developers, newcomers, and those with varying skill levels. To address these issues, there is a
need for an automated bug triaging technique that not only includes new developers but also
prioritizes workload distribution among different developer categories. Therefore, this study
introduces a novel bug triaging strategy that combines two pivotal models: Bug Solving
Developer Recommendation Model (BSDRM) and Developer Scheduler (DevSched).
The first model, known as the BSDRM, forms the core of automated bug triaging.
BSDRM harnesses the power of Machine Learning (ML) algorithms and historical bug
reports to intelligently suggest developers for specific bug resolution tasks. To achieve this,
Eclipse, Mozilla, and NetBeans datasets are aggregated and split into training and testing sets.
Subsequently, a sentence-embedded model is crafted from the training set, generating a
developer-specific word repository. In contrast, the test set is transformed into a vocabulary
list using an embedded model. BSDRM identifies eligible developers by matching their
developer-specific word repository with the bug report vocabulary list via K-Nearest
Neighbour (KNN) analysis. These developers are then categorized into three groups:
experienced, newly experienced, and fresh graduate developers, utilizing a classification
model comprising various ML algorithms Decision Tree (DT), Extra Tree (ET), AdaBoost
(AdC), Bagging Classifier (BC), Gradient Boosting (GB), KNN, Nearest Centroid (NC),
Bernoulli Na¨ıve Bayes (BNB), Multinomial Na¨ıve Bayes (MNB), Complement Na¨ıve
iii
Bayes (CoNB), Gaussian Na¨ıve Bayes (GNB), Logistic Regression (LR), Perceptron (Pr),
and Multi-Layer Perceptron (MLP). Remarkably, the Bagging Classifier exhibits outstanding
performance, achieving 96.59% accuracy in classifying developers with varying experience
levels.
In tandem with BSDRM, this study introduces the second model, DevSched, which
assumes a critical role in balancing developer workloads. DevSched factors in workload
distribution, developer proficiency, and bug characteristics. It generates multiple developer
profiles based on historical bug reports and assigns bugs to developers by assessing the
highest similarity between bug vectors and developer corpora. DevSched also dynamically
adjusts developer workloads and refines their ratings based on performance. The study
utilizes bug reports from Eclipse, Mozilla, and NetBeans to evaluate developer performance
in the bug-triaging process. DevSched efficiently assigns and balances bugs among various
developer categories, resulting in significantly reduced standard deviations for Eclipse,
NetBeans, and Mozilla datasets compared to conventional bug distribution processes. This
meticulous process is reiterated for each bug, ensuring optimal resource allocation and timely
resolution of critical issues.
The proposed study will collectively enhance bug resolution efficiency, optimize
developer workloads, and ensure that both experienced and newer developers are judiciously
utilized in the bug triaging process.