Ensemble Learning Algorithms for Classification Tasks in Natural Language  Processing (NLP)

Hossain, Afzal

Ensemble Learning Algorithms for Classification Tasks in Natural Language Processing (NLP)

Hossain, Afzal

URI: http://reposit.library.du.ac.bd:8080/xmlui/xmlui/handle/123456789/4110

Date: 2025-04-20

Abstract:

Natural Language Processing (NLP) encompasses a multitude of practical applications, including Information Retrieval, Information Extraction, Machine Translation, Text Simplification, Sentiment Analysis, Text Summarization, Spam Filtering, Auto-prediction, Auto-correction, Speech Recognition, Question Answering, and Natural Language Generation. Many of these applications are essentially classification tasks, which can be performed by machine learning models. Ensemble techniques within machine learning involve combining multiple models to improve predictive performance compared to individual models. This thesis explores the application of ensemble learning techniques to improve classification performance in NLP tasks. Various ensemble learning techniques, including bagging, boosting, random forest, and voting, are explored and experimented with. For each ensemble method, common base models, such as Support Vector Machines (SVM), Naive Bayes, Decision Trees, and K-Nearest Neighbor (KNN), are employed. Various evaluation metrics commonly used in NLP classification tasks are used, including accuracy, precision, recall, F1-score, and time complexity of the algorithms. The findings of the thesis suggest that ensemble methods, especially boosting, generally perform better than traditional machine learning methods for NLP classification tasks. The thesis also describes the modification of two ensemble models – firstly, majority voting is modified for the situation when a tie occurs, and secondly, bagging is modified with a different type of sampling. Both of these methods result in improved performances in the datasets. Overall, the research work provides a comprehensive overview of ensemble learning algorithms and their applications in improving classification performance in NLP tasks, backed by theoretical discussions, case studies, and experimental results.