Abstract:
Natural Language Processing (NLP) encompasses a multitude of practical applications,
including Information Retrieval, Information Extraction, Machine Translation, Text
Simplification, Sentiment Analysis, Text Summarization, Spam Filtering, Auto-prediction,
Auto-correction, Speech Recognition, Question Answering, and Natural Language Generation.
Many of these applications are essentially classification tasks, which can be performed by
machine learning models. Ensemble techniques within machine learning involve combining
multiple models to improve predictive performance compared to individual models. This thesis
explores the application of ensemble learning techniques to improve classification performance
in NLP tasks.
Various ensemble learning techniques, including bagging, boosting, random forest, and voting,
are explored and experimented with. For each ensemble method, common base models, such
as Support Vector Machines (SVM), Naive Bayes, Decision Trees, and K-Nearest Neighbor
(KNN), are employed. Various evaluation metrics commonly used in NLP classification tasks
are used, including accuracy, precision, recall, F1-score, and time complexity of the algorithms.
The findings of the thesis suggest that ensemble methods, especially boosting, generally
perform better than traditional machine learning methods for NLP classification tasks. The
thesis also describes the modification of two ensemble models – firstly, majority voting is
modified for the situation when a tie occurs, and secondly, bagging is modified with a different
type of sampling. Both of these methods result in improved performances in the datasets.
Overall, the research work provides a comprehensive overview of ensemble learning
algorithms and their applications in improving classification performance in NLP tasks, backed
by theoretical discussions, case studies, and experimental results.