Abstract:
The object of this research work is to propose a new method of automatic Bangla news document summarization. It is noticeable that the existing English text summarization systems may not be directly applicable for Bangla for the complexities of Bangla language in grammatical rules, structure of sentences, different placement of subject and object, etc. Again, the research work for Bangla language processing is difficult because there is hardly any automated tool to facilitate research work. In this challenging situation, a new approach for Bangla news document summarization has been presented here by introducing pronoun replacement and an improved version of sentence ranking. Major parts of this approach are (i) preprocessing the input document, (ii) word tagging, (iii) replacement of pronoun, and (iv) sentence ranking. Replacement of pronoun has been accomplished here for the rst time to minimize the dangling pronoun in summary. After replacing pronoun, sentences are ranked by considering (i) term frequency, (ii) sentence frequency, (iii) numerical gures (presented in words and digits), and (iv) title words. If two sentences has at least 60% cosine similarity, frequency of larger sentence is increased and remove smaller sentence which eliminates redundancy. Moreover, the rst sentence has been specially considered for containing any title word. Again, numerical gure has beenidenti ed from words and digits to assess the importance of sentences despite the variety of forms for any numerical gure in Bangla. For achieving the target of this proposed method, 3000 news documents have been analyzed and some Bangla grammar books have been studied. The effect of each incorporated feature has been demonstrated with step by step performance analysis. From the evaluation results of the proposed method, the F-measure scores for ROUGE-1 and ROUGE-2 have been found as 0.6003 and 0.5708 respectively and the accuracy of pronoun replacement has been found as 71.80%. The proposed method has minimized the dangling pronoun in summary for 89.75% than the latest Bangla text summarization system. Again, the text summarization performance of the proposed method has been observed as 9.39% (based on ROUGE-1 F-measure score) and 12.52% (based on ROUGE-2 F-measure score) better than the latest existing method.