Theses, Dissertations & Reports
Permanent URI for this communityhttp://dspace.iiuc.ac.bd/handle/88203/66
Browse
Item “ BANGLA TEXT CLASSIFICATION THROUGH MACHINE LEARNING ALGORITHM”(International Islamic University Chittagong, Department of Computer Science and Engineering, 2023-07) Tarek, Ashraf Uddin; Chowdhury, Satirtha; Hossain, MosharafThe increasing number of social media users and e-commerce platforms have a great impact on people's daily life. People share their emotions and opinion through the social media platform. These emotions and comments now take the important place of analysis because based on these emotions, opinion, business farm make them plan what will produce, consumer decide what will he/she buy etc. Lots of work have been carried out to analysis the sentiment or emotion in English language. Due to the complexity of the Bangla language, few work has been done on it but in recent years several researchers have been carrying out various research based on the Bangla language. In this paper we conduct sentiment analysis (SA) on Bangla language by using the machine leaning model (ML). Most of the work basically divide the sentiment into three categories in this paper we divide the sentiment into four categories namely strong positive, positive, negative, and strongly negative. In this paper we use logistic regression (LR), decision tree (DT), Random Forest (RF), linear support vector machine (LSVM), confusion matrix, and kernel support vector machine (KSVM) algorithm of machine learning (ML). From support vector machine (SVM) we mainly used gaussian kernel radial basis function (RBF). The sentiments are converted to NumPy array to use the sentiment in machine learning. Since the NumPy array is numeric we train our model by these data to get the proper prediction about any given sentiment whether that sentiment positive or negative. The data used are all raw data or primary data collected from the different microblog websites and social media platforms. The total number of raw data 10851 most of them collected from Facebook and YouTube due to their popularity in our country some of the data collected from twitter as well. All the ML model applied for the single word of the sentences first, known as unigram feature analysis best on the single word the logistic regression model and RBF SVM provide the highest accuracy 71.26% and 71.91% respectively. By using two words each model works almost like unigram models, in bigram models LR again shows the highest accuracy, but the accuracy level little bit dropped for RBF SVM. The accuracy for LR and RBF SVM 72.04% and 68.79% respectively. Later we used models for three words defined as trigram feature analysis in that time get highest 70.87% accuracy for LR. Most of the papers basically use SentiWordNet to assess the polarity of the sentiments but in this paper, we use word by word analysis which hardly 6 seen to any paper. This paper will help the business farm as well as the consumers to make their decision and will work as guideline for the new researcher in this topic.