“ BANGLA TEXT CLASSIFICATION THROUGH  MACHINE LEARNING ALGORITHM”

Tarek, Ashraf Uddin; Chowdhury, Satirtha; Hossain, Mosharaf

“ BANGLA TEXT CLASSIFICATION THROUGH MACHINE LEARNING ALGORITHM”

dc.contributor.author	Tarek, Ashraf Uddin
dc.contributor.author	Chowdhury, Satirtha
dc.contributor.author	Hossain, Mosharaf
dc.date.accessioned	2024-03-27T08:17:00Z
dc.date.available	2024-03-27T08:17:00Z
dc.date.issued	2023-07
dc.description	A Dissertation Submitted in Fulfilment of the Requirements for the Degree of Bachelor of Science (B.Sc.) Page: 1-62	en_US
dc.description.abstract	The increasing number of social media users and e-commerce platforms have a great impact on people's daily life. People share their emotions and opinion through the social media platform. These emotions and comments now take the important place of analysis because based on these emotions, opinion, business farm make them plan what will produce, consumer decide what will he/she buy etc. Lots of work have been carried out to analysis the sentiment or emotion in English language. Due to the complexity of the Bangla language, few work has been done on it but in recent years several researchers have been carrying out various research based on the Bangla language. In this paper we conduct sentiment analysis (SA) on Bangla language by using the machine leaning model (ML). Most of the work basically divide the sentiment into three categories in this paper we divide the sentiment into four categories namely strong positive, positive, negative, and strongly negative. In this paper we use logistic regression (LR), decision tree (DT), Random Forest (RF), linear support vector machine (LSVM), confusion matrix, and kernel support vector machine (KSVM) algorithm of machine learning (ML). From support vector machine (SVM) we mainly used gaussian kernel radial basis function (RBF). The sentiments are converted to NumPy array to use the sentiment in machine learning. Since the NumPy array is numeric we train our model by these data to get the proper prediction about any given sentiment whether that sentiment positive or negative. The data used are all raw data or primary data collected from the different microblog websites and social media platforms. The total number of raw data 10851 most of them collected from Facebook and YouTube due to their popularity in our country some of the data collected from twitter as well. All the ML model applied for the single word of the sentences first, known as unigram feature analysis best on the single word the logistic regression model and RBF SVM provide the highest accuracy 71.26% and 71.91% respectively. By using two words each model works almost like unigram models, in bigram models LR again shows the highest accuracy, but the accuracy level little bit dropped for RBF SVM. The accuracy for LR and RBF SVM 72.04% and 68.79% respectively. Later we used models for three words defined as trigram feature analysis in that time get highest 70.87% accuracy for LR. Most of the papers basically use SentiWordNet to assess the polarity of the sentiments but in this paper, we use word by word analysis which hardly 6 seen to any paper. This paper will help the business farm as well as the consumers to make their decision and will work as guideline for the new researcher in this topic.	en_US
dc.identifier.uri	http://dspace.iiuc.ac.bd:8080/xmlui/handle/123456789/8135
dc.publisher	International Islamic University Chittagong, Department of Computer Science and Engineering	en_US
dc.subject	Facebook	en_US
dc.subject	YouTube	en_US
dc.subject	Twitter	en_US
dc.subject	Unigram	en_US
dc.subject	Bigram	en_US
dc.subject	Trigram	en_US
dc.subject	Machine learning Model	en_US
dc.subject	Primary data	en_US
dc.subject	Feature	en_US
dc.title	“ BANGLA TEXT CLASSIFICATION THROUGH MACHINE LEARNING ALGORITHM”	en_US
dc.type	Thesis	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: ID C191011, C191020, C183032.pdf
Size:: 1.43 MB
Format:: Adobe Portable Document Format
Description:

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.71 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Bachelor of Computer Science and Engineering (CSE)