eArticles

Home

eArticles

검색결과 돌아가기

검색화면

Export 프린트

Clustering as a Catalyst for Big Data Classification (CC-BC)

Resource Type: Conference
Authors: Halder, Mithun; Shopnil, Shayanta; Arafat, Yeasir; Chowdhury, Md Muzadded; Hossain Jobayer, Sayed; Farid, Dewan Md.
Source: 2023 26th International Conference on Computer and Information Technology (ICCIT) Computer and Information Technology (ICCIT), 2023 26th International Conference on. :1-6 Dec, 2023
Subject: Aerospace
Bioengineering
Communication, Networking and Broadcast Technologies
Components, Circuits, Devices and Systems
Computing and Processing
Engineered Materials, Dielectrics and Plasmas
Fields, Waves and Electromagnetics
Photonics and Electrooptics
Power, Energy and Industry Applications
Robotics and Control Systems
Signal Processing and Analysis
Big Data
Predictive models
Data mining
Labeling
Task analysis
Regression tree analysis
Random forests
Classification
Clustering
Auto labelling
Ensemble learning
Language

Online Access

Full Text (IEEE)

초록

In supervised learning, data classification is the method of categorizing data to facilitate data mining processes for informed decision-making. The central aim of a classification model is to accurately predict the categorical data for both familiar and unfamiliar instances. The classification models in machine learning are usually trained with datasets where instances are labeled. This paper explores an alternative way of constructing classification models based on the similarities of the instances rather than labels annotated by experts. The process of labeling data is a resource-intensive and time consuming process incredibly challenging when dealing with large datasets known as big data. In light of the proposed methodology clusters, the big data developing classifiers based on these clusters while bypassing the predefined class labels. This approach enhanced the performance of the classifier. Moreover, the generated clusters can be associated with the relevant class labels introducing a link between the unsupervised clustering and the supervised classification task. To validate our proposed approach, we gathered a diverse collection of data from Kaggle. For experimental analysis, we applied three widely recognized decision tree induction ID3 (Iterative Dichotomiser 3), C4.5 (extension of ID3 algorithm), CART (Classification & Regression Tree), NavieBayes classifier and Ensemble classifier(RandomForest, Bagging, Boosting). The outcomes of our investigation shed light on the potential of leveraging instance clustering for classification tasks, potentially revolutionizing the conventional paradigms of supervised learning in the domain of data mining and decision support.

공지

DAU Library

eArticles

요약정보

Clustering as a Catalyst for Big Data Classification (CC-BC)

Online Access

초록