Comparing Rotation Forest Model And Enhanced Random Forest Model On Imbalanced Data (Application To Classification Of Poverty Households In Sampang Regency, 2019)

Ari Shobri Bukhari, Khairil Anwar Notodiputro, Bagus Sartono

Abstract


The first priority of the SDGs is poverty eradication (no poverty). In the Indonesian context, poverty cases has a high correlation with the profession in agriculture sector. For example, in the 2019 Susenas data in Sampang Regency, more than 80% of the sample households categorized as poor have a household head who works in agriculture. Poverty alleviation efforts always begin with identifying/classifying poverty households/families. However, data on poor households is usually unbalanced data that requires special handling in its analysis. This study uses a classification models that are widely used today (in data science world), namely Random Forest and its development methods (Rotation Forest, and Enhanced Random Forest), in classifying poor and non-poor households. The results showed that the forest-based model studied had a low estimation ability when used in cases of unbalanced data, so an approach such as resampling technique was needed before carrying out the classification process. This study cannot conclude which one of the forest model is the most robust for unbalanced data or which method is the most suitable for the use of resampling techniques, but the results of the study show that the use of resampling techniques will improve the quality of the estimation results, especially on sensitivity side.

 

 


Keywords


Poverty household classification, unbalanced ata, random forest, rotation forest, enhanced random forest, resampling technique

Full Text:

PDF

References


Badan Pusat Statistik. 2022. Persentase Penduduk Miskin (P0) Menurut Kabupaten/Kota (Persen). https://www.bps.go.id/indicator/23/621/2/persentase-penduduk-miskin-p0-menurut-kabupaten-kota.html. Last Accessed 20 March 2022.

Bharathidason, et al. 2014. Improving Classification Classification Accuracy based on Random Forest Model with Uncorrelated High Performing Trees. International Journal of Computer Applications. 101(13): 26-30.

Breiman, et al. 1984. Classification and Regression Trees. New York: Wadsworth, Belmont, CA.

Breiman, L. 2001. Random Forest. Machine Learning, Kluwer Academic Publishers, Netherlands. 45(1): 5-32

Chawla, V., et al. 2002. Smote: Synthetic minority oversampling technique. Journal of Atrificial Intelligence Research 9(1), 321–357.

Fawcett, T. 2006. An Introduction to Roc Analysis. Pattern Recognition Letters 27, 861–874. Institute for the Study of Learning and Expertise, USA.

Kurnia, Fauzi. 2016. Kajian Metode Enhanced Random Forest Untuk Perbaikan Klasifikasi Classification and Regression Trees (CART). Jakarta: Sekolah Tinggi Ilmu Statistik.

Raharjo, M. 2016. Kajian Empirik Akurasi Prediksi Klasifikasi Metode Rotation Forest. Bogor(ID): Institut Pertanian Bogor. In press.

Rodriguez, J., L. et al. 2006. Rotation forest: A new classifier ensemble method. IEEE Transactions on Pattern Analysis and Machine Intellgince 28(10), 1619–1630.

Wijaya and Junjun. 2018. Penanganan Data Tidak Seimbang pada Pemodelan Rotation Forest Keberhasilan Studi Mahasiswa Program Magister. Xplore (ISSN:2302-5751). 2(2): 32-40.




DOI: https://doi.org/10.21776/ub.jepa.2023.007.02.44

Refbacks

  • There are currently no refbacks.