PENGARUH SMOTE TERHADAP PERFORMA ALGORITMA RANDOM FOREST DAN ALGORITMA GRADIENT BOOSTING DALAM MEMPREDIKSI PENYAKIT STROKE
Abstract
Stroke is a disease that can occur suddenly, causing progressive brain damage due to non-traumatic blood flow disruption in the brain. Common symptoms of stroke include numbness in the limbs and impaired communication. Stroke is the second leading cause of death in the world and the third leading cause of mental retardation globally. Predictive machine learning-based technology can help identify early symptoms of stroke for prevention and early intervention. This study aims to compare the performance of the Random Forest and Gradient Boosting algorithms in predicting stroke. By applying the SMOTE method to address class accuracy in the dataset, this study shows that the Random Forest model is superior, with an accuracy of 95.5%, a precision of 78.8%, a recall of 93.1%, and an f1-score of 84.2%. In conclusion, the Random Forest algorithm performs better than Gradient Boosting in predicting stroke, showing significant potential in assisting early detection and medical decision making.
References
[2] Y. A. Utama and S. S. Nainggolan, “Faktor Resiko yang Mempengaruhi Kejadian Stroke: Sebuah Tinjauan Sistematis,” J. Ilm. Univ. Batanghari Jambi, vol. 22, no. 1, p. 549, 2022, doi: 10.33087/jiubj.v22i1.1950.
[3] F. Meila Azzahra Sofyan, A. Putri Riyandoro, D. Fitriani Maulana, J. Haerul Jaman, S. Informasi, and U. Singaperbangsa Karawang, “Penerapan Data Mining dengan Algoritma C5.0 Untuk Prediksi Penyakit Stroke,” Juli, vol. 6, pp. 619–625, 2023, [Online]. Available: https://ojs.trigunadharma.ac.id/index.php/jsk/index
[4] F. Akbar, H. W. Saputra, A. K. Maulaya, M. F. Hidayat, and R. Rahmaddeni, “Implementasi Algoritma Decision Tree C4.5 dan Support Vector Regression untuk Prediksi Penyakit Stroke,” MALCOM Indones. J. Mach. Learn. Comput. Sci., vol. 2, no. 2, pp. 61–67, 2022, doi: 10.57152/malcom.v2i2.426.
[5] K. R. Sulaeman, C. Setianingsih, and R. E. Saputra, “Analisis Algoritma Support Vector Machine Dalam Klasifikasi Penyakit Stroke Support Vector Machine Algorithm Analysis In Stroke Disease Classification,” e-Proceeding Eng., vol. 9, no. 3, pp. 922–928, 2022.
[6] R. A. Zuama, S. Rahmatullah, and Y. Yuliani, “Analisis Performa Algoritma Machine Learning pada Prediksi Penyakit Cerebrovascular Accidents,” J. Media Inform. Budidarma, vol. 6, no. 1, p. 531, 2022, doi: 10.30865/mib.v6i1.3488.
[7] K. L. Kohsasih and Z. Situmorang, “Analisis Perbandingan Algoritma C4.5 dan Naïve Bayes Dalam Memprediksi Penyakit Cerebrovascular,” J. Inform., vol. 9, no. 1, pp. 13–17, 2022, doi: 10.31294/inf.v9i1.11931.
[8] W. Apriliah, I. Kurniawan, M. Baydhowi, and T. Haryati, “Prediksi Kemungkinan Diabetes pada Tahap Awal Menggunakan Algoritma Klasifikasi Random Forest,” Sistemasi, vol. 10, no. 1, p. 163, 2021, doi: 10.32520/stmsi.v10i1.1129.
[9] S. Hermawan and S. Budi, “Analisis dan Prediksi Pertempuran Game Of Thrones Menggunakan Algoritma Random Forest dan Logistic Regression,” vol. 3, no. November, p. 454, 2021, [Online]. Available: https://www.kaggle.com/
[10] H. S. W. Hovi, A. Id Hadiana, and F. Rakhmat Umbara, “Prediksi Penyakit Diabetes Menggunakan Algoritma Support Vector Machine (SVM),” Informatics Digit. Expert, vol. 4, no. 1, pp. 40–45, 2022, doi: 10.36423/index.v4i1.895.
[11] A. Kharis Pratama, H. Ashaury, and F. Rakhmat Umbara, “Klasifikasi Data Gempa Bumi Di Pulau Jawa Menggunakan Algoritma Extreme Gradient Boosting,” JATI (Jurnal Mhs. Tek. Inform., vol. 7, no. 4, pp. 2923–2929, 2024, doi: 10.36040/jati.v7i4.7296.
[12] Z. Zuriati and N. Qomariyah, “Klasifikasi Penyakit Stroke Menggunakan Algoritma K-Nearest Neighbor (KNN),” ROUTERS J. Sist. dan Teknol. Inf., vol. 1, no. 1, pp. 1–8, 2022, doi: 10.25181/rt.v1i1.2665.
[13] Y. Aulia and S. W. Nensi, “Analisis Prediksi Stroke dengan Membandingkan Tiga Metode Klasifikasi Decision Tree , Naïve Bayes , dan Random Forest,” vol. 3, no. 2, 2023.
[14] C. B. Sonjaya, A. Fitri, N. Masruriyah, D. S. Kusumaningrum, and A. R. Pratama, “The Performance Comparison of Classification Algorithm in Order to Detecting Heart Disease,” Intern. (Information Syst. J., vol. 5, no. 2, pp. 166–175, 2022, [Online]. Available: http://jurnal.masoemuniveristy.ac.id/index.php/internal
[15] R. A. Nurdian, Mujib Ridwan, and Ahmad Yusuf, “Komparasi Metode SMOTE dan ADASYN dalam Meningkatkan Performa Klasifikasi Herregistrasi Mahasiswa Baru,” J. Tek. Inform. dan Sist. Inf., vol. 8, no. 1, pp. 24–32, 2022, doi: 10.28932/jutisi.v8i1.4004.
[16] W. Hidayat, M. Ardiansyah, and A. Setyanto, “Pengaruh Algoritma ADASYN dan SMOTE terhadap Performa Support Vector Machine pada Ketidakseimbangan Dataset Airbnb,” Edumatic J. Pendidik. Inform., vol. 5, no. 1, pp. 11–20, 2021, doi: 10.29408/edumatic.v5i1.3125.
[17] I. P. Putri, “Analisis Performa Metode K- Nearest Neighbor (KNN) dan Crossvalidation pada Data Penyakit Cardiovascular,” Indones. J. Data Sci., vol. 2, no. 1, pp. 21–28, 2021, doi: 10.33096/ijodas.v2i1.25.
[18] R. Supriyadi, W. Gata, N. Maulidah, and A. Fauzi, “Penerapan Algoritma Random Forest Untuk Menentukan Kualitas Anggur Merah,” E-Bisnis J. Ilm. Ekon. dan Bisnis, vol. 13, no. 2, pp. 67–75, 2020, doi: 10.51903/e-bisnis.v13i2.247.
[19] S. E. Herni Yulianti, Oni Soesanto, and Yuana Sukmawaty, “Penerapan Metode Extreme Gradient Boosting (XGBOOST) pada Klasifikasi Nasabah Kartu Kredit,” J. Math. Theory Appl., vol. 4, no. 1, pp. 21–26, 2022, doi: 10.31605/jomta.v4i1.1792.
[20] S. E. Suryana, B. Warsito, and S. Suparti, “Penerapan Gradient Boosting Dengan Hyperopt Untuk Memprediksi Keberhasilan Telemarketing Bank,” J. Gaussian, vol. 10, no. 4, pp. 617–623, 2021, doi: 10.14710/j.gauss.v10i4.31335.