Data Management Analysis for Predicting Stroke using RapidMiner
Keywords:
stroke, data mining, decision treeAbstract
Stroke are known as the second most leading cause of death. Because of this, data mining techniques are already being used to predict patients that may have stroke. Therefore, we are doing a study to try using data mining techniques using RapidMiner to find information or patterns regarding stroke from a dataset obtained from Kaggle. Three data mining techniques are used in this study, that is classification using decision trees, association rule using FP-Growth algorithm, and clustering technique using k-Means algorithm. Using RapidMiner, we are able to process the dataset using the operators provided in the application. As the result, we found out that due to an unbalanced data, the decision tree model made were only able to predict 68,75% patients as having stroke. With the association rule technique, we found out that most attributes in the dataset does not really associated with each other. With the clustering technique, we were able to group up patients and found out that most patients that have stroke are averaged in the age of 58, with 31 bmi, and 201 average glucose level.
References
Dinata, C. A., Syafrita, Y., & Sastri, S. (2013). Gambaran Faktor Risiko dan Tipe Stroke pada Pasien Rawat Inap di Bagian Penyakit Dalam RSUD Kabupaten Solok Selatan Periode 1 Januari 2010 - 31 Juni 2012. Jurnal Kesehatan Andalas, 2(2), 57-61.
Boehme, A. K., Esenwa, C., & Elkind, M. S. V. (2017). Stroke Risk Factors, Genetics, and Prevention. Circulation Research, 120(3), 472-495.
Centers of Disease Control and Prevention, "Preventing Stroke: Healthy Living," Centers for Disease Control and Prevention, 31 January 2020. [Online]. Available: https://www.cdc.gov/stroke/healthy_living.htm. [Accessed 9 May 2021].
World Health Organization, "The top 10 causes of death," World Health Organization, 9 December 2020. [Online]. Available: https://www.who.int/news-room/fact-sheets/detail/the-top-10-causes-of-death. [Accessed 9 May 2021].
Ramageri, B. M. (2020). Data Mining Techniques and Applications. Indian Journal of Computer Science and Engineering, 1(4), 301-305.
Milovic, B., & Milovic, M. (2022). Prediction and Decision Making in Health Care using Data Mining. International Journal of Public Health Science (IJPHS), 1(2), 69-78.
Durairaj, M., & Ranjani, V. (2013). Data MIning Applications in Healthcare Sector: A Study. International Journal of Scientific & Technology Research, 2(10), 29-35.
Gorade, S. M., Deo, A., & Purohit, P. (2017). A Study of Some Data Mining Classification Techniques. International Research Journal of Engineering and Technology, 4(4), 3112-3115.
Song, Y. Y., & Lu, Y. (2015). Decision tree methods: applications for classification and prediction. Shanghai Archives of Psychiatry, 27(2), 130-135.
Zeng, Y., Yin, S., Liu, J., & Zhang, M. (2015). Research of Improved FP-Growth Algorithm in Association Rules Mining. London: Hindawi.
Chen, R., Ovbiagele, B., & Feng, W. (2016). Diabetes and Stroke: Epidemiology, Pathophysiology, Pharmaceuticals and Outcomes. The American Journal of the Medical Sciences, 351(4), 380-386.
Published
How to Cite
Issue
Section
Copyright (c) 2024 Hendy Tannady, Johanes Fernandes Andry, Kosasi Kosasi
This work is licensed under a Creative Commons Attribution 4.0 International License.