Download Presentation (428 KB)

Publication Date


Document Type


Presentation Type



Haiyan Xie

Mentor Department



Background: Coronavirus disease 2019 (COVID-19) which is caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) was declared as a pandemic. We aimed to predict death outcome of COVID-19 using data mining methods. Material and Methods: 72390 laboratory-confirmed case were included to the study during February 10, 2020 to August 17, 2020, according to the Centers for Disease Control and Prevention (CDC) data. . In order to be able to find important and influential variables in predicting the mortality of COVID-19 among our variables including demographical, and clinical factors, we used the random forest method. The prediction of death outcome was done using logistic regression with all variables and selected variables. Results: Through all patients, 2150 (2.97%) cases experienced the death outcome. The association between disease outcome (survivor and deceased) and variables including age, gender, US worker, developed, race, all sign and symptoms, abnormal chest X-ray, ARDS, hospitalization, ICU admission, and intubation was significant (p-value < 0.001). The median value of mean decrease accuracy was 42.10 and variables including age group, ARDS, fever, sex, cough, race, subjective fever, abnormal chest X-ray, diarrhea, intubation, dyspnea, and Myalgia were selected as the important factors for prediction of death outcome. Logistic regression with all variables and selected variables had the AUC of 0.96. Other performance criteria were slightly different between two models. Conclusion: Using data mining models, the death of patients with COVID-19 can be predicted with high accuracy

Predicting The Mortality Risk In Covid-19 With Clinical Characteristics And Laboratory Outcome Characteristics Using Data Mining Methods