Abstract

Understanding data on novel coronavirus (COVID-19) pandemic, and modeling such data over time are crucial for decision making at managing, fighting, and controlling the spread of this emerging disease. This thesis work looks at some aspects of exploratory analysis and modeling of COVID-19 data obtained from the Florida Department of Health (FDOH). In particular, the present work is devoted to data collection, preparation, description, and modeling of COVID-19 cases and deaths reported by FDOH between March 12, 2020, and April 30, 2021. For modeling data on both cases and deaths, this thesis utilized an autoregressive integrated moving average (ARIMA) times series model. The "IDENTIFY" statement of SAS PROC ARIMA suggests a few competing models with suggested values of the parameter p (the order of the Autoregressive model), d (the order of the differencing), and q (the order of the Moving Average model). All suggested models are then compared using AIC (Akaike Information Criterion), SBC (Schwarz Bayes Criterion), and MAE (Mean Absolute Error) values, and the best-fitting models are then chosen with smaller values of the above model comparison criteria. To evaluate the performance of the model selected under this modeling approach, the procedure is repeated using the first six month's data and forecasting the next 7 days data, nine month's data and forecasting the next 7 days data, and then all reported FDOH data from March 12, 2020, to April 30, 2021, and forecasting the future data. The findings of exploratory data analysis that suggests higher COVID-19 cases for females compared to males and higher male deaths compared to females are taken into consideration by evaluating the performance of final models by gender for both cases and deaths' data reported by FDOH. The gender-specific models appear to be better under model comparison criteria Mean Absolute Error (MAE) and Root Mean Squared Error (RMSE) compared to models based on gender aggregated data. It is observed that the fitted models reasonably predicted the future numbers of confirmed cases and deaths. Given similarities in reported COVID-19 data, the proposed modeling approach can be applied to data in the USA and many other States, and countries around the world.

Notes

If this is your thesis or dissertation, and want to learn how to access it or for more information about readership statistics, contact us at STARS@ucf.edu.

Graduation Date

2021

Semester

Summer

Advisor

Uddin, Nizam

Degree

Master of Science (M.S.)

College

College of Sciences

Department

Statistics & Data Science

Degree Program

Statistical Computing; Data Mining

Format

application/pdf

Identifier

CFE0008732;DP0025463

URL

https://purls.library.ucf.edu/go/DP0025463

Language

English

Release Date

August 2021

Length of Campus-only Access

None

Access Status

Masters Thesis (Open Access)

Share

COinS