Modelling COVID-19 Individual Risks in Sweden Using Spatial Information, Statistics and Machine Learning

Typ
Examensarbete för masterexamen
Master's Thesis
Program
Complex adaptive systems (MPCAS), MSc
Publicerad
2024
Författare
Fu, Lukas
Modellbyggare
Tidskriftstitel
ISSN
Volymtitel
Utgivare
Sammanfattning
The Covid-19 pandemic was a modern time pandemic that lasted a little over two years, and caused a severe social and economical disruption on a worldwide scale. Using data consisting of individual and DeSO covariates of the population of Sweden, sourced from Statistics Sweden and the Public Health Agency of Sweden, this project aims to model individual risks of Covid-19 using machine learning algorithms, and to extract information on feature importance from the fitted models. The models tested include logistic regression, random forest, support vector machines and neural network, and Shapley values were additionally evaluated for random forest in an attempt to gain more insight into the feature relation to the prediction. The logistic regression and random forest models both resulted in feature importances consisting of a mixture of individual and DeSO features, where features such as age, level of education, and living conditions for both the DeSO and the individual, along with income and occupation of the individual, showed high importance. Support vector machines and neural network models did not produce any useful results due to computational limitations. The large size of the data set was a consistent hindrance in this project, as many issues were caused by computational costs, and many of the improvements on optimization in this project are centered around handling these costs. Further research may entail in optimizing performances of presented or alternate models, but may also expand to more thoroughly analyse the spatial and temporal dependencies of disease cases. While the results of this project might not be particularly significant on its own, this project may still provide a basis for future developments in pandemic data analysis.
Beskrivning
Ämne/nyckelord
Modelling, Machine Learning, Neural Network, Logistic Regression, Random Forest, Support Vector Machine, SHAP, COVID-19
Citation
Arkitekt (konstruktör)
Geografisk plats
Byggnad (typ)
Byggår
Modelltyp
Skala
Teknik / material
Index