RESEARCH ARTICLE


Machine Learning Model for Predicting Number of COVID-19 Cases in Countries with Low Number of Tests



Samy Hashim1, Sally Farooq1, Eleni Syriopoulos1, Kai de la Lande Cremer1, Alexander Vogt1, Nol de Jong1, Victor L. Aguado1, Mihai Popescu1, Ashraf K. Mohamed1, Muhamed Amin1, *
1 Department of Sciences, University College Groningen, Hoendiepskade 23/24 9718 BG Groningen, Netherlands


Article Metrics

CrossRef Citations:
0
Total Statistics:

Full-Text HTML Views: 1222
Abstract HTML Views: 810
PDF Downloads: 450
ePub Downloads: 323
Total Views/Downloads: 2805
Unique Statistics:

Full-Text HTML Views: 727
Abstract HTML Views: 326
PDF Downloads: 353
ePub Downloads: 240
Total Views/Downloads: 1646



Creative Commons License
© 2022 Hashim et al.

open-access license: This is an open access article distributed under the terms of the Creative Commons Attribution 4.0 International Public License (CC-BY 4.0), a copy of which is available at: https://creativecommons.org/licenses/by/4.0/legalcode. This license permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

* Address correspondence to this author at the Department of Sciences, University College Groningen, Hoendiepskade 23/24 9718 BG Groningen, Netherlands; E-mail: m.a.a.amin@rug.nl


Abstract

Background:

The COVID-19 pandemic has presented a series of new challenges to governments and healthcare systems. Testing is one important method for monitoring and controlling the spread of COVID-19. Yet with a serious discrepancy in the resources available between rich and poor countries, not every country is able to employ widespread testing.

Methods and Objective:

Here, we have developed machine learning models for predicting the prevalence of COVID-19 cases in a country based on multilinear regression and neural network models. The models are trained on data from US states and tested against the reported infections in European countries. The model is based on four features: Number of tests, Population Percentage, Urban Population, and Gini index.

Results:

The population and the number of tests have the strongest correlation with the number of infections. The model was then tested on data from European countries for which the correlation coefficient between the actual and predicted cases R2 was found to be 0.88 in the multi-linear regression and 0.91 for the neural network model

Conclusion:

The model predicts that the actual prevalence of COVID-19 infection in countries where the number of tests is less than 10% of their populations is at least 26 times greater than the reported numbers.

Keywords: Machine learning, Model, COVID-19 cases, Healthcare systems, Testing, RNA viruses.