Regression Model For Predicting Breast Cancer Patients Using Integrated Genomic Data In Kenya: A Case Of Kenyatta National Hospital
Abstract
Cancer has been characterized as a heterogeneous disease which has caused havoc worldwide
with the increasing deaths related to cancer. The early diagnosis and prognosis of a cancer type
have become a necessity in cancer research, as it can facilitate the subsequent clinical
management of patients. The importance of classifying cancer patients into high or low risk
groups has led many research teams, from the biomedical and the bioinformatics field, to study
the application of machine learning (ML) methods. The main objective of the study was to
develop a regression model for predicting breast cancer patients using integrated genomic data.
It was facilitated by the objectives that sought to review the literature on factors to predict
breast cancer patients using integrated genomic data, develop a regression model for predicting
breast cancer patients using integrated genomic data and test and validate the regression model
for predicting breast cancer patients using integrated genomic data. Data was obtained online
through openML site. Information will be abstracted from the data obtained was used for
assessing breast cancer patients. The researcher utilized the Kenyatta National Hospital dataset
that includes 44,000 cancer patients. This formed the target population used in the study. The
population was narrowed down to 1,172 new cases between January of 2017 and June 2019.
Analysis was conducted by reviewing the literature, assessing the details and testing and
validating the model for predicting cancer patients using integrated genomic data machine
learning model will be applied. Inferential data analysis was used in reviewing the literature.
In this case, the data was summarized into points in a constructive manner. The analysis was
vital in forming the basis of quantitative data analysis. Regression analysis was used in the
identification of supervised learning models and their influence on the topic. Additionally,
regression analysis was employed as a predictive modeling technique that assesses the
affiliation between the variables. The research findings further established that factors
influencing breast cancer prognosis, screening appropriate predictors as independent variables
are an important step in model construction. In this case, the demographic risk factors are
important in the creation of BC risk prediction model. Additionally, it was found that the
genetic variants, combinations of demographic risk factors yielded a higher risk prediction
accuracy than the individual demographic risk factors. Age, disease stage, grade, tumor size,
race, marital status, number of nodes, histology, number of positive nodes and primary site
code have been entered into many predictive models as predictors, given that these factors
represent key risk factors for onset and survival in breast cancer. The researcher proposed an
ML approach to efficiently combine genetic variants with BC risk factors related to both
familial history and oestrogen metabolism and to search for optimal interactions among them.
According to the research, the choice of the most appropriate algorithm depends on many
parameters including the types of data collected, the size of the data samples, the time
limitations as well as the type of prediction outcomes. Therefore, it was recommended that the
future of cancer modeling new methods should be studied for overcoming the limitations.