Regression Model For Predicting Breast Cancer Patients Using Integrated Genomic Data In Kenya: A Case Of Kenyatta National Hospital

Bundi, Doreen

View/Open

Fulltext (684.6Kb)

Downloads: 480

Date

2021

Author

Bundi, Doreen

Metadata

Show full item record

Abstract

Cancer has been characterized as a heterogeneous disease which has caused havoc worldwide with the increasing deaths related to cancer. The early diagnosis and prognosis of a cancer type have become a necessity in cancer research, as it can facilitate the subsequent clinical management of patients. The importance of classifying cancer patients into high or low risk groups has led many research teams, from the biomedical and the bioinformatics field, to study the application of machine learning (ML) methods. The main objective of the study was to develop a regression model for predicting breast cancer patients using integrated genomic data. It was facilitated by the objectives that sought to review the literature on factors to predict breast cancer patients using integrated genomic data, develop a regression model for predicting breast cancer patients using integrated genomic data and test and validate the regression model for predicting breast cancer patients using integrated genomic data. Data was obtained online through openML site. Information will be abstracted from the data obtained was used for assessing breast cancer patients. The researcher utilized the Kenyatta National Hospital dataset that includes 44,000 cancer patients. This formed the target population used in the study. The population was narrowed down to 1,172 new cases between January of 2017 and June 2019. Analysis was conducted by reviewing the literature, assessing the details and testing and validating the model for predicting cancer patients using integrated genomic data machine learning model will be applied. Inferential data analysis was used in reviewing the literature. In this case, the data was summarized into points in a constructive manner. The analysis was vital in forming the basis of quantitative data analysis. Regression analysis was used in the identification of supervised learning models and their influence on the topic. Additionally, regression analysis was employed as a predictive modeling technique that assesses the affiliation between the variables. The research findings further established that factors influencing breast cancer prognosis, screening appropriate predictors as independent variables are an important step in model construction. In this case, the demographic risk factors are important in the creation of BC risk prediction model. Additionally, it was found that the genetic variants, combinations of demographic risk factors yielded a higher risk prediction accuracy than the individual demographic risk factors. Age, disease stage, grade, tumor size, race, marital status, number of nodes, histology, number of positive nodes and primary site code have been entered into many predictive models as predictors, given that these factors represent key risk factors for onset and survival in breast cancer. The researcher proposed an ML approach to efficiently combine genetic variants with BC risk factors related to both familial history and oestrogen metabolism and to search for optimal interactions among them. According to the research, the choice of the most appropriate algorithm depends on many parameters including the types of data collected, the size of the data samples, the time limitations as well as the type of prediction outcomes. Therefore, it was recommended that the future of cancer modeling new methods should be studied for overcoming the limitations.

URI

https://repository.kcau.ac.ke/handle/123456789/1302

Collections

Faculty of Computing and Information Management [113]