A Model For Predicting Students Academic Performance In Public Secondary Schools In Kitui West Constituency
Abstract
In the present era of data deluge, institutions have accumulated huge amounts of data in their
databases. Educational institutions all over the world are not an exception, having as well
accumulated large amounts of data in their various educational management information
systems databases of various forms and formats. The accumulation of such data in various
educational institutions has led to the rise of two research fields namely; Educational data
mining and learning analytics in an effort to discover hidden knowledge (insights) that can
greatly improve operations in educational institutions. Among the hidden knowledge include
but not limited to; predicting students’ performance, students’ drop out, discovering students
interest which could avert popular student’s unrest in various institutions etc. This study
seeks to take advantage of such an opportunity and develop a model using dataset obtained
from public secondary schools in Kitui west constituency that can be used to predict students’
academic performance. There has been attempts from various researchers all over the globe
to address this problem. Although such studies achieved some level of success, various
limitation discussed in details in the empirical review militated against the performance of the
earlier models. Desk research methodology was used to extract relevant secondary data from
various schools’ departments within Kitui west constituency. Then preprocessing which
includes feature selection after which the cleaned dataset was loaded to staging Data Lake in
Hadoop. Data was queried from the Data Lake to python using Pyspark where data analysis
procedures took place. Dataset consisting of optimal subset of features was used to train four
machine-learning algorithms: Gradient boost classifier, Random forest classifier, Decision
tree classifier and Deep Neural Network classifier. Generally, Decision tree and Random
forest classifiers registered the best performance overall, with an accuracy of 97%, but after
stratified Kfold cross validation, Decision tree classifier’s performance proved more stable
with an average of 97% compared to Random forest classifier with 93%. Thus, Decision tree
classifier was recommended for deployment in predicting students ‘academic performance
for its reliable accuracy and relatively good precision on predicting the study’s target group.
The developed Model will place students in to two groups: PASS and FAIL. The aim being
to arouse an initiation of intervention from various stakeholders to reduce dismal
performance among public secondary schools in Kitui west constituency.