• Login
    View Item 
    •   KCA University Repository Home
    • Theses and Dissertations
    • Faculty of Computing and Information Management
    • View Item
    •   KCA University Repository Home
    • Theses and Dissertations
    • Faculty of Computing and Information Management
    • View Item
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    A Data Pipeline Architecture For Classification Of Potential Claimants In Reunification Of Unclaimed Financial Assets

    Thumbnail
    View/Open
    Fulltext (2.392Mb)
    Downloads: 191
    Date
    2021
    Author
    Mudambo, Nick A
    Metadata
    Show full item record
    Abstract
    As data grows exponentially, organizations are leveraging on the capabilities of technology to generate knowledge that can support decision making. However, storage and processing data through traditional data pipeline architectures presents a risk of single point of failure. The Unclaimed Financial Assets Authority, like other organizations has faced such challenges when reunifying unclaimed financial assets, due to the inability to harness and process data received from the disintegrated systems. The main aim of this study was to develop a modern data pipeline architecture for classification of potential claimants in the reunification of unclaimed assets. Target population for the study was potential claimants that had registered on the various platforms provided by the Authority and records submitted by holders between July 1, 2020 and November 1, 2020. Secondary data was extracted from the various platforms and systems. The data used is 210587 and 1378953 records for potential claimants and holders’ reports respectively. Data cleaning was done using Python’s Pandas library. Use-modify-create development approach was used to design and implement the proposed classification of potential claimants’ data pipeline architecture by leveraging on the Lambda architecture and data lake approach. The approach facilitated activities like ingestion into Hadoop data lake. Pyspark was used to transform the data through Map Reduce approach, before classification algorithm was applied. HiBench was used to evaluate the architecture implemented where the Micro-benchmark metrics were used to refine the architecture. The major findings of the study were the high utilization of allocated resources by the Non-DFS storage and the Non-Heap memory which calls for management and monitoring to avoid out of storage and memory issues. The study recommends Neural Network algorithm for classification with an accuracy of 94.27% and F1-Score of 1. Use of Micro-benchmark workloads to indicate instances where CPU requires optimization and where disk I/O utilization is heavy was also recommended. A further comparative study that includes other ML techniques using different dataset, evaluation metrics, and Hibench workloads is recommended.
    URI
    http://repository.kca.ac.ke/handle/123456789/546
    Collections
    • Faculty of Computing and Information Management [112]

    Copyright © 2020  | KCA University Library | Off-Campus Access |
    Send Feedback
     

    Browse

    All of KCA University RepositoryCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

    My Account

    LoginRegister

    Statistics

    View Usage Statistics

    Copyright © 2020  | KCA University Library | Off-Campus Access |
    Send Feedback