PUBLICATIONS

Contributions

DATA SCIENCE PROJECTS

Data4Governance Challenge at the CCHub Lagos, Nigeria

Title: The Impact of Flood on Agriculture: A case study of Ibadan city

Date: February, 2020

CcHUB in partnership with the WorldBank, the Bill and Melinda Gates Foundation, the European Union, Korea International Cooperation Agency, and the Department for International Development hosted the Data4Governance challenge. This event was an 8-day design for development hackathon which started on February 17 and ended on February 26, 2020. Our team, Data Findars, worked on Impacts of Flood Hazards on Agriculture and Settlement. Our solution which was among the top 8 can be found here and executive report here.

Data Science Capstone Project for Microsoft Professional Programme in Data Science

Title: Mortgage Loan Approvals Prediction from Government Data.

Date: May- June, 2019 

The project considers how demographics, location, property type, lender, and other factors are related to whether mortgage application across the United States was accepted or denied. We trained a Catboost model on 500,000 mortgage loan applications and noticed that lender, applicant income, loan purpose, loan amount and state code have a significant effect on the mortgage loan approval. The prediction results whose code file can be found here achieved a public score of 0.7330 out of the benchmark of 0.7350 and you can also read the executive report here.

Date: March- April, 2019         

A data-driven challenge organized by Deep learning IndabaXMorocco tested ML skills on a real-case problem.  The Github page which can be found here used various catalogs of machine learning models to predict the class labels of heart disease data on the validation dataset. We evaluated the performance of each model with logloss metric and the model with the least logloss was used to predict the class label on the validation dataset. Our submission under GBG-IXM was ranked 1st among IXM groups.

A Comprehensive Empirical Demonstration of the No Free Lunch Theorem (NFLT) in Statistical Machine Learning

Date: November 2018- February 2019              

In this project, we provided a comprehensive empirical demonstration of the NFLT by comparing the predictive performances of a wide variety of machine learning algorithms/methods on a wide variety of qualitative and quantitative different datasets. Our research work conclusively demonstrates great evidence in favor of the NFLT by using the overall ranking of methods and their corresponding learning machines. The github can be found here.       


Statistical automation of tax analysis

Date: May - November, 2018 

This project used tidyverse and some other R packages to create an automated data portal which included revenue collection statistics, registration statistics, HR statistics, cost statistics, etc. for the Rwanda Revenue Authority.