HDSC’ 22 Premier Project: Real Life Machine Learning Topics

9 min readJan 11, 2022


Our objective to raise the next generation of problem solvers is one of the many motivations to ensure that every one of our interns gets the chance to apply learning to reality; solving real world problems. Walking down history lane to the first machine learning project (the game of checkers) by IBMer Arthur Samuelin 1952, different aspects of the world’s economy have adapted well to advancements in machine learning and artificial intelligence; including health, finance, security, hospitality, real estates, etc.

Interns are encouraged to complete real life machine learning projects as a part of their internship with Hamoye. This will further bolster their learning and knack for the concepts learned in the internship; like Tim Sanders said,

“Education without application is just entertainment”.

HDSC22 Premiere Projects

Details of the premiere projects are provided here. You can search your project details using the project code assigned to your project group


  • Scope: Health and Medicare
  • Topic: COVID-19 Clinical Trials dataset: Database of COVID-19 related clinical studies being conducted worldwide
  • Project Description:

Since its emergence in 2019, covid-19 has been the most talked about subject. Datasets consist of clinical trials related to COVID 19 studies, and can be useful for answering some of the most asked questions about the success of the clinicals. See more about the provided dataset here


  • Scope: Health and Medicare
  • Topic: US Hospital Customer Satisfaction 2016- 2020: Is Patient Satisfaction Correlated With Overall Hospital Performance?
  • Project Description:

One of the many ways a hospital is ranked is the satisfaction of its patients. By collecting data from the United States’ Centers for Medicare and Medicaid Services (CMS), it is possible to analyse how satisfied US patients are. See more about the provided datasets here


  • Scope: Health and Medicare
  • Topic: Mortality by age and state in Germany (2016–2021)
  • Project Description:

Death rates in Germany are collected over a five year period, and they have been grouped by age, gender and state. See more about the provided datasets here


  • Scope: Health and Medicare
  • Topic: Breast Cancer Analysis & Prediction
  • Project Description:

According to the World Health Organization, breast cancer is currently the most common type of cancer worldwide, with 2.26 million cases recorded in 2020. Using collated data and applying machine learning techniques, we can seek to understand the chances of a person being diagnosed with breast cancer. See more about the provided dataset here


  • Scope: Health and Medicare
  • Topic: Blindness Detection
  • Project Description:

Reports have shockingly revealed that there are 43 million people living with blindness and 295 million people living with moderate-to-severe visual impairment globally. Out of these, a huge 77% is completely preventable or treatable. The objective here is to be able to detect and help put measures in place to prevent or treat blindness and yes, machine learning can help. See more about the provided dataset here.


  • Scope: Health and Medicare
  • Topic: Classification & Prediction of Dementia
  • Project Description:

The deterioration of cognitive function of a person is in some way, one of the many syndromes that one wouldn’t want to be diagnosed with. The chances that a person will have dementia and the probable type they can have is what we aim to solve in this project. See more about the provided dataset here


  • Scope: Trade, agriculture and finance
  • Topic: Consumer Finance Complaints (Bureau of Consumer Financial Protection)
  • Project Description:

Financial institutions receive complaints about services and transactions everyday that must be attended to. We have collected data between 2011 and 2019 in the form of these complaints customers have made about multiple products and services in the financial sector, such as Credit Reports, Student Loans, Money Transfer, etc. See more about the provided dataset here


  • Scope: Trade, agriculture and finance
  • Topic:: New York Stock Exchange
  • Project Description:

Dataset provided focuses on the S&P 500 companies historical prices for fundamental and technical analysis. See more about the provided dataset here


  • Scope: Trade, agriculture and finance
  • Topic: E-Finance Research Dataset (1981–2019)
  • Project Description

The world has become electronic these days and with it the finance industry. We have provided data between 1981 to 2019 that can be analysed, and the trends that can be uncovered for further developments in the e-finance industry. See more about the provided dataset here


  • Scope: Trade, agriculture and finance
  • Topic: Japan Trade Statistics
  • Project Description

Japan is currently the 4th largest goods trading partner with $183.6 billion in total (two way) goods trade during 2020. Japan’s international trade and types of goods are captured in the months and years. See more about the provided dataset here


  • Scope: Trade, agriculture and finance
  • Topic: Foreign Exchange Rates 2000–2019
  • Project Description

The climbing rates of international trades and financial developments have been pegged to the exchange rate for some time now. Moreso, in developing countries, the input structure of production depends on imported capital and intermediate goods, so an increase in exchange rates makes import production inputs more expensive and thus negatively affects economic growth. Data is provided for analysis and insights. See more about the provided dataset here


  • Scope: Trade, agriculture and finance
  • Topic: Prediction Of Gold Rates Using ML Techniques
  • Project Description

How important is the forecast of the rise and fall of gold prices? Here we have datasets collected from 2011 to 2019 that can help. See more about the provided datasets here


  • Scope: Trade, agriculture and finance
  • Topic: Credit to agriculture in Brazil and world
  • Project description

The credit to Agriculture dataset provides data on the amount of loans provided by the private/commercial banking sector to producers in agriculture, forestry and fisheries, household producers, cooperatives, and agro-businesses. The dataset also provides statistics on the total credit to all industries and indicators on the share of credit to agricultural producers. See more about the provided dataset here.


  • Scope: Trade, agriculture and finance
  • Topic: Agricultural Survey of African Farm Households
  • Project description

This is a survey of 9500+ households, used to study the impact of climate change on agriculture. This survey was collated from randomly chosen families and households in districts that are representatives for key agro-climatic zones and farming systems. See more about the provided dataset here


  • Scope: Crime Population
  • Topic: Crime Against Women 2001–2014 (India)
  • Project Description:

In the 14 years between 2001 -2014, lots of crimes were perpetrated against women. Ranging from rape to dowry deaths, data regarding this can be found here


  • Scope: Weather
  • Topic: Air Quality in Madrid (2001–2018) Dataset
  • Project Description:

Authorities in Madrid, Spain have been forced to take critical measures in combating the continuous deterioration of air quality in the city. One of such measures is the prohibition of cars in the city center. The dataset, collected over an 18 year period, can be useful to answer critical questions about the causes, effects and possible solutions to air pollution in Madrid and other parts of the world. See more about the data set here


  • Scope: Weather
  • Topic: Historical Hourly Weather Data 2012–2017
  • Project Description:

The weather is an excellent way to illustrate basic signal processing concepts, such as filtering, Fourier transform, auto-correlation, cross-correlation. The dataset contains a five year high temporal resolution (hourly measurements) data of various weather attributes, such as temperature, humidity, air pressure, etc., from about 30 cities across the US and Canada. See more about the provided dataset here


  • Scope: Jobs and Career
  • Topic: Online Job Postings
  • Project Description:

The labor market is continuously evolving with the use of technology to advertise job openings, hence, the need to understand the demand for certain professions and job titles. There is also a need to identify skills that are most frequently required by employers, how the distribution of necessary skills changes over time, and make recommendations to job seekers and employers. See more about the provided dataset here


  • Scope: Jobs and Career
  • Topic: Stack Overflow Developer Survey 2020
  • Project Description:

This data set will provide insights into the job satisfaction, education, opinion of software and other experiences of developers from over 180 countries. See more about the provided dataset here.


  • Scope: Jobs and Career
  • Topic: Are Your Employees Burning Out?
  • Project Description:

How can you tell if workers in your company are exhausted? The dataset in this project can analyse and predict the burn rate of each employee and its influence on their overall mental health. See more about the provided dataset here.


  • Scope: Sports
  • Topic: Men’s Professional Basketball: Stats on players, teams, and coaches in men’s pro basketball leagues, 1937–2012
  • Project Description:

Men’s Professional Basketball is one of the most popular sports in the world. The dataset includes amazing stats on players, coaches, and teams in men’s professional basketball leagues from 1937 to 2012. See more about the provided dataset here.


  • Scope: Sports
  • Topic: Premier League matches 2014–2020
  • Project Description:

Football is very unpredictable, but coaches and managers might be able to increase the chances of winning through a strategic game plan and ability to maximize opponents’ weaknesses. The dataset for this project includes 6 top teams who took the premier league trophies between 2014 and 2020. See more about the provided dataset here.


  • Scope: Sports
  • Topic: Football Data: Expected Goals and Other Metrics
  • Project Description:

Looking through six of the top leagues across Europe can help you predict certain outcomes such as teams likely to score a goal, characteristics of each league etc. See more about the provided dataset here.


  • Scope: Crime Population
  • Topic: Crime Against Women 2001–2014 (India)
  • Project Description:

Crimes committed against women (such as rape, domestic violence, marginalization etc.) have been the subject of conversation amongst world leaders, government and international organizations. Here , data is provided from 2001 -2014 for analyses and inference. See more about the provided dataset here


  • Scope: Crime Population
  • Topic: Gun Violence
  • Project Description:

The rise in lack of profitable employment world-wide among many other things has catalyzed the bane of gun violence amidst young people. A comprehensive data that has been compiled from 2013- 2018 guarantees some insight into this problem. See more about the provided dataset here


  • Scope: Crime & Population
  • Topic: Violence Against Women and Girls
  • Project Description:

Zoning in on gender equality, women and girls’ education has unearthed a staggering amount of violence against females. The provided dataset is replete with provided information about women from over 70 countries. See more about the provided dataset here


  • Scope: Crime & Population
  • Topic: Global Population Estimates
  • Project Description:

Almost every government policy and economic decisions are dependent on population estimates. Information about gender and demography are provided in this dataset for analysis and estimations. See more about the provided dataset here.


  • Scope: Education
  • Topic: Academic ranking of world universities Analytics
  • Project Description:

The performance of a university determines their rank on the global stage and a lot of factors influence each institution’s position. This dataset can provide insights on the performance of the university considering factors like the location of the university, quality of faculty, facilities, research output, alumni employment etc. Find the dataset here.


  • Scope: Education
  • Topic: University Rankings for years 2018, 2019 and 2020
  • Project Description:

Education is important, and that makes educational institutes important as well, hence, a lot of factors affect how universities are ranked globally. Data provided from a 3-year period can provide the background for carrying out these analyses. See more about the provided datasets here.


  • Scope: Others
  • Topic: Divorce Prediction: Uncover what makes relationships last or break
  • Project Description:

Maldives has a divorce rate of 10.97% per 1000 inhabitants per year, Belarus 4.63% and the US 4.34%, and the numbers continue to rise across different regions of the world. With advancement in data exploration, it is possible to predict whether a marriage will fail or succeed. See more about the provided dataset here.


  • Scope: News
  • Topic: News Category Dataset
  • Project Description:

News are vital pieces or chunks of information that are circulated via the media. However, they are more digestible in organized categories — education, politics, business, crime etc. See more about the provided dataset here


  • Scope: Road Safety
  • Topic: Deadly Traffic Accidents in the UK (2015)
  • Project Description:

Till this day, accidents are still incredibly common. Approximately 336 road accidents occur a day. To begin to understand why road accidents are still such a problem on UK roads, one needs to know when, where and why they happen. See more about the provided dataset here




Our mission is to develop an army of creative problem solvers using an innovative approach to internships.