HDSC Winter ’22 Capstone Project Presentation: Malaria Detection Via Blood Sample Images Using CNNs

A Project by Team LSTM

7 min readMay 4, 2022

As part of our data science internship at Hamoye, we were tasked to build a machine learning model that helps in detection of malaria via blood sample images.

Why do we need to work on this?

Malaria is usually diagnosed by a microbiologist by examining a small sample of blood smear. Malaria can be easily treated if it is diagnosed early and followed with appropriate treatment.

Computer-assisted diagnostics are on the rise these days as they can be used effectively as a primary test in the absence of a microbiologist. Deep learning is a clever way of doing things when a machine is trained to mimic the thought process of the human brain.

Due to the large number of malaria patients, traditional diagnosis by experienced microbiologists requires a great deal of manpower and finances, making regular screening difficult. Therefore, it is of paramount importance to develop an automated malaria detection system based on blood samples images, in order to improve the situation described above.

Related work

In recent years, machine learning has been one of the most common techniques that have yielded better results, especially in medical image analysis, classification, and object detection. Convolutional Neural Networks are becoming increasingly popular as deep learning methods in medical image analysis, and they are highly effective.

Multiple automated malaria detection techniques have been developed as a result of developments in computer algorithm development for medical image processing, most of which have been in the area of image classification, we sought to expand the scope of this by having a high recall, low latency image classification and object detection algorithm which will not only classify cell images, but will also localized the cells in the image.

Introduction

Malaria is one of the top 10 diseases which claims the highest mortality rates globally. According to the 2021 report released by the WHO, there were 241 million cases of malaria infection globally, with 627,000 deaths globally in 2020. It is estimated that the numbers will continue to rise, leading to a greater increase in mortality rates.

Challenges with surmounting these are many, one of which is late detection, which can be due to lack of access to testing facilities and over-burdening of few available resources, especially in rural and underdeveloped societies. To overcome the challenge, a machine learning model was created based on analyzing thousands of images that were collected in rural areas and processed in order to automatically detect malaria.

*Figure 1. Malaria death related graphs*

Objective

To build a low latency classification and object detection model to detect the malaria parasite infection via microscopic images of blood samples.

Dataset

The malaria dataset is publicly available by the National Library of Medicine hosted by Lister Hill National Center for Biomedical Communications, USA. The data used for this project is sourced from Kaggle:

https://www.kaggle.com/datasets/iarunava/cell-images-for-detecting-malaria

Total Number of data: 27560 | Classes: ‘Parasitized’ : 13780, ‘Uninfected’ : 13780

Methodology

A. Data Analysis and Preprocessing

As our dataset is in image format we applied some image processing on it by using the OpenCV library. Image analysis or imagery analysis is the extraction of meaningful information from images; mainly from digital images by means of digital image processing techniques. As we need to feed the same size, shape, and dimension of images in the same format, we resize the images into (64,64).

After resizing the images, we performed the RGB to Grayscale conversion for edge detection. For edge detection we used Gaussian Blur to reduce the image noise.

After this we applied Canny Edge Detector, which is an edge detection operator that uses a multi-stage algorithm to detect a wide range of edges, in images to make image edges sharper and image smoother with threshold values of 80 and 160. Then again, convert the images into RGB for further processing. addWeighted used for superimposing the image with detected edges of images

*Figure 2. Original image with after Preprocessing*

B. Model Implementation

Classification Model

For the Malaria Detection we used one of the Convolutional Neural Network’s architectures: DenseNet Architecture. In the DenseNet architecture, each layer is connected to all the other layers, hence the name Densely Connected Convolutional Network.

Model training and validation

For training and validation, the dataset splitted into 80% and 20% respectively. The hyperparameters used in model implementation are given below:

After training the model, got the accuracy of 95.67% and validation accuracy of 94.09%

Comparison Table

Class Activation Mapping

Class Activation Maps (CAM) is a powerful tool used in Computer Vision for classification tasks. Allows the scientist to inspect the image to be categorized and understand which parts/pixels of that image contributed most to the final output of the model; it means regions in the image that were relevant to this particular class.

Furthermore, this will help improve the overall explainability of the model. For CAM we used the Keract library, output can be seen below:

Classification to Object detection Model

Object detection is a computer vision method that identifies and detects objects within an image. Specifically, object detection draws bounding boxes around these detected objects, which allow us to determine where the objects are in a given scene.

Object detection therefore helps us to create bounding boxes around the malaria parasite cells in an image containing them. To convert an image classification task to object detection requires creating a pyramid of scaled versions of the image and passing the Sliding Window across each scaled image in the pyramid.

The main logic of creating a pyramid of scaled versions is towards enabling the detector to detect objects of different sizes. The sliding window detects the presence of objects (malaria parasites) in each scaled image, and further processing involves applying boxes around each object to bound them. Hence, it’s like image localization, except that image localization only works for one object in an image, while object detection can detect several objects in an image.

*Figure 4: Input Image for object detection and Output image after applying sliding window*

The input image is an image of a blood sample containing malaria parasites (the deep purple dots), when it passes through the object detection model, it will draw bounding boxes around every box in the image.

The red box, which is called the sliding window, moves across the image from left to right and top to down with the aim of detecting objects in the image. In the output, it looks busy because there are lots of parasitic cells in the image.

You can see that there is a bounding box around each cell in the image. This confirms two facts: firstly, the image classification model does a good job of ‘knowing’ which cells are the parasites, secondly, the object detection model does a good job of being able to receive input from the image classification model and then efficiently apply bounding boxes around each parasitic cell.

C. Result and Deployment

We trained the best performing model for 30 epochs and obtained a precision and recall of 95%.

Taking into consideration GPU time-out periods, we measured the inference time of the accepted model by taking an average of inference times for 10 different images. We obtained an average inference time of 2.12 seconds, which we believe is pretty fast and can greatly improve work-flow optimization.

Conclusion

Detection and the examination of Malaria parasites is one of the finest ways to diminish Malaria-related death. In this project, a computer vision algorithm was used to detect the Malaria parasite from the cell image sample. 10 different Convolutional Neural Network architectures were implemented via pre-training towards obtaining a good model with considerably high precision and recall; the DenseNet-121 model had the best performance of all 10 and was selected for inference and deployment.

To enrich the dataset to feed the deep network, the images were pre-processed using OpenCV’s CannyEdge technique towards highlighting the corners of relevant features of the image, this will in extension improve the model’s feature extraction process. The model is constructed for performing classification and object detection operations on given data.

The model’s accuracy in detecting malaria in terms of training 95.67% and validation 94.09% with an inference latency of 2.12 seconds. Because of lack of access to testing facilities and over-burdening of few available resources, especially in rural and underdeveloped societies, it has become challenging for early detection. To overcome this challenge, web-based as well as mobile-based applications were developed by using Streamlit and Tensorflow Lite.