Using Diet Analysis to Predict and Prevent Child Malnutrition

4 min readSep 25


HDSC Spring ’23 Capstone Project by Team Pyspark


Child malnutrition is a persistent global concern, adversely affecting both the physical and cognitive development of young individuals. Various aspects of malnutrition include stunting (low height for age), wasting (low weight for height), underweight (low weight for age), overweight (high weight for height), and obesity, each reflecting the diverse ways inadequate nutrition can impact health and well-being. This article delves into the multifaceted manifestations of malnutrition, incorporating stunting, wasting, underweight, overweight, obesity, and micronutrient deficiencies. Harnessing the capabilities of AI, we explore the potential of machine learning and deep learning in modeling intricate dietary patterns to predict these malnutrition indicators.

Relevant Literature:

In [1], the Seaborn team employed both deep learning regression and machine learning regression techniques, including Linear Regression, Random Forest, Decision Tree, Polynomial Regression, Ridge Regression, and Lasso Regression, for their model development and testing. Their research aimed to identify and predict major risk factors for stunting, wasting, and underweight using machine learning algorithms, with the goal of reducing child malnutrition.

They analyzed dietary variables such as exclusive breastfeeding, early initiation of solid foods, and more to predict the prevalence of malnutrition indicators like stunting, wasting, and overweight in children. Malnutrition can result from both the deficiency and excess of macronutrients and micronutrients. However, it’s worth noting that the previous cohort’s work did not consider the contribution of micronutrients to malnutrition.

The Dataset

The dataset for this project was obtained from the UNICEF data warehouse, which provides Malnutrition data spanning from 1970 to 2022. This dataset encompasses 346 geographic regions and includes 608 malnutrition indicators for analysis. However, to align with our study’s emphasis on dietary analysis for malnutrition prediction, we extracted a customized dataset from this warehouse, specifically including only nutrition-related features [2].


Our study leverages an extensive dataset sourced from the UNICEF repository, spanning more than five decades. We meticulously preprocessed the dataset to prepare it for further analysis, employing lambda functions and pivot tables to achieve this. Subsequently, we conducted exploratory data analysis to gain insights from the data. This analysis allowed us to address questions such as identifying the top 10 countries most affected by stunting.

Figure 1. Top 10 Severe wasting affected countries

Our approach incorporates essential components, including feature engineering, model selection, and performance evaluation. To address multicollinearity in the data, we applied Principal Component Analysis. We employed a range of machine learning models, including ElasticNet, XGBoost, Linear Regression, and Gradient Boosting. In addition, we developed deep learning architectures, such as Convolutional Neural Networks (CNN) and Recurrent Neural Networks (RNN), to enhance our analysis.

Results and Implications:

Utilizing a time series forecasting model, we were able to examine future trends in the various burdens of malnutrition. Our analysis provided insights into the data for the next five years, allowing us to anticipate and understand potential future developments in malnutrition.

Figure 2. Global Annual Underweight forecast

This research highlights the effectiveness of AI models in predicting malnutrition indicators. Model performance was assessed using Mean Squared Error (MSE), with ElasticNet emerging as the top performer, followed by CNN and Gradient Boosting. Additionally, time series forecasting provided valuable insights into potential trends, facilitating proactive policy development and intervention strategies.

To enhance accessibility and understanding of malnutrition data, an interactive dashboard was created. Developed in Python using Plotly and Dash, this dashboard serves as a valuable resource for governments, healthcare professionals, and communities. It enables users to visualize trends and take informed actions to combat malnutrition.

Figure 3. Time series Dashboard


Our study exemplifies the transformative potential of AI in addressing the intricate challenge of child malnutrition. By integrating machine learning and deep learning, we elevate predictive accuracy and shed light on the complex relationship between diet and malnutrition. These insights play a pivotal role in devising evidence-based strategies to mitigate malnutrition’s adverse impact on global child health.

Our machine learning and deep learning models have demonstrated superior performance compared to the previous cohort’s work by the Seaborn team. Additionally, we have conducted extensive diet analysis, considering a broader spectrum of diet-related features. Furthermore, we have predicted malnutrition indicators linked to micronutrient deficiencies. The introduction of interactive dashboards enhances user-friendliness.

To combat malnutrition effectively, interventions should not only aim to improve access to nutritious food but also prioritize educating individuals about the significance of a balanced diet and adopting healthy lifestyle choices. These dashboards serve as valuable tools for governments, healthcare professionals, and communities to gain a better understanding of the situation and facilitate education efforts.


  1. Team Seaborn, “Machine Learning for Malnutrition Risk Prediction,” HDSC 2023 Winter Cohort.
  2. Data Warehouse — UNICEF DATA




Our mission is to develop an army of creative problem solvers using an innovative approach to internships.