Team DataFusion HDSC Fall 2023 Premiere Project Documentation
The African continent has witnessed numerous conflicts over the years, which have had devastating effects on the lives of its citizens and the economy. Understanding the scale of these conflicts is crucial in developing effective strategies to mitigate them and prevent their escalation to an international level. This project aims to develop a machine learning model to predict the scale of conflicts in Africa as either national or international, based on available features. This project has the potential to provide valuable insights into the causes and prevention of conflicts in Africa, leading to improved conflict resolution and peacebuilding efforts on the continent.
The objectives of this project are to:
- Develop a machine learning model to predict whether a conflict is of national or international scale based on the available features.
- Identify the key features that contribute significantly to the prediction of conflict scale.
- Investigate the factors that differentiate conflicts of national scale from those of international scale, providing insights into the reasons for conflicts becoming of international importance.
- Evaluate the performance of the machine learning model using appropriate evaluation metrics and techniques.
- Explore potential strategies and recommendations for stakeholders to mitigate conflicts and prevent their escalation to an international scale.
The steps taken are illustrated with the flowchart below:
The dataset was obtained from Kaggle via this link
This analysis focuses on the African Conflict Dataset from 1997 to 2020. The dataset contains information for 65,535 individual observations, with each having 29 different variables associated with it.
The following procedures were used to prepare the data:
- Data collection : The dataset contains information on various variables such as event ID, event date, event type, actors involved, location, latitude, longitude, and fatalities. The dataset has missing values in some columns, such as actor2, admin3, associated actors and event notes.
- Data transformation: The dates in the data were converted to datetime, and the latitude and longitude columns were converted to float.
- Data visualization: Here, information was displayed using bar charts, box plots, pie charts, heatmaps, and other visual aids to facilitate clear and simple interpretation.
We conducted exploratory data analysis to gain a deeper understanding of the gathered data, to provide a more comprehensive view of the data, and to identify and understand patterns that could help explain unexpected results.
As a result of this analysis, several insights were generated.
- The event types, battles (28.8%) and violence against civilians (28.6%) contributed the most to fatalities, which indicates the prevalence of armed conflicts and human rights abuses. Protests and riots (over 30%) also appear to be relatively common, highlighting the presence of social and political unrest.
2. We observed that in 2020, there were 10,413 reported events, making it the year with the highest count. This suggests that 2020 was a year that experienced a significant number of incidents. While between 2016–2019 had a significant number of reported events. It can be said that these years signify periods of heightened conflict or incidents that received substantial attention and reporting.
3. The dataset captures the key actors involved in conflicts and events that have occurred in Africa, shedding light on their impact on the lives of people. The absence of actor information raises questions about the circumstances surrounding these events. The dataset reveals the prominence of certain actors in specific regions, such as civilians in the Democratic Republic of Congo, protesters in Algeria, and civilians facing conflict repercussions in Burundi.
Figure 2: Distribution of Top Event Actors
- We discovered that the year 1999 had the highest number of reported deaths, accounting for approximately 39.5% of the total fatalities.
2. Angola emerged as the nation with the most reported deaths, accounting for approximately 35.6% of the total fatalities. Shockingly, we identified that Eastern Africa and Middle Africa have witnessed the highest number of fatalities, with a staggering total of 126,929 and 239,064 respectively. These numbers serve as a stark reminder of the human cost of these conflicts.
Distribution of Event Types by Region
Figure 2: Geospatial distribution of Event types by Fatalities
The heatmap reveals the areas with the highest density of conflicts, highlighting the regions that require immediate attention and intervention. This visual representation helps policymakers and organizations identify the hotspots and allocate resources effectively.
Figure 3: Frequency Distribution of Source Scale Conflict Report
To facilitate the creation and validation of the model, the dataset was divided into two parts: the training dataset and the testing dataset.
During the model evaluation, we employed several supervised machine learning algorithms including Logistic Regression, Nearest Neighbor, Gaussian Naïve Bayes, Decision Tree, and Random Forest Classification. Among these algorithms, Random Forest Classification algorithm demonstrated the best performance, exhibiting the highest accuracy score of 86%.
The train/test split method was employed to validate the model.