PREDICTING CONFLICT HOTSPOTS IN THE WORLD

TEAM LUDWIG, HDSC Spring ’24 Cohort

13 min readAug 8, 2024

Courage Siameh, Duncan Munene Karugu, Clinton Ahiwe Onuoha. Tawakalitu Balogun, John Michael, Chukwuma Nwachukwu, Chidiebere Nnadiegbulam, Feranmi Oyedare, Rashidat Sikiru, Seun Damilare Keshinro, Muhammed Balogun, Emmanuel Okeke

Abstract

This research paper explores the use of machine learning algorithms and large language models (LLMs) to forecast areas prone to conflicts and suggest mitigation strategies. By leveraging historical data from the ACLED data bank and utilizing RandomForest algorithms in predicting conflict events, the integration of LLMs, specifically Google’s Gemini-1.5-Flash, provided novel insights into generating proactive intervention strategies. The findings highlight the importance of considering various factors, including economic, political, social, and environmental variables, in conflict prediction models. The study also emphasizes the need for future research to incorporate diverse data sources, explore interdependencies among predictors, and expand geographic and contextual coverage to develop more comprehensive and universally applicable models for conflict prevention and resolution.

Introduction

Background

Predicting conflict hotspots has become a critical aspect of global security and peacekeeping efforts. With the proliferation of data and advancements in technology, it has become possible to analyze patterns and predict areas that are likely to experience conflicts. This project aims to harness these capabilities to identify potential conflict zones before they escalate.

2. Importance of the Project Topic

The ability to predict conflict hotspots is critically important for several reasons. Firstly, it allows for proactive intervention, potentially preventing violence and saving lives. For example, if peacekeeping organizations such as the United Nations can predict conflict hotspots, it would enable them to leverage diplomacy, mediation and conciliation to mitigate such outbreaks, thereby leading to effectiveness in enforcing peace and protecting lives. Secondly, it aids governments and international organizations in allocating resources more efficiently, ensuring that humanitarian aid reaches those in need before a conflict escalates. Lastly, understanding and predicting conflicts can contribute to long-term strategies for peacebuilding and stability, ultimately fostering global security and prosperity.

3. Project Aim and Objectives

The primary aim of this project is to utilize LLMs to generate mitigation strategies based on the model prediction. Specific objectives include:

Data Collection and Integration: Gathering historical conflict data leveraging the ACLED data bank to identify frequencies and types of conflict over time, the hotspots for top occurring events, the impact of each event and the potential variables that will aid in conflict resolution.
Model Development: Utilizing generative AI, advanced machine learning algorithms and statistical methods to analyze the data to predict future hotspots..
Validation and Testing: Testing the model against historical data to ensure its accuracy and reliability.
Implementation and Dissemination: Providing actionable insights for policymakers and stakeholders.

Literature Review

Quantitative Methods

Quantitative approaches dominate the field of conflict prediction, leveraging statistical and machine learning techniques to analyze large datasets. Hegre et al. (2013) employed logistic regression models to predict civil wars, incorporating variables such as GDP per capita, population size, and political regime type. Goldstein et al. (2014) utilized random forest models to forecast conflict events using socio-economic and political indicators.

The advent of big data has significantly enhanced predictive capabilities. Researchers like Hendrix and Salehyan (2012) have utilized social media data, satellite imagery, and real-time news reports, providing timely and granular information that allows for more accurate and dynamic predictions.

2. Qualitative Methods

Qualitative methods are essential for understanding the context-specific factors driving conflicts. Case studies and ethnographic research provide detailed narratives and uncover local dynamics that quantitative models may overlook. For instance, Kalyvas (2006) explored the logic of violence in civil wars, offering in-depth insights into micro-level mechanisms and individual motivations.

Mixed methods, combining quantitative and qualitative approaches, offer a comprehensive understanding of conflict dynamics. Cederman et al. (2013) integrated quantitative analysis of ethnic power relations with qualitative case studies, enhancing the understanding of conditions under which ethnic groups mobilize for violence.

3. Case Studies

Sub-Saharan Africa has been a primary focus for conflict prediction due to its history of civil wars and political instability. Raleigh and Hegre (2009) analyzed the spatial distribution of conflicts in the region, highlighting the importance of political exclusion and ethnic grievances.

The Middle East presents unique challenges for conflict prediction, with its complex interplay of political, religious, and socio-economic factors. Gleditsch and Ward (2013) studied the Arab Spring uprisings, identifying key predictors such as autocratic governance, youth bulges, and unemployment rates.

In South Asia, ongoing insurgencies and diverse political landscapes offer valuable insights into conflict prediction. Urdal (2008) examined the impact of youth bulges on political violence, finding a significant correlation between large youth cohorts and increased likelihood of conflict.

4. Emerging Trends

Recent studies have started to explore the link between climate change and conflict. Burke et al. (2009) found that rising temperatures and changing precipitation patterns exacerbate resource scarcity, leading to increased competition and violence. This highlights the necessity of incorporating environmental variables into conflict prediction models.

5. Identifying Gaps in the Literature

Despite the advances in conflict prediction, several gaps remain. First, many studies focus on specific regions or conflict types, which may not be generalizable to other contexts. There is a need for more comparative studies that analyze different regions and conflict types to develop universally applicable models.

Second, the interplay between various predictors, such as economic, political, social, and environmental factors, is often underexplored. Most models consider these factors in isolation, missing out on the complex interdependencies that drive conflicts.

Third, there is a notable gap in the application of large language models (LLMs) alongside predictive models in predicting conflict prone nations. LLMs, with their advanced natural language processing capabilities, can generate mitigation strategies for the conflict prone nations. However, their integration into conflict prediction frameworks remains largely unexplored.

This research aims to address these gaps by:

Comparative Analysis: Conducting comparative studies across different regions and conflict types to develop more generalizable predictive models. This approach will provide insights into universal predictors of conflict as well as region-specific dynamics.
Multifactorial Analysis: Examining the interplay between various predictors, including economic, social, and other important factors, to capture the complex interdependencies that drive conflicts. By doing so, this research will develop more comprehensive models that better reflect the realities of conflict dynamics.
Incorporating Large Language Models: Utilizing LLMs to generate mitigation strategies based on the prediction of the model. This novel approach aims to bridge the gap between qualitative and quantitative methods, providing richer and more robust mitigation strategies for the conflict prone nations.

Methodology

The datasets utilized for this project were sourced from the ACLED data bank. Six different datasets based on regions of the world were unified to become a single dataset to ensure data efficiency and comprehensiveness. The six datasets that were sourced for include:

Africa_1997–2024_Jun28
Asia-Pacific_2018–2024_Jun28
Europe-Central-Asia_2018–2024_Jun28
LatinAmerica_2018–2024_Jun28
MiddleEast_2015–2024_Jun28
USA_Canada_2022_2024_Jun28
Data Cleaning

Data cleaning was performed by removing unnecessary columns from the data to ensure efficiency, Few data types of the features were corrected to the appropriate data type format. Missing values were removed from the dataset using the SimpleImputer from the sklearn library. The missing values were replaced using the mean for numerical data and most frequent for categorical data. Also, duplicates in the dataset were removed to ensure data accuracy.

Data Analysis/Exploration Technique

Exploratory data was conducted to gain insights into the data, identify patterns, and understand the distributions of conflicts across different regions. The exploration technique used for this research work is called pandas. There are 8 research questions generated solely for the analysis of this project work . The key analysis conducted include:

How has the frequency of conflicts changed over the years?
Which region of the world has experienced the highest number of conflicts?
How has the distribution of conflicts across different regions evolved over the years?
What is the rate of conflict in each region? Which regions have the highest and lowest rates of conflict?
How has the human cost of conflicts, in terms of fatalities, changed over the years?
What are the most common types of events that incites conflicts, and what is their relative frequency?
How do the conflict types vary across different regions of the world?
How do the types of conflict events vary across different regions of the world?

Data Visualization

How has the frequency of conflicts changed over the years?

Figure 3.3.1: Number of conflicts over the years

From Figure 3.3.1, it can be said that the number of conflicts appears to have increased steadily from 1997 to 2024, with a notable spike in 2020 .

2. Which region of the world has experienced the highest number of conflicts?

Figure 3.3.2: Region with the highest frequency

Africa seems to have the highest frequency of conflicts, followed by the Middle East and Asia as seen in Figure 3.3.2 above.

3. How has the distribution of conflicts across different regions evolved over the years?

Figure 3.3.3: Number of Conflicts by Region and Year

From Figure 3.3.3, asides Africa consistently showing the highest number of conflicts, the Middle east saw a significant increase from 2015 to 2020. Asia within the year 2016 to 2023 experienced a significant increase in conflict. USA/Canada region experienced a relatively lower conflict count over the years as compared to the other regions.

4. What is the rate of conflict in each region? Which regions have the highest and lowest rates of conflict?

Figure 3.3.4: Rate of conflicts in each region

Africa, Asia, Middle East and Latin America hold the highest conflict count of 270572, 256666, 202991, and 148986 respectively. USA/Canada and Europe have significantly lower conflict rates with a count of 20286 and 98228 respectively as seen in Figure 3.3.4.

5. How has the human cost of conflicts, in terms of fatalities, changed over the years?

Figure 3.3.5: Fatalities experienced over the years

The chart above in Figure 3.3.5 provides a clear visual representation of how the human cost of conflicts, in terms of fatalities, has evolved over the years, showing periods of escalation and relative decline.

Trend Analysis:

The number of fatalities due to conflicts shows a general increasing trend from 1997 to 2024, with substantial fluctuations.
The early years (1997–2010) had relatively low fatality counts with occasional spikes.
From 2011 onwards, there is a steady and significant increase, peaking in 2018.
Post-2018, the fatalities remain high but show a slight decreasing trend towards 2024.

Key Peaks and Drops:

Major peaks in 1999, 2018, and subsequent high numbers in 2019–2023.
Notable drops in 2006 and again in 2024 after a period of high fatalities.

Recent Changes:

The data suggests a decrease in fatalities in 2024 compared to the peak years but still remains elevated compared to the early years.

6. What are the most common types of events that incites conflicts, and what is their relative frequency?

Figure 3.3.6: Percentage Distribution of Causes of Conflicts [Event type]

In Figure 3.3.6 above, it was observed that protests and battles are the most common types of conflict events, each accounting for over 22% of total events. Violence against civilians and explosions/ remote violence are also significant, with 18.3% and 15.9% respectively while riots and strategic developments, although less frequent, still make up over 10% each of the event types.

Figure 3.3.7: Frequency of Sub-Event Types

From Figure 3.3.7 above, armed clashes and peaceful protests are the most frequent sub-event types. Various forms of attacks and violent demonstrations are also common while the least frequent sub-event types include chemical weapon usage, agreements, and establishment of headquarters or bases.

7. How do the conflict types vary across different regions of the world?

Figure 3.3.8: Frequency of Event-Type by Region

In Figure 3.3.8, the event types observed include: battles, explosions/remote violence, protests, riots, strategic developments, violence against civilians. The regions include include:

Africa: High in Battles, Riots, and Violence against civilians.
Asia: High in Protests, considerable in Battles and Violence against civilians.
Europe: Balanced across Protests and Riots, fewer in other types.
Latin America: High in Protests.
Middle East: High in Explosions/Remote violence and Battles.
USA/Canada: Lower counts across all event types.

8. How do the types of conflict events vary across different regions of the world?

Figure 3.3.9: Number of Events by Region and Disorder Type

The observations on disorder type noticed in Figure 3.3.9 include: demonstrations, political violence, political violence & demonstrations and strategic developments. The regions observed include:

Africa: Leading in Political violence, followed by Demonstrations.
Asia: Leading in Demonstrations and significant in Political violence.
Europe: Noticeable counts in Demonstrations and Political violence.
Latin America: High in Demonstrations.
Middle East: Significant in Political violence.
USA/Canada: Generally lower across all disorder types.
Data Modeling and Evaluation

In this research project, RandomForest algorithm was used to create a predictive model for predicting conflict hotspots in the world using the event_type. After preprocessing, the data was divided into X features [independent variables] and y feature [dependent variable]. The independent variables include fatalities, log_fatalities, region_encoded, sub_event_encoded, and region_sub_event_interaction while the dependent variable is the event_type. The data was then divided into training data and testing data. The training data amounted for 80% of the data while the testing data amounted for the remaining 20%. After the model’s evaluation, LLM was integrated to generate mitigation strategies based on the prediction of the model. Google’s generative api [Gemini-1.5-Flash] was used to achieve the integration of LLM to create mitigation strategies.

Result

Following the evaluation of the model on the test data, RandomForest model performed well with an accuracy of 85%, Precision, F1 score and Recall were used to evaluate the model’s performance. Confusion matrix was also used to evaluate the performance of the model to identify the number of True positives, False positives, True negatives and False negatives. From the feature

importances of the model, it was discovered that region_event_interaction had the greatest impact on the model followed by the encoded event_type. It can be said that the model performs very well and it generalizes well on the data.

Figure 4.1: Classification Report

Figure 4.2: Confusion Matrix

Figure 4.3: Model’s Feature Importance

Figure 4.4: A Snippet of the LLM Integration

Conclusion

This research paper aimed to harness the power of machine learning and large language models (LLMs) to forecast conflict-prone regions and propose mitigation strategies. The study was driven by the increasing importance of predicting conflict hotspots to enable proactive interventions and efficient resource allocation, ultimately contributing to global security and stability.

The project achieved its objectives by first gathering comprehensive historical conflict data from various regions using the ACLED data bank. The data were cleaned, analyzed, and visualized to identify patterns and trends in conflict occurrences. A predictive model using the RandomForest algorithm was developed, achieving an accuracy of 85%. The model’s feature importance highlighted the significant impact of region-event interaction and event types on conflict prediction.

Furthermore, the integration of LLMs, specifically Google’s Gemini-1.5-Flash, provided an approach to generating actionable mitigation strategies based on model predictions. This integration bridged the gap between quantitative data analysis and qualitative strategy development, offering richer and more robust solutions for policymakers and stakeholders.

Recommendations

Future research should focus on enhancing data integration by incorporating diverse data sources such as social media data and news reports to improve predictive accuracy. Additionally, exploring the complex interdependencies between economic, political, social, and environmental factors can provide a deeper understanding of conflict dynamics. Expanding the geographic and contextual scope of the study to include more regions and specific conflict contexts will aid in developing universally applicable predictive models. Furthermore, integrating advanced large language models (LLMs) tailored to generate detailed and context-specific intervention strategies can offer better insights. Finally, the inclusion of climate data is crucial, as it increasingly impacts conflict dynamics, and understanding these effects can aid in more comprehensive conflict prevention and resolution strategies.

References

Burke, M. B., Miguel, E., Satyanath, S., Dykema, J. A., & Lobell, D. B. (2009). Warming increases the risk of civil war in Africa. Proceedings of the National Academy of Sciences, 106(49), 20670–20674.

Cederman, L.-E., Gleditsch, K. S., & Buhaug, H. (2013). Inequality, grievances, and civil war. Cambridge University Press.

Collier, P., & Hoeffler, A. (2004). Greed and grievance in civil war. Oxford Economic Papers, 56(4), 563–595.

Gleditsch, K. S., & Ward, M. D. (2013). Forecasting is difficult, especially about the future: Using contentious issues to forecast interstate disputes. Journal of Peace Research, 50(1), 17–31.

Goldstein, J. S., & Pevehouse, J. C. (2014). International Relations. Pearson Higher Ed.

Hendrix, C. S., & Salehyan, I. (2012). Climate change, rainfall, and social conflict in Africa. Journal of Peace Research, 49(1), 35–50.

Hegre, H., Karlsen, J., Nygård, H. M., Strand, H., & Urdal, H. (2013). Predicting armed conflict, 2010–2050. International Studies Quarterly, 57(2), 250–270.

Kalyvas, S. N. (2006). The logic of violence in civil war. Cambridge University Press.

Raleigh, C., & Hegre, H. (2009). Population size, concentration, and civil war. A Geographically Disaggregated Analysis of African States, 1960–2002, 45(3), 369–398.

Urdal, H. (2008). Population, resources, and political violence: A subnational study of India 1956–2002.

Journal of Conflict Resolution, 52(4), 590–617.