Team InsightIQ Project Documentation
HDSC Fall 2023 Cohort
Energy is like the lifeblood that courses through the veins of modern infrastructure. It serves as the powerhouse for industries, fuels the engines of transportation, and illuminates’ communities. The very quality and accessibility of energy resources are intricately woven into the tapestry of infrastructure development. As we confront a global infrastructure deficit, which is further complicated by pressing environmental concerns, a profound transformation becomes not just desirable but imperative. This project delves deep into the intricate dance between energy and infrastructure, all in harmony with the overarching theme of “Infrastructure Deficit: AI’s Transformative role in Planning.”
The problem at hand is the urgent need to transition from non-renewable to renewable energy sources to address infrastructure deficits sustainably. Non-renewable energy exacerbates environmental challenges and hampers infrastructure growth. It’s critical to understand renewable energy trends, barriers, and the path to eliminating non-renewables.
Aim of the Project
The project aims to analyse global energy data and offer data-driven insights for accelerating the transition to renewable energy. It seeks to facilitate informed infrastructure planning that aligns with sustainability, reducing our dependence on non-renewable sources.
The dataset was obtained from Kaggle, to view the dataset click here
- Renewable Total Power Generation: This dataset ranks energy sources in annual energy consumption, with tidal waves leading, followed by hydro, wind, and biofuel. Renewable waste and geothermal sources contribute less, providing an overview of their importance.
- Non-Renewables Total Power Generation: Focusing on non-renewable sources from 1990 to 2020, it covers Coal, Natural Gas, Nuclear, Oil, and more. It reveals how different countries rely on these sources and their potential environmental impacts.
- Top 20 Countries Power Generation: Highlights the top 20 countries in renewable energy adoption, focusing on Hydro, Biofuel, Solar PV, and Geothermal. This dataset shows their commitment to sustainable energy and global trends in adopting cleaner alternatives.
- Renewable Power Generation (1997–2017): Tracks renewable energy’s growth between 1990 and 2017 in categories like Hydro, Biofuel, Solar PV, and Geothermal. It demonstrates the increasing role of renewables and their environmental benefits.
- Country_Consumption_TWH: Spans electricity consumption from 1990 to 2020 for numerous countries. It offers insights into global energy consumption trends, revealing shifts from non-renewable to renewable sources.6. Continent_Consumption_TWH: Covers continental electricity consumption from 1990 to 2020, providing a comprehensive view of global energy consumption trends on a continental scale. It’s essential for assessing the transition from non-renewable to renewable energy sources worldwide.
Exploratory Data Analysis
During the initial phase of the exploratory data analysis (EDA), univariate analyses were conducted to gain a comprehensive understanding of each column as a feature contributing to the overall narrative. Subsequently, in the second phase, bivariate and multivariate analyses were performed, examining relationships between two or more distinct datasets, allowing for a deeper and more insightful exploration of the narrative encapsulated within the table.
Some of the key data visualizations done includes:
- Contribution of renewable energy sources to the total Energy Consumption value for each country.
- Top countries generating renewable energy (TWH).
- Top countries in various energy adoption
- Renewable energy generation over Time (Renewable Energy trends analysis for some selected countries).
- Energy Consumption in various countries, continents and regions of the world.
- Yearly Energy consumption growth.
- Geospatial analysis of the top 20 countries in Energy growth
DATA CLEANING AND PREPROCESSING
Transitioning from yearly to daily granularity in energy consumption data marks a pivotal step in our analysis, unlocking opportunities for in-depth exploration. This process involves breaking down annual values into daily increments, enabling us to discern subtle day-to-day variations in consumption patterns. The shift enhances our understanding and equips us with tools for more accurate time series forecasts.
This transition results in a dataset with significantly more data points, allowing predictive models to finely attune to daily fluctuations in energy usage. This precision enhances prediction accuracy, aiding decision-making in the energy sector. After granulation, we address gaps in data using statistical techniques, ensuring completeness and integrity.
We meticulously evaluate the datetime column for quality, exploring chronology, seasonality, and orderliness. This assessment provides insights into the time-dependent dynamics of energy consumption. Statistical tests, such as the Augmented Dickey-Fuller test, assess data stationarity. Log transformations and time series analysis methods extract valuable patterns, enriching our dataset for informed decisions in the energy sector.
Our feature engineering journey begins by extracting new seasonality attributes, such as year, month, day, and prevailing season. These attributes reveal a cyclical nature, and we use sine and cosine transformations to impart this cyclical knowledge to our models.
A cornerstone of our time series analysis is decomposing the data into level, trend, seasonality, and residual noise. This powerful model enhances our understanding and forecasting capabilities. Utilizing the seasonal_decompose() function in the statsmodels library, we dissect the data to unveil its structures and patterns.
To gain deeper insights, we apply a shift, or lag, to each variable, exploring correlations and relationships. Analyzing these shifts enhances our comprehension of dynamic interplay within the dataset, strengthening our ability to make informed decisions in time series analysis and forecasting.
We applied various methods to fit and train the model, starting with Model_1. For this model, we employed `auto_arima`, a popular time series forecasting model known for its speed, low memory usage, and auto-fitting capabilities.
The dataset was divided into training and validation sets in an 85:15 ratio. We fine-tuned the hyperparameters, initializing them with the following values:
start_p=1 start_q=1 test=’adf’ max_p=3 max_q=3 m=1 d=None seasonal=False start_P=0 D=0 trace=True error_action=’ignore’ suppress_warnings=True stepwise=True
After fitting the model, it selected the best model as ARIMA (0,1,0) (0,0,0)  with an intercept. The total fit time for this model was 11.891 seconds.
In addition, we visualized the model’s diagnostic plots, which can provide insights into its performance and any areas for improvement.
Examining the diagnostic plots some of this were found to be true:
Top left: The residual errors appear to fluctuate around a mean of zero and maintain a fairly consistent variance between 0 and 2, indicating a reasonably constant level of error.
Top right: The density plot suggests that the distribution of the residual errors is approximately normal with a mean close to zero, indicating a good fit.
Bottom left: Most of the blue dots do not align closely with the red line, implying that the distribution is slightly skewed, albeit not significantly.
Bottom right: The Correlogram, also known as the ACF (Autocorrelation Function) plot, reveals that the residual errors exhibit some degree of autocorrelation, suggesting that there might be patterns or dependencies in the data that the model has not fully captured. This information is important for further model improvement.
To assess the performance of the individual country models, we calculated three key evaluation metrics: Mean Absolute Error (MAE), Mean Squared Error (MSE), and Root Mean Squared Error (RMSE). These metrics provided insights into the model’s accuracy and allowed for the identification of areas for improvement.
In the pursuit of enhancing model performance, several key improvements were implemented:
1. Maximizing Optimal Intercepts: Efforts were made to optimize the model’s intercepts, fine-tuning this parameter to improve its ability to capture the data’s underlying patterns and trends.
2. Dropping Redundant and Low-Importance Columns: Columns that contributed minimally to the predictive power of the model were identified and removed. This not only streamlined the model but also reduced the risk of overfitting, potentially leading to improved accuracy.
3. Reducing Noise techniques: Pre-processing techniques that introduced noise or extraneous information were minimized, focusing the model on the most relevant and significant features. This refinement process aimed to enhance the model’s ability to make accurate predictions by reducing unnecessary complexity.
These enhancements collectively aimed to boost the model’s performance and increase the accuracy of energy consumption forecasts for each country.
SAVING THE MODEL
In our univariate time series modelling approach, each country’s energy consumption data was treated as a separate model. This means that we trained a distinct model for each country using its corresponding energy consumption column and datetime intervals. Once trained, each model was saved as a pickle file with a .pkl extension, ensuring that the models are preserved and can be easily accessed and utilized for future analyses and predictions.
One of the primary objectives of this project is to visualize and determine the energy consumption trends for each country. To achieve this, models were developed for each country, and these models were used to forecast energy consumption for a period of 10 years, spanning from 2020 to 2030. This extensive forecasting exercise serves several important purposes:
1. Setting Energy Goals: By projecting energy consumption over a 10-year period, it becomes possible to establish clear energy goals for each country. These goals can be aligned with sustainability targets and energy efficiency objectives.
2. Expectations Management: The forecasts provide a basis for managing expectations regarding energy consumption. Understanding the likely trajectory of energy usage helps policymakers, businesses, and consumers prepare for future energy needs.
3. Continuous Improvement: Monitoring energy consumption trends over the coming decade allows for ongoing assessment and improvement of energy policies and practices. By identifying potential areas of growth or reduction in energy consumption, countries can adapt and refine their strategies accordingly.
The results of these 10-year energy consumption forecasts were visualized using line plots, making it easier to interpret and communicate the expected trends and patterns for each country. This visual representation is a valuable tool for decision-makers and stakeholders in the energy sector.
The energy consumption forecasts for several countries, including China, Brazil, Italy, Sweden, Kazakhstan, Colombia, Japan, and Taiwan, highlight the critical challenges and opportunities they face in the coming decade. These forecasts demonstrate that economic development, population growth, and rising living standards are driving increased energy demand. However, the transition to more sustainable energy sources and enhanced energy efficiency measures is crucial to mitigate the environmental and economic impacts of rising consumption. It is evident that countries pursuing net-zero scenarios show a path towards a more secure, environmentally responsible, and economically resilient future. Monitoring progress towards these targets will be essential to ensure a sustainable energy future for these nations.
This project is dedicated to addressing the critical nexus of energy and infrastructure, recognizing the urgent need for a transition from non-renewable to renewable energy sources. By analysing global energy data, identifying leading countries in renewable energy adoption, forecasting the path to global renewable energy goals, and pinpointing barriers to rapid adoption, it aims to provide actionable recommendations for sustainable infrastructure development. This multifaceted approach aligns with the overarching goal of creating a more resilient, environmentally responsible, and sustainable global infrastructure landscape.
GitHub: Insight_IQ HDSC ’23 Premiere Project