Researchers and demographers often use statistical models and projections based on current trends and historical data. These models may take into account a variety of factors that can influence birth rates, such as age, race and ethnicity, education, income, and fertility preferences.
It should be noted that birth rate predictions are not infallible, as they rely on models and assumptions that may not always be reflective of real-world outcomes.
- AIM AND OBJECTIVES
To make predictions about future birth rates. These predictions can in turn help provide valuable insights and information for policy makers, healthcare providers, and others who are interested in understanding demographic trends and planning for the future.
- FLOW PROCESS
- Data sourcing
The project dataset was obtained from Kaggle.
This dataset contains two files listed below:
- US_births_1994–2003_CDC_NCHS
- US_births_2000–2014_SSA
Both datasets have the following similar features :
- year
- month
- date_of_month
- day_of_week
- births
- Data Processing
The data processing phase of this project involved several steps to prepare the dataset for our analysis.
The first step was to merge two files into a single file using the .merge() method. Once merged, we performed data cleaning techniques such as checking for missing values using the .isna() method and identifying and removing duplicates using the duplicate() method.
Next, we checked for the stationarity of the data using the Augmented Dickey-Fuller (ADF) test to determine if the time series data has a stable mean and variance over time.
Finally, we used the stats method to identify and remove any outliers that may distort the analysis results. These steps are essential to ensure the accuracy and reliability of the data used for forecasting the US birth rate using time series analysis.
- Data Analysis
The time series plot is a valuable tool for analyzing time series data, including the birth rates dataset. In the case of the birth rates dataset, the time series plot reveals a clear seasonality pattern with peak birth rates occurring in the months of July, August, and September.
This observation is likely due to factors such as warm weather, increased daylight hours, and holidays such as Thanksgiving and Christmas. Furthermore, the time series plot indicates a gradual increase in birth rates over time, which could be attributed to various factors such as changes in societal attitudes towards family planning and improvements in healthcare access.
The box plots provided further insight into the distribution of the birth rate data across different time periods. The plots show that the highest number of births typically occur during the third quarter of the year, with September having the highest median number of births. Furthermore, it can be observed that most births tend to occur on weekdays rather than weekends, with the median number of births on weekdays being higher than during the weekends.
Using heat map was another useful visualization tool that helped identify any correlations between different variables in the dataset. In this case, the heat map shows that the month of February has the lowest number of births, followed by November and April. On the other hand, the highest number of births occurred between the years 2006 and 2008, particularly during the month of August.
Finally, the analysis of the lowest number of births occurring on certain dates, such as December 25 and July 4, suggests that these dates may be associated with cultural or societal factors that discourage births. The observation that Friday the 13th has a higher frequency of births than weekends is an interesting finding that may warrant further investigation into any cultural or superstitious beliefs surrounding this date.
Overall, these visualizations and observations provide valuable insights into the birth rate data and can inform the selection of appropriate machine learning features and algorithms for predicting future birth rates.
Machine Learning Forecasting
Two models were used in forecasting this time series data:
- Autoregressive Integrated Moving Average (ARIMA)
2. Facebook Prophet.
ARIMA MODEL
The auto_arima() function from the pmdarima package is used to determine the best set of (p, d, q) orders for an ARIMA model based on the input time series data. The ‘m’ parameter is set to 12, indicating that the data has a seasonal pattern with a period of 12 months.
By minimizing the residuals, the model is better able to accurately predict the values of the time series. The AIC (Akaike Information Criterion) is a measure of the quality of the model and takes into account both the goodness of fit and the complexity of the model. A lower AIC score indicates a better model fit.
The over goal is to find the ARIMA model with the best balance of accuracy and simplicity, as represented by the optimal values of p, d, and q.
The resulting best order values were determined to be (12, 1, 1), with a seasonal period of 12. These values were used to create an instance of the ‘ARIMA’ model, which was fit to the training data using the ‘fit()’ method.
The order (12, 1, 1) represents an ARIMA model with 12 autoregressive terms, 1 difference order, and 1 moving average term. The difference order of 1 indicates that the first difference of the time series was used to make the data stationary.
FACEBOOK PROPHET MODEL
Facebook Prophet is a time series forecasting model developed by Facebook’s Core Data Science team. Prophet is built on a decomposable model that allows for additive non-linear time series trends, daily seasonality, and holiday effects.
An instance of the prophet class from the facebook Prophet library is created and assigned it to a variable with the ‘interval_width’ parameter is set to 0.95, which sets the width of the uncertainty intervals for the forecasted values to 95%.
The visualization of the Facebook Prophet model shows the actual values of the time series data and the forecasted values for the future time periods, with uncertainty intervals.
- RESULT
Our analysis of birth rate data using both the ARIMA and Prophet models provided valuable insights into the seasonality and trend patterns of the data we worked with:
ARIMA model revealed a gradual increase in the birth rate over time, while also indicating a clear seasonality pattern.
This seasonality pattern was further confirmed by the Prophet model, which also predicted a gradual increase in birth rates in the future.
CONCLUSION
The insights of the time series forecasting can have practical applications in various fields:
- In policy making, forecasting can provide guidance for developing policies that promote family formation or increased access to healthcare for women based on declining birth rates.
- In business planning, forecasting can help companies anticipate consumer demand during specific periods, enabling them to plan and stock up accordingly.
- Forecasting can also assist organizations in efficiently allocating resources.
E.g: hospitals could use a forecast of higher birth rates in the summer months to schedule additional staff or allocate more resources to the maternity ward.