HDSC Winter ’22 Capstone Project: Politics and Governance

HamoyeHQ
5 min readMay 4, 2022

Election, COVID, and Demographic Data by County: What Factors Influenced the USA 2020 Election?

A Project by Team Arima

The US presidential election takes place every four years on the first Tuesday in November. Candidates must be at least 35 years old, born in the United States and have lived in the US for at least 14 consecutive 14 years, in order to be eligible.

Traditionally, candidates make their intention to run for president public the year before the election takes place. Since there is no national authority which conducts the elections, local authorities organize the election with the help of thousands of administrators.

Various factors have been documented to affect the electoral process in the USA, ranging from voter’s sentiment, contestant’s political party, current economic situation and others, just to mention a few. In the 2020 election, the electoral process was further impacted by the Covid-19 pandemic which was at its peak infectious/transmission phase. With this in mind, we intend to classify and predict the factors that influenced the USA 2020 elections.

Aims and Objectives

This project aims at understanding the factors that contributed to or influenced the outcome of the USA 2020 election. The aim of this article is to put you through the project workflow undertaken by members of Team Arima. GitHub link

Data collection

The dataset used for this project was downloaded from the Kaggle database. The data compiles country statistics for the 2016 and 2020 elections, COVID cases and deaths, demographic numbers, and economic numbers. The dataset can be accessed from this link.

Data importation and pre-processing

The raw data after being imported into the jupyter notebook was scrutinized thoroughly. The dataset was seen to contain both categorical and numerical variables. The raw dataset contains about 4867 rows and 50 columns.

Data cleaning

This process involved taking the best decision on how to deal with incomplete, inaccurate, and irrelevant records from the source data. It was noticed that data from the 2016 voting trend for 1522 counties were missing. Hence, we decided to drop them as they cannot be used to predict how vote share changes because of COVID and other factors. Also, some columns were converted to percentages to remove any implicit dependence on other variables.

Exploratory Data Analysis (EDA)

Exploratory data analysis is performed on a given dataset to gain more insights about the dataset in terms of its summary statistics and visualizations of relationships that exist among the various variables in the dataset. Here, scatterplot maps were plotted to determine the regions in the USA that were predominantly republicans, and those that were democratic, during the 2020 election.

Here, we could discern that cities with high populations are more inclined towards democrats. Whereas the rural population is more inclined towards republicans.

Checking the effect of the death rate on democrats vote shift, we could see that counties where Trump received the most votes by a massive margin have a higher death rate than counties where President Joe Biden won in a relative landslide. It was speculated that the salience of the pandemic will be a major problem for Trump’s electoral campaign because an overwhelming number of voters judged that he had mishandled the crisis.

Checking for the effect of unemployment on voting percentage, we could see a small trend between the unemployment ratio and vote percentage for Biden. It seems that counties with higher unemployment rates have high democratic vote share.

It is clear from the plot above that the counties with higher percentages of men are republican dominated. From this, we can assume that men are more inclined towards being republicans than women.

Finally, we were able to discern from the plot above that Trump is more popular in white and latino communities. This divide seems to be deepening further in the 2020 elections. Trump had a majority in almost 70% of the white dominant counties. He has also become quite more popular in Latino community compared to 2016. African Americans have always been loyal democrats. However, the democrat’s popularity among African Americans seems to have decreased in recent elections; the same goes for the Asian communities too. The native vote share has also seen a huge jump in favor of Biden in the 2020 election.

The vote in DC has always been heavily democratic; no republican has ever won an electoral vote. And as we can observe from the chart, the same trend continues in the 2020 elections too. Wyoming and West Virginia are republican paradises, and they won by overwhelming majority in both elections. Biden won back the much fabled “blue wall” Pennsylvania, Wisconsin, and Michigan that defected to Trump’s side in the 2016 election. He also won by breaking through in two traditionally republican states in Arizona and Georgia, which have gradually shifted blue in the Trump era. Trump’s winning margin increases in Florida, which was considered to be a pivotal swing state. Is Florida a republican state now?

Model training

Five different models were used in training the dataset; linear regression model, lasso regression, random forest regression, gradient boosting regression, and artificial neural network (ANN). The cleaned and processed dataset was first separated into a target variable set and features variable set. Then, the dataset for training the model was filtered by removing columns that might not be useful for training models. Model definition and evaluation was performed, and the model was then fitted to the processed dataset. Predictions and errors were carried out on the validation dataset. The mean square error and R2 score was chosen as the parameter for testing accuracy and precision of the model.

Results

Best results were obtained from the linear model, gradient boosting regression and ANN Model.

The best parameter from the hyperparameter tuning was obtained at a learning rate of 0.005

--

--

HamoyeHQ

Our mission is to develop an army of creative problem solvers using an innovative approach to internships.