Analysis of Congressional Speeches in the 43rd-111th United States Congresses
By Adam Behnke, James Mahler, Parvathi Mayyappan, Youna Song
Advised by Dani Nedal and Zach Branson
The Data
We have been given over 30GB of text data from speeches in the United States House of Representatives and the United States Senate. The speeches are from the 43rd to the 111th Congresses, or 1873 to 2010. The speeches are separated by their Congress number, with each Congress lasting two years. The data we have include the age, gender, political party, and state of the speaker, as well as the text of the speeches themselves. Most of our analysis will involve analyzing the way different countries are being talked about, including the number of times each country is mentioned, and comparing this information over time and between political parties.
Overview of Application
- Exploratory Analysis: This tool allows users to select a country or a set of countries and see a time series plot showing the number of times the selected countries were mentioned in each Congress. This allows the user to see how certain countries become more or less crucial to United States foreign policy over time, as well as compare the importance of different countries over time.
- Heatmap of Country Mentions: This application allows users to select a Congress, and it will display an interactive world heatmap showing the number of times each country was mentioned in the particular Congress. There are also filters that allow you to display this map for just one of the political parties, or to visualize the differences in the number of mentions between the two political parties. Countries that are black and labeled N/A on the heatmap are countries that did not yet exist in the selected time period. Plots may take several seconds to load.
- Wordclouds: This allows users to see which words are most often mentioned together with certain countries. We also include a tab to compare and contrast the words the two political parties are using when talking about countries, as well as a tab to compare the words being used with certain countries over time. We have chosen a small number of relevant countries for these features due to storage and time limitations.
- Keyness Analysis: This tool allows users to analyze the relationship between country mentions and political party. There are two specific ways this can be done: users can select a specific Congress, and all relevant countries and their associated probabilities will be displayed, or users can select a country/set of countries and see the ways these probabilities change over time.
Analysis of The Mentions of Each Country
The following plot enables the analysis and comparison of the number of times that countries were mentioned across the 43rd to 111th congresses. The count feature allows us to choose between counting the total number of times that a particular country was mentioned in each congress and the number of speeches in each congress that mentioned a particular country. The line type feature allows us to choose between illustrating the actual number of mentions or a trend line over the actual number of mentions, computed from local polynomial regression (loess) fitting. Finally, we are able to filter the data for the number of mentions to those of a particular political party.
Heatmap of Country Mentions
NOTE: Please allow several seconds for plots to load.
Here we can see the number of times each country was mentioned during a particular congress. We have attempted to account for country name aliases, such as the capitol of each country, to get the most accurate counts possible. I have put the number of mentions on a log scale to make the visualiztion more robust to outliers. You can also select a specific political party to view just that party's number of mentions, or you can select 'diff' to view a heatmap of the difference between Republicans and Democrats. The 'diff' plot uses the difference in proportions for each party (i.e. if half of a country's mentions were from Republicans and half from Democrats, the value on the map will be zero). This heatmap is also interactive; you can hover your cursor over a country to see the country's name and data. The 'counting method' option allows you to either count every single mention of a country throughout a congress, or to only count the number of speeches a country was mentioned in (i.e. if China is mentioned 10 times in one speech, it would only count as one mention). Countries that are black indicate that this country did not exist during the time frame selected. The earliest time period that map data was available for is 1947-1949, so that is the earliest Congress available for this tool.
Country Mentions
In this tab we can explore the words said in context of several country mentions in the form of a wordcloud. Once selecting the country and number of top spoken words to remove, you can hover over words in the wordcloud to see the exact frequency in which the word was spoken.
Comparing Various Country Mentions by Party
In this tab we can compare and contrast what's said about various countries by members of different political parties. The first plot compares distinct and unique words said by each member in the context of the country. In this plot, blue represents the Democratic party contextual words and red is the Republic party contextual words. The second wordcloud is the words that were said by both parties in reference of that country.
Comparing Various Country Mentions Over Time
Compare contextual info around country mentions in congress over different periods of time. The first plot shows the words that are unique to each time period. The first time period is in light blue and the second in dark blue. The second word cloud is the most freqent similar word between the 2 time periods.
Keyness Analysis of Country Mentions on Political Party
Here we present an analysis of the keyness of country mentions for political party. The first plot illustrates the relationship between mentioning particular countries in a speech and being a democrat or republican, for each congress's corresponding two-year period. The relationship is exemplified through a probability, which is computed by a logistic regression model on the countries mentioned in each political party's speeches in a particular congress. The second plot allows us to analyze this relationship over time by visualizing the probability that a speaker was a democrat given that they mentioned a certain country, across each congress.