Exploration of Wording in Headlines


4.032 | Data Visualization | Spring 2024

Every day, ~20.9 million newspapers and magazines are circulated and read by people of a huge variety of political stances, religious beliefs, life experiences, and identities. Yet, amongst the turmoil of current events today and the many arguments about media bias, one form of bias is less spoken about, which is how headline wording and how often a word is mentioned can impact the peoples’ thoughts and beliefs.


Here, I break down headlines from the 2016-2020 presidency and how headlines, wording, and various trends came about during the time of President Trump’s term. Datasets were primarily obtained from Harvard’s Dataverse, a CNN headlines dataset from Kaggle, web-scraped from various news websites, from the Tyndall Report, fivethirtyeight, and so on.


The interactive timeline above is a breakdown of articles from 2016 to 2020, the top thirteen words (that contain meaning) in headlines during that time range, and a view of when they were most frequently used. Click on any of the words or hover over any point in time on the timeline to preview what headlines were published on that day. Below is a condensed graph of the same words and a timeseries of their frequencies (scaled) 0 over time.

Here’s an example of how wording in headlines can change perception of blame or wrongdoing: when a headline says something like “[X] group massacres [Y] civilians,” it is natural for us to assign blame onto group X. On the other hand, a headline like “Thousands if [Y] killed” assigns no blame. More quantitatively, we can look at the times the words “slaughter” or “horrific,” for example, is used to describe group [X] or group [Y] deaths. Alternatively, we can look at the difference in the amount of coverage on the death toll of two groups to see how sometimes larger media trends can impact public perception that isn’t driven by very obvious bias in individual reporters or news sources. (A good article on this for a very active current problem is here.)

For the 2016-2020 election period, there is less severity in words and impacts, but there is still a lot to be learned about how bias can change perception. Here, we look at some of the things we can learn from more active news events and see if we can draw causal relations between the two.


Look at graph one, there actually seems to be quite a weak correlation between the frequency of the word "trump" and his approval rating over time.

Here, we compare the coverage of the top news topics of 2016 and how much they were covered (in units of minutes on screen), which TV news outlets covered them, and so on. It's interesting to see how Trump's TV coverage more than doubled that of Clinton's, yet there seemed to be an equal split of time between the three news outls, CBS, ABC, and NBC.

Finally, here are some more graphs I think would be interesting to explore, but might have some limitations:

Sentiment graph using TextBlob word analysis, graphed on polarity and subjectivity, with any zero'd values removed. As you can tell, there's a lot of variance and not much that can be gathered from the headlines alone. A future next step would be to analyze the wording of the entire article.