Bias in Headlines

Every day, ~20.9 million newspapers and magazines are circulated and read by people of a huge variety of political stances, religious beliefs, life experiences, and identities. Yet, amongst the turmoil of current events today and the many arguments about media bias, one form of bias is less spoken about, which is how headline wording and how often a word is mentioned can impact the peoples’ thoughts and beliefs.

Here, I break down headlines from the 2016-2020 presidency and how headlines, wording, and various trends came about during the time of President Trump’s term. Datasets were primarily obtained from Harvard’s Dataverse, a CNN headlines dataset from Kaggle, web-scraped from various news websites, from the Tyndall Report, fivethirtyeight, and so on.

The interactive timeline above is a breakdown of articles from 2016 to 2020, the top thirteen words (that contain meaning) in headlines during that time range, and a view of when they were most frequently used. Click on any of the words or hover over any point in time on the timeline to preview what headlines were published on that day. Below is a condensed graph of the same words and a timeseries of their frequencies (scaled) 0 over time.

Here’s an example of how wording in headlines can change perception of blame or wrongdoing: when a headline says something like “[X] group massacres [Y] civilians,” it is natural for us to assign blame onto group X. On the other hand, a headline like “Thousands if [Y] killed” assigns no blame. More quantitatively, we can look at the times the words “slaughter” or “horrific,” for example, is used to describe group [X] or group [Y] deaths. Alternatively, we can look at the difference in the amount of coverage on the death toll of two groups to see how sometimes larger media trends can impact public perception that isn’t driven by very obvious bias in individual reporters or news sources. (A good article on this for a very active current problem is here.)

For the 2016-2020 election period, there is less severity in words and impacts, but there is still a lot to be learned about how bias can change perception. Here, we look at some of the things we can learn from more active news events and see if we can draw causal relations between the two.

Look at graph one, there actually seems to be quite a weak correlation between the frequency of the word "trump" and his approval rating over time.

Here, we compare the coverage of the top news topics of 2016 and how much they were covered (in units of minutes on screen), which TV news outlets covered them, and so on. It's interesting to see how Trump's TV coverage more than doubled that of Clinton's, yet there seemed to be an equal split of time between the three news outls, CBS, ABC, and NBC.

Finally, here are some more graphs I think would be interesting to explore, but might have some limitations:

Specific words we know are involved with bias (for example, assigning blame or disapprovement): "stormed out", "slaughtered", "complains". Any words that might have connotations that show the bias of the reporter. Here, the limitation lies much in the words chosen as having biased connotations and if they could be a complete list.
How often a word is used in headlines and how that correlates with the sentiment of the article. You can actually look below here and see a graph of sentiment (objectivity and positivity) for the headlines. As you can obviously tell, it's not very accurate nor is it very useful data to draw conclusions from.
Below, we also have graphs of frequency for various words in the CNN dataset (top) and overall headlines (below). These are not normalized so can be slightly inaccurate, but only serves to show just how much "Trump" was mentioned in 2016 — which even if it felt like a lot of it was negative sentiment, lead to overall greater popularity.

Sentiment graph using TextBlob word analysis, graphed on polarity and subjectivity, with any zero'd values removed. As you can tell, there's a lot of variance and not much that can be gathered from the headlines alone. A future next step would be to analyze the wording of the entire article.

Exploration of Wording in Headlines