Russia, Twitter & Authenticity: Establishing Credibility Metrics

Team Members

Tim Groot - Sophie Minihold - Jessica Robinson - Manuel Schneider - Joanna Sleigh - Dydimus Zengenene

Contents


Summary of Key Findings

In this project we first established a systematic way of assessing the features of tweets, accounts, and account activity that enabled the creation of a credibility metric. Second, we found that this metric applied to 2017 tweets by accounts believed to be tied to the Russian IRA identifies that high engagement is correlated with credible content. Thus, the tweets by Twitter accounts suspected of being from Russia’s IRA are a post-truth achievement in the sense that:

  • At first glance, most tweets look credible.

  • Credibility scores are related to the retweet count.

  • Most credible-looking tweets circulated the most.

1. Introduction

Twitter is an important social network site that users increasingly use as a news source (Kwak, Lee, Park & Moon, 2010). A major area of concern is that alongside information from traditional news sources, mis/dis-information is being spread on Twitter (Lazer, Baum, Benkler, Berinsky, Greenhill, Menczer et al., 2018). One recent example is the case of the disinformation campaigning by Russia during the run-up to the election of U.S. President Donald Trump – a topic of significant public discourse (Allcott & Gentzkow, 2017; Knight Foundation, 2019). Concurrently the term 'fake news’ (used to describe a variety of materials, including both misinformation as well as deliberate hoaxes) has become mainstreamed (Phillips, 2018). In response, counter-campaigns that range from fact-checking to media literacy have been increasingly implemented at national and grassroots levels. Yet as Marwick (2018) notes, these approaches have not stemmed the flow of so-called 'fake news' and its ilk on Twitter. Given this context, the question arises of what makes tweets credible? What is meant by credibility? And how can it be assessed?

1.1 Understanding credibility on Twitter

What is credibility? When one looks up the term in the Oxford English Dictionary (2019) the term 'credibility' is defined as the objective and subjective components of the believability of a source or message. It is both objective, based on facts and evidence, as well as subjective, based on opinions and feelings. Credibility is also closely related to concepts of trust, quality, authority, as well as persuasion. The process of establishing credibility entails users making judgements. These can be made consciously, after much consideration, while others are based on appearance and more intuitive (Lazar, Meiselwitz and Feng, 2007). Credibility is thus situation-specific and culturally-bound.

In the land of social media, specifically in the Twittersphere, information credibility is difficult to judge. This is partly due to the absence of a filtering mechanism that ensures good quality of information (such as the peer review process academic journals use). As well, there is the inability to trace information back to a reliable source, such as a newspaper. There is also the fact that tweets are by nature social, with their social value signified by the number of a Tweet's retweets and favorites and the user's friend to follower ratio. The credibility of a tweet can therefore be judged not just according to its content, but also by its popularity and the grooming / influence of its author. Some scholars have thus conflated credibility with engagement metrics (Menchen‐Trevino and Hargittai, 2010). In doing so, they acknowledge that the amplification process that is not simply organic, but one that is dictated by Twitter’s algorithm which facilitates the propagation of heavily engaged-with tweets (Lee, 2014; Patel, 2014; Stein, 2015). Marres (2018) argues that fake-news mitigation strategies to verify ‘The Truth’, such as fact-checking sites, are insufficient in part because they do not address this algorithmic selection process, and therefore do not address a key component of the post-truth climate in a social network site such as Twitter is a claim to authenticity via profile quality and consistency.

1.2 The game of credibility in disinformation campaigns

What does credibility look like in the context of a disinformation campaign? In this project, we were concerned with developing a credibility metric specific to our data set of Twitter accounts associated with Russia’s Internet Research Agency (IRA), accused of propagating 'fake news' and disinformation internationally (Twitter, 2018). We assumed that as successful disinformation agents must also play by Twitter’s rules, they may even 'game' the algorithm in a way that technically makes them ‘real’ or ‘authentic’ and arguably, credible. We identified features, and signals at both the account and tweet level, that successful disinformation efforts may have in common. Our hope was to come one step closer to a potential near real-time tool to scan for disinformation campaigns more reliably on Twitter in our heightened post-truth climate.

Our approach drew upon the work of Gupta, Kumaraguru, Castillo & Meier, (2014) who to assess the credibility of twitter content developed TweetCred; a tool attributing credibility scores based on various characteristics. We took a similar approach, but built a credibility metric from a specific set of misinformation data.

2. Initial Data Sets

The data set used in this project came directly from Twitter:

On 17 October Twitter, in a blog post entitled ‘Enabling further research of information operations on Twitter’, released data sets containing: “3,841 accounts affiliated with the IRA, originating in Russia, and 770 other accounts, potentially originating in Iran. They include more than 10 million Tweets and more than 2 million images, GIFs, videos, and Periscope broadcasts, including the earliest on-Twitter activity from accounts connected with these campaigns, dating back to 2009.”

3. Research Questions

  • RQ1: How can the credibility of a tweet and user be determined?

  • RQ2: To what extent can qualitative and quantitative features of tweet text, users, and account activity predict how much a tweet is retweeted?

4. Methodology

The Twitter data sets contained more than 10 million tweets that the company believes were created by accounts connected to the IRA. This data contains information on which tweets received the most favorites and retweets. In addition, the data contains fields on a number of features of each tweet and the user who created it, for example the tweet text, hyperlinks in the tweet, the user’s location and profile description, when the account was created, and other metadata. Our goal in this project was to investigate what about these tweets made them credible, and indeed if it is even possible to predict what tweets will be perceived as credible.

4.1 Data collection and sampling

We accessed the data through the Digital Methods Initiative’s Twitter Capture and Analysis Toolset (TCAT). Although the data goes back to 2009, in this project we focused on the data from 2017. This was a period of emerging revelations about misinformation campaigns, when users would theoretically be more on guard. Yet the IRA accounts continued to successfully propagate content on the platform perceived as credible.

To make our sample, we ordered users during this period by retweet count and then selected 15 users to study: five from among the top of the list (the most successful), five from the middle, and five from the bottom. The top most three and bottom most three where deliberately omitted to minimize the outlier effect. Our 15 user accounts comprised 13,371 tweets from which a random sample of 498 tweets was then chosen for further manual analysis.