Privacy and Twitter:

(Geo)Locating the Debate

Team Members

David Moats

Ben Koeck

Michael Van Der Haagen

Vera Franz

Kevin Deluca

Tommaso Elli

Adrian Bertoli

Introduction

In this project, we attempted to take stock of the debate on privacy and surveillance a year after the Edward Snowden NSA leaks. On Jun 3rd 2013, NSA contractor Edward Snowden met with journalists Glenn Greenwald and Laura Poitras and leaked confidential NSA files to them. Revelations about the US government's extensive data collection were published on the 5th of July and further revelations about Verizon’s phone tapping were published on the 6th. This caused a massive stir on social and mainstream media but we were curious after the initial buzz if the issue of surveillance was still politicized, one year on.

This topic and data set was dealt with the previous summer in <a href="https://wiki.digitalmethods.net/Dmi/DetectingTheSocials" target="out">Detecting the Socials</a> but the focus was on extracting forms of sociality through social media, rather than the substantive debate itself. What the previous project did find was that before Snowden, the privacy query was focused on celebrity and personal topics but became highly politicised and specific following the leaks. But quickly It was particularly interesting to look at this topic a year later because of the possibility of anniversary commemorations and also Glenn Greenwald’s book tour.

Research Questions

1. Characterising the debate: Jun 2013 and now

How has the debate developed one year on? Is it driven more by news or by professional campaigners? Is it in contrast an individual, personal issue? There were also sub issues such as whistleblower advocacy, legal reform in the US, and other ways of scoring political points off the controversy. Which is most prevalent then and now

2. Who are the debate leaders (institutions or individuals)

Along the same lines, which actors, institutions or sources driving the debate? Is the UK merely reproducing the US sources, compared with Europe?

3. Geolocate the debate: US versus UK?

Finally, our topic expert, Vera Franz was particularly interested in whether or not the topic of privacy had remained international or focused more on US campaigns aimed at changing policy. Others were curious if this were a regional issue, city versus rural or Red State versus Blue State.

Methodology

For this project we used the TCAT data set Privacy which queries for ‘privacy’, ‘surveillance’ and ‘facebookprivacy’. There was another data set which scraped for a different set of terms around Prism NSA and Snowden but this was not consistent for both periods and also would not contain the more personal, less politicised elements of the debate.

We divided up the data set into 3 weeks for each year: before, during and after the snowden leaks and before, during and after and during the anniversary.

Week 1: 31.05.2013 - 05.06.2013

Week 2: 06.06.2013 - 12.06.2013

Week 3: 13.06.2013 - 20.06.2013

Week 1: 31.05.2014 - 05.06.2014

Week 2: 06.06.2014 - 12.06.2014

Week 3: 13.06.2014 - 20.06.2014

We established subgroups based on specific digital objects: users and hashtags which we investigated using networks and qualitative content analysis. We used mention networks and co-hashtag networks to profile the debate and debate actors respectively.

Simultaneously, we investigated the possibilities for locating the debate using different types of geographic traces: geocoded coordinates, Timezones, language and UTC offset.

Hashtag Profiling

From the co-hashtag networks, we could clearly discern a shift in the framing and agenda setting of the issue as #privacy became attached to more political, as opposed to personal hashtags. In the second and third weeks there was the emergence of party political hashtags such as #tcot (top conservatives on twitter) and #lcot (top liberals) along with other non-privacy specific causes like #ows. Screen Shot 2014-07-08 at 16.46.35.png

Screen Shot 2014-07-08 at 16.46.38.png

User Profiling

Based on the mention networks and qualitative analyses of the profiles of key actors, we established the following codings of actors:
  • Snowden and Whistleblower advocates
  • Advocacy/Reform legislative reform
  • News / tech
  • Politics
  • Personal Privacy
  • Commercial (privacy software)
  • Sepeda Gunung

Aside from the committed privacy and security activists, we also found a high proportion of both mainstream media and established political actors who ‘jumped on the bandwagon’ to score political points - often in relation to political parties.

These were the key, most mentioned actors for the weeks studied:

Screen Shot 2014-07-08 at 16.39.15.png

2013 Data

Week 1: eff, privacyint, txitua, cyberrights, bettybrowser, privacycamp, canaryorg, aclu, ioerror, privacysurgeon, mashable, liberationtech Not included because not directly relevant: sirbasstoven, paulbernaluk, copyright_italy

Week 2: guardian, ggreenwald, algore, nytimes, senrandpaul, guardianus, kimdotcom, anonyops, eff, barackobama, ioerror, trevortimm, Thomas_drake, normative, democracynow. Not included: kimjongnumberun, damienfahey, stephenathome

Week 3: thomasdrake, ggreenwald, eff, privacyint, aclu, ioerror, csoghoian, guardian, washingtonpost, liberationtech, trevortimm, barackobama, youranonnews

Screen Shot 2014-07-08 at 16.39.24.png

2014 Data

Week 1: Very activist campaign oriented, are clearly dominating the discussion - fightfortheftr, youowntheweb, eff, youranonnews. News based towards the anniversary: the guardian, techsites such as wired, ap, guardiantech

Week 2: Less campaing based, more activist base (privacyint, privatelocknet, doctorow, arusbridger). Guardian mentioned more than week 1. discussion moving into more specific topics. Other important actors are fightfortheftr, flemingjude, eff, and paulnemitz.

Week 3: Campaigning and activist hype is over. We are left with experts and clear subconversations. Canadian (mgeist) / European (casperbowden, tomwatson, paulnemitz) / US (oaklandprivacy, ) divide. Transpartisan public policy debate between liberals and conservatives in the US.

Before Snowden, the issue space was very activist based, whereas in week 2 and week 3 there is enormous noise, particularly coming from Newsy actors. There were also several leaders in the discussion. Similar to week three. Within the trending topic it has been found that spam or unrelated actors are using the hashtag to promote other products or services.

In 2014 there was a rise in discussion before the actual anniversary, but the peak quickly disappears and discussions are forming and can be obtained more clear.

In terms of the presumed noise from spam bots and promotional sources we noticed a market discrepency between the rate of tweeting and the amount of mentions received. At one end of the scale were actors who only received mentions but did not participate (public figures like president obama) and at the other end were mainly bots who tweeted frequently but were ignored by most other users. In future a tweet mention ratio could be used to filter out some of the less relevant accounts.

Screen Shot 2014-07-08 at 16.37.29.png

Locating the debate:

Having established key actors and hashtags we then attempted to bring them together using using location. We notice in week three of 2014 there there was a clear geographic dimension to the debates in the data set:

EU: tom_watson, greenjennyjones, robevansgdn, raycorrigan, paulbernaluk, casparbowden, julianhuppert, jamesbruk, gurchetangrewal, cyberseckent, arusbridger, glynmoody, ianbrownoii, cnil, netizenrights (Italy subcluster: roccopanettait, annamasera, montecitorio)

Canada: mgeist, veritas_truth_, lawscribes, heatherrenwick, minpetermckay, citizenlab, josh_wingrove, privacylawyer, canadacjfe, jennbarrigar, chuddles11, patondabak, pdmcleod, caparsons, cippic, mcinnescooper

USA: repzoelofgren, repthomasmassie, pmocek, kevinbankston, senrandpaul, mostrolenk, 4yourfreedom, elizabeth_joh, joshgerstein, seattleprivacy, oaklandprivacy, astepanovich, a_greenberg

Australia: Asher_Wolf

We then used this subset of actors to show the change in their top hashtag use before and after snowden, with the RAW bubble diagrams:
Screen Shot 2014-07-08 at 16.37.31.png

Screen Shot 2014-07-08 at 16.37.34.png

And then compare the hashtag usage across the different national debates:

Screen Shot 2014-07-08 at 16.37.38.png

One other key event in the 2014 weeks was the publication of Glenn Greenwalds book No Place to Hide. We decided to test the viability of using geocoding data (primarily generated by smart phones) to locate discussions of Greenwald on his book tour. As we can see from the map below the tweets to some extent follow the path of his book tour down the east coast of the united states, but there is also significant international commentary as the tour goes on. This is because the book tour is not just a localised event but also includes tv appearences so is never fully grounded. Anonymous and Wiki Leaks founder Julian Assange both engaged in commentary on the book.

Screen Shot 2014-07-08 at 16.38.16.png

We had some skepticism towards the way place is mapped through Twitter so as an experiment we also decided to compare geocoding to the self-reported place listed on user profiles. Self-reported places were assigned a geocordinate and were connected to geocoded Tweets, both of which were placed on a map using Gephi. The results are striking in that Many users list their location as Alaska who are in the US or South East Asia - this is possible because Alaska would appear first on a drop down menu.

Screen Shot 2014-07-08 at 16.56.30.png

We finally attempted to locate the debate using another bit of Twitter data which is timezones (in relation to UTC time). This could be seen as more trustworthy than self reported location and more ubiquitous than geocoding. With a combination of language data and timezones most countries and regions within larger countries could be identified. In the map below we use a User-Hashtag Bi-Partite network of the 2013 dataset and plotted the users by Timezone allowing the Hashtags they used to gather around the relevant time zones. Languages other than english were colour coded.

Screen Shot 2014-07-08 at 16.38.10.png

As one can see the debate centres around the east coast of the US and the UK. Surveilance, Privacy and Prism are located in the atlantic between the two, while NSA is firmly located in the US debate. The US debate also has tcot and teaparty while #bigbrother is more linked to Britian - home of Orwell.

Conclusions

The privacy debate can helpfully be located thematically using the established methods of co-hashtag analysis and social networks, although more work needs to be done in both methods to separate out the concerted advocacy from the spam and newsy coverage. Locating the debate geographically is however more of a challenge given the data Twitter makes available and more needs to be done in this area.

Of the methods we attempted, we learned that geocoding to specific locations may be unhelpful because it misunderstands the international distributed character of issues on Twitter. Timezone + languge give a less ridgid understandings of regions which may be helpful so long as regions are allowed to spill into each other (as in the final diagram) rather than thinking of them as separate. But each of these methods produce different notions of place, which cannot be resolved into the ridgid logic of latitute and longitude.
Topic revision: r2 - 19 Jan 2017, UjangJr
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki? Send feedback