Track the Trackers Workshop

by Anne Helmond & Alexei Miagkov (Ghostery) for the Digital Methods Summerschool 2013

by Anne Helmond and Carolin Gerlitz for the Digital Methods Summerschool 2012

Introduction

In this short workshop you will learn how to map the tracking ecology related to a set of websites using the DMI Tracker Tracker tool and Gephi. The Tracker Tracker tool was conceived at the Digital Methods Winterschool 2012 in January. It is build on top of the anti-tracking plugin www.ghostery.com and allows to identify the invisible web, devices that track user activities online and the services associated to them. In order to prepare for this workshop we recommend reading the related projects and materials listed below. Please download and install Gephi at https://gephi.org/ before the workshop starts so you can also learn how to visualize your results.

1. The Wall Street Journal - What They Know

The whole file: http://online.wsj.com/public/page/what-they-know-digital-privacy.html

Marketers are spying on Internet users -- observing and remembering people's clicks, and building and selling detailed dossiers of their activities and interests. The Wall Street Journal's What They Know series documents the new, cutting-edge uses of this Internet-tracking technology. The Journal analyzed the tracking files installed on people's computers by the 50 most popular U.S. websites, plus WSJ.com. The Journal also built an "exposure index" -- to determine the degree to which each site exposes visitors to monitoring -- by studying the tracking technologies they install and the privacy policies that guide their use.

http://blogs.wsj.com/wtk/

The Tracking Ecosystem

Surfing the Web kickstarts a process that passes information about you and your interests to tracking companies and advertisers. See how it works.

http://graphicsweb.wsj.com/documents/divSlider/ecosystems100730.html

Tracking the Trackers: Our Method

To determine the prevalence of Internet tracking technologies, The Wall Street Journal analyzed the 50 most-visited U.S. websites, as ranked by the comScore Media Metrix report from October 2009.

http://online.wsj.com/article/SB10001424052748703977004575393121635952084.html

What Facebook Apps Know

The Wall Street Journal analyzed 100 of the most used applications that connect to Facebook's social-networking platform to see what data they sought from people. See what permissions they ask users to grant them. How Grabby Are Your Facebook Apps? http://online.wsj.com/article/SB10001424052702303302504577328363924309098.html

2. DEFCON 19: Tracking the Trackers: How Our Browsing History Is Leaking into the Cloud (Video)

Speaker: Brian Kennish Founder of Disconnect. What companies and organizations are collecting our web-browsing activity? How complete is their data? Do they have personally-identifiable information? What do they do with the data?

The speaker, an ex--Google and DoubleClick engineer, will answer these questions by detailing the research he did for The Wall Street Journal (http://j.mp/tttwsj) and CNN (http://j.mp/tttcnn), talking about the crawler he built to collect reverse-tracking data, and launching a tool you can use to do your own research.

http://www.youtube.com/watch?v=BK_E3Bjpe0E

3. Mozilla’s Collusion. Discover who’s tracking you online (Browser add-on)

Collusion is an experimental add-on for Firefox and allows you to see all the third parties that are tracking your movements across the Web. It will show, in real time, how that data creates a spider-web of interaction between companies and other trackers.

http://www.mozilla.org/en-US/collusion/

4. Gary Kovacs: Tracking the trackers (Video)

As you surf the Web, information is being collected about you. Web tracking is not 100% evil -- personal data can make your browsing more efficient; cookies can help your favorite websites stay in business. But, says Gary Kovacs, it's your right to know what data is being collected about you and how it affects your online life. He unveils a Firefox add-on to do just that.

Gary Kovacs is the CEO of the Mozilla Corporation, where he directs the development of Firefox.

http://www.ted.com/talks/gary_kovacs_tracking_the_trackers.html

5. Tracking the trackers (The Guardian crowdsourcing project)

Tracking the trackers: What are cookies? An introduction to web tracking

What exactly are web cookies and what do they do? This guide gives you an introduction to help you understand more about our Tracking the Trackers project.

http://www.guardian.co.uk/technology/2012/apr/23/cookies-and-web-tracking-intro

Tracking the trackers: who are the companies monitoring us online? - interactive

The red circles are the top ten most prolific tracking companies found in our Tracking the Trackers data. Click on their names to discover more about them. The blue circles are the 100 most popular websites that use them. Guardian readers helped to collect information on more than 7,000 websites and the services they use that employ tracking.

http://www.guardian.co.uk/technology/interactive/2012/apr/23/tracking-trackers-companies-following-online

6. Tracking the Trackers: Where Everybody Knows Your Username (Background reading)

Background on Third-Party Web Tracking and Anonymity by The Center for Internet and Society at Stanford Law School, a leader in the study of the law and policy around the Internet and other emerging technologies.

http://cyberlaw.stanford.edu/node/6740

Methodology and early results: http://cyberlaw.stanford.edu/node/6694

To capture on-line behavior, thousands of HTTP cookies are sent daily to web browsers to identify users and gather statistical knowledge about tastes and habits. The cookie consensus website hosts a collection of cookies that Andrea Fiore received while surfing through the first 50 entries of the Alexa directory of News sites. In the future it will also host a software that will give the users the capability to create their own cookie collections.

http://vimeo.com/26760589

8. Track the Trackers (tactical media project)

TRACK-THE-TRACKERS---” is a network installation consisting of tactical media components. The work makes use of existing personal technologies in conjunction with the satellite GPS infrastructure to provide participants with an expanded audible (not a visual) experience of the proliferation of video surveillance in the urban public sphere. The internet platform http://www.t-t-trackers.net serves as an exchange point for these coordinates: the data gathered in this way is made available to the other participants by uploading it to the internet platform. The project prompts participants to think beyond the protection of their own private sphere and to invest in the public sphere.

http://www.t-t-trackers.net/

9. visipisi

This page will tell you what websites you have recently visited! In other words, it will partially access your browser's history without your permission. This implies that any website on the Internet can do this. I got the inspiration from Michal's code after reading about it on HN.

http://oxplot.github.com/visipisi/visipisi.html

10. Twitter is tracking you on the web

In a blog post today announcing Twitter's new tailored suggestions system is something that has left me shocked: an overt admission by Twitter that it is transparently tracking your movements around the web. Othman Laraki, on the Twitter blog:

“These tailored suggestions are based on accounts followed by other Twitter users and visits to websites in the Twitter ecosystem. We receive visit information when sites have integrated Twitter buttons or widgets, similar to what many other web companies — including LinkedIn, Facebook and YouTube — do when they’re integrated into websites. By recognizing which accounts are frequently followed by people who visit popular sites, we can recommend those accounts to others who have visited those sites within the last ten days.”

http://dcurt.is/twitter-is-tracking-you-on-the-web

11. Panopticlick (EFFR)

Is your browser configuration rare or unique? If so, web sites may be able to track you, even if you limit or disable cookies. Panopticlick tests your browser to see how unique it is based on the information it will share with sites it visits. Click below and you will be given a uniqueness score, letting you see how easily identifiable you might be as you surf the web.

https://panopticlick.eff.org/

12. "Do Not Track" standards for the Web

W3C Standards: http://www.w3.org/QA/2011/09/do_not_track_standards_for_the.html

Mozilla Firefox offers a Do Not Track feature that lets you express a preference not to be tracked by websites. When the feature is enabled, Firefox will tell advertising networks and other websites and applications that you want to opt-out of tracking for purposes like behavioral advertising.

http://dnt.mozilla.org/

Zur Benutzung der Cookie-Suche geben Sie einfach eine Webadresse (URL) in das Eingabefeld ein. Die Cookiesuchmaschine besucht die angegebene Adresse und versucht auszuwerten, welche Cookies beim Besuch gesetzt werden. Die Auswertung der eingegebenen Adresse kann mehrere Sekunden dauern. Die Darstellung der Ergebnisse erfolgt zweigeteilt in sogenannte Erstanbieter-Cookies und Drittanbieter-Cookies.

Die Suche kann mittels unterschiedlicher Browser-Simulationen durchgeführt werden. Hierzu können Sie den gewünschten Browser wählen, und ob dieser eventuell vorhandenes Javascript auf der Zieladresse auswertet.

Auf Basis von: Java, HtmlUnit, Tomcat

Die Cookiesuchmaschine verwendet den Einsatz von Session-Cookies, um die evtl. gemachten Einstellungen bzgl. Browser und Javascript für die Dauer des Besuchs zu speichern. Nach Ablauf ihrer Sitzung werden diese Cookies automatisch von ihrem Browser gelöscht.

English: This cookie search engine searches which cookies are set when visiting a particular website. Enter a URL and hit ‘cookies suchen’

http://b-versio.verbraucher-sicher-online.de/jcookie/index.jsp

14. FourthParty

FourthParty is an open-source platform for measuring dynamic web content.

We implemented FourthParty as an extension to Mozilla Firefox. It currently instruments the browser APIs for HTTP traffic, DOM windows, cookies, and resource loads. Fourth- Party also instruments JavaScript API calls on the window, navigator, and screen objects using getters, setters, and ECMAScript proxies [18]. All events are logged to a SQLite database.

Mayer, Jonathan R., and John C. Mitchell. 2012. “Third-party Web Tracking: Policy and Technology.” In Security and Privacy (SP), 2012 IEEE Symposium On, 413–427. http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=6234427.

http://fourthparty.info/

15. Track the Trackers (DMI)

The cloud seems to be a buzz word; what it refers to could be difficult to grasp. This project aims to make (some parts) of the cloud tangible. The project focuses on devices that track users online and show their encounters with the cloud, both those that require active participation of the user (through Widgets) and those encounters that are automated (through tags, web bugs, pixels and beacons). For this purpose we have re-purposed Ghostery, a browser plugin that informs users which companies are present on websites they visit to build a custom tool for tracker detection. We focus on automated tracking devices, that operate as default setting once a user requests a website and widgets, including social buttons, which require user action to set further data transmission in motion. We use a wide definition of “tracker”, including a number of devices that allow for user-data collection, such as internal tracking devices, bugs, widgets, external analytic services and further interfaces to the cloud.

The newly developed tool also allows us to create connections among websites, defining relations based on their connection to the same tracking devices, giving insight into the fluidity of content. In short, by repurposing the Ghostery tool we are able to characterize different collections of URLs. We are further interested to study tracking ecologies in a number of URL collections, issue spaces or web spheres, to see if there are specific trackers at work in particular countries, whether data-protective countries or web spheres deploy less tracking devices and whether countries like Iran use trackers from major US corporations. On top of that we are interested in which trackers are at work in the news sphere, in specific issue spaces, such as health/addiction sites, adults' and childrens' sites, privacy-concerned sites and technology blogs.

The wider aim of the project is to contribute to explicate and make more concrete the more abstract claims of ongoing data-veillance in the back-end by providing detailed insights in the ecology and economy of tracking.

https://wiki.digitalmethods.net/Dmi/DmiWinterSchool2012TrackingTheTrackers
DMI Projects using the Track the Trackers tool:
Visualizing Facebook’s Alternative Fabric of the Web

On March 9, Carolin Gerlitz and I presented our paper Reworking the fabric of the web: The Like economy at the Unlike Us conference in Amsterdam. We showed the outcome of some empirical work, building on a previous Winterschool project with the Digital Methods Initiative called Track the Trackers. For Unlike Us we visualized the relative presence of Facebook trackers in the top 1000 Alexa as a way to make visible the alternative fabric of the web Facebook is creating. More information about the tool and method to create these maps can be found on the Tracker Tracker tool wiki page.

http://www.annehelmond.nl/2012/03/12/visualizing-facebooks-alternative-fabric-of-the-web/

Track the Trackers and Watch the Watchers

The two tools have a similar intention but they differ in their approach. Track the Trackers allows you to characterize a predefined set of URLs while Collusion allows you to monitor the trackers involved in your individual browsing behavior.

http://www.annehelmond.nl/2012/02/29/track-the-trackers-and-watch-the-watchers/
Trackers on Dutch political websites

Trackers on Dutch political websites by Anne Helmond. Het gebruik van trackers als beacons, cookies, plugins, widgets en analytics op de websites van Nederlandse politieke partijen in kaart gebracht.

http://www.annehelmond.nl/2012/06/11/trackers-gebruikt-op-de-websites-van-nederlandse-politieke-partijen-in-kaart-gebracht/

Exercise: Visualizing the trackers in a specific sourceset

Methodology:
  1. Collect a set of sources or choose from these predefined URL lists.
  2. Enter links in the Track Tracker tool (max 100 links per batch): https://tools.digitalmethods.net/beta/trackerTracker/
  3. Settings for frontpage only: only look at specified pages.
  4. Save all files.
  5. Open .gefx in Gephi
Steps for Gephi
  1. Open as UNdirected graph
  2. Force Atlas 2, LinLog mode: yes, Prevent Overlap: yes
  3. Ranking > Nodes > Degree > Size/Weight (Red Diamond), Min size: 3 Max size: 30 (you can play with these settings)
    If this does not work due to a Gephi bug, try the following: calculate "average weighted degree" in the statistics panel and then use "weighted degree" to rank the nodes
  4. In Data Laboratory: Create a boolean column from regex match > Label.
    Title: FB Regular expression: .*[F|f]acebook.*
    This will add a new column that will detect all labels with Facebook or facebook in it. These will then appear in Partition > Nodes so that you can only color all detected FB, or select them in Data Laboratory
  5. Color. Go to Data Laboratory > Select the two Facebook coded nodes > Right Click - Edit all nodes > Color > RGB > 0,51,255
    All other nodes: 204,204,204
    Google Green: 49,182,57
    Twitter blue: 64,153,255
    Facebook blue 59,89,82 (not bright enough for map/beamer)
  6. Or, instead of step 4 and 5 you can select nodes in the Data Laboratory and Right Click them and edit the node color
  7. In Preview: Default Straight, Show Edges, Color: source

Literature

Arnold Roosendaal’s paper ‘Facebook Tracks and Traces Everyone: Like This!’ on Social Science Research Network

Gerlitz, Carolin, and Anne Helmond. 2013. “The Like Economy: Social Buttons and the Data-intensive Web.” New Media & Society (February 4). doi:10.1177/1461444812472322. http://nms.sagepub.com/content/early/2013/02/03/1461444812472322.

Anne Helmond (NL) and Carolin Gerlitz (UK) - Reworking the fabric of the web: The Like economy from network cultures on Vimeo.

Slides from the workshop

This topic: Dmi > WorkshopTrackingtheTrackers
Topic revision: 20 Jun 2013, AnneHelmond
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki? Send feedback