In this short workshop you will learn how to map the tracking ecology related to a set of websites using the DMI Tracker Tracker tool and Gephi. The Tracker Tracker tool was conceived at the Digital Methods Winterschool 2012 in January. It is build on top of the anti-tracking plugin www.ghostery.com and allows to identify the invisible web, devices that track user activities online and the services associated to them. In order to prepare for this workshop we recommend reading the related projects and materials listed below. Please download and install Gephi at https://gephi.org/ before the workshop starts so you can also learn how to visualize your results.
Related projects and materials
1. The Wall Street Journal - What They Know
The whole file: http://online.wsj.com/public/page/what-they-know-digital-privacy.html
Marketers are spying on Internet users -- observing and remembering people's clicks, and building and selling detailed dossiers of their activities and interests. The Wall Street Journal's What They Know series documents the new, cutting-edge uses of this Internet-tracking technology. The Journal analyzed the tracking files installed on people's computers by the 50 most popular U.S. websites, plus WSJ.com. The Journal also built an "exposure index" -- to determine the degree to which each site exposes visitors to monitoring -- by studying the tracking technologies they install and the privacy policies that guide their use.
http://blogs.wsj.com/wtk/The Tracking Ecosystem
Surfing the Web kickstarts a process that passes information about you and your interests to tracking companies and advertisers. See how it works.
http://graphicsweb.wsj.com/documents/divSlider/ecosystems100730.htmlTracking the Trackers: Our Method
To determine the prevalence of Internet tracking technologies, The Wall Street Journal analyzed the 50 most-visited U.S. websites, as ranked by the comScore Media Metrix report from October 2009.
http://online.wsj.com/article/SB10001424052748703977004575393121635952084.htmlWhat Facebook Apps Know
The Wall Street Journal analyzed 100 of the most used applications that connect to Facebook's social-networking platform to see what data they sought from people. See what permissions they ask users to grant them. How Grabby Are Your Facebook Apps? http://online.wsj.com/article/SB10001424052702303302504577328363924309098.html
2. DEFCON 19: Tracking the Trackers: How Our Browsing History Is Leaking into the Cloud (Video)
Speaker: Brian Kennish Founder of Disconnect. What companies and organizations are collecting our web-browsing activity? How complete is their data? Do they have personally-identifiable information? What do they do with the data?
The speaker, an ex--Google and DoubleClick engineer, will answer these questions by detailing the research he did for The Wall Street Journal (http://j.mp/tttwsj) and CNN (http://j.mp/tttcnn), talking about the crawler he built to collect reverse-tracking data, and launching a tool you can use to do your own research.
3. Mozilla’s Collusion. Discover who’s tracking you online (Browser add-on)
Collusion is an experimental add-on for Firefox and allows you to see all the third parties that are tracking your movements across the Web. It will show, in real time, how that data creates a spider-web of interaction between companies and other trackers.
4. Gary Kovacs: Tracking the trackers (Video)
As you surf the Web, information is being collected about you. Web tracking is not 100% evil -- personal data can make your browsing more efficient; cookies can help your favorite websites stay in business. But, says Gary Kovacs, it's your right to know what data is being collected about you and how it affects your online life. He unveils a Firefox add-on to do just that.
Gary Kovacs is the CEO of the Mozilla Corporation, where he directs the development of Firefox.
5. Tracking the trackers (The Guardian crowdsourcing project)
To capture on-line behavior, thousands of HTTP cookies are sent daily to web browsers to identify users and gather statistical knowledge about tastes and habits. The cookie consensus website hosts a collection of cookies that Andrea Fiore received while surfing through the first 50 entries of the Alexa directory of News sites. In the future it will also host a software that will give the users the capability to create their own cookie collections.
8. Track the Trackers (tactical media project)
TRACK-THE-TRACKERS---” is a network installation consisting of tactical media components. The work makes use of existing personal technologies in conjunction with the satellite GPS infrastructure to provide participants with an expanded audible (not a visual) experience of the proliferation of video surveillance in the urban public sphere. The internet platform http://www.t-t-trackers.net serves as an exchange point for these coordinates: the data gathered in this way is made available to the other participants by uploading it to the internet platform. The project prompts participants to think beyond the protection of their own private sphere and to invest in the public sphere.
This page will tell you what websites you have recently visited! In other words, it will partially access your browser's history without your permission. This implies that any website on the Internet can do this. I got the inspiration from Michal's code after reading about it on HN.
10. Twitter is tracking you on the web
In a blog post today announcing Twitter's new tailored suggestions system is something that has left me shocked: an overt admission by Twitter that it is transparently tracking your movements around the web. Othman Laraki, on the Twitter blog:
“These tailored suggestions are based on accounts followed by other Twitter users and visits to websites in the Twitter ecosystem. We receive visit information when sites have integrated Twitter buttons or widgets, similar to what many other web companies — including LinkedIn, Facebook and YouTube — do when they’re integrated into websites. By recognizing which accounts are frequently followed by people who visit popular sites, we can recommend those accounts to others who have visited those sites within the last ten days.”
11. Panopticlick (EFFR)
Is your browser configuration rare or unique? If so, web sites may be able to track you, even if you limit or disable cookies. Panopticlick tests your browser to see how unique it is based on the information it will share with sites it visits. Click below and you will be given a uniqueness score, letting you see how easily identifiable you might be as you surf the web.
Zur Benutzung der Cookie-Suche geben Sie einfach eine Webadresse (URL) in das Eingabefeld ein. Die Cookiesuchmaschine besucht die angegebene Adresse und versucht auszuwerten, welche Cookies beim Besuch gesetzt werden. Die Auswertung der eingegebenen Adresse kann mehrere Sekunden dauern. Die Darstellung der Ergebnisse erfolgt zweigeteilt in sogenannte Erstanbieter-Cookies und Drittanbieter-Cookies.
Auf Basis von: Java, HtmlUnit, Tomcat
English: This cookie search engine searches which cookies are set when visiting a particular website. Enter a URL and hit ‘cookies suchen’
FourthParty is an open-source platform for measuring dynamic web content.
Mayer, Jonathan R., and John C. Mitchell. 2012. “Third-party Web Tracking: Policy and Technology.” In Security and Privacy (SP), 2012 IEEE Symposium On, 413–427. http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=6234427.
15. Track the Trackers (DMI)
The cloud seems to be a buzz word; what it refers to could be difficult to grasp. This project aims to make (some parts) of the cloud tangible. The project focuses on devices that track users online and show their encounters with the cloud, both those that require active participation of the user (through Widgets) and those encounters that are automated (through tags, web bugs, pixels and beacons). For this purpose we have re-purposed Ghostery, a browser plugin that informs users which companies are present on websites they visit to build a custom tool for tracker detection. We focus on automated tracking devices, that operate as default setting once a user requests a website and widgets, including social buttons, which require user action to set further data transmission in motion. We use a wide definition of “tracker”, including a number of devices that allow for user-data collection, such as internal tracking devices, bugs, widgets, external analytic services and further interfaces to the cloud.
The newly developed tool also allows us to create connections among websites, defining relations based on their connection to the same tracking devices, giving insight into the fluidity of content. In short, by repurposing the Ghostery tool we are able to characterize different collections of URLs. We are further interested to study tracking ecologies in a number of URL collections, issue spaces or web spheres, to see if there are specific trackers at work in particular countries, whether data-protective countries or web spheres deploy less tracking devices and whether countries like Iran use trackers from major US corporations. On top of that we are interested in which trackers are at work in the news sphere, in specific issue spaces, such as health/addiction sites, adults' and childrens' sites, privacy-concerned sites and technology blogs.
The wider aim of the project is to contribute to explicate and make more concrete the more abstract claims of ongoing data-veillance in the back-end by providing detailed insights in the ecology and economy of tracking.
DMI Projects using the Track the Trackers tool:
Visualizing Facebook’s Alternative Fabric of the Web
Settings for frontpage only: only look at specified pages.
Save all files.
Open .gefx in Gephi
Steps for Gephi
Open as UNdirected graph
Force Atlas 2, LinLog mode: yes, Prevent Overlap: yes
Ranking > Nodes > Degree > Size/Weight (Red Diamond), Min size: 3 Max size: 30 (you can play with these settings) If this does not work due to a Gephi bug, try the following: calculate "average weighted degree" in the statistics panel and then use "weighted degree" to rank the nodes
In Data Laboratory: Create a boolean column from regex match > Label. Title: FB Regular expression: .*[F|f]acebook.* This will add a new column that will detect all labels with Facebook or facebook in it. These will then appear in Partition > Nodes so that you can only color all detected FB, or select them in Data Laboratory
Color. Go to Data Laboratory > Select the two Facebook coded nodes > Right Click - Edit all nodes > Color > RGB > 0,51,255 All other nodes: 204,204,204 Google Green: 49,182,57 Twitter blue: 64,153,255 Facebook blue 59,89,82 (not bright enough for map/beamer)
Or, instead of step 4 and 5 you can select nodes in the Data Laboratory and Right Click them and edit the node color
In Preview: Default Straight, Show Edges, Color: source