Digital Methods Summer School 2009: Studying the Web with the Web?

2011 | 2010 | 2009 | 2008 | 2007

DMI Summer School

Device keywords: Twitter, Wikipedia, Google and the Internet Archive
Issue keywords: Iran election, Climate change, Human rights, Early blogosphere

We look at Google results and see society, instead of Google. That is a shorthand way of saying that we see hierarchies of institutions and of issues in the ranked lists that are returned after our queries. Type this query into (.nl, and you are returned not just top Websites in the Netherlands but also significant societal actors.

But the question that is often asked is, where does social research end, and Google studies begin? Isn’t it Google that determines the rankings? Surely Google has more to do with the hierarchies than societal dynamics. Can Google ever be removed from the picture when one is using it to perform research? These questions are of course classic ones more generally about the possibility of ever being able to isolate phenomena dependent on a context for them to exist. However, this question should also be put to Web studies more generally. Can one only study the Web with the Web?

The research projects presented are attempts at a Web studies, where the devices themselves are a part of the analysis, and one is always aware of the significance of the question of where Google studies ends, and social research with Google begins. It is a question and approach that also may be applied to other devices as Twitter, Wikipedia and the Internet archive.

(1) Twitter studies: De-banalizing tweets

What do the contents of Twitter tell us about what is happening on the ground and online in Iran? As media attention has faded in the aftermath of the election crisis there, do the tweets continue unabated? Are they mainly banal or highly substantive, when filtered? In the project “#iranelection RT”, Twitter is made into a machine that provides accounts of the situation in Iran, including eyewitness reports of the protests and beatings. There are acts of solidarity, warnings and cries for help.

During “Iran’s Twitter Revolution” (as described in a piece in The Nation), some Twitter users, in an act of solidarity with a defeated candidate, tinted their profile pictures green (Berman, 2009). The question is whether other media spaces were colored green, too, or more to the point: How writerly are the different media spaces, from the news and the blogosphere to the platforms as YouTube, Flickr and Facebook? Another aspect of “Iran’s Twitter Revolution” concerns the online platforms that were accessible or inaccessible in Iran, as followed by Internet censorship researchers. Whilst press and other accounts of the crisis often discussed the impact of the software and its users, we found that the three platforms mentioned most in the news about the Iran elections (Twitter, Facebook and YouTube) were not accessible in Iran.

(2) Wikipedia studies: Identifying societal controversies

Wikipedia, the collaboratively authored online encyclopedia, would not normally be thought of as a site for the analysis of societal controversies. As an encyclopedia, with a dedicated policy of “neutral point of view,” Wikipedia could be described as a controversy defusing device, where controversial matters are sent to discussion pages, and controversial articles are ‘forked’, or split into two subject matters, e.g., the article on climate change and a forked article called scientific opinion on climate change. With multiple forkings, a subject matter begins to develop a ecology, accessible to the user through hyperlinks. How may one analyse the links and show such an ecology? How may it be put to use? One means of using the results of controversy analysis in Wikipedia is to identify the scope as well as the heatedness of matters of concern. Another way of study controversy with Wikipedia is to show which articles have current issues or problems, i.e., carry a banner (or template) indicating that the article is a possible hoax, or that it represents a particular partisan view. There are hundreds and hundreds of such warnings, and, in the project, they are put to use for the identification of particular controversies. Put differently, can Wikipedia lead us to current societal controversies, and also show whether and when they are still open or have reached closure?

(3) Google studies: Reinterpreting engine results

What kinds of findings may be made by reinterpreting search engine results, especially the rankings of sites for particular queries? What kinds of findings can be made by comparing results across the many ‘local’ versions of Google, such as the new Palestinian one, In the project “Rights Types”, we entered the word rights in various languages into the local Googles, in order to obtain hierarchies of rights types per country. Are there distinctive rights that rise to the top in Finland, the Netherlands, France, Italy, Switzerland, Germany, Austria, Sweden, Russia, Japan, Canada, the U.K., Australia, Philippines, Ivory Coast? Everyman’s rights in Finland, prostitutes’ rights in the Netherlands, computer programmers’ rights in Japan, the rights of athletes in doping cases in Austria – countries could be said to have distinctive concerns, compared to other countries, as read from Google results. Additionally, we looked into human rights more specifically across the various Google versions, and asked whether the results returned local pages or more global ones. Which countries have well-developed content providers for human rights issues, and which rely on non-local, perhaps establishment sources? Can Google be made to show which human rights actors are dominant information providers per country? One of findings is that countries lacking a local human rights sphere, such as Algeria and Lebanon, default to a ‘global’ human rights sphere. The second finding is that there are multiple human rights source spheres. Important information sources shared in the Middle East are different from Western Europe.

(4) Internet archive studies: Conjuring a past state of the Web

At the Internet archive (, the user types in a URL, and receives a list of dates when that URL has been saved. Click an archived page, and surf historical pages and their links. Of interest is how the Internet archive handles time, because it cannot present all sites from the same day. Indeed, the Internet archive “jump-cuts” through time, loading the page closest in time to the one you have surfed away from. If a page is unavailable, it connects to the live Web. In all, the archive interface is built for single URL searches, and browsing. There have been special collections created, however, where relevant URLs around an event (U.S. elections, September 11) are stored, and the collections are searchable. In the U.S. context there are approximately 25 special collections, most of which have been created through a technique called “Web sphere” analysis, which is largely an editorial approach to building URL lists. Special software is then used to capture the URL, and also to annotate it. It is largely an expert undertaking, and building a special collection requires resources. We would like to contribute to historical Web research first by noting the two historiographical approaches available to date, the study of the history of a single site (in a kind of autobiographical approach) and the history of an event (“event-based” historiography). Is it also possible to study particular periods of history as well? In a case study, the early blogosphere was analyzed, using the Eatonweb portal page from 15 August 2000, as the starting point. All pages linked from Eatonweb were captured, and the dates of the archived pages were analyzed so as to create incremental "snapshots," whereby the highest percentage of archived blogs were included in each date range. We thereby conjured a past state of the Web, i.e., the early blogosphere. For the historian, of special interest, too, is the amount of material available for a particular period, or past state of the early blogosphere. We found that some 80% of the early blogosphere (as defined by the Eastonweb) is archived, with pages ranging from 1996 to 2001 (and later). The relative importance of pages (including the missing ones) can be analyzed through hyperlink analysis, showing (on cluster graphs) each site's presence in the early blogosphere. In all, to single site histories and event-based special collections, the work shown here adds a historiographical approach - periods, or "past states of the Web". It also contributes a means to collect them and analyze them both in terms of the period's relative completeness, but also in terms of the relevance of each site (including the missing ones) within the collection.


The Digital Methods Summer School is dedicated to learning and developing research techniques for studying societal conditions and cultural change with the Internet.

The Digital Methods Summer School, held at Media Studies, University of Amsterdam, is directed by Richard Rogers, and session facilitators include Sabine Niederer, Michael Stevenson and Esther Weltevrede. It is open to New Media staff and graduates, PhD candidates and highly motivated undergraduate students. Participants: Vera Bekema, Anat Ben-David, Erik Borra, Liliana Bounegru, Chris Castiglione, Marieke van Dijk, Martin Feuz, Andrea Fiore, Carolin Gerlitz, Anne Helmond, Niels Kerssens, Marijn Koolen, Simon Marschall, Koen Martens, Bram Nijhof, Kimberley Spreeuwenberg, Elena Tiis, Tjerk Timan, Laura van der Vlies, Marijn de Vries Hoogerwerff and Veysel Yuce.

DMI Summer School 2009


The Digital Methods Initiative (DMI) is organizing its 3rd Summer School, an intensive program where we learn and develop research techniques for studying societal conditions and cultural change with the Internet. The DMI Summer School is open to PhD candidates, motivated M.A.’s as well as advanced B.A. students. The Summer School meets physically Mondays and Fridays, and runs from 29 June to 21 August. There is a long Summer holiday period where there are no meetings (18 July to 15 August).

During the DMI Summer School participants will actively engage in empirical research projects, employing Web-specific software tools, such as scrapers and crawlers. The Summer School has four units: old and new media formats that organize attention; Wikipedia as social controversy indicator; Google as national Web-maker; and Conjuring a past state of the Web with the Internet Archive. The Summer School concludes with a final event where the four research projects are presented.

The DMI Summer School is a component of the Digital Methods Initiative (DMI), directed by Richard Rogers, Chair, New Media & Digital Culture, Media Studies, University of Amsterdam, and supported by the Mondriaan Interregeling.

A list of Summer School participants can be found here.

Summer School Sessions 2009

The Issue Day as Media Attention Format, 29 June – 3 July

Sessions facilitated by Richard Rogers

Non-governmental and inter-governmental organizations strive to organize attention for social issues in a variety of ways, including through celebrity endorsement as well as ‘calendrical work.’ Great effort is made by especially non-governmental organizations to have one or more of their issues placed onto official issue calendars, the most well known of which is the U.N.’s, though there are national governmental and other issue calendars, too. To take the U.N.’s at beginning of the month of December, for example, World AIDS Day is on the 1st of December, the International Day for the Abolition of Slavery is on the 2nd, the International Day of Disabled Persons is on the 3rd, the International Day Against Corruption is the 9th, and Human Rights Day is on the 10th. Having an issue day is a particular media format that draws attention to it for at least one day of the year. The question concerns the kind of attention organized by an issue day, compared to that of other attention formats, both old and new media-style.

Please see

The Issue Day As Media attention Format's project page.

Opening session slides,

Required Readings

Wikipedia as a Space of Controversy, 6 - 10 July

Sessions facilitated by Sabine Niederer

The Wikipedia project has been the source of musings about wisdom of crowds and collaborative knowledge, as well as the subject of criticism about the demise of the expert and reliability of encyclopedic knowledge. Wikipedia is a topic of controversy, but on another level, is also a place where controversies are born, logged and archived.

Wikipedia has a strict protocol of editing and upkeep, making use of technical tools and content agents to deal with vandalism and dubious content. A next step in Wikipedia controversy research could put Wikipedia to use as a space for societal controversy diagnostics.

Looking at Wikipedia, what can be said about the state of a controversy? What are controversy indicators and measuring instruments on Wikipedia? What are new methods of controversy mapping, when looking at the aspects of technicity of Wikipedia content, such as the Wikipedia editing protocol, activity stats, discussion pages, ranking (through WikiRank), language Wikipedias and recent changes feeds. In this unit we will identify these controversy indicators, and will try to develop methods for identifying and mapping controversies through Wikipedia.

Wikipedia as a Space of Controversy Projects Page

Opening session slides: Niederer_WikipediaDMIsummer09.pdf

Required Readings

  • Halavais, Alex and Dereck Lackaff (2008). “An Analysis of Topical Coverage of Wikipedia,” Journal of Computer-Mediated Communication 13: 429–440.
  • Kittur, Aniket and Robert E. Kraut (2008). “Harnessing the Wisdom of Crowds in Wikipedia: Quality through Coordination” Proceedings of the ACM 2008 Conference on Computer Supported Cooperative Work, ACM: New York, USA, pp. 37-46.
  • Niederer, Sabine (2009). “Wikipedia and the Composition of the Crowd,” unpublished ms.
  • Reagle, Joseph M. (2008). In Good Faith: Wikipedia and the Pursuit of the Universal Encyclopedia, Dissertation New York University, pp. 70-102.
  • Venturini, Tomasso (2009). "Diving in Magma: How to Explore Controversies with Actor-Network Theory," forthcoming in Public Understanding of Science.

The Nationality of Issues. Repurposing Google for Internet Research, 13 – 17 July

Sessions facilitated by Esther Weltevrede

Since 2000 the idea of the Internet as universal medium has been met with the ‘national turn,’ where the Web arguably has become pluralised; ‘Webs’ increasingly manifest themselves nationally. This turn is witnessed and theorized through different approaches, including language (Lovink 2009), communication flows (Halavais 1998) and the distribution of the adoption of users (Shirky 2008). One also may view the nationalizing of content in light of legislation (e.g., intellectual property), particular state policies (e.g., Internet censorship) and local sensibilities about what counts as good results (e.g., Another view on the national turn is how devices serve content tailored to a location. Here, the focus is on the national Webs arranged by Google, largely through IP-to-geo technology. A device-centric approach to localized Google services draws attention to which Google services make use of IP-to-geo (e.g. Google, Google Books, Google Trends, Google News) and how each employs IP-to-geo more specifically. How is Google organizing the Web nationally? Does location overdetermine the results presented by Google? What are the effects of localized Google services on information cultures? In order to answer these questions, we propose to compare engine results for queries across national Webs, as organized by Google. Are particular sources privileged in particular countries? Do such privileged source sets tell us more about a country’s information culture, or about Google’s means of delivering content nationally?

Introduction to the required readings:

How local or global is an issue? The location of an issue can be studied in a number of ways. In his book News Analysis, Teun van Dijk made a geography of news by critically studying the international coverage of national news in the press (1988). One might also think about the location of an issue from a web perspective, for instance in terms of where an issue is happening and where it is discussed (Marres & Rogers 2008). This approach uses the web as a site of research for far more than just online culture, but rather how to study cultural and societal conditions with the web. How can Google be repurposed as a research tool to map the nationality of issues from a web perspective? Siva Vaidhyanathan introduces some issues to take into account when repurposing Google for national web studies.


Opening session slides

Project: Nationality of Issues

Project: BorderedSources

Required Readings

Google and Geo-IP services: Advanced search howto's

    1. Search Trends 2007 (54 mb)
    2. RSS
    3. Hacking with Google
    4. Forensic Surfing
    5. Now Find That Hidden Web
    6. Google's Spam Guide
  • Halavais, A. M. C. (2008). Search Engine Society. Blackwell Publishers (excerpt).
  • Jeanneney, Jean Noël and Teresa Lavender Fagan (2007). Google and the Myth of Universal Knowledge. A View from Europe. University of Chicago Press (excerpt).
  • Reviews of Google books:
Search engine monitoring & internet censorship research National webs
  • Goldsmith, J. L., and T. Wu (2006). Who Controls the Internet? Illusions of a Borderless World. Oxford University Press.
  • Halavais, A. M. C. (2000). “National Borders on the World Wide Web.” New Media & Society.
  • Lovink, Geert (2009). “Internet, Globalization and the Politics of Language .” Pre-publication.
  • Weltevrede, Esther. Thinking Nationally With the Web. A Medium-specific Approach to the National Turn in Web Archiving. Master Thesis, Media Studies University of Amsterdam, 2009.
Google & Search Blogs

Digital Methods for the Internet Archive, 17 – 21 August

Sessions facilitated by Michael Stevenson

How might a Digital Methods approach and related tools be deployed in the work of the historian of Web documents, specifically those collected by the Internet Archive? As Web archivists have stressed, the website exhibits characteristics of both 'live' and 'permanent' media, leading to difficulties in the archiving process (Foot and Schneider, 2004; Brugger, 2009). At the same time, the presence of natively digital forms of description and organization (e.g. URLs, meta-tags and hyperlinks) may significantly affect the manner in which the archive is navigated, as well as the process by which the historian selects and analyzes data. Put differently, the problems of archive inconsistency and incompleteness have been exacerbated on the Web, but the specificity of the Internet Archive and the documents it has collected may also make innovative methods possible.

How does archive work differ online? An historical project is typically begun by identifying domains of interest (e.g. newspapers and policy documents), and querying the relevant library databases for a particular keyword. Using the Internet Archive's Wayback machine, one is asked to enter a URL. In archive research, the reconstruction of context is a matter of following implicit links. An archived source will ideally 'suggest' the next query - some name, concept or event to be followed up on in subsequent visits. With the Internet Archive, such traces may be explicit; moreover, one may take a cue from digital ranking algorithms (such as those used by search engines) and use hyperlinks to indicate reputation.

In these sessions, the focus is on the development of blogging in the period 1994-1999. In addition to expanding an initial corpus of archived blogs (retrieved via a directory created by Brigitte Eaton in 1999), questions include how the medium was defined socially and technically by users, as well as how it was represented at the time in other domains (such as newspaper and magazine articles). The aim is to develop methods for locating these definitions, in forms ranging from keywords and 'About' sections to manifestos and magazine features, and for gauging their relative currency or reputation among the early blogging community.

Project Pages

Profiling the Archived Blogosphere


Wayback Web Collections

Early Blog Features

Required Readings

  • Blood, Rebecca (2002). “Introduction.” We’ve Got Blog: How Weblogs Are Changing Our Culture. Ed. John Rodzvilla. Cambridge, MA: Perseus Publishing. ix-xiii.
  • --- (2000). “Weblogs: A History and Perspective.” We’ve Got Blog: How Weblogs Are Changing Our Culture. Ed. John Rodzvilla. Cambridge, MA: Perseus Publishing, pp. 7-16.
  • Brügger, Niels (2009). “Website History and the Website as an Object of Study.” New Media & Society 11.1-2: 115.
  • Clark, Joe (2002). “Deconstructing ‘You’ve Got Blog’.” We’ve Got Blog: How Weblogs Are Changing Our Culture. Ed. John Rodzvilla. Cambridge, MA: Perseus Publishing, pp. 57-68.
  • Mead, Rebecca (2000). “You’ve Got Blog.” We’ve Got Blog: How Weblogs Are Changing Our Culture. Ed. John Rodzvilla. Cambridge, MA: Perseus Publishing, pp. 47-56.
  • Powazek, Derek M (2000). “What the Hell Is a Weblog and Why Won’t They Leave Me Alone?.” We’ve Got Blog: How Weblogs Are Changing Our Culture. Ed. John Rodzvilla. Cambridge, MA: Perseus Publishing, pp. 3-6.
  • Schneider, Steven, and Kirsten Foot. “The web as an object of study.” New Media and Society 6.1 (2004): 114-122.
  • Stevenson, Michael (2009). "Of the Web: Early Reflections on Blogging." Unpublished ms.

Technical Facilities

Scripts and Tools

A list of and DMI tools can be found in the tools section here. You can also take a look at DMI Methods or start from the homepage to see what we have done previously.

Shared Folder

Share info
Topic revision: r37 - 29 Oct 2012, ErikBorra
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki? Send feedback