You are here: Foswiki>Dmi Web>ToolDatabase>ToolGoogleScraper (05 Feb 2010, RichardRogers)Edit Attach

Googlescraper (Search Engine Scraper)

Launch tool

Batch queries Google. Query the resonance of a particular term, or a series of terms, in a set of Websites.

Instructions

The Google Scraper has been deprecated and replaced by two new tools:

The Search Engine Scraper, if you want to scrape and analyze overall search results for a given query or set of queries.
The Lippmannian Device, if you want to analyze query results on a per-site basis.

Overview

The Googlescraper (also known as the Lippmannian Device) queries Google and makes the results available for further analysis. In the top text box, place the source set, in this case a list of URLs. In the bottom text box, place key words. Google will be asked if each keyword occurs in each URL. Results are displayed as a tag cloud and an html table. They also are written to a text file which you can access at the bottom or through previous results.

Harvester feature: In the top box, you may also place a combination of URLs and text, and the URLs will be fetched out of the text and queried for the key words placed in the bottom box. Detailed instructions of use and use cases are available.

To merge divided scrapes from the same project together, use the Lippmannian Merge tool.

Sample project

The Googlescraper can be used for a number of specific research projects, including censorship research, and source distance research. The most common use of the tool is researching the presence as well as the ranking of particular sources within Google engine results. A sample project is this tag cloud, which visually presents unique hosts from top 100 URLs returned from the query of

Course units using this tool

The Ordering Device
The Engine Source Distance Research (the significance of source ranking in search engine returns) Researching the presence as well as the ranking of particular s...

The Spheres
The Spheres Spheres as way of thinking about the Web Thinking of the Web in terms of spheres refers initially to the name of one of the most well known, the bl...

Other projects using this tool

Climate Change Skeptics
Introduction To what extent are climate change 'skeptics' present in the climate change spaces on the Web? The question is posed in order to gain insight into wh...

Dmi About
The Digital Methods Initiative About Us The Digital Methods Initiative (DMI) is one of Europe's leading Internet Studies research groups. Comprised of new media...

Dmi Protocols
Protocols devised by the DMI This page is being replaced gradually by our new research protocols and methods page. Hyperlink Analysis * Perfom an issue craw...

Firefox Tool Bar
DMI Tools firefox extension The DMI toolbar is a Firefox extension that provides extra functionality to the DMI tools. Currently it provides off loading of HTTP r...

Issue Image Analysis
Issue Animals Research With climate change, animals become endangered. Global warming as well as global cooling threatens the habitat of species, as animals migr...

Nofollow
Nofollow / Indexing Issues in the Blogosphere Introduction: Indexing and Ranking Search engine critiques generally focus on either the allocation of pages to be ...

Protocol Redistributed Content Discovery
1) Derive issue related sites known to be blocked in a country. 2) Query list of sites for a controversial subject matter or name in Google Scraper. Retain teaser...

Protocol Surfer Rerouting
1) Familiarize oneself with the content of a set of blocked Websites, e.g., women's issues sites blocked in China.2) Query Web for the key words or issue language...

Summer School 2007
Digital Methods Summer School 2007: New Objects of Study 2010 2009 2008 2007 How does one do research online? What are the new objects of study, and how do ...

Test Home
DMI Tools Digital Methods Project Overview FAQ Tag Cloud Introduction The Digital Methods Initiative is a contribution to doing research into the "nati...

Tool Google Scraper FAQ
Google Scraper FAQ What does the Google Scraper actually do? The Google Scraper is a piece of software which allows one to batch query Google. It allows a user t...

Tool Harvester How To
Input text in the harvester to extract URLs. Tip: On a website, view source. Copy and paste source code into harvester in order to extract the URLs (or embedded l...

Tool Issue Network Cloud How To
Enter the URL of an Issuecrawler xml file. The xml source file URL looks like this: http://www.issuecrawler.net/files/inm_316224.xml The xml source file URL is lo...

Tool Lippmannian Device Sample Project
The Lippmannian Device can be used for a number of specific research projects, including censorship research, and source distance research. The most common use of...

Tool Search Engine Scraper
Main.KoenMartens 05 Dec 2008

I	Attachment	Action	Size	Date	Who
png	google_scraper.png	manage	9 K	05 Dec 2008 - 14:41	AnneHelmond
png	syn_bio_venter.png	manage	15 K	12 Dec 2008 - 14:42	EstherWeltevrede
png	syn_bio_venter_def.png	manage	51 K	12 Dec 2008 - 15:15	EstherWeltevrede

Topic revision: r15 - 05 Feb 2010, RichardRogers

Digital Methods

Course

Copyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki? Send feedback