PersonalizedSearch < Dmi

You are here: Foswiki>Dmi Web>PersonalizedSearch (25 Aug 2008, ErikBorra)Edit Attach

Personalized Search

Experiment Protocol

Purpose

Nicholas Negroponte is talking in The Daily Me (1995) about the decline of the shared experience and the decline of a media generated society or imagined community [B. Anderson]. This 'shared experience' means that if you all read the same media, you all live the same experience. Because of more personalization on the Web, the online experience is no longer shared. Does personalized search contribute to the end of this shared experience? This eventually leads to a fragmented society, with individuals consuming highly personalized items.
Drawing from these thoughts, one can look the same at search engine returns. Results are increasingly meant to be familiar. The question is, do search engines output more and more results that are familiar and thus better? Important is to describe the term familiar. Does this word mean that people receive certain expected findings? Or does Google return results that are familiar to a certain group or query? An example is the fact that Google is often returning Wikipedia as the first search result for a query. Do users expect this result now? And is it expected that Wikipedia will decline when the popularity drops? Is personalization a return to old search engine logic? Do search engines favor the information culture over the knowledge culture.

A thought or assumption that springs from these questions is whether or not Google is returning to the original state of search, because of their personalized search. Google as search engine was 'special' with their form of ranking, because it did not show results that people liked to see but they worked with a 'fair' system. Is Google now changing their ways to make search results more relevant to users?

Research questions:

How may we characterize the impact of personalized search on google search results?
Will results for the same query made by different individuals result in highly personalized results?
Or does personalized search have little impact on search engine results?
Perhaps personalized search is a service that influences the ads far more than the search results.

We would like to conduct an experiment in order to determine the effects of "personalization" on search engine results. In order to do so, we would like to follow the general guidelines of an experimental procedure.

1) Create a Google user account, and regularly launch a series of queries revolving around a particular subject area. Click particular results (see below for specificity).

2) On another machine, regularly launch the same series of queries as above. (This is the control machine.)

3) Track the difference in ranking of selected Web pages, i.e., those 'clicked' in the personalized search, and those same pages in the regular google results (on the control machine).

Experiment Description

According to google.com/psearch, personalized search returns results are "based on the things you've searched for on Google and the sites you've visited."

a) web search history

b) visited search results

The level of "personalization" of the results must be built up, as your personalized Google is trained. Google: "You might not notice a big impact on your search results early on, but they should steadily improve over time the more you use Web History." Here the question concerns "improvement," as discussed in terms of 'familiarity' above. We are interested in whether the sites that we visited become the 'familiar' ones.

It is important to keep in mind that other factors may influence the search results. In order to avoid other variables affecting our results in the experiment, we should seek to neutralize any effects resulting from other means of tracking usage. Thus the control machine will:

not have the Googler toolbar installed

not engage in any Web history management

have no cookies at the outset, and will accept cookies at the start

will be based in the same geographical location as the personalized Google machine

will be queried at google.com with language set as en for English

will have preferences set at 100 results and moderate safe search on (which is the default filter)

Protocol I

This protocol is designed to isolate the impact of personalized search in a particular issue area: fur. If one visits only fur-friendly sites, do these sites rise in the results page for the query fur. If one visits only anti-fur sites, do these sites rise in the results page, to the detriment of the fur-friendly sites. That is, can we cleanse the results so that they become "one-sided"?

Query: fur

person A

wants only fur
move anti-fur out of fur space

person B
wants only anti-fur
move fur-friendly sites out of the fur space (replace them with anti-fur)

The experimental protocol should include a description of the frequency of queries and visits.

Protocol II

Whilst the first protocol concerned the feedback loops that may be created in one set of search engine results, here we would like to go a step further. Are we able to train a personalized Google to behave in a manner of an "ideological machine"?

Here we take a set of what may be termed evangelical key words, querying them repeatedly in personalized Google, as well as on the control machine. We would very much like Christian-friendly results to eventually predominate over any other type of results. Take for example the term, "faith." After querying personalized Google for a large set of evangelical key words, we would like the results for "faith" to deal with Christian faith, as opposed to, say any faith, and more specifically to recommended Christian music over George Michael's song, Faith.

3 mechanisms for researching google's feedback.

1. click on random result.
2. click on first result [logged in / with cookies]
3. click on last result.[logged out / without cookies]
--
[List of queries soon available]

In order to start the research, it is decided that we will try to create a christian profile. In order to understand the engine logic, tests need to be performed to show how, when and how fast results are being changed by specific seach selection procedures. A specific search selection procedure can for instance be the above mentioned; selecting the first, the last, random and none for a particular search query per system.

The method that is used to create the list of queries is as followed:

Copy sourcecode of 'Links to Various Evangelical Christian Resources' and extract clean urls using the Harvester

Chop urls to unique hosts using the Analyse tool

Run through Issuediscovery

Data cleaning of issuediscovery list, remove none specific terms

While formulating the list of queries, some questions/problems arose. Is the content of the results pages important?
Fur/anti-fur will be subject to web dynamics surrounding the issue, but will provide result with an extra layer or reseach data. Hot issues like the us election will provide interesting search results but are subject to the changing dynamics of the issue space on the web. Opposing queries will provide means to test between the different profiles.
This can be captured in the discussion concerning reagents/queries of personalization protocol on generic queries (ie. www, http, url) vs. opposing queries (man/woman, fur/anti-fur). Points to focus on are that generic queries are not subject to fluctuation in popularity on the Web. Also, search results for 'www' are stable results.

Procedure

0. determine whether logged in or logged off makes any, and how many, difference to personalization

1. prepare a list of queries

2. search for these queries in a clean browser

Three options possible:

3.1 search for first, last and one random result on three different machines

3.2 search on result number [begin with 50]

3.3 search by hostname list

3.3.1. query one term from the list

3.3.2. are search results in speficied hostname list (of christian websites)

3.3.3. select result excisting in list

3.3.4. track position of those hostnames

3.3.5. do they change and how

4. as a placebo, the same thing needs to be done but by a profile using random results and one query-only profile

Predictions

Before the actual research is accomplished, predictions can already be made.

Assumptions would be that the first Google rank result will move down when consistently selecting the last result of the list of queries. Also that certain terms such as "faith" will return more Christian site instead of for instance Hindu sites. When a profile is biased, with opposing terms, this would favour one side over the other.

Notes

This Firefox add-on tries to de-personalize search using the same strategy: SquiggleSR is a Firefox add-on which sends generated queries to search engines in order to deceive them and protect user’s privacy. RSS flow titles and search engines statistics are used to create coherent and personalized queries.

Topic revision: r5 - 25 Aug 2008, ErikBorra

Digital Methods

Course

Copyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki? Send feedback