this page is a draft!!!
The 'raw data' format file is actually just a database dump of the particular network - some uninteresing fields are left out. All fields are printed comma separated in a textfile. On this page you will see a description of all headings/fields in this file.
im_network
Provides a description of the network
| Field |
Description |
| id |
The network id |
| schedule_id |
Id of the schedule which generated this map. |
| schedule_index |
Chronological position of the network within the series (0 = the initial map, not actually produced by the scheduler) |
| crawl_queued |
Time at which the request to crawl this network was sent |
| crawl_start |
Time at which the crawl of this network started |
| crawl_end |
Time at which the crawl of this network finished |
| crawl_timeouts |
Number of timeouts |
| page_downloads |
Number of pages downloaded during the crawl |
| excluded_pages |
Number of pages excluded during the crawl |
| num_starting_points |
Total number of starting points |
| [starting_point_privilege |
Currently expected values are 0 (do not privilege startingpoints) or 2 (privilege startingpoints). |
| iterations |
Number of iterations of the algorithm. Expected values are 1, 2 or 3. |
| depth |
Depth to which each site is crawled. |
| co_link_analysis |
Type of co-link analysis: 1 = by site; 2 = by page |
| exclusion_list |
List of sites to exclude. XML. |
| title |
Title of the network |
| minimum_diversity |
Minimum number of domain categories the network must contain |
| required_authority |
Number of inward links a node must receive to be included in the network |
im_site
Provides a description of the host.
| Column |
Description |
| id |
site_id |
| url |
URL to be linked to when the map is rendered, usally the homepage. |
| host |
the host of the url |
| name |
Name of the website or organisation |
| category |
e.g. gov/com/org, international/national |
| authority |
Number of inward links the site receives from the network |
| knowledge |
Number of links from this site to other sites in the network |
| in_network |
1 = in the network 0 = an External Site (not in the network, but part of the set of nodes which generates the network) |
im_page
Provides a description of a deep-link (aka page).
| Column |
Description |
| id |
page_id |
| site_id |
the id of the site/host this page belongs - refers to #im_site |
| url |
the full url |
| date_stamp |
the date of retrieval of this link |
im_link
Provides a description of the links between pages.