Input URLs or text into the harvester and choose depth of search (example.com/depth1/depth2/depth3).
In the box you can enter URLs. After clicking submit all unique hosts of the URLs will be checked for robots.txt (e.g. http://www.bla.com/bla/bla/index.html
will be checked for http://www.bla.com/robots.txt
) and each unique URL will be checked for <meta name="robots" content="bla">. From those URLs the links are fetched and the process starts again for the specified depth.
Harvest the urls
while i < depth
for every url
get the host
if host/robots.txt exists //this is only checked if an url has a different host from the previous
else say it didn't exist
if robot meta tag exists //this is checked for every url
display meta tag
else say it didn't exit
get all the links of the url
Note: Not all frames are supported. For more information about robot exclusion protocols, please see http://en.wikipedia.org/wiki/Robots.txt