Extract URLs from text, source code or search engine results. Produces a clean list of URLs.
Input text in the harvester to extract URLs.
Tip: On a website, view source. Copy and paste source code into harvester in order to extract the URLs (or embedded links).
Tip: To harvest the results of a Google query open it in Firefox, select the results you want to rip the links from, right-click the selection and click 'View Selection
Source'. Now paste this into the harvester. To extract only the URLs from the results, choose the setting 'only return uniques' as well as 'Exclude URLs from Google and Youtube '. To extract only the hosts from the results, choose the previous two as well as 'only return hosts'. Note that in its search results Google also includes links to a site's categories etc. If you would only like to extract the links to the specific search results, you can better use the Google Scraper
, leaving the top URL box empty.
This tool will only recognize hyperlinks which start with http:// or https:// or www. You might also try the Link Ripper Tool
which extracts the hyperlinks (href) from a set of URLs.
Dmi Summer 2011 Spanish Revolution
Project: Extract URLs from the Daily Kos blogroll
- Go to dailykos.com
- View page source (in Firefox, choose View>Page Source or press ctrl-u)
- In the page source, find the relevant text under blogroll
- Copy and paste into the Harvester, outputting a list of URLs ready for further analysis, e.g. using the Issuecrawler
Spanish Revolution Team Members * Alex, Diana S, Demet, Orsi Research Question Spanish revolution: comparing the mediascape of commercial social media (twitt...
Summer School 2015 Digital Methods App Analysis
Digital Methods for App Analysis: Mapping App Ecologies in the Google Play Store Team Members Michael Dieter, Stefanie Duguay, Carolin Gerlitz, Lisa Han, Anne He...
Summer School 2019 Botsandtheblackmarket
Bots and the black market of social media engagement Team Members Lead: Janna Joceli Omena, Jason Chao Elena Pilipets. Participants: Bence Kollanyi, Bruno Zil...