1556 shaares
120 private links
120 private links
3 results
tagged
crawler
The archive-crawler project is building Heritrix: a flexible, extensible, robust, and scalable web crawler capable of fetching, archiving, and analyzing the full diversity and breadth of internet-accesible content.
Every now and then, I find myself driven nuts by websites that insist on me clicking through tens of pages to do relatively simple things that I need to do on a regular basis, or websites which insist that I access their content interactively on their web page rather than being able to select some content and download it for viewing later.
HTTrack is a free (GPL, libre/free software) and easy-to-use offline browser utility