Downloading A Website from The Internet Archive

Ahmed Musaad
Ahmed Musaad

I lost a few websites recently, something went wrong and in a blink of an eye, we no longer had access to the websites or any backups we made of them. I dug every possible offline backup I could find on the hope I stashed an old backup somewhere within the dust-covered HDDs but found nothing.

via GIPHY

My last hope was the Internet Archive Wayback Machine, with shaky hands and tightened chest I navigated to the website and looked up the addresses of the websites we lost. To my relief, they were all cached, not the most up-to-date versions but better than nothing.

Now that I know some data is available, I can start rebuilding the websites from the grounds up and populating the content from the cached pages. That being said, navigating through websites on the Wayback website is nerve-wrecking. I needed a much faster (and more offline) way to access these cached copies of our websites. Hartator's Wayback Machine Downloader to the rescue.

The tool is brilliant. It allows you to download an entire website from the Internet Archive Wayback Machine hence removing the need to use their web interface to browse the cached copies which is exactly what I have been looking for.

Before proceeding, ensure you have Ruby (>= 1.9.2) installed on your machine.

git clone https://github.com/hartator/wayback-machine-downloader
gem install wayback_machine_downloader
cd wayback-machine-downloader/bin

To download a website, run the following command:

./wayback_machine_downloader https://<website address> --concurrency 20

That's it, you have a local copy of the cached website. Use it as you like and don't forget to support the Internet Archive who made this possible.



Great! Next, complete checkout for full access to Ahmed Musaad
Welcome back! You've successfully signed in
You've successfully subscribed to Ahmed Musaad
Success! Your account is fully activated, you now have access to all content
Success! Your billing info has been updated
Your billing was not updated