Is there any way to web-scrape a website that's down?

  • Python
  • Thread starter Eclair_de_XII
  • Start date
In summary, the conversation discusses the difficulty of finding archived pages for the tkinter objects on the effbot.org site. The speaker suggests trying the internet archive wayback machine and using a web-scraping script to find a working link. They also mention the availability of tkinter documentation on other sites and clarify that Flask is not a gui library. Through a simple search, they find a working mirror of the site. It is unclear if the mirror is a snapshot or a scrape from Wayback Machine.
  • #1
Eclair_de_XII
1,083
91
TL;DR Summary
I used to go to effbot.org for documentation on tkinter. But now it seems to be down. Sometimes I thought about writing a web-scraping script to record all the pages explaining the widgets and what-not of tkinter, but I'm wondering if that is even possible. I cannot even access the pages normally.
I tried Google-searching the site, and found several archive sites. Each archive site has archived the main site and page directory, yes. But every single archive site has seemed to fail to capture the pages on the tkinter objects. I confess that I had taken the site for granted. I'm aware of other tkinter documentation sites on the internet, and I am also aware that other GUI modules exist, like Flask; one user on here mentioned it to me once. All the same, I found effbot the most valuable for tkinter documentation.
 
Technology news on Phys.org
  • #2
Short answer no. If you can’t see it how can you scrape it.

There is another way though. Try the internet archive wayback machine. They may have taken a snapshot of the site.

HTTPS://web.archive.org
 
Last edited:
  • #3
https://web.archive.org/web/20200801000000*/effbot.org

I've found plenty of archives of the site, but the ones I have checked do not seem to have the instruction pages available. Frankly, it would be a bit hasslesome to check every single one; I'm considering using a web-scraping script to search for a working link. As mentioned earlier, the web archive seems to have the page directories but not the pages themselves. For example:

https://web.archive.org/web/20200703091947/http://effbot.org/tkinterbook
 
  • #7
Is that a mirror or a Google cache of the site (aka snapshot)?
 
  • #8
  • #9
jedishrfu said:
Is that a mirror or a Google cache of the site (aka snapshot)?
According to the message on the site it is a scrape from Wayback Machine.
 
  • Haha
Likes jedishrfu

FAQ: Is there any way to web-scrape a website that's down?

Can a website that is down still be web-scraped?

Yes, it is possible to web-scrape a website that is down. The process involves accessing the website's cached version or using a specialized tool that can retrieve information from a temporarily unavailable website.

Is it legal to web-scrape a website that is down?

The legality of web-scraping a website that is down depends on the website's terms of service and the purpose of the web-scraping. If the website's terms of service prohibit web-scraping or if the web-scraping is for commercial gain, it may be considered illegal.

Can I still retrieve updated information from a website that is down?

No, if a website is down, it means that the server hosting the website is not accessible. This also means that the website's information cannot be updated. Web-scraping a down website will only retrieve the information that was last available.

Are there any limitations to web-scraping a website that is down?

Yes, there are limitations to web-scraping a website that is down. These limitations include the inability to access updated information, the possibility of retrieving incomplete or outdated data, and potential legal implications.

Is it possible to web-scrape a website that is down without any coding knowledge?

Yes, there are some tools and software available that allow users to web-scrape a website without any coding knowledge. However, these tools may have limitations and may not be able to retrieve all the information from a down website.

Back
Top