- #1
Leo_Chau_430
- 8
- 1
- TL;DR Summary
- I am trying to write a program that can automatically scrap through the website https://www.goodschool.hk/ss to make an Excel that contains phone number, address, email address and fax number of all the secondary schools, primary schools and kindergarten in Hong Kong. However, I have faced some problems... My code can be ran successfully, but the excel generated is blank.
My code is as follow:
Python:
import pandas as pd
from bs4 import BeautifulSoup
import requests
import os
url = 'https://www.goodschool.hk/ss'
response = requests.get(url)
html = response.content
soup = BeautifulSoup(html, 'html.parser')
school_items = soup.find_all('div', {'class': 'school-item'})
school_names_en = []
school_names_zh = []
school_addresses_en = []
school_addresses_zh = []
school_phones = []
school_emails = []
school_faxes = []
for school_item in school_items:
name_elements = school_item.select('a.school-name')
school_names_en.append(name_elements[0].text.strip())
school_names_zh.append(name_elements[1].text.strip())
address_elements = school_item.select('div.school-address')
school_addresses_en.append(address_elements[0].text.strip())
school_addresses_zh.append(address_elements[1].text.strip())
contact_elements = school_item.select('div.contact-info')
school_phones.append(contact_elements[0].text.strip())
school_emails.append(contact_elements[1].text.strip())
school_faxes.append(contact_elements[2].text.strip())
df = pd.DataFrame({
'School Name (English)': school_names_en,
'School Name (Chinese)': school_names_zh,
'Address (English)': school_addresses_en,
'Address (Chinese)': school_addresses_zh,
'Phone Number': school_phones,
'Email Address': school_emails,
'Fax Number': school_faxes
})
desktop_path = os.path.join(os.path.expanduser("~"), "Desktop")
excel_file_path = os.path.join(desktop_path, "school_data.xlsx")
df.to_excel(excel_file_path, index=False)
if os.path.exists(excel_file_path):
print("Excel file generated successfully!")
else:
print("Failed to generate Excel file.")