Guidance for a beginner to extract data from a website

  • Thread starter Mr.Husky
  • Start date
In summary, the individual is a high school graduate who is planning to major in computer science and engineering. They have set a goal for themselves to find the name of a person with a known ranking in a competitive exam. They are seeking guidance from someone with knowledge in computer science to determine if this goal is achievable and what skills they would need to possess. The individual has attempted to find the name by trying random names on the exam website, but this has not been successful. They are now considering using a public API or scraping the data from the website to find the name. However, they are also interested in understanding what they can and cannot do in computer science, as hacking is not typically considered a main component of the field.
  • #1
Mr.Husky
Gold Member
89
28
TL;DR Summary
Guidance to extract data from a website for a beginner.
Hello!
I just completed my high school and about to major in computer science and engineering. I thought it will be better if I create a goal to keep myself interested on the field. It is simple, concrete and I think it is doable. And I need someone to guide me cause I know nothing about CS.

My goal is to find name of a person whose "rank" in a competitive exam is known. That's it. Let me expand on it. Recently, the exam conducting body released results based on "names". That means you don't have to enter any other details or verify yourself to see your or any other's result. You can know it just by knowing full name. And they provide some data related to self's rank. Now, I got 9307 rank in this exam. And the data mentioned, " no. Of students with same rank, boys- 0 and girls- 1". My goal is to find who got that rank. If you know the name, you just enter it and see the rank. If I know the rank, can I conversely find the name? Is it possible? Well I know nothing about web applications. Do you think is it doable? If so, how to approach it? What skills do I need to know? If you know how to do this task, please don't mention the process. But guide me so that I can do it myself. I recently opened a book, it said, type print("hello world!") In python.and boom I got the same words down the line. Then i stopped learning programming. I didn't found it any exciting. Maybe this task may teach me something.

Thank you!
Ganesh kumara.
 
Technology news on Phys.org
  • #2
The most effective ways of doing what you want are to either :

a) ask the marking body who belongs to that ranking, or

b) inquire amongst the students being ranked to see who matches.

However, bear in mind that the phrase "no. of students with same rank" may not mean "no. of other students with same rank", ie 1(one) person is in that rank : presumably you.
 
  • #3
So you have no idea what the names are of the others who took the exam, and want to keep trying random names until you find a score equal to your 9307?

If the website is designed well, it will lock you out after you have tried 3-4 random names with no match to the database.
 
  • Like
Likes Vanadium 50
  • #4
berkeman said:
So you have no idea what the names are of the others who took the exam, and want to keep trying random names until you find a score equal to your 9307?

If the website is designed well, it will lock you out after you have tried 3-4 random names with no match to the database.
Well that's not the case sir. I don't know whether it is ethical or not but I checked results of more than 30 people since I know their names. ( Some are from exam hall, some from my college).

The problem is trying random names doesn't work because total number of students participated is 137,000+ .
 
  • #5
hmmm27 said:
The most effective ways of doing what you want are to either :

a) ask the marking body who belongs to that ranking, or

b) inquire amongst the students being ranked to see who matches.

However, bear in mind that the phrase "no. of students with same rank" may not mean "no. of other students with same rank", ie 1(one) person is in that rank : presumably you.
For option b, the total student count participated is 137,000+.

Thanks I rechecked the analytics they provided and it said, "No. of Girls (Equal your Rank)=1" since I am a boy, there must be a girl with the same rank. But my interest is not in figuring out who is that but to understand what I can do in computer science and what I can't do. I just got this idea and want to know is it possible to conversely find the data from a website?
 
  • #6
Mr.Husky said:
is it possible to conversely find the data from a website?
If you're lucky and the website has a public API, you can just use that.

Most websites don't, though, so your only option other than manually browsing is to scrape the data--write a program to automatically download web pages and extract data from the html. My usual go-to in Python for doing that is BeautifulSoup.
 
  • Informative
Likes Mr.Husky
  • #7
Mr.Husky said:
But my interest is not in figuring out who is that but to understand what I can do in computer science and what I can't do.
Hacking is generally not considered a main component of computer sciences.
 
  • Like
Likes berkeman, Vanadium 50 and Mr.Husky
  • #8
PeterDonis said:
If you're lucky and the website has a public API, you can just use that.

Most websites don't, though, so your only option other than manually browsing is to scrape the data--write a program to automatically download web pages and extract data from the html. My usual go-to in Python for doing that is BeautifulSoup.
So I have to learn python now. Thanks for mentioning BeautifulSoup. Just got to know about it. So I will just learn how to code in python and maybe after a few months, I will get to know who got the same rank.
 
  • Skeptical
Likes berkeman
  • #9
This is really creepy.

Thread closed at least temporarily for moderator discussion
 
  • Like
Likes Vanadium 50 and berkeman

FAQ: Guidance for a beginner to extract data from a website

How do I extract data from a website as a beginner?

As a beginner, you can start by using web scraping tools like BeautifulSoup in Python or Scrapy to extract data from websites. These tools allow you to specify the elements you want to scrape and extract the data in a structured format.

Is web scraping legal?

Web scraping is a legally gray area and it depends on the website's terms of service. Make sure to read and understand the website's terms of service before scraping any data. Some websites explicitly prohibit scraping, while others may allow it for personal use.

How can I avoid getting blocked while scraping data from a website?

To avoid getting blocked while scraping data from a website, you can set up delays between requests, use rotating proxies, and mimic human behavior by randomizing user-agent headers. It's also important to be respectful of the website's resources and not overwhelm their servers with too many requests.

What are some common challenges when extracting data from a website?

Some common challenges when extracting data from a website include dealing with dynamic content loaded via JavaScript, handling CAPTCHAs, and maintaining the scraper over time as the website's structure changes. It's important to continuously monitor and update your scraping script to adapt to these challenges.

How can I store and analyze the data I extract from a website?

Once you have extracted the data from a website, you can store it in a database like MySQL or MongoDB for further analysis. You can then use data analysis tools like Pandas in Python or Tableau to analyze and visualize the extracted data to gain insights and make informed decisions.

Similar threads

Back
Top