Dashboard Week - Day 4 - Web scrape NRL Data

by Cammy Phillips

Todays Task For Dashboard week was not to make a dashboard but instead to web scrape all of the NRL data and create three data sets. This is so we have some data for our final dashboards tomorrow. Below is the full breif from Lorna...

The Data School - DS34: Dashboard Week Day 3

After having only done basic HTML web scraping in the past once I knew today was going to be a challenge but what I didn't expect was for the data to be in a different format. In the future this is definitely something I will look out for sooner when I am having problems with web scraping.

Below are some tips for web scraping and how to do it

  • Start with a text input tool into alteryx of the web page URL that you want to scrape and use a download tool to parse it out (if it has worked 'HTTP/1.1 200 OK' will show in your download headers)
  • Another tip is to view your download data, add a browse tool and then copy the text from the download tool into a text viewer such as vs code as this will make it a lot easier to filter and read through
  • You can use the data in your download data and compare it to the code from the web page when you inspect it as shown below
  • You can use these bits of code that can be uniform through out the download data to then regex out different parts of tables
  • Do not to over-run the site, parse and run your text input and download data tool to prevent excess numbers of requests going to the site - this can result in you being blocked
  • The most important thing to consider when web scraping is that it is important to check that you are allowed to scrape the site