Dashboard Week Day 4 - Web Scraping

Today's mission should we choose to accept it (reluctantly we kind of had to) was to web scrape data from the London Marathon website. In particular we had to web scrape data from runners who's surnames began with the first two letters of our surnames - in my case 'Ho'.

I first started by retrieving one year of data (2023) for all runners with the aim of filtering surname by 'Ho' later and then replicating my workflow for years 2022 and 2021 later on. I did this using the text input tool and inputted https://results.tcslondonmarathon.com/2021/?page=1&event=MAS&num_results=1000&pid=list&pidp=start&search[sex]=M&search[age_class]=%25 as a starting point.

Next I generated 24 rows which was the number of pages I needed for that year and then proceeded to download the data.

Next came a loaddddddd of regex - ChatGPT was my best friend here.

And finally all I needed to do was filter out surnames beginning with 'Ho'.

Once I was happy with my workflow I duplicated two more and went into each and replaced '2023' in the text input and formula tools with the other years.

And here's my final dashboard!

Author:
Nadia Holloway
Powered by The Information Lab
1st Floor, 25 Watling Street, London, EC4M 9BR
Subscribe
to our Newsletter
Get the lastest news about The Data School and application tips
Subscribe now
© 2025 The Information Lab