Have you ever heard about APIs (Application Programming Interface)? Have you ever used one before? Chances are that you did, even though you may not have known what it is or how it works. So let's dive into it.
What is an API?
Every time you login to a website with your Google account, an underlying API was used. Every time you pay with your Paypal Account, an underlying API was used. Every time you get a weather alert as a text message, an underlying API was used.
An API is a set of protocols that enable different services to speak or interact with each other. You can think of it as a two-way communication process or a bridge between different services. For example: You want to buy a pair of shoes with Paypal. The Paypal application sends a request for some of your bank information to your bank. The bank reads that request and sends the requested information to Paypal. With that information Paypal is then able to process the show purchase.
In data analytics, APIs are often used to access databases or servers and request data from there. For example, the New York Times has an API through which you could request information about book or movie bestseller lists or receive metadata about their published articles such as keywords, publication date, or word count. You can then use this data to create analyses or visualizations (remember to cite the source though!)
How to use an API?
In order to send the request the HTTP method is used. You basically create a very specific and modified URL that conveys your request for the specific information that you are interested in. Within this URL you can filter for sub-sections of the data or include specific categories or fields. Some APIs also require an authorization key or token that will be included in this URL.
To go with the above-mentioned New York Times example, if I wanted to retrieve the most recent bestseller list for only Young Adult books. I could use the following URL (note that you would need to insert your authorization key):
https://api.nytimes.com/svc/books/v3/lists/current/young-adult.json?api-key=yourkey
It is possible to simply paste it into your browser; however, the data will be a single string and not very useful. In Chrome, it would look something like this:
Instead, we can use Alteryx's developer tools "Download" and "JSON Parse" to extract the data from that URL. The end-result will look something like this. Still not very useful for data visualizations.
However, all our data is included here and with a bit of data cleaning it can be shaped into a suitable format. I used a filter to only show the relevant results, RegEx to parse out the Book ID and Field Name, and Crosstab to retrieve a table with each row representing one Young Adult bestseller. Finally I reduced the dataset to only the fields that I was interested in such as title, author, and publisher.
And now you have a nice table with your relevant information that then can be used for further analysis or visualizations.
So why do Data Consultants need APIs?
The short answer: APIs are everywhere!
The long answer: APIs are used in all kinds of different formats, and for different goals. A client company may use an API to retrieve data from an open source. They may use an API to send and retrieve data internally from one department to another. The client company may also have a close partner with whom they share an API. As a data consultant you may also want to use an API to supplement your data for your analysis and visualizations. Also, it is extremely cool to have access to an incredible amount of data, right at your finger tips, when you know how to use APIs. We can move from using published datasets to using APIs and only retrieving the information we need. APIs can also update in real-time, which may facilitate business insights.
Feature Image by Douglas Lopes / Unsplash
Sketch created by Lisa Hitch in Excalidraw.com
Screenshots from Alteryx 2023.1