Today we downloaded data from New York Taxi & Limousine Commission. It is only possible to download data in a parquet format. This format is a highly efficient method of storing complex data.
The New York Taxi & Limousine Commission helpfully provide a guide on how to extract data from the parquet files. I chose to use R (as opposed to Python) as it is super easy to install non-standard packages.
- Open R Studio (or R but R Studio is prettier)
- Go to Tools > Install Packages...
- In the modal window, make sure Repository (CRAN) is selected in the 'Install From' field
- Find the package named 'arrow' in the Package field
- I tend to leave the other settings as the default options and things work fine...
- Save the .parquet files in a place which is easy to access in the files pane
- Run the following code in the console to read in your parquet file as a data frame:
trips <- read_parquet('filename.parquet')
8. Run the following code in the console to save the dataframe as a csv to the specified file path:
write.csv(trips, "[file path]\\[new file name].csv", row.names=FALSE)
Here, any back slashes must be changed to either a forward slash or a double backward slash (don't ask me why its just an R thing).
Now your data will be saved as a .csv in the file path specified in the R script.