Using .PARQUET files

Today we downloaded data from New York Taxi & Limousine Commission. It is only possible to download data in a parquet format. This format is a highly efficient method of storing complex data.

The New York Taxi & Limousine Commission helpfully provide a guide on how to extract data from the parquet files. I chose to use R (as opposed to Python) as it is super easy to install non-standard packages.

Open R Studio (or R but R Studio is prettier)
Go to Tools > Install Packages...
In the modal window, make sure Repository (CRAN) is selected in the 'Install From' field
Find the package named 'arrow' in the Package field
I tend to leave the other settings as the default options and things work fine...
Save the .parquet files in a place which is easy to access in the files pane
Run the following code in the console to read in your parquet file as a data frame:

trips <- read_parquet('filename.parquet')

8. Run the following code in the console to save the dataframe as a csv to the specified file path:

write.csv(trips, "[file path]\\[new file name].csv", row.names=FALSE)

Here, any back slashes must be changed to either a forward slash or a double backward slash (don't ask me why its just an R thing).

Now your data will be saved as a .csv in the file path specified in the R script.

Author:

Lydia Wren

View Profile