Using .PARQUET files

Today we downloaded data from New York Taxi & Limousine Commission. It is only possible to download data in a parquet format. This format is a highly efficient method of storing complex data.

The New York Taxi & Limousine Commission helpfully provide a guide on how to extract data from the parquet files. I chose to use R (as opposed to Python) as it is super easy to install non-standard packages.

  1. Open R Studio (or R but R Studio is prettier)
  2. Go to Tools > Install Packages...
  3. In the modal window, make sure Repository (CRAN) is selected in the 'Install From' field
  4. Find the package named 'arrow' in the Package field
  5. I tend to leave the other settings as the default options and things work fine...
  6. Save the .parquet files in a place which is easy to access in the files pane
  7. Run the following code in the console to read in your parquet file as a data frame:

trips <- read_parquet('filename.parquet')

8. Run the following code in the console to save the dataframe as a csv to the specified file path:

write.csv(trips, "[file path]\\[new file name].csv", row.names=FALSE)

Here, any back slashes must be changed to either a forward slash or a double backward slash (don't ask me why its just an R thing).

Now your data will be saved as a .csv in the file path specified in the R script.

Author:
Lydia Wren
Powered by The Information Lab
1st Floor, 25 Watling Street, London, EC4M 9BR
Subscribe
to our Newsletter
Get the lastest news about The Data School and application tips
Subscribe now
© 2025 The Information Lab