1 post tagged with "Spark"

NYC taxi Data AnalysisMay 3, 2020

This project demonstrates the Analysis of NYC Taxi Data. We have approx 1.4 billion taxi rides between 2009 and 2016 (approx 400 GB uncompressed CSV Or approx 35 GB snappy parquet). We have analyzed most pickup/drop off zones, peak hours for taxi, trip distribution, peak hours for trips, top 3 pickup/drop, how people are paying, how payment type evolved with Time, Ride Sharing Opportunity.

Spark
Scala
SBT