NYC TLC Yellow Taxi — Lakehouse Pipeline Report
Pipeline Overview
This report summarises the NYC TLC Yellow Taxi medallion pipeline loaded into Databricks Unity Catalog.
1 · Ingested Files
NoteParquet files in the Unity Catalog Volume
| File | Size (MB) |
|---|---|
| yellow_tripdata_2019-01.parquet | 105.3 |
| yellow_tripdata_2019-02.parquet | 98.6 |
| yellow_tripdata_2019-03.parquet | 110.6 |
| yellow_tripdata_2019-04.parquet | 105.0 |
| yellow_tripdata_2019-05.parquet | 106.3 |
| yellow_tripdata_2019-06.parquet | 98.1 |
| yellow_tripdata_2019-07.parquet | 89.5 |
| yellow_tripdata_2019-08.parquet | 85.8 |
| yellow_tripdata_2019-09.parquet | 92.6 |
| yellow_tripdata_2019-10.parquet | 101.4 |
| yellow_tripdata_2019-11.parquet | 96.2 |
| yellow_tripdata_2019-12.parquet | 96.4 |
| yellow_tripdata_2020-01.parquet | 89.2 |
| yellow_tripdata_2020-02.parquet | 87.9 |
| yellow_tripdata_2020-03.parquet | 42.4 |
| yellow_tripdata_2020-04.parquet | 4.2 |
| yellow_tripdata_2020-05.parquet | 5.9 |
| yellow_tripdata_2020-06.parquet | 9.1 |
| yellow_tripdata_2020-07.parquet | 12.8 |
| yellow_tripdata_2020-08.parquet | 15.8 |
| yellow_tripdata_2020-09.parquet | 20.4 |
| yellow_tripdata_2020-10.parquet | 25.1 |
| yellow_tripdata_2020-11.parquet | 22.5 |
| yellow_tripdata_2020-12.parquet | 22.0 |
| yellow_tripdata_2021-01.parquet | 20.7 |
| yellow_tripdata_2021-02.parquet | 20.8 |
| yellow_tripdata_2021-03.parquet | 28.6 |
| yellow_tripdata_2021-04.parquet | 32.4 |
| yellow_tripdata_2021-05.parquet | 36.9 |
| yellow_tripdata_2021-06.parquet | 42.0 |
| yellow_tripdata_2021-07.parquet | 41.7 |
| yellow_tripdata_2021-08.parquet | 41.4 |
| yellow_tripdata_2021-09.parquet | 44.0 |
| yellow_tripdata_2021-10.parquet | 50.8 |
| yellow_tripdata_2021-11.parquet | 50.6 |
| yellow_tripdata_2021-12.parquet | 47.3 |
| yellow_tripdata_2022-01.parquet | 36.4 |
| yellow_tripdata_2022-02.parquet | 43.5 |
| yellow_tripdata_2022-03.parquet | 53.1 |
| yellow_tripdata_2022-04.parquet | 52.7 |
| yellow_tripdata_2022-05.parquet | 53.0 |
| yellow_tripdata_2022-06.parquet | 52.8 |
| yellow_tripdata_2022-07.parquet | 47.1 |
| yellow_tripdata_2022-08.parquet | 47.4 |
| yellow_tripdata_2022-09.parquet | 47.3 |
| yellow_tripdata_2022-10.parquet | 54.4 |
| yellow_tripdata_2022-11.parquet | 47.8 |
| yellow_tripdata_2022-12.parquet | 51.2 |
| yellow_tripdata_2023-01.parquet | 45.5 |
| yellow_tripdata_2023-02.parquet | 45.5 |
| yellow_tripdata_2023-03.parquet | 53.5 |
| yellow_tripdata_2023-04.parquet | 51.7 |
| yellow_tripdata_2023-05.parquet | 55.9 |
| yellow_tripdata_2023-06.parquet | 52.5 |
| yellow_tripdata_2023-07.parquet | 46.1 |
| yellow_tripdata_2023-08.parquet | 45.9 |
| yellow_tripdata_2023-09.parquet | 45.7 |
| yellow_tripdata_2023-10.parquet | 56.3 |
| yellow_tripdata_2023-11.parquet | 53.5 |
| yellow_tripdata_2023-12.parquet | 54.2 |
| yellow_tripdata_2024-01.parquet | 47.6 |
| yellow_tripdata_2024-02.parquet | 48.0 |
| yellow_tripdata_2024-03.parquet | 57.3 |
| yellow_tripdata_2024-04.parquet | 56.4 |
| yellow_tripdata_2024-05.parquet | 59.7 |
| yellow_tripdata_2024-06.parquet | 57.1 |
| yellow_tripdata_2024-07.parquet | 49.9 |
| yellow_tripdata_2024-08.parquet | 48.7 |
| yellow_tripdata_2024-09.parquet | 58.3 |
| yellow_tripdata_2024-10.parquet | 61.4 |
| yellow_tripdata_2024-11.parquet | 57.8 |
| yellow_tripdata_2024-12.parquet | 58.7 |
| yellow_tripdata_2025-01.parquet | 56.4 |
| yellow_tripdata_2025-02.parquet | 57.5 |
| yellow_tripdata_2025-03.parquet | 66.7 |
| yellow_tripdata_2025-04.parquet | 64.2 |
| yellow_tripdata_2025-05.parquet | 74.2 |
| yellow_tripdata_2025-06.parquet | 70.1 |
| yellow_tripdata_2025-07.parquet | 63.8 |
| yellow_tripdata_2025-08.parquet | 59.4 |
| yellow_tripdata_2025-09.parquet | 69.1 |
| yellow_tripdata_2025-10.parquet | 71.8 |
| yellow_tripdata_2025-11.parquet | 67.8 |
| yellow_tripdata_2025-12.parquet | 70.3 |
| yellow_tripdata_2026-01.parquet | 61.2 |
85 files · 4.6 GB stored in /Volumes/workspace/bronze/raw_files
2 · Table Row Counts
| Layer | Table | Rows |
|---|---|---|
| Bronze | workspace.bronze.tlc_trips_raw | 311,735,379 |
| Silver | workspace.silver.tlc_trips | 304,237,240 |
| Gold — trips_by_month | workspace.gold.trips_by_month | 85 |
| Gold — trips_by_zone | workspace.gold.trips_by_zone | 264 |
| Gold — time_patterns | workspace.gold.time_patterns | 168 |
Note
Silver retains 97.6 % of Bronze rows after cleaning (removing nulls, outliers, invalid dates, zero-distance trips, and dates outside 2019–present).
3 · Monthly Trip Volume
4 · Monthly Revenue
5 · Top 15 Pickup Zones
6 · Hourly Demand Heatmap
7 · Average Fare Inflation Over Time
8 · Weekend vs Weekday Demand
9 · Zone Revenue Concentration (Pareto)
10 · Trip Distance & Duration Trends
11 · Key Insights
| Category | Metric | Value |
|---|---|---|
| Volume | Peak month | March 2019 — 7,687,621 trips |
| Volume | COVID trough | April 2020 — 229,464 trips (−97% vs pre-COVID peak) |
| Volume | 80% recovery after COVID | Not yet recovered (≥80% of pre-COVID peak) |
| Revenue | Best revenue month | March 2019 — $147,454,210 |
| Revenue | Average fare (2019–now) | $23.50 |
| Demand patterns | Busiest hour (weekday) | 18:00 |
| Demand patterns | Busiest hour (weekend) | 18:00 |
| Demand patterns | Quietest hour | 03:00 on Wed |
| Trip characteristics | Avg fare increase since 2019 | $19.23 → $29.88 (+55.4%) |
| Trip characteristics | Avg trip distance change (2019→2026) | +0.35 mi |
ImportantCOVID-19 Impact
April 2020 saw a ~90% collapse in NYC taxi ridership. The fleet did not recover to 80% of pre-COVID trip volume until Not yet recovered.
Generated with Quarto · Data source: NYC TLC Trip Record Data via Databricks Unity Catalog