HP24
© 2026

Data Engineer & Analytics Specialist

HetangPatel

Built at the limit

B.Sc. Computer ScienceDec 2025Available now
Live Telemetry
SPEED · KM/H312THROTTLE · BRAKECIRCUIT · SECTOR MAP
Scroll

Chapter 01

The Engineer 

Who he is and how he thinks — read as a statement of intent, not a résumé.

The Engineer

I build systems that turn raw, messy data into decisions people can act on.

My work lives where data engineering meets causal inference, spatial analytics, and machine learning — pipelines that hold up in production, models that answer the actual question, and interfaces that make the answer obvious.

Formula 1 isn't decoration here. The discipline it demands — extracting signal under pressure, choosing the right tool over the fashionable one, finding tenths where others see noise — is the same discipline I bring to a warehouse migration or a regression that has to survive scrutiny.

0

Records modeled

0K+

Laps trained on

0

Engineers led

Technical Data SheetHP24
01 · Designation
Data Engineer · Analytics Specialist
02 · Degree
B.Sc. Computer Science, Dalhousie University
03 · Class of
December 2025
04 · Certification
Google Advanced Data Analytics
05 · Base
Halifax, Nova Scotia
06 · Range
Open across Canada
07 · Eligibility
No sponsorship required
08 · Status
Available — actively racing

Chapter 02

The Work 

Four projects. Four engineering worlds. Each one sweeps in over the last.

01Causal Inference

Regulation Impact Analyzer

Formula 1 · Causal Inference + Cloud Analytics · May 2026

When the rulebook changes, did the field actually converge? I treated the 2022 Formula 1 regulation overhaul as a natural experiment and measured the causal effect.

Three layers, one question. A Difference-in-Differences model with team fixed effects and HC3 robust errors isolates the regulation's true impact from pre-existing trends — the DiD coefficient lands at +0.068 (p = 0.050), a real but modest convergence, exactly consistent with Red Bull re-establishing dominance by 2023.

On top of that sits a walk-forward Ridge forecaster that learns regulation-response patterns as each post-2022 season closes, then projects the 2026 reset. The whole thing is wired through a production cloud stack — raw parquet into Snowflake, dbt staging-to-mart transformations, and a three-view Power BI dashboard connected live to the warehouse.

The F1 framing is the hook. The method — separating a true policy effect from noise, then tracking a rolling forecast against live actuals — is the same one used for pricing shocks, market entries, and demand planning.

SnowflakedbtPower BIstatsmodelsscikit-learnFastF1Python
Snowflake Blue
2022 · REG RESET+0.068TREATMENTCONTROL

Role · Solo — data engineering & causal modeling

−42%
Forecast error vs naive
0.0484
Ridge MSE (2024)
+0.068
DiD coefficient
2019–2026
Panel
02Spatial Analytics

Halifax Transit Analytics

Spatial Analytics · PostgreSQL + PostGIS · Jan 2026

663,788 records of a city's bus network, turned into a map of where service works — and where it quietly fails the people who depend on it.

Halifax Transit's GTFS feed went into a normalized PostgreSQL + PostGIS schema with GIST spatial indexing, automatic geometry triggers, and a validation layer that caught and filtered broken foreign-key references before they corrupted any join.

From there, SQL analytical views, CTEs and window functions surface the network's truth: 41% of stops are served by a single route, 973 stops sit isolated beyond 500 metres of a neighbour, and the 5 PM peak pushes 33,635 departures. A stop-connectivity query that took four seconds was driven under one with the right B-tree and spatial indexes.

It ships as an interactive Streamlit dashboard with Folium maps over all 2,380 stops — built so an operations stakeholder can see coverage inequity, not just admire a chart.

PostgreSQLPostGISPythonGeoPandasStreamlitFoliumSQLAlchemy
Signal Amber
2,380 STOPS · 80 ROUTES · 152 HUBS

Role · Solo — data engineering & spatial analysis

663,788
Records modeled
2,380
Stops mapped
41%
Single-route stops
4s → <1s
Query time
03Machine Learning

Strategy A/B Testing Framework

Formula 1 · ML Simulation Engine · Dec 2025

Two pit strategies enter. One race-distance simulation, twenty drivers, 86,000 laps of training data decide which one wins.

An XGBoost lap-time predictor (MAE 2.6s, trained on 86,000+ laps across three seasons) drives a modular race-simulation engine that runs a full field over an entire Grand Prix. Feed it Strategy A versus Strategy B and it returns predicted finishing positions, times, and the winner — side by side.

The engineering discipline is in what was left out: I stripped outcome-leaking features like average speed and throttle, restricting the model to signals genuinely available before each lap. XGBoost beat an LSTM by more than 2x on this tabular data — the right tool over the fashionable one.

Validated against the 2024 Abu Dhabi Grand Prix: the model called Norris's conservative one-stop over a two-stop alternative, which is exactly how the real race played out. Deployed as a Flask API on Railway behind Docker, React front end on Vercel, CI/CD through GitHub Actions.

XGBoostFlaskReactDockerRailwayVercelGitHub Actions
Split Teal
STRATEGY A · P2STRATEGY B · P458-LAP SIMULATION · 20 DRIVERS · MAE 2.6s

Role · Solo — full-stack ML engineering

2.6s
Lap-time MAE
86,000+
Laps trained
26
Tracks modeled
Correct
Real-race validation
04Genomics Engineering

BIOMEX Genomics Platform

Biomedical · -omics Visualization Webapp · Sept – Dec 2025

A web platform that lets researchers and clinicians run genomic differential-expression analysis on datasets of 60,000–90,000 samples — and see the biology, not the command line.

As Development Director I led a 10-person cross-functional team rebuilding a faculty client's legacy desktop genomics tool as a modern web application. I translated dense bioinformatics requirements into a clean architecture: a React + Firebase front end talking to an R Plumber API that exposes native DESeq2 analysis without rewriting the science in another language.

The proof of concept the client cared about most — one complete analysis method, executed end to end with verified-correct results — we delivered. A POST hits the API, temporary count and metadata matrices are reconstructed in isolated workspaces, DESeq2 runs, and the resulting volcano plot, MA plot and heatmap come back as rendered figures.

My job was as much communication as code: weekly client engagement, eliciting analytical requirements, and steering a team across the full skill range. Frontend reached ~80% completion and every milestone landed on schedule — the client confirmed the project viable for continued development.

ReactFirebaseR PlumberDESeq2BioconductorAgile / Scrum
Clinical Violet
−log10(p)log2 FOLD

Role · Development Director · 10-person team

10 people
Team led
60–90K samples
Dataset scale
~80%
Frontend complete
100%
Milestones on time

Chapter 03

The Stack 

The toolkit behind the work, grouped by discipline.

The Stack

The Toolkit

Grouped by discipline, not by hype. Every tool here has earned its place in shipped work — pipelines in production, models under scrutiny, dashboards in stakeholders’ hands.

01

Data Platforms & Engineering

  • Snowflake
  • dbt
  • Azure Databricks
  • ADLS Gen2
  • Azure Data Factory
  • Cosmos DB
  • PostgreSQL
  • PostGIS
  • ETL / ELT
  • Data Modeling
  • Data Warehouse
02

Languages

  • SQL
  • Python
  • JavaScript
  • Java
  • R
  • HTML
  • CSS
03

Machine Learning & Statistics

  • XGBoost
  • scikit-learn
  • Ridge Regression
  • Causal Inference
  • A/B Testing
  • Hypothesis Testing
04

Analytics & BI

  • Power BI
  • Tableau
  • Streamlit
  • GeoPandas
  • Plotly
05

Frameworks & APIs

  • FastAPI
  • Flask
  • React
  • Firebase
  • Docker
  • Kubernetes
06

DevOps & Workflow

  • Git
  • GitHub Actions
  • GitLab CI
  • CI/CD
  • Jira
  • Confluence
  • Agile / Scrum

Chapter 04 · Contact

Ready
when
you are.

Open to data engineering and analytics roles across Canada. If you’re building something that needs signal pulled out of noise — let’s talk.

(902) 989-8240Halifax, Nova ScotiaOpen across Canada · No sponsorship required