Data Engineer & Analytics Specialist
HetangPatel
Built at the limit
The Engineer
The Engineer
I build systems that turn raw, messy data into decisions people can act on.
My work lives where data engineering meets causal inference, spatial analytics, and machine learning — pipelines that hold up in production, models that answer the actual question, and interfaces that make the answer obvious.
Formula 1 isn't decoration here. The discipline it demands — extracting signal under pressure, choosing the right tool over the fashionable one, finding tenths where others see noise — is the same discipline I bring to a warehouse migration or a regression that has to survive scrutiny.
Records modeled
Laps trained on
Engineers led
- 01 · Designation
- Data Engineer · Analytics Specialist
- 02 · Degree
- B.Sc. Computer Science, Dalhousie University
- 03 · Class of
- December 2025
- 04 · Certification
- Google Advanced Data Analytics
- 05 · Base
- Halifax, Nova Scotia
- 06 · Range
- Open across Canada
- 07 · Eligibility
- No sponsorship required
- 08 · Status
- Available — actively racing
The Work
Regulation Impact Analyzer
Formula 1 · Causal Inference + Cloud Analytics · May 2026
When the rulebook changes, did the field actually converge? I treated the 2022 Formula 1 regulation overhaul as a natural experiment and measured the causal effect.
Three layers, one question. A Difference-in-Differences model with team fixed effects and HC3 robust errors isolates the regulation's true impact from pre-existing trends — the DiD coefficient lands at +0.068 (p = 0.050), a real but modest convergence, exactly consistent with Red Bull re-establishing dominance by 2023.
On top of that sits a walk-forward Ridge forecaster that learns regulation-response patterns as each post-2022 season closes, then projects the 2026 reset. The whole thing is wired through a production cloud stack — raw parquet into Snowflake, dbt staging-to-mart transformations, and a three-view Power BI dashboard connected live to the warehouse.
The F1 framing is the hook. The method — separating a true policy effect from noise, then tracking a rolling forecast against live actuals — is the same one used for pricing shocks, market entries, and demand planning.
Role · Solo — data engineering & causal modeling
Halifax Transit Analytics
Spatial Analytics · PostgreSQL + PostGIS · Jan 2026
663,788 records of a city's bus network, turned into a map of where service works — and where it quietly fails the people who depend on it.
Halifax Transit's GTFS feed went into a normalized PostgreSQL + PostGIS schema with GIST spatial indexing, automatic geometry triggers, and a validation layer that caught and filtered broken foreign-key references before they corrupted any join.
From there, SQL analytical views, CTEs and window functions surface the network's truth: 41% of stops are served by a single route, 973 stops sit isolated beyond 500 metres of a neighbour, and the 5 PM peak pushes 33,635 departures. A stop-connectivity query that took four seconds was driven under one with the right B-tree and spatial indexes.
It ships as an interactive Streamlit dashboard with Folium maps over all 2,380 stops — built so an operations stakeholder can see coverage inequity, not just admire a chart.
Role · Solo — data engineering & spatial analysis
Strategy A/B Testing Framework
Formula 1 · ML Simulation Engine · Dec 2025
Two pit strategies enter. One race-distance simulation, twenty drivers, 86,000 laps of training data decide which one wins.
An XGBoost lap-time predictor (MAE 2.6s, trained on 86,000+ laps across three seasons) drives a modular race-simulation engine that runs a full field over an entire Grand Prix. Feed it Strategy A versus Strategy B and it returns predicted finishing positions, times, and the winner — side by side.
The engineering discipline is in what was left out: I stripped outcome-leaking features like average speed and throttle, restricting the model to signals genuinely available before each lap. XGBoost beat an LSTM by more than 2x on this tabular data — the right tool over the fashionable one.
Validated against the 2024 Abu Dhabi Grand Prix: the model called Norris's conservative one-stop over a two-stop alternative, which is exactly how the real race played out. Deployed as a Flask API on Railway behind Docker, React front end on Vercel, CI/CD through GitHub Actions.
Role · Solo — full-stack ML engineering
BIOMEX Genomics Platform
Biomedical · -omics Visualization Webapp · Sept – Dec 2025
A web platform that lets researchers and clinicians run genomic differential-expression analysis on datasets of 60,000–90,000 samples — and see the biology, not the command line.
As Development Director I led a 10-person cross-functional team rebuilding a faculty client's legacy desktop genomics tool as a modern web application. I translated dense bioinformatics requirements into a clean architecture: a React + Firebase front end talking to an R Plumber API that exposes native DESeq2 analysis without rewriting the science in another language.
The proof of concept the client cared about most — one complete analysis method, executed end to end with verified-correct results — we delivered. A POST hits the API, temporary count and metadata matrices are reconstructed in isolated workspaces, DESeq2 runs, and the resulting volcano plot, MA plot and heatmap come back as rendered figures.
My job was as much communication as code: weekly client engagement, eliciting analytical requirements, and steering a team across the full skill range. Frontend reached ~80% completion and every milestone landed on schedule — the client confirmed the project viable for continued development.
Role · Development Director · 10-person team
The Stack
The Stack
The Toolkit
Grouped by discipline, not by hype. Every tool here has earned its place in shipped work — pipelines in production, models under scrutiny, dashboards in stakeholders’ hands.
Data Platforms & Engineering
- Snowflake
- dbt
- Azure Databricks
- ADLS Gen2
- Azure Data Factory
- Cosmos DB
- PostgreSQL
- PostGIS
- ETL / ELT
- Data Modeling
- Data Warehouse
Languages
- SQL
- Python
- JavaScript
- Java
- R
- HTML
- CSS
Machine Learning & Statistics
- XGBoost
- scikit-learn
- Ridge Regression
- Causal Inference
- A/B Testing
- Hypothesis Testing
Analytics & BI
- Power BI
- Tableau
- Streamlit
- GeoPandas
- Plotly
Frameworks & APIs
- FastAPI
- Flask
- React
- Firebase
- Docker
- Kubernetes
DevOps & Workflow
- Git
- GitHub Actions
- GitLab CI
- CI/CD
- Jira
- Confluence
- Agile / Scrum
Chapter 04 · Contact
Ready
when
you are.
Open to data engineering and analytics roles across Canada. If you’re building something that needs signal pulled out of noise — let’s talk.