Data Analytics & Observability

The Challenge

The platform generates data across 40+ containers, and the business needed analytics capabilities without compromising patient privacy. Production data can't be accessed directly by analysts. PII has to be systematically removed before any data leaves the production boundary. On top of that, I needed an observability stack covering infrastructure health, application logs, and business metrics. All within compliance constraints.

Approach & Role

I built the data pipeline infrastructure, observability stack, and analytics tooling from scratch. The core design principle: PII never leaves the production boundary unprotected. Every record flowing to analytics environments passes through obfuscation transforms. The pipeline also detects schema drift so that when upstream services add or change columns, we catch it immediately rather than silently losing data.

Architecture & Patterns

ETL pipeline (AWS Glue):

Observability stack:

Privacy-focused web analytics:

Operational dashboards:

Impact & Scale