DevOps Automation & Tooling
- Built a declarative MedStack PaaS deployment system with GitOps workflow (plan-on-PR, apply-on-merge) managing 40+ containers across HIPAA-compliant clusters
- Created a vulnerability consolidation pipeline merging GitHub Dependabot, Vanta, and Jira findings into a unified dashboard with SLA tracking
- Implemented DNS-as-Code for 20+ domains using OctoDNS with full PR review workflow
- Built a deployment manager (FastAPI + HTMX) at 98% test coverage providing self-service container management for development teams
- Event-driven operational alerting via Lambda and Google Chat catches production issues within minutes
The Challenge
A small platform team managing 130+ repositories needed force-multiplying automation. I was responsible for HIPAA-compliant container orchestration across MedStack PaaS clusters, vulnerability tracking across multiple scanning tools, declarative DNS for 20+ domains, deployment lifecycle management, and event-driven alerting. All without adding operational overhead or external SaaS dependencies.
Approach & Role
I built every tool in this category from scratch. Each is a focused utility that does one thing well. The philosophy is GitOps and code-driven: DNS changes go through PR review, vulnerability status is consolidated programmatically, releases are coordinated via automation rather than manual checklists.
Architecture & Patterns
Vulnerability consolidation system (Python):
- Pipeline architecture: Collector to obtain upstream data, merger to combine results, engine to process rules and metrics, writer to submit resulting data to Google Sheets
- Parallel data collection from GitHub Security, compliance tools, and issue trackers via ThreadPoolExecutor
- Consolidation engine merges findings by CVE, deduplicates across sources, tracks SLA deadlines
- Google Sheets dashboard with upsert-based persistence (no data loss on partial runs)
- Google Chat alerting with deduplication and retraction (resolved vulns retract their alerts)
- 253 tests with 28 property-based tests (Hypothesis) validating invariants
Deployment manager (FastAPI + HTMX):
- Server-rendered web UI for managing container registries, secrets, stacks, and deployment lifecycles
- HTMX partial rendering with OOB swaps for real-time UI updates without JavaScript frameworks
- CSRF double-submit cookie protection, sliding-window rate limiting, input sanitization middleware
- Docker socket orchestration for direct container management
- 389 pytest tests at 98% code coverage, 35 Playwright E2E scenarios including mobile viewports
DNS-as-Code (OctoDNS):
- Declarative DNS management for 20+ domains via Route53 + Cloudflare providers
- Containerized OctoDNS tooling (Alpine Docker) for reproducible execution
- Plan-on-PR / apply-on-merge GitOps workflow
- Full cloud migration history visible through DNS record changes
MedStack PaaS deployment automation (Python):
- Custom Python CLI I wrote to manage 40+ containers across HIPAA-compliant Docker Swarm clusters via MedStack API
- Declarative YAML stack definitions with templated service configs (Jinja2-style variables for secrets, endpoints, credentials) and deep-merge overrides per environment
- GitOps workflow: plan-on-PR (compare action posts cluster diff as PR comment) / apply-on-merge (deploy with concurrency control and YAML lint gate)
- Multi-stack architecture: blue/green service prefixes for zero-downtime releases, shared-services layer for infrastructure (Keycloak, Kong, Elasticsearch, monitoring)
- DeepDiff-based drift detection comparing live cluster state against declarative config; validates secrets, configs, and volumes exist before deploying
- Rolling update policies per service: configurable update/rollback order, parallelism, failure action, and monitor intervals
- Bulk service refresh (image pull + redeploy) with configurable inter-service wait to prevent node overload
- Evolved from basic templates to OIDC-integrated service definitions with per-environment secret mapping and multi-region support (CA + US production clusters)
Operational alerting:
- Event-driven pipeline: EventBridge rules execute Lambda (726-line handler) which notifies operations via Google Chat webhooks
- Configurable alert routing based on event source and severity
- Cost optimization: dev environment scheduler (Flask app) that stops/starts resources on schedule
Impact & Scale
- Vulnerability SLA compliance tracked automatically with zero manual spreadsheet maintenance
- MedStack PaaS deployments managed declaratively, PR review shows exact diff of what will change in the cluster
- Coordinated releases across 6+ repos completed in minutes instead of hours
- Deployment manager provides self-service for development team without infrastructure access
- 98% test coverage on deployment tooling, high confidence in operational reliability
- Event-driven alerting catches production issues within minutes