API Gateway & Identity Architecture
- Built a 963-line differential config engine that treats Kong gateway configuration as code, diffing against live state and applying changes with dependency-aware ordering
- Made Google Workspace the single source of truth for all access decisions, one place to grant access, one place to revoke it across 10+ systems
- Delivered 15+ branded login experiences from a single Keycloak deployment with FIPS 140-3 dual-target builds
- Wrote custom SCIM sync Lambdas and a Go opkssh plugin to bridge gaps where Google Workspace doesn't natively integrate with AWS Identity Center, GitHub Enterprise, or SSH
- Enforces 6-tier RBAC at the gateway edge via custom Lua plugins for JWT validation and OIDC session management
The Challenge
The platform I manage is multi-tenant. Each tenant organization needs branded login experiences, role-based access control across 6 tiers, and locale-specific registration flows for 15+ locales. All of that has to be FIPS-compliant for FedRAMP. On top of customer-facing identity, I also needed to unify how operators and employees access everything: AWS accounts, GitHub, internal tools, dev servers. The goal was one place to grant access, one place to revoke it.
Approach & Role
I built both the API gateway management system and the identity provider infrastructure from scratch. The gateway uses a custom Python tool that treats configuration as code. YAML definitions are diffed against live Kong state and applied with dependency-aware ordering. For identity, I built the Keycloak infrastructure with FIPS dual-target builds.
Architecture & Patterns
API Gateway (Kong as Code):
- 963-line Python differential config engine reads 44 YAML service definitions
- Diffs against live Kong state via Admin API, applies changes with ThreadPoolExecutor (10 workers)
- Dependency-aware CRUD ordering: upstreams, targets, services, routes, plugins (reversed for deletes)
- Template variable substitution for multi-environment support (dev, staging, prod)
- Blue/green stack support for zero-downtime gateway updates
Custom Lua plugins (vendored and ported to Kong 3.x):
- JWT-Keycloak: validates JWT tokens, extracts realm and client roles, enforces authorization at the edge
- OIDC: session management with token introspection fallback
- Path-prefix: URL rewriting for multi-tenant routing
Identity Provider (Keycloak):
- Dual-target FIPS builds from a single Dockerfile (FIPS-strict mode vs standard)
- Brand configuration dynamically generated from a shared git submodule (15+ branded themes)
- PII encryption provider for sensitive user attributes
- CI/CD consolidation: I merged 4 separate workflows into 1, eliminating N×M build matrix complexity
Impact & Scale
- Service definitions managing the full API topology for 40+ containers
- 6-tier RBAC enforcement at the gateway edge (SuperAdmin to Anonymous)
- 15+ branded login/registration experiences from a single deployment
- FIPS 140-3 compliant identity provider with BouncyCastle cryptographic providers
- Concurrent config application (10 workers) for fast gateway updates across environments
- Custom authentication flows supporting organization-aware multi-tenant registration
Centralized Identity & Access Control
Beyond customer-facing authentication, I needed to solve operator and employee access across all infrastructure, SaaS tooling, and internal services. The company had grown to the point where managing access per-system wasn't sustainable.
Design principle: I made Google Workspace the single source of truth for all access decisions. One place to grant access, one place to revoke it. Group membership propagates into per-application role assignments across every system in the stack.
Group-based RBAC architecture:
- Groups are structured by team and by application role. A developer joins their team group, which nests into app-specific access groups
- Adding a user to one or two groups gives them the correct access level across all systems simultaneously
- Offboarding is a single action: disable the Google account, access disappears everywhere
Downstream integrations:
- AWS (multi-account): I built a custom Lambda-based SCIM sync to push users and groups to IAM Identity Center. Google Workspace's native SCIM support doesn't meet Identity Center's requirements, so this was necessary. Terraform manages permission sets and account assignments from the synced groups
- GitHub Enterprise Cloud: Same approach. Custom SCIM sync via Lambda to populate org membership and team assignments from Google groups
- Pipedrive, SendGrid: SAML federation with group-based role mapping
- Jira / Atlassian products: Native Google Workspace connector with group-to-role mapping
- Microsoft 365: Office license provisioning tied to group membership
- OVH (dedicated dev servers): Control panel access gated by group membership
- Zabbix: SAML with JIT user provisioning. Users are auto-created on first login with the correct role based on group membership
- Proxmox, production logs, Keycloak admin portals: I fronted these with oauth2-proxy, which validates Google authentication and group membership before granting access
- Dev server SSH: I extended opkssh with a custom Go plugin that resolves Google group membership at login time. The upstream project handles OIDC-based SSH but doesn't natively support Google groups, so my shim queries the Directory API to enforce group-based access policies
What required custom engineering:
- AWS Identity Center SCIM sync (Lambda): Google Workspace doesn't natively support SCIM provisioning to AWS, so I built a custom process for user/group synchronization
- GitHub Enterprise SCIM sync (Lambda): Same constraint. Google doesn't natively push to GitHub's SCIM endpoint
- opkssh Google Groups plugin (Go): I wrote a shim that queries Google's Directory API at login to enforce group-based access policies on dev servers
- oauth2-proxy deployment pattern: I standardized a reverse-proxy configuration fronting multiple internal services with consistent group-based access rules