Vulnerability Consolidation System

The Challenge

The platform generates vulnerability findings from multiple sources: GitHub Dependabot alerts across 20+ repositories, Vanta compliance findings from continuous monitoring, and Jira tickets for manual remediation tracking. Each source has its own UI, its own status model, and its own view of the world. Nobody had a single answer to "how many open vulnerabilities do we have right now?" without manually checking three dashboards and reconciling overlapping records.

For FedRAMP continuous monitoring, I need to demonstrate that vulnerabilities are tracked from discovery through remediation with clear timelines. Doing that manually across three systems doesn't scale.

Approach & Role

I built a Python consolidation tool that pulls from all three APIs, merges records by unique identifier, tracks lifecycle status (open, dismissed, fixed, retracted), and writes everything to a unified Google Sheets dashboard with calculated metrics. It runs on a schedule and supports incremental sync so it's not hammering APIs for unchanged data.

The design priorities were reliability (collector failures are isolated. If Vanta is down, GitHub and Jira data still flows), correctness (28 property-based tests validate merge logic, field extraction, and aggregation invariants), and auditability (every record tracks first_seen_date, last_seen_date, and status transitions).

Architecture & Patterns

Pipeline architecture:

  1. Timestamp Manager reads last sync times from a metadata sheet
  2. Collectors (GitHub, Vanta, Jira) run in parallel via ThreadPoolExecutor
  3. Data Validator checks CVE IDs, dates, and URLs before writing
  4. Data Merger upserts records by unique identifier, preserving historical data and manual entries
  5. Dashboard Calculator aggregates metrics by severity and priority
  6. Sheet Writer outputs to multiple tabs with formula hyperlinks and sheet protection
  7. Google Chat Notifier alerts on new critical/high findings and SLA deadline warnings

Collector design:

Data merge logic:

Alerting:

Testing Strategy

253 tests across 68 test files. The testing approach is split between deterministic unit tests and property-based correctness validation:

Property-based tests (28 properties via Hypothesis):

Deterministic tests cover:

100+ iterations per property test ensures edge cases surface.

Impact & Scale