Vulnerability Consolidation System

Consolidates vulnerability data from 3 APIs (GitHub Dependabot, Vanta, Jira) into a unified Google Sheets dashboard with SLA deadline tracking
Parallel collection via ThreadPoolExecutor with failure isolation, if one source is down the others still flow
28 property-based tests (Hypothesis) validate merge logic, deduplication stability, and aggregation invariants across arbitrary inputs
Incremental sync minimizes API calls after the initial full sync, reducing runtime from ~60s to ~20-30s
Feeds directly into FedRAMP continuous monitoring evidence with clear first_seen/last_seen/status lifecycle for every finding

The Challenge

The platform generates vulnerability findings from multiple sources: GitHub Dependabot alerts across 20+ repositories, Vanta compliance findings from continuous monitoring, and Jira tickets for manual remediation tracking. Each source has its own UI, its own status model, and its own view of the world. Nobody had a single answer to "how many open vulnerabilities do we have right now?" without manually checking three dashboards and reconciling overlapping records.

For FedRAMP continuous monitoring, I need to demonstrate that vulnerabilities are tracked from discovery through remediation with clear timelines. Doing that manually across three systems doesn't scale.

Approach & Role

I built a Python consolidation tool that pulls from all three APIs, merges records by unique identifier, tracks lifecycle status (open, dismissed, fixed, retracted), and writes everything to a unified Google Sheets dashboard with calculated metrics. It runs on a schedule and supports incremental sync so it's not hammering APIs for unchanged data.

The design priorities were reliability (collector failures are isolated. If Vanta is down, GitHub and Jira data still flows), correctness (28 property-based tests validate merge logic, field extraction, and aggregation invariants), and auditability (every record tracks first_seen_date, last_seen_date, and status transitions).

Architecture & Patterns

Pipeline architecture:

Timestamp Manager reads last sync times from a metadata sheet
Collectors (GitHub, Vanta, Jira) run in parallel via ThreadPoolExecutor
Data Validator checks CVE IDs, dates, and URLs before writing
Data Merger upserts records by unique identifier, preserving historical data and manual entries
Dashboard Calculator aggregates metrics by severity and priority
Sheet Writer outputs to multiple tabs with formula hyperlinks and sheet protection
Google Chat Notifier alerts on new critical/high findings and SLA deadline warnings

Collector design:

GitHub Collector - incremental sync: sorts by updated_at descending, terminates early when it hits records older than last sync. Filters archived repos, supports an inclusive repo allowlist.
Vanta Collector - OIDC authentication, pulls from both active findings and remediation endpoints. Always full sync (API limitation).
Jira Collector - incremental JQL queries with updated >= filter. Extracts multiple Dependabot links per issue via regex parsing.
All collectors are failure-isolated: if one throws, the others complete and the system writes partial data with a PARTIAL_SUCCESS status.

Data merge logic:

Upsert-based persistence: reads existing sheet data before writing, matches by GitHub Alert URL / Vanta Finding ID / Jira Issue Key
Field preservation: retains values when APIs no longer provide them (e.g., Vanta stops returning title/severity once a finding is remediated, but I keep the original values)
Manual entry protection: "Fixed in Release" column stays editable while all other columns are formula-protected
Status lifecycle tracking: open to dismissed/fixed/retracted, with retraction for repos removed from the scan list
Schema migration: automatically adds new tracking columns to existing sheets without data loss

Alerting:

Google Chat webhook integration for real-time notifications
Alerts on: new Critical/High vulnerabilities, SLA deadline warnings (10 days and 3 days before deadline)
Delivery status tracking to avoid duplicate alerts

Testing Strategy

253 tests across 68 test files. The testing approach is split between deterministic unit tests and property-based correctness validation:

Property-based tests (28 properties via Hypothesis):

Collector failure isolation: any subset of collectors can fail without crashing the system
Consolidated view properties: merged records never lose data, deduplication is stable
Dashboard aggregation: metric sums are consistent with source data
Field extraction: URL parsing and CVE ID extraction handle arbitrary inputs
Fixed vulnerability exclusion: remediated records never count as active
GitHub alert URL matching: the regex correctly identifies valid Dependabot URLs across any repository/alert combination
Data merger properties: upsert logic is idempotent, historical dates are immutable

Deterministic tests cover:

Each collector's API response parsing and error handling
Configuration loading and validation
Alert message formatting
Date formatting edge cases
Sheet writer protection logic
Backward compatibility with schema changes
Performance monitoring accuracy

100+ iterations per property test ensures edge cases surface.

Impact & Scale

Consolidates vulnerability data from 3 APIs into a single dashboard updated on schedule
Parallel collection reduces sync time from ~60s (sequential) to ~20-30s
Incremental sync minimizes API calls after the initial full sync
253 tests with 28 property-based validations ensure merge correctness across schema changes
Feeds directly into FedRAMP continuous monitoring evidence: clear first_seen/last_seen/status lifecycle for every finding
SLA alerting catches approaching deadlines before they're missed
Manual entry preservation means the security team can annotate records without their work being overwritten