COVID-19 Decision Intelligence & Risk Monitoring System¶

How I automated national-level situational reporting, reducing data-to-decision time by 95%¶

Project Type: End-to-end Data Engineering & AI Automation
Goal: Prove I can build robust data pipelines that feed directly into automated, LLM-driven executive reporting
Context: Public health organizations face massive "action lag" where data takes days to become actionable insights.

View Full Technical Breakdown → View Source Code on GitHub →

The Business Problem¶

Using fragmented public health data across 36 states, I addressed a critical data velocity bottleneck:

The Pain Points

Data Fragmentation: Raw data from multiple sources had mismatched state names and messy headers, requiring 6+ hours of manual processing.
Action Lag: Analysts took 1–2 days to convert raw SQL results into strategic briefings for leadership.
Alert Fatigue: Tracking 20+ separate KPIs made it impossible to prioritize resource allocation effectively.

The Solution Impact

By building a Medallion architecture and integrating GenAI, I reduced reporting lag from 2 days to under 60 seconds and decreased administrative monitoring time by 80%.

My Process (How I Approached It)¶

This project highlights how I build pipelines that prioritize business actionability:

1. Data Standardization (The Foundation)

Engineered an end-to-end Medallion architecture (Bronze-Silver-Gold) in PostgreSQL. Used robust SQL cleaning scripts (regex, fuzzy matching) to standardize 40,000+ records with zero data loss.

2. Risk Signal Engineering

Instead of exposing 20+ confusing KPIs, I designed a weighted risk scoring algorithm (combining Case Fatality, Vaccination, and Positivity Rates) to provide a single, actionable "Risk Signal".

3. AI-Driven Automation

Integrated Google Gemini/Gemma LLMs using n8n orchestration to automatically synthesize complex SQL signals into executive briefings.

4. Human-in-the-Loop Feedback

Architected a feedback system using n8n webhooks, allowing stakeholders to tune alert sensitivity directly from their dashboards, reducing false positives.

Technical Highlights¶

Why This Stack?¶

PostgreSQL (Medallion): Strict auditing required for public health data. Preserving raw states while exposing engineered features ensures integrity.
n8n over Airflow: Lower engineering complexity combined with high visual visibility for stakeholders to understand the workflow.
Docker Containerization: Reduced environment setup time by 90% (from 4 hours to <10 minutes) for rapid deployment.

Key Engineering Decisions

Idempotent Pipelines: Used `TRUNCATE ... CASCADE` inside SQLAlchemy transactions for safe, repeatable "one-click" data refreshes.
7-Day Moving Averages: Implemented DAX smoothing logic in Power BI to filter out administrative reporting noise (like weekend lags).
Tiered Prompt Engineering: Designed specific LLM prompts grounded in strict SQL contexts to prevent AI "hallucination."

Results & Impact¶

<30s

ETL Runtime (Down from 6 hrs)

95%

Faster Decision Speed

80%

Less Monitoring Time

Operational Wins:¶

Automated Briefings: Strategy briefings generated in under 60 seconds of data ingestion.
Actionable Dashboards: High-performance Streamlit portal for real-time ETL orchestration and Power BI for national situational awareness.
Zero-Friction Deployment: Fully containerized stack for immediate local spin-up.

What This Proves About My Process¶

I Solve the Right Problem: The issue wasn't a lack of data; it was the speed of insight. I focused on automation and synthesis.
I Build for the End User: Executives needed a single risk score, not raw data tables. I engineered the metrics to match their decision-making process.
I Deploy Safely: "Human-in-the-Loop" design ensures that AI doesn't make unchecked decisions.

Apply This Process to Your Business →