COVID-19 Decision Intelligence & Risk Monitoring System¶
How I automated national-level situational reporting, reducing data-to-decision time by 95%¶
Project Type: End-to-end Data Engineering & AI Automation
Goal: Prove I can build robust data pipelines that feed directly into automated, LLM-driven executive reporting
Context: Public health organizations face massive "action lag" where data takes days to become actionable insights.
The Business Problem¶
Using fragmented public health data across 36 states, I addressed a critical data velocity bottleneck:
The Pain Points
- Data Fragmentation: Raw data from multiple sources had mismatched state names and messy headers, requiring 6+ hours of manual processing.
- Action Lag: Analysts took 1–2 days to convert raw SQL results into strategic briefings for leadership.
- Alert Fatigue: Tracking 20+ separate KPIs made it impossible to prioritize resource allocation effectively.
The Solution Impact
By building a Medallion architecture and integrating GenAI, I reduced reporting lag from 2 days to under 60 seconds and decreased administrative monitoring time by 80%.
My Process (How I Approached It)¶
This project highlights how I build pipelines that prioritize business actionability:
1. Data Standardization (The Foundation)
Engineered an end-to-end Medallion architecture (Bronze-Silver-Gold) in PostgreSQL. Used robust SQL cleaning scripts (regex, fuzzy matching) to standardize 40,000+ records with zero data loss.
2. Risk Signal Engineering
Instead of exposing 20+ confusing KPIs, I designed a weighted risk scoring algorithm (combining Case Fatality, Vaccination, and Positivity Rates) to provide a single, actionable "Risk Signal".
3. AI-Driven Automation
Integrated Google Gemini/Gemma LLMs using n8n orchestration to automatically synthesize complex SQL signals into executive briefings.
4. Human-in-the-Loop Feedback
Architected a feedback system using n8n webhooks, allowing stakeholders to tune alert sensitivity directly from their dashboards, reducing false positives.
Technical Highlights¶
Why This Stack?¶
- PostgreSQL (Medallion): Strict auditing required for public health data. Preserving raw states while exposing engineered features ensures integrity.
- n8n over Airflow: Lower engineering complexity combined with high visual visibility for stakeholders to understand the workflow.
- Docker Containerization: Reduced environment setup time by 90% (from 4 hours to <10 minutes) for rapid deployment.
Key Engineering Decisions
- Idempotent Pipelines: Used `TRUNCATE ... CASCADE` inside SQLAlchemy transactions for safe, repeatable "one-click" data refreshes.
- 7-Day Moving Averages: Implemented DAX smoothing logic in Power BI to filter out administrative reporting noise (like weekend lags).
- Tiered Prompt Engineering: Designed specific LLM prompts grounded in strict SQL contexts to prevent AI "hallucination."
Results & Impact¶
<30s
ETL Runtime (Down from 6 hrs)
95%
Faster Decision Speed
80%
Less Monitoring Time
Operational Wins:¶
- Automated Briefings: Strategy briefings generated in under 60 seconds of data ingestion.
- Actionable Dashboards: High-performance Streamlit portal for real-time ETL orchestration and Power BI for national situational awareness.
- Zero-Friction Deployment: Fully containerized stack for immediate local spin-up.
What This Proves About My Process¶
- I Solve the Right Problem: The issue wasn't a lack of data; it was the speed of insight. I focused on automation and synthesis.
- I Build for the End User: Executives needed a single risk score, not raw data tables. I engineered the metrics to match their decision-making process.
- I Deploy Safely: "Human-in-the-Loop" design ensures that AI doesn't make unchecked decisions.