Business Context: COVID-19 Surveillance Strategy
Industry Background
In the Public Health and Epidemiology sector, data is the most critical weapon against outbreaks. However, during a pandemic, decision-makers often face "Raw Data Overload" - where different sources of data live in different systems. This fragmentation leads to huge downtime in analysis and alerting, leading to longer decision time making it impossible to act before a surge occurs. This project aims to solve this problem by providing a proactive, automated reporting and early warning surveillance system.
The Goal: Moving from Reactive to Proactive
Most systems are Reactive: they show what happened last week. This system is Proactive: it uses real-time ETL and GenAI to flag emerging risks now.
Stakeholders & Users
| Stakeholder Persona | Strategic Need | Core Objective | Project Solution |
|---|---|---|---|
| Executive Leadership | Unified situational awareness and national-level risk briefings. | Make informed policy decisions and optimize large-scale resource allocation. | A unified Interface for quick insights and risk evaluation: Powerbi for national level key parameters , Excel for risk evaluation tools, LLM-powered briefing and alerts |
| Operational & Logistics Teams | Precise data on infection velocity and population protection gaps. | Prioritize the deployment of medical supplies, oxygen, and vaccination units. | Suggestive Alerting: Logic-driven triggers flagging states with high positivity, low vax coverage or high fatality rates. |
| Epidemiological Analysts | Access to cleaned, feature-engineered datasets for deep-dive trend research. | Discover hidden transmission patterns and evaluate the efficacy of interventions. | ETL Pipeline: Production-grade datasets with engineered metrics like positive_test_rate, case_fatality_rate, daily_new_cases, and more. |
| Data Governance & IT | Automated, reliable data pipelines and high-integrity, real-time reporting. | Minimize manual reporting lag and maintain a single source of truth. | Dockerized ETL Pipeline: Staging to gold layers, Postgres as single source of truth(data warehouse). |
Risk level formulated
A state is flagged risky, if it has positive_test_rate > x OR daily_new_cases > y OR case_fatality_rate > z.
For example: x = 5, y = 1000, z = 2
And a weighted risk score is calculated for each state:
weighted_risk_score = (positive_test_rate * 0.4) + ((100 - vaccination_rate) * 0.3) + ((case_fatality_rate * 10) * 0.3)
for states were:
positive_test_rate > x
OR daily_new_cases > y
OR case_fatality_rate > z;
Variables: These parameters (x, y, z) can be adjusted by stakeholders to fine-tune the risk assessment and trigger alerts.
Risk Evaluation Tool Dashboard Requirements (Excel)
To provide surgical auditing capabilities, the Excel dashboard must implement the following features: 1. Dynamic Threshold Control: A centralized panel where users can adjust variables for positivity rates, case counts, and fatality rates to instantly update the risk map. 2. Automated Hot Zone Identification: A calculation engine that filters and sorts states automatically, highlighting only those that breach any of the user-defined medical thresholds. 3. National Macro KPIs: Summary metrics providing a snapshot of the total number of active hot zones and weighted national averages for the selected date. 4. State Deep-Dive Profiler: A lookup system allowing stakeholders to isolate any single state to view its specific infection velocity and vaccination coverage without affecting the main surveillance table. 5. Visual Signal Priority: Use of conditional formatting and data bars to provide immediate visual context on the scale of risk across different regions.
BI Dashboard Requirements (Power BI)
While Excel is used for surgical logic audits and suggestive alerts, the Power BI dashboard acts as the National Situational Awareness Layer. It converts the complex Medallion data into a high-level visual narrative for leadership.
Output format required: Power BI National Situational Awareness Layer
National Situational Awareness Dashboard Requirements (Power BI)
To convert complex daily data into a strategic executive narrative, the Power BI dashboard must implement the following features: 1. Latest-Truth KPI Header: High-visibility summary cards utilizing absolute latest reporting date logic to provide ground-truth snapshots of Total Confirmed, Active Cases, and Recovery Rates. 2. Noise-Filtered Trend Engine: Advanced DAX-driven line charts employing 7-Day Moving Averages to smooth out administrative reporting lags and reveal the true trajectory of the pandemic. 3. Geospatial Risk Prioritization: A choropleth map utilizing strict, rules-based conditional formatting (Green/Yellow/Red) combined with Top N burden ranking to pinpoint critical hot zones for immediate intervention. 4. Intervention Efficacy Analytics: Interactive scatter plots correlating vaccination rates with case fatality rates (CFR) to visually demonstrate the statistical success of public health counter-measures. 5. Unified UX Interactivity: A Z-pattern layout featuring a global responsive date slicer and bidirectional cross-filtering to allow stakeholders to move seamlessly from national trends to granular state details.
Project Impact & Real-World Value
The transition from a manual, reactive data process to this automated surveillance system delivers measurable improvements in both technical efficiency and strategic response time.
| Metric | Traditional Manual Process | Automated System Outcome | Impact |
|---|---|---|---|
| System Deployment | 2-4 Hours (DB setup, env config, dependency management) | < 10 Minutes (One-click Docker launch) | 90% faster time-to-production for new environments. |
| Data Transformation | 4-6 Hours (Manual cleaning) | < 30 Seconds (Medallion ETL Pipeline) | Standardizes 40k+ records instantly with zero manual error. |
| Situation Synthesis | 1-2 Days (Analyst review, drafting briefings, stakeholder sign-off) | < 60 Seconds (GenAI Risk Audit) | Leadership receives strategic briefings while the data is still fresh. |
| Outbreak Detection | Manual scanning of 36 states to find outliers. | Instant Logic-Trigger (Weighted Risk Scoring) | 80% reduction in administrative search time by focusing only on Hot Zones. |
Requirements & Objectives
- Unified Source of Truth: Consolidate disparate CSVs into a production-grade Medallion architecture (Postgres).
- Zero-Lag Intelligence: Automate the transformation pipeline so the latest data is always available.
- Actionable Synthesis: Use LLM's to convert complex SQL metrics into qualitative situation reports.
- Human-in-the-Loop: Implement a feedback mechanism to reduce false positives and tune alert sensitivity.
Industry Best Practices Followed
- Idempotent Pipelines: The ETL can be re-run safely at any time.
- Infrastructure as Code (IaC): The entire stack is defined in
compose.yamlfor consistency across environments. - Portability: Scripts are environment-agnostic, using dynamic path resolution.