Strategies for Prospective Biosurveillance Using Multivariate Time Series Howard Burkom1, Yevgeniy Elbert2, Sean Murphy1 Johns Hopkins Applied Physics Laboratory National Security Technology Department 2 Walter Reed Army Institute for Research 1 Tenth Biennial CDC and ATSDR Symposium on Statistical Methods Panelist: Statistical Issues in Public Health Surveillance for Bioterrorism Using Multiple Data Streams Bethesda, MD March 2, 2005 Defining the Multivariate Temporal Surveillance Problem Varying Nature of the Data: Multivariate Nature of Problem:

Trend, day-of-week, seasonal behavior depending on data type & grouping: Many locations Multiple syndromes Stratification by age, gender, other covariates Surveillance Challenges: Defining anomalous behavior(s) Hypothesis tests--both appropriate and timely Avoiding excessive alerting due to multiple testing Correlation among data streams Varying noise backgrounds Communication with/among users at different levels Data reduction and visualization Problem: to combine multiple evidence sources for increased sensitivity at manageable alert rates 6 height of outbreak

5 Office Visits MILITARY ED-UI ED ILI OTC Recent Respiratory Syndrome Data 4 3 2 early cases 8/1/2004 7/1/2004 6/1/2004 5/1/2004

4/1/2004 3/1/2004 2/1/2004 1/1/2004 12/1/2003 11/1/2003 10/1/2003 9/1/2003 8/1/2003 7/1/2003 6/1/2003 5/1/2003 4/1/2003

3/1/2003 2/1/2003 0 1/1/2003 1 Multivariate Hypothesis Testing Parallel monitoring: Null hypothesis: no outbreak of unspecified infection in any of hospitals 1N (or counties, zipcodes, ) FDR-based methods (modified Bonferroni) Consensus monitoring: Null hypothesis: no respiratory outbreak infection based on hosp. syndrome counts, clinic visits, OTC sales, absentees Multiple univariate methods: combining p-values Fully multivariate: MSPC charts General solution: system-engineered blend of these Scan statistics paradigm useful when data permit

Univariate Alerting Methods Data modeling: regression controls for weekly, holiday, seasonal effects Outlier removal procedure avoids training on exceptional counts Baseline chosen to capture recent seasonal behavior Standardized residuals used as detection statistics Process control method adapted for daily surveillance Combines EWMA, Shewhart methods for sensitivity to gradual or sudden signals Parameters modified adaptively for changing data behavior Adaptively scaled to compute 1-sided probabilities for detection statistics Small-count corrections for scale-independent alert rates Outputs expressed as p-values for comparison, visualization Parallel Hypotheses & Multiple Testing Adapting Standard Methods P-values p1,,pn with multiple null hypotheses desired type I error rate : no outbreak at any hospital j j=1,,N Bonferroni bound: error rate is achieved with test p j < /N, all j (conservative) Simes 1986 enhancement (after Seeger, Elkund): Put p-values in ascending order: P(1),,P(n) Reject intersection of null hypotheses if any P(j*) < j* N

Reject null for j <= j* (or use more complex criteria) Parallel Hypotheses: Criteria to Control False Alert Rate Simes-Seeger-Elkund criterion: Gives expected alert rate near desired for independent signals Applied to control the false discovery rate (FDR) for many common multivariate distributions (Benjamini & Hochberg, 1995) FDR = Exp( # false alerts / all alerts ) Increased power over methods controlling Pr( single false alert ) Numerous FDR applications, incl. UK health surveillance in (Marshall et al, 2003) Criterion: reject combined

null hypothesis if any p-value falls below line Stratification and Multiple Testing Counts unstratified by age EWMAShewhart aggregate p-value Counts ages 0-4 Counts ages 5-11 EWMAShewhart EWMAShewhart EWMAShewhart

p-value, ages 0-4 p-value, ages 5-11 p-value, ages 71+ Modified Bonferroni (FDR) MIN resultant p-value composite p-value Counts ages 71+ Consensus Monitoring:

Multiple Univariate Methods Fishers combination rule (multiplicative) Given p-values p1, p2,,pn: F 2 ln p j j F is 2 with 2n degrees of freedom, for pj independent Recommended as stand-alone method Edgingtons rule (additive) Let S = sum of p-values p1, p2,,pn n n n n n n n

S S 1 S 2 S 3 Resultant p-value: n! 1 n!

2 n! 3 n! ( stop when (S-j) <= 0 ) Normal curve approximation formula for large n Consensus method: sensitive to multiple near-critical values Multiple Univariate Criteria: 2D Visualization Nominal univariate criteria Edgington Fisher 934 days of EMS Data 12 time series: separate syndrome groups of ambulance calls Poisson-like counts: negligible day-of-week, seasonal effects EWMA-Shewhart algorithm applied to derive p-values Each row is mean over ALL combinations Stand-Alone Method Multiple Testing Problem!Addl Consensus Alerts Multivariate Control Charts T2 statistic: (X- S-1(X-

X = multivariate time series: syndromic claims, OTC sales, etc. S = estimate of covariance matrix from baseline interval Alert based on empirical distribution to alert rate MCUSUM, MEWMA methods filter X seeking shorter average run length Hawkins (1993): T2 particularly bad at distinguishing location shifts from scale shifts T2 nondirectional Directional statistic: ( - S-1(X- , where is direction of change MSPC Example: 2 Data Streams Evaluation: Injection in Authentic and Simulated Backgrounds Background: Authentic: 2-8 correlated streams of daily resp syndrome data (23 mo.) Simulated: negative binomial data with authentic , modeled overdispersion with = k Injections (additional attributable cases): Each case stochastic draw from point-source epicurve dist. (Sartwell lognormal model) 100 Monte Carlo trials; single outbreak effect per trial

With and without time delays between effects across streams # alerts in noise (no attributable cases) Pr( False Alarm ) # days examined ( 1-specificity ) Pr(det ection) ( sensitivity ) # signals alerted # signals injected ROC: Both as a function of threshold Multivariate Comparison Example: faint, 1- peak signal with in 4 independent data streams, with differential effect delays Data correlation tends to degrade alert rate of multiple, univariate methods Cross correlation can greatly improve multivariate method

performance (if consistent), or can degrade it! PD=PFA (random) ROC Effects of Data Correlation Example: faint, 2- peak signal with 2 of 6 highly correlated data streams, with differential effect delays Detection Probability Degradation of multiple, univariate methods Effect of strong, consistent correlation on multivariate methods Daily False Alarm Probability Conclusions Comprehensive biosurveillance requires an interweaving of parallel and consensus monitoring Adapted hypothesis tests can help maintain sensitivity at practical false alarm rates But background data and cross-correlation must be understood

Parallel monitoring: FDR-like methods required according to scope, jurisdiction of surveillance Multiple univariate Fisher rule useful as stand-alone combination method Edgington rule gives sensitivity to consensus of tests Multivariate MSPC T2-based charts offer promise when correlation is consistent & significant, but their niche in routine, robust, prospective monitoring must be clarified Backups References 1 Testing Multiple Null Hypotheses Simes, R. J., (1986) "An improved Bonferroni procedure for multiple tests of significance", Biometrika 73 751-754. Benjamini, Y., Hochberg, Y. (1995). " Controlling the False Discovery Rate: a Practical and Powerful Approach to Multiple Testing ", Journal of the Royal Statistical Society B, 57 289-300.

Hommel, G. (1988). "A stagewise rejective multiple test procedure based on a modified Bonferroni test , Biometrika 75,383-386. Miller C.J., Genovese C., Nichol R.C., Wasserman L., Connolly A., Reichart D., Hopkins A., Schneider J., and Moore A. , Controlling the False Discovery Rate in Astrophysical Data Analysis, 2001, Astronomical Journal , 122, 3492 Marshall C, Best N, Bottle A, and Aylin P, Statistical Issues in Prospective Monitoring of Health Outcomes Across Multiple Units, J. Royal Statist. Soc. A (2004), 167 Pt. 3, pp. 541-559. Testing Single Null Hypotheses with multiple evidence Edgington, E.S. (1972). "An Additive Method for Combining Probability Values from Independent Experiments. , Journal of Psychology , Vol. 80, pp. 351-363. Edgington, E.S. (1972). "A normal curve method for combining probability values from independent experiments. , Journal of Psychology , Vol. 82, pp. 85-89. Bauer P. and Kohne K. (1994), Evaluation of Experiments with Adaptive Interim Analyses, Biometrics 50, 1029-1041 References 2 Statistical Process Control

Hawkins, D. (1991). Mulitivariate Quality Control Based on Regression-Adjusted Variables , Technometrics 33, 1:61-75. Mandel, B.J, The Regression Control Chart, J. Quality Technology (1) (1969) 1:1-9. Wiliamson G.D. and VanBrackle, G. (1999). "A study of the average run length characteristics of the National Notifiable Diseases Surveillance System, Stat Med. 1999 Dec 15;18(23):3309-19. Lowry, C.A., Woodall, W.H., A Multivariate Exponentially Weighted Moving Average Control Chart, Technometrics, February 1992, Vol. 34, No. 1, 46-53 Point-Source Epidemic Curves & Simulation Sartwell, P.E., The Distribution of Incubation Periods of Infectious Disease, Am. J. Hyg. 1950, Vol. 51, pp. 310-318; reprinted in Am. J. Epidemiol., Vol. 141, No. 5, 1995 Philippe, P., Sartwells Incubation Period Model Revisited in the Light of Dynamic Modeling, J. Clin, Epidemiol., Vol. 47, No. 4, 419-433. Burkom H and Rodriguez R, Using Point-Source Epidemic Curves to Evaluate Alerting Algorithms for Biosurveillance, 2004 Proceedings of the American Statistical Association, Statistics in Government Section [CD-ROM], Toronto: American Statistical Association (to appear) MSPC 2-Stream Example: Detail of Aug. Peak Effect of Combining Evidence

0.10 Edgington: ED, OV, OTC 0.09 Office Visits Only Algorithm P-values 0.08 0.07 0.06 0.05 0.04 0.03 0.02 0.01 0.00 8/17/03 10/6/03 early cases 11/25/03

1/14/04 height of outbreak 3/4/04 4/23/04 secondary event Bayes Belief Net (BBN) Umbrella To include evidence from disparate evidence types Continuous/discrete data Derived algorithm output or probabilities Expert/heuristic knowledge Graphical representation of conditional dependencies Can weight statistical hypothesis test evidence using heuristics not restricted to fixed p-value thresholds Can exploit advances in data modeling, multivariate anomaly detection Can model Heuristic weighting of evidence Lags in data availability or reporting Missing data

Bayes Network Elements Flu Season Flu Anthrax GI Anomaly Resp Anomaly Sensor Alarm Posterior probabilities P(Flu | Evidence) Evidence P(Anthrax | Evidence) Flu Season GI Anomaly

Resp Anomaly Sensor Alarm 0.70 Flu Season GI Anomaly Resp Anomaly Sensor Alarm 0.67 Flu Season GI Anomaly Resp Anomaly Sensor Alarm 0.08

> 0.005 Flu Season GI Anomaly Resp Anomaly Sensor Alarm 0.07 < 0.17 >> >> 0.0023 0.09 Structure of BBN Model for Asthma Flare-ups

Syndromic Asthma Interaction Asthma Military RX Cold/Flu Season and Irritant Resp Anomaly Resp Military OV SubFreezing Temp Cold/Flu Season Cold/Flu Season Start Poll utio n Resp Military RX Resp Civilian OV

Ozone Resp Civilian OTC PM 2.5 AQI Season Allergen Mold Spores Level Season Grass Pollen Level Season Tree Pollen Level

Season Weed Pollen Level Season BBN Application to Asthma Flare-ups Availability of practical, verifiable data: For truth data: daily clinical diagnosis counts For evidence: daily environmental, syndromic data Known asthma triggers with complex interaction Air quality (EPA data) Concentration of particulate matter, allergens Ozone levels Temperature (NOAA data) Viral infections (Syndromic data) Evidence from combination of expert knowledge, historical data