A Sparsification Approach for Temporal Graphical Model ...

A Sparsification Approach for Temporal Graphical Model ...

A Sparsification Approach for Temporal Graphical Model Decomposition Ning Ruan Kent State University Joint work with Ruoming Jin (KSU), Victor Lee (KSU) and Kun Huang (OSU) Motivation: Financial Markets Motivation: Biological Systems Fluorescence Counts Protein-Protein Interaction Microarray time series profile 3 Vector Autoregression Univariate Autoregression is self-regression for a timeT series X (t ) (u ) X (t u ) (t ) u 1 VAR is the multivariate extension of autoregression

t= 0 1 2 3 4 x1 (0) x1 (1) x1 (2) x1 (3) x1 (4) x1 (T )

x2 (0) x3 (0) x2 (1) x3 (1) x2 (2) x3 (2) x2 (3) x3 (3) x2 (4) x3 (4) x2 (T ) x3 (T )

xm (0) xm (1) xm (2) xm (3) xm (4) xm (T ) T T X(t ) (u ) X(t u ) (t ) u 1 4 Granger Causality Goal: reveal causal relationship between two univariate time series. Y is Granger causal for X at time t if Xt-1 and Yt-1 together are a better predictor for Xt than Xt-1 alone.

i.e., compare the magnitude of error (t) t) vs. (t) t) T X (t ) [ t u X (t u )] (t ) u 1 vs. T X (t ) [ t u X (t u ) t uY (t u )] (t ) u 1 5 Temporal Graphical Modeling Recover the causal structure among a group of relevant time series X1 X2 X3 X4 X2 X1

12 X4 X6 X8 X6 X3 X5 X7 X7 X5 X8 temporal graphical model The Problem

Given a temporal graphical model, can we decompose it to get a simpler global view of the interactions among relevant time series? How to interpret these causal relationships??? Extra Benefit X1 X2 X3 X4 X5 X6 X7 X8 X1 X2 X3 X4 X5 X1 X3 X2 X4 X7 X8 Clustering based on similarity X2 X1

X7 X6 X7 X5 X6 X3 X6 X4 X8 X5 X8 Consider time series clustering from a new perspective! Clustered Regression Coefficient Matrix Vector Autoregression Model T

X (t) (u )X (t u ) (t) u 1 (u) is a NxN coefficient matrix Clustered Regression Coefficient Matrix 0 submatrix 1 (u ) 2 (u ) 0 (u ) 0 0 0

0 K (u ) 1) if(u)ij0, then time series i and j are in the same cluster 2) if time series i and j are not in the same cluster, then (u)ij=0 Temporal Graphical Model Decomposition Cost Goal: preserve prediction accuracy while reducing representation cost Given a temporal graphical model, the cost for model decomposition is L || X (t ) t 1 T

2 2 ( u ) X ( t u ) || ( || ( u ) || )

u 1 prediction error L2 penalty Problem Tend to group all time series into one cluster Refined Cost for Decomposition C1 Balance size of clusters T tr (C C ) ( Cik ) k 2 X2 i

C is NxK membership matrix 1 0 0 1 0 0 0 1 0 0 0 1

Overall cost is the sum of three parts L T || X (t ) t 1 2 2 T ( u ) X ( t u ) ||

( || ( u ) || ) ( tr ( C C )) u 1 prediction error L2 penalty size constraint

Optimal Decomposition Problem Find a cluster membership matrix C and its regression coefficient matrix such that the cost for decomposition is minimal Hardness of Decomposition Problem Combined integer (membership matrix) and numerical (regression coefficient matrix) optimization problem Large number of unknown variables NxK variables in membership matrix NxN variables in regression coefficient matrix Basic Idea for Iterative Optimization Algorithm Relax binary membership matrix C to probabilistic membership matrix P Optimize membership matrix while fixing regression coefficient matrix Optimize regression coefficient matrix while fixing membership matrix Employ two optimization steps iteratively to get a local optimal solution

Overview of Iterative Optimization Algorithm Time Series Data Temporal Graphical Model Optimize cluster membership matrix Optimize regression coefficient matrix Quasi-Newton Method Generalized ridge regression Step 1 Step 2 Step 1: Optimize Membership Matrix

Apply Lagrange multiplier method: F cost( P) i ( p( k | i ) 1) i k Quasi-Newton method Approximate Hessian matrix by iteratively updating ( n 1) ( n) P P (n) (n) (n) H ( n 1) F ( P ,

) (n) Step 2: Optimize Regression Coefficient Matrix Decompose cost functions into N subfunctions constant N cost Fi i 1 Generalized Ridge Regression T T Fi ( yk X k i )T ( yk X k i ) Ti i M i k yk is a vector related with P and X (length L) Xk is a matrix related with P and X (size LxN) k=1, traditional ridge regression

NxK N NxK+N Complexity Analysis 1 0 0 7 0 5 0 5 0

6 8 0 2 0 3 0 3 0 1 2 4

0 6 0 0 N Compute coefficient matrix O( NxK+N Update Hessian Matrix takes O (k ( NK N )2 ) Step 1 is the computational bottleneck of entire algorithm 3 i R ) Basic Idea for Scalable Approach Utilize variable dependence relationship to optimize each variable (or a small number

of variables) independently, assuming other relationships are fixed Convert the problem to a Maximal Weight Independent Set (MWIS) problem Experiments: Synthetic Data Synthetic data generator Generate community-based graph as underlying temporal graphical model [Girvan and Newman 05] Assign random weights to graphical model and generate time series data using recursive matrix multiplication [Arnold et al. 07] Decomposition Accuracy Find a matching between clustering results and ground-truth clusters such that the number of intersected variables are maximal The number of intersected variables over total number of variables is decomposition accuracy Experiments: Synthetic Data (cont.) Applied algorithms Iterative optimization algorithm based on QuasiNewton method (newton) Iterative optimization algorithm based on MWIS method (mwis)

Benchmark 1: Pearson correlation test to generate temporal graphical model, and Ncut [Shi00] for clustering (Cor_Ncut) Benchmark 2: directed spectral clustering [Zhou05] on ground-truth temporal graphical model (Dcut) Experimental Results: Synthetic On average, newton is better than Cor_Ncut and Dcut by 27% and 32%, respectively On average, mwis is better than Cor_Ncut and Dcut by 24% and 29%, respectively Experimental Results: Synthetic mwis is better than Cor_Ncut by an average of 30% mwis is better than Dcut by an average of 52% Experiment: Real Data Data Annual GDP growth rate (downloaded from

http://www.ers.usda.gov/Data/Macroeconomics) 192 countries 4 Time periods 1969-1979 1980-1989 1990-1999 1998-2007 Hierarchically bipartition into 6 or 7 clusters Experimental Result: Real Data Summary We formulate a novel objective function for the decomposition problem in temporal graphical modeling. We introduce an iterative optimization approach utilizing Quasi-Newton method and generalized ridge regression.

We employ a maximum weight independent set based approach to speed up the Quasi-Newton method. The experimental results demonstrate the effective and efficiency of our approaches. Thank you

Recently Viewed Presentations

  • WWII: The Home Front

    WWII: The Home Front

    "Women, Back Them Up -To Bring Them Back!" ... Women served as nurses, stretcher bearers, drivers, machine operators, cooks and secretaries. They also . flew. Canadian built planes to bases in Britain and ferried . officers. and . politicians.
  • Neutralization of an acid or base.

    Neutralization of an acid or base.

    Strong and Weak acids and bases do NOT necessarily tell you how dangerous they are. Concentration is the most important factor for determining danger. Ammonia is a weak base, if it is highly concentrated it can burn you.
  • Science - Year 4 Living Things & their

    Science - Year 4 Living Things & their

    From big movements like a jump, to small movements like a wink. Plants turn, or grow, towards the light. All living things reproduce. Animals have babies. New plants grow from seeds. All living things are sensitive. ... PowerPoint Presentation Last...
  • SOCIOLOGY - What is it?

    SOCIOLOGY - What is it?

    An orderly world where all involved are fulfilling a role. The young people are preforming their role- that of a student & the school is performing its task of preparing students to be productive citizens in society
  • Man's Impact - ScienceGeek.net

    Man's Impact - ScienceGeek.net

    Invasive Species. Introduced species: any organism that was brought to an ecosystem as the result of human actions. Invasive species: A species that takes advantage of an unoccupied niche, or that successfully out-competes native species
  • Chapter 11: Indexing and Hashing - Manhattan College

    Chapter 11: Indexing and Hashing - Manhattan College

    Indexing Basic Concepts Indexing mechanisms used to speed up access to desired data. E.g., author catalog in library Search Key - attribute or set of attributes used to look up records in a file.
  • Oceanic Zones

    Oceanic Zones

    Oceanic Zones Several factors used to divide the ocean in to distinct life marine zones light availability distance from shore water depth Zones of Light Availability photic (light) zone: upper part of the ocean where light penetrates euphotic zone is...
  • Element Baby Book Project - PC\|MAC

    Element Baby Book Project - PC\|MAC

    Element Baby Book Project. Procedure and Example. Introduction. In this project you will adopt an element from the periodic table. As the proud new parent of your element, you will create a baby book to remember each stage of your...