Think before You Discard: Accurate Triangle Counting in Graph Streams with Deletions Kijung Shin, Jisu Kim, Bryan Hooi, Christos Faloutsos Introduction Problem Algorithm Experiments Conclusion Triangles in a Graph Graphs are everywhere! social networks, the web, citation networks Triangles are a fundamental primitive 3 nodes connected to each other Counting triangles has many applications community detection, anomaly detection, query optimization Accurate Triangle Counting in Graph Streams with Deletions (by Kijung Shin) 2/46 Introduction

Problem Algorithm Experiments Conclusion Application: Anomaly Detection [LJK18] # Incident Triangles # Incident Triangles [KMF11] Degree Telemarketer Degree Accurate Triangle Counting in Graph Streams with Deletions (by Kijung Shin) 3/46 Introduction Problem

Algorithm Experiments Conclusion Remaining Challenges Counting triangles in real-world graphs online social networks Web Citation networks Call networks Real-world graphs are Large: not fitting in main memory Fully dynamic: both growing and shrinking Accurate Triangle Counting in Graph Streams with Deletions (by Kijung Shin) 4/46 Introduction Problem Algorithm

Experiments Conclusion Previous Work - Given: a large and fully-dynamic graph - Goal: accurately estimate the count of triangles Large Graph Fully dynamic Graph Insertion Deletion Accurate MASCOT [LJK18] Triest-IMPR [DERU17] WRS [Shi17] ESD [HS17] Triest-FD [DERU17] ThinkD (Proposed) Accurate Triangle Counting in Graph Streams with Deletions (by Kijung Shin) 5/46

Introduction Problem Algorithm Experiments Conclusion Our Contributions We propose ThinkD (Think before You Discard) Fast and Accurate: outperforming competitors Scalable: linear data scalability Theoretically Sound: unbiased estimates Accurate Triangle Counting in Graph Streams with Deletions (by Kijung Shin) 6/46 Road Map Problem Definition Proposed Method: ThinkD Experiments Conclusions Accurate Triangle Counting in Graph Streams with Deletions (by Kijung Shin) 7/46

Introduction Problem Algorithm Experiments Conclusion Fully Dynamic Graph Stream Model for a large and fully-dynamic graph Discrete time , starting from 1 and ever increasing At each time , a change in the input graph arrives change: either an insertion or deletion of an edge Time Change (given) Graph (unmate rialized) Accurate Triangle Counting in Graph Streams with Deletions (by Kijung Shin) 8/46

Introduction Problem Algorithm Experiments Conclusion Fully Dynamic Graph Stream Model for a large and fully-dynamic graph Discrete time , starting from 1 and ever increasing At each time , a change in the input graph arrives change: either an insertion or deletion of an edge Time Change (given) Graph (unmate rialized)

Accurate Triangle Counting in Graph Streams with Deletions (by Kijung Shin) 9/46 Introduction Problem Algorithm Experiments Conclusion Fully Dynamic Graph Stream Model for a large and fully-dynamic graph Discrete time , starting from 1 and ever increasing At each time , a change in the input graph arrives change: either an insertion or deletion of an edge Time Change (given) Graph (unmate rialized)

Accurate Triangle Counting in Graph Streams with Deletions (by Kijung Shin) 10/46 Introduction Problem Algorithm Experiments Conclusion Fully Dynamic Graph Stream Model for a large and fully-dynamic graph Discrete time , starting from 1 and ever increasing At each time , a change in the input graph arrives change: either an insertion or deletion of an edge

Time Change (given) Graph (unmate rialized) Accurate Triangle Counting in Graph Streams with Deletions (by Kijung Shin) 11/46

Introduction Problem Algorithm Experiments Conclusion Fully Dynamic Graph Stream Model for a large and fully-dynamic graph Discrete time , starting from 1 and ever increasing At each time , a change in the input graph arrives change: either an insertion or deletion of an edge Time Change (given) Graph (unmate rialized)

Accurate Triangle Counting in Graph Streams with Deletions (by Kijung Shin) 12/46 Introduction

Problem Algorithm Experiments Conclusion Fully Dynamic Graph Stream Model for a large and fully-dynamic graph Discrete time , starting from 1 and ever increasing At each time , a change in the input graph arrives change: either an insertion or deletion of an edge Examples: Accurate Triangle Counting in Graph Streams with Deletions (by Kijung Shin) 13/46 Introduction Problem Algorithm Experiments Conclusion

Problem Definition Given: a fully-dynamic graph stream (possibly infinite) memory space (finite) Estimate: the counts of global and local triangles To Minimize: estimation error at each time Time Changes # Triangles Accurate Triangle Counting in Graph Streams with Deletions (by Kijung Shin) Given Estimate 14/46 Introduction Problem Algorithm Experiments

Conclusion Problem Definition (cont.) Global Triangles: all triangles in the graph Local triangles: the triangles incident to each node 3 2 1 3 1 2 4 3 2 1 Accurate Triangle Counting in Graph Streams with Deletions (by Kijung Shin) 15/46 Road Map Problem Definition Proposed Method: ThinkD <<
Experiments Conclusions Accurate Triangle Counting in Graph Streams with Deletions (by Kijung Shin) 16/46 Introduction Problem Algorithm Experiments Conclusion Overview of ThinkD Maintains and updates Number of (non-deleted) triangles that it has observed How it processes an insertion: arrive count test Yes store

No - arrive: an insertion of an edge arrives - count: count new triangles and increase - test: toss a coin - store: store the edge in memory Accurate Triangle Counting in Graph Streams with Deletions (by Kijung Shin) 17/46 Introduction Problem Algorithm Experiments Conclusion Overview of ThinkD (cont.) Maintains and updates Number of (non-deleted) triangles that it has observed How it processes a deletion: arrive count test

Yes delete No - arrive: a deletion of an edge arrives - count: count deleted triangles and decrease - test: test whether the edge is stored in memory - delete: delete the edge in memory Accurate Triangle Counting in Graph Streams with Deletions (by Kijung Shin) 18/46 Introduction Problem Algorithm Experiments Conclusion Comparison with Triest-FD ThinkD (Think before You Discard): every arrived change is used to update arrive count:

update Triest-FD [DERU17]: test Yes No (discard) store / delete some changes are discarded without being used to update arrive store / test Yes update delete No (discard) information loss! count: Accurate Triangle Counting in Graph Streams with Deletions (by Kijung Shin) 19/46 Introduction

Problem Algorithm Experiments Conclusion Two Versions of ThinkD Q1: How to test in the test step Q2: How to estimate the count of all triangles from ThinkD-FAST: simple and fast using independent Bernoulli trials ThinkD-ACC: accurate and parameter-free using random pairing [GLH08] Accurate Triangle Counting in Graph Streams with Deletions (by Kijung Shin) 20/46 Introduction ls i a t De Problem

Algorithm Experiments Conclusion ThinkD-FAST: Details : number of (non-deleted) triangles observed so far Toy example: - time : - count : - memory : (, ) ( , ) (, ) ( , ) Accurate Triangle Counting in Graph Streams with Deletions (by Kijung Shin) 21/46 Introduction ls i a t De Problem

Algorithm Experiments Conclusion ThinkD-FAST: Arrive Step A new change in the input graph arrives change: either insertion or deletion of an edge arrive count test - time : - change: - count : - memory : (, ) ( , ) Yes store / delete No

(, ) ( , ) Accurate Triangle Counting in Graph Streams with Deletions (by Kijung Shin) 22/46 Introduction ls i a t De Problem Algorithm Experiments Conclusion ThinkD-FAST: Count Step Count added or deleted triangles and update arrive count

test Yes store / delete No - time : - change: - count : + 2 - # added triangles: - memory : (, ) ( , ) (, ) ( , ) Accurate Triangle Counting in Graph Streams with Deletions (by Kijung Shin)

23/46 Introduction ls i a t De Problem Algorithm Experiments Conclusion ThinkD-FAST: Test Step Simulate a Bernoulli trial with probability is an input parameter arrive count test - time

: - change: - count : - memory : (, ) ( , ) Yes store / delete No (, ) ( , ) Accurate Triangle Counting in Graph Streams with Deletions (by Kijung Shin) 24/46 Introduction ls i a t De Problem Algorithm

Experiments Conclusion ThinkD-FAST: Store Step Store the new edge in memory arrive count test - time : - change: - count : - memory : (, ) ( , ) Yes store / delete No (, ) ( , ) (, )

Accurate Triangle Counting in Graph Streams with Deletions (by Kijung Shin) 25/46 Introduction ls i a t De Problem Algorithm Experiments Conclusion ThinkD-FAST: Arrive Step A new change in the input graph arrives change: either insertion or deletion of an edge arrive count - time

: - change: - count : - memory : (, ) ( , ) test Yes store / delete No (, ) ( , ) (, ) Accurate Triangle Counting in Graph Streams with Deletions (by Kijung Shin) 26/46 Introduction ls i a t De Problem

Algorithm Experiments Conclusion ThinkD-FAST: Count Step Count added or deleted triangles and update arrive count test Yes store / delete No - time : - change: - count : =1 - # deleted triangles: - memory : (, ) ( , )

(, ) ( , ) (, ) Accurate Triangle Counting in Graph Streams with Deletions (by Kijung Shin) 27/46 Introduction ls i a t De Problem Algorithm Analysis Experiments Conclusion

ThinkD-FAST: Test Step Test if the arrived edge is in memory arrive count - time : - change: - count : - memory : (, ) ( , ) test Yes store / delete No (, ) ( , ) (, ) Accurate Triangle Counting in Graph Streams with Deletions (by Kijung Shin) 28/46

Introduction ls i a t De Problem Algorithm Analysis Experiments Conclusion ThinkD-FAST: Store Step Remove the arrived edge from memory arrive count test No - time :

- change: - count : - memory : (, ) (, ) ( , ) (, ) Yes store / delete ( , ) Accurate Triangle Counting in Graph Streams with Deletions (by Kijung Shin) 29/46 Introduction Problem Algorithm Experiments Conclusion Unbiased Estimation : count of observed triangles

: estimated count of all triangles : true count of all triangles [ Theorem 1 ] At any time , Unbiased estimate of Proof and a variance of : see the paper Accurate Triangle Counting in Graph Streams with Deletions (by Kijung Shin) 30/46 Introduction Problem Algorithm Experiments Conclusion Disadvantages of ThinkD-FAST Parameter: setting the parameter is non-trivial small underutilize memory inaccurate estimation large out-of-memory error Accurate Triangle Counting in Graph Streams with Deletions (by Kijung Shin)

31/46 Introduction Problem Algorithm Experiments Conclusion Disadvantages of ThinkD-FAST (cont.) Information loss: it may discard inserted edges even when memory is not full arrive - time : - estimate : count test - change: Yes store / delete

No (, ) ( , ) (, ) ( , ) Accurate Triangle Counting in Graph Streams with Deletions (by Kijung Shin) 32/46 Introduction Problem Algorithm Experiments Conclusion ThinkD-ACC: Even Better! Random pairing [RLH08] instead of Bernoulli trials Advantages: no parameter less information loss: utilizes memory as fully as possible more accurate estimation still unbiased Disadvantages: complicated slower than ThinkD-FAST Accurate Triangle Counting in Graph Streams with Deletions (by Kijung Shin)

33/46 Introduction Problem Algorithm Experiments Conclusion Scalability of ThinkD Let be the size of memory For processing changes in the input stream, [ Theorem 2 ] ThinkD-ACC takes ( ) linear in data size [ Theorem 3 ] ThinkD-FAST with takes ( ) Accurate Triangle Counting in Graph Streams with Deletions (by Kijung Shin) 34/46 Introduction

Problem Algorithm Experiments Conclusion Advantages of ThinkD Fast and Accurate: outperforming competitors Scalable: linear data scalability (Theorems 2 & 3) Theoretically Sound: unbiased estimates (Theorem 1) Accurate Triangle Counting in Graph Streams with Deletions (by Kijung Shin) 35/46 Road Map Problem Definition Proposed Method: ThinkD Experiments << Conclusions Accurate Triangle Counting in Graph Streams with Deletions (by Kijung Shin) 36/46 Introduction
Problem Algorithm Experiments Conclusion Experimental Settings Competitors: Triest-FD [DERU17] & ESD [HS17] state-of-the-art algorithms for triangle counting in fully-dynamic graph streams Implementations: Datasets: insertions (edges in graphs) + deletions (random 20%) ER Synthetic (100B edges) Social Networks (1.8B+ edges, ) Citation Web Trust (16M+) (6M+) (0.7M+) Accurate Triangle Counting in Graph Streams with Deletions (by Kijung Shin)

37/46 Introduction Problem Algorithm Experiments Conclusion EXP1. Bias Analysis [THM1] Does ThinkD give unbiased estimates? True Count ThinkD-ACC ThinkD-FAST Triest-FD - memory budget of the edges - dataset: - #repeats: 10,000 Accurate Triangle Counting in Graph Streams with Deletions (by Kijung Shin) 38/46 Introduction

Problem Algorithm Experiments Conclusion EXP2. Variance Analysis Does ThinkD maintain estimates with small variance? Triest-FD ThinkD-FAST ThinkD-ACC True Count Number of Processed Changes - memory budget of the edges - dataset: - #repeats: 10,000 Accurate Triangle Counting in Graph Streams with Deletions (by Kijung Shin) 39/46 Introduction Problem

Algorithm Experiments Conclusion EXP3. Scalability [THM 2 & 3] Does ThinkD scale linearly with the size of the input stream? (THM 2 & 3) ThinkD-ACC ThinkD-FAST Linear scalability (slope=1) Number of Changes - dataset: ER - memory budget fixed to (i.e., = size / ) Accurate Triangle Counting in Graph Streams with Deletions (by Kijung Shin) 40/46 Introduction Problem Algorithm

Experiments Conclusion Advantages of ThinkD Fast and Accurate: outperforming competitors Scalable: linear data scalability (Theorems 2 & 3) Theoretically Sound: unbiased estimates (Theorem 1) Accurate Triangle Counting in Graph Streams with Deletions (by Kijung Shin) 41/46 Introduction Problem Algorithm Experiments Conclusion Estimation Error (ratio) EXP4. Space & Accuracy Is ThinkD more accurate than its best competitors? Triest-FD ThinkD-FAST

ESD ThinkD -ACC Memory budget (ratio) - dataset: - # repeats: 1000 Accurate Triangle Counting in Graph Streams with Deletions (by Kijung Shin) 42/46 Introduction Problem Algorithm Experiments Conclusion Estimation Error (ratio) EXP5. Speed & Accuracy Is ThinkD faster than its best competitors? ThinkD-ACC ESD Triest-FD

ThinkD -FAST Running time (Sec) - dataset: - # repeats: 1000 Accurate Triangle Counting in Graph Streams with Deletions (by Kijung Shin) 43/46 Introduction Problem Algorithm Experiments Conclusion Advantages of ThinkD Fast and Accurate: outperforming competitors Scalable: linear data scalability Theoretically Sound: unbiased estimates Accurate Triangle Counting in Graph Streams with Deletions (by Kijung Shin)

44/46 Road Map Problem Definition Proposed Method: ThinkD Experiments Conclusions << Accurate Triangle Counting in Graph Streams with Deletions (by Kijung Shin) 45/46 Introduction Problem Algorithm Experiments Conclusion Conclusions We propose ThinkD accurately estimates the count of triangles in large and fully-dynamic graphs Fast and Accurate: outperforming competitors Scalable: linear data scalability Theoretically Sound: unbiased estimates Download

ThinkD Accurate Triangle Counting in Graph Streams with Deletions (by Kijung Shin) 46/46 Introduction Problem Algorithm Experiments Conclusion References [RLH08] Rainer Gemulla et al., Maintaining bounded-size sample synopses of evolving datasets. The VLDB Journal 2008 [KMF11] U Kang et al., Spectral analysis for billion-scale graphs: Discoveries and implementation. PAKDD 2011 [DERU17] Lorenzo De Stefani et al., TRIST: Counting Local and Global Triangles in Fully-Dynamic Streams with Fixed Memory Size. TKDD 2017 [Shi17] Kijung Shin, WRS: Waiting Room Sampling for Accurate Triangle Counting in Real Graph Streams, ICDM 2017 [HS17] Han, Guyue, and Harish Sethu, "Edge sample and discard: A new algorithm for counting triangles in large dynamic graphs." ASONAM 2017 [LJK18] Yongsub Lim et al., Memory-efficient and Accurate Sampling for Counting Local Triangles in Graph Streams: From Simple to Multigraphs, TKDD 2018

Accurate Triangle Counting in Graph Streams with Deletions (by Kijung Shin) 47/46 Backup Slides Accurate Triangle Counting in Graph Streams with Deletions (by Kijung Shin) 48/46 Introduction Problem Algorithm Experiments Conclusion Proof of Unbiasedness Proof Sketch: Prob. that each added triangle is observed Prob. that each removed triangle is observed Thus, Accurate Triangle Counting in Graph Streams with Deletions (by Kijung Shin) 49/46