Triangles in a Graph - Carnegie Mellon School of Computer Science

Triangles in a Graph - Carnegie Mellon School of Computer Science

Think before You Discard: Accurate Triangle Counting in Graph Streams with Deletions Kijung Shin, Jisu Kim, Bryan Hooi, Christos Faloutsos Introduction Problem Algorithm Experiments Conclusion Triangles in a Graph Graphs are everywhere! social networks, the web, citation networks Triangles are a fundamental primitive 3 nodes connected to each other Counting triangles has many applications community detection, anomaly detection, query optimization Accurate Triangle Counting in Graph Streams with Deletions (by Kijung Shin) 2/46 Introduction

Problem Algorithm Experiments Conclusion Application: Anomaly Detection [LJK18] # Incident Triangles # Incident Triangles [KMF11] Degree Telemarketer Degree Accurate Triangle Counting in Graph Streams with Deletions (by Kijung Shin) 3/46 Introduction Problem

Algorithm Experiments Conclusion Remaining Challenges Counting triangles in real-world graphs online social networks Web Citation networks Call networks Real-world graphs are Large: not fitting in main memory Fully dynamic: both growing and shrinking Accurate Triangle Counting in Graph Streams with Deletions (by Kijung Shin) 4/46 Introduction Problem Algorithm

Experiments Conclusion Previous Work - Given: a large and fully-dynamic graph - Goal: accurately estimate the count of triangles Large Graph Fully dynamic Graph Insertion Deletion Accurate MASCOT [LJK18] Triest-IMPR [DERU17] WRS [Shi17] ESD [HS17] Triest-FD [DERU17] ThinkD (Proposed) Accurate Triangle Counting in Graph Streams with Deletions (by Kijung Shin) 5/46

Introduction Problem Algorithm Experiments Conclusion Our Contributions We propose ThinkD (Think before You Discard) Fast and Accurate: outperforming competitors Scalable: linear data scalability Theoretically Sound: unbiased estimates Accurate Triangle Counting in Graph Streams with Deletions (by Kijung Shin) 6/46 Road Map Problem Definition Proposed Method: ThinkD Experiments Conclusions Accurate Triangle Counting in Graph Streams with Deletions (by Kijung Shin) 7/46

Introduction Problem Algorithm Experiments Conclusion Fully Dynamic Graph Stream Model for a large and fully-dynamic graph Discrete time , starting from 1 and ever increasing At each time , a change in the input graph arrives change: either an insertion or deletion of an edge Time Change (given) Graph (unmate rialized) Accurate Triangle Counting in Graph Streams with Deletions (by Kijung Shin) 8/46

Introduction Problem Algorithm Experiments Conclusion Fully Dynamic Graph Stream Model for a large and fully-dynamic graph Discrete time , starting from 1 and ever increasing At each time , a change in the input graph arrives change: either an insertion or deletion of an edge Time Change (given) Graph (unmate rialized)

Accurate Triangle Counting in Graph Streams with Deletions (by Kijung Shin) 9/46 Introduction Problem Algorithm Experiments Conclusion Fully Dynamic Graph Stream Model for a large and fully-dynamic graph Discrete time , starting from 1 and ever increasing At each time , a change in the input graph arrives change: either an insertion or deletion of an edge Time Change (given) Graph (unmate rialized)

Accurate Triangle Counting in Graph Streams with Deletions (by Kijung Shin) 10/46 Introduction Problem Algorithm Experiments Conclusion Fully Dynamic Graph Stream Model for a large and fully-dynamic graph Discrete time , starting from 1 and ever increasing At each time , a change in the input graph arrives change: either an insertion or deletion of an edge

Time Change (given) Graph (unmate rialized) Accurate Triangle Counting in Graph Streams with Deletions (by Kijung Shin) 11/46

Introduction Problem Algorithm Experiments Conclusion Fully Dynamic Graph Stream Model for a large and fully-dynamic graph Discrete time , starting from 1 and ever increasing At each time , a change in the input graph arrives change: either an insertion or deletion of an edge Time Change (given) Graph (unmate rialized)

Accurate Triangle Counting in Graph Streams with Deletions (by Kijung Shin) 12/46 Introduction

Problem Algorithm Experiments Conclusion Fully Dynamic Graph Stream Model for a large and fully-dynamic graph Discrete time , starting from 1 and ever increasing At each time , a change in the input graph arrives change: either an insertion or deletion of an edge Examples: Accurate Triangle Counting in Graph Streams with Deletions (by Kijung Shin) 13/46 Introduction Problem Algorithm Experiments Conclusion

Problem Definition Given: a fully-dynamic graph stream (possibly infinite) memory space (finite) Estimate: the counts of global and local triangles To Minimize: estimation error at each time Time Changes # Triangles Accurate Triangle Counting in Graph Streams with Deletions (by Kijung Shin) Given Estimate 14/46 Introduction Problem Algorithm Experiments

Conclusion Problem Definition (cont.) Global Triangles: all triangles in the graph Local triangles: the triangles incident to each node 3 2 1 3 1 2 4 3 2 1 Accurate Triangle Counting in Graph Streams with Deletions (by Kijung Shin) 15/46 Road Map Problem Definition Proposed Method: ThinkD << Experiments Conclusions Accurate Triangle Counting in Graph Streams with Deletions (by Kijung Shin) 16/46 Introduction Problem Algorithm Experiments Conclusion Overview of ThinkD Maintains and updates Number of (non-deleted) triangles that it has observed How it processes an insertion: arrive count test Yes store

No - arrive: an insertion of an edge arrives - count: count new triangles and increase - test: toss a coin - store: store the edge in memory Accurate Triangle Counting in Graph Streams with Deletions (by Kijung Shin) 17/46 Introduction Problem Algorithm Experiments Conclusion Overview of ThinkD (cont.) Maintains and updates Number of (non-deleted) triangles that it has observed How it processes a deletion: arrive count test

Yes delete No - arrive: a deletion of an edge arrives - count: count deleted triangles and decrease - test: test whether the edge is stored in memory - delete: delete the edge in memory Accurate Triangle Counting in Graph Streams with Deletions (by Kijung Shin) 18/46 Introduction Problem Algorithm Experiments Conclusion Comparison with Triest-FD ThinkD (Think before You Discard): every arrived change is used to update arrive count:

update Triest-FD [DERU17]: test Yes No (discard) store / delete some changes are discarded without being used to update arrive store / test Yes update delete No (discard) information loss! count: Accurate Triangle Counting in Graph Streams with Deletions (by Kijung Shin) 19/46 Introduction

Problem Algorithm Experiments Conclusion Two Versions of ThinkD Q1: How to test in the test step Q2: How to estimate the count of all triangles from ThinkD-FAST: simple and fast using independent Bernoulli trials ThinkD-ACC: accurate and parameter-free using random pairing [GLH08] Accurate Triangle Counting in Graph Streams with Deletions (by Kijung Shin) 20/46 Introduction ls i a t De Problem

Algorithm Experiments Conclusion ThinkD-FAST: Details : number of (non-deleted) triangles observed so far Toy example: - time : - count : - memory : (, ) ( , ) (, ) ( , ) Accurate Triangle Counting in Graph Streams with Deletions (by Kijung Shin) 21/46 Introduction ls i a t De Problem

Algorithm Experiments Conclusion ThinkD-FAST: Arrive Step A new change in the input graph arrives change: either insertion or deletion of an edge arrive count test - time : - change: - count : - memory : (, ) ( , ) Yes store / delete No

(, ) ( , ) Accurate Triangle Counting in Graph Streams with Deletions (by Kijung Shin) 22/46 Introduction ls i a t De Problem Algorithm Experiments Conclusion ThinkD-FAST: Count Step Count added or deleted triangles and update arrive count

test Yes store / delete No - time : - change: - count : + 2 - # added triangles: - memory : (, ) ( , ) (, ) ( , ) Accurate Triangle Counting in Graph Streams with Deletions (by Kijung Shin)

23/46 Introduction ls i a t De Problem Algorithm Experiments Conclusion ThinkD-FAST: Test Step Simulate a Bernoulli trial with probability is an input parameter arrive count test - time

: - change: - count : - memory : (, ) ( , ) Yes store / delete No (, ) ( , ) Accurate Triangle Counting in Graph Streams with Deletions (by Kijung Shin) 24/46 Introduction ls i a t De Problem Algorithm

Experiments Conclusion ThinkD-FAST: Store Step Store the new edge in memory arrive count test - time : - change: - count : - memory : (, ) ( , ) Yes store / delete No (, ) ( , ) (, )

Accurate Triangle Counting in Graph Streams with Deletions (by Kijung Shin) 25/46 Introduction ls i a t De Problem Algorithm Experiments Conclusion ThinkD-FAST: Arrive Step A new change in the input graph arrives change: either insertion or deletion of an edge arrive count - time

: - change: - count : - memory : (, ) ( , ) test Yes store / delete No (, ) ( , ) (, ) Accurate Triangle Counting in Graph Streams with Deletions (by Kijung Shin) 26/46 Introduction ls i a t De Problem

Algorithm Experiments Conclusion ThinkD-FAST: Count Step Count added or deleted triangles and update arrive count test Yes store / delete No - time : - change: - count : =1 - # deleted triangles: - memory : (, ) ( , )

(, ) ( , ) (, ) Accurate Triangle Counting in Graph Streams with Deletions (by Kijung Shin) 27/46 Introduction ls i a t De Problem Algorithm Analysis Experiments Conclusion

ThinkD-FAST: Test Step Test if the arrived edge is in memory arrive count - time : - change: - count : - memory : (, ) ( , ) test Yes store / delete No (, ) ( , ) (, ) Accurate Triangle Counting in Graph Streams with Deletions (by Kijung Shin) 28/46

Introduction ls i a t De Problem Algorithm Analysis Experiments Conclusion ThinkD-FAST: Store Step Remove the arrived edge from memory arrive count test No - time :

- change: - count : - memory : (, ) (, ) ( , ) (, ) Yes store / delete ( , ) Accurate Triangle Counting in Graph Streams with Deletions (by Kijung Shin) 29/46 Introduction Problem Algorithm Experiments Conclusion Unbiased Estimation : count of observed triangles

: estimated count of all triangles : true count of all triangles [ Theorem 1 ] At any time , Unbiased estimate of Proof and a variance of : see the paper Accurate Triangle Counting in Graph Streams with Deletions (by Kijung Shin) 30/46 Introduction Problem Algorithm Experiments Conclusion Disadvantages of ThinkD-FAST Parameter: setting the parameter is non-trivial small underutilize memory inaccurate estimation large out-of-memory error Accurate Triangle Counting in Graph Streams with Deletions (by Kijung Shin)

31/46 Introduction Problem Algorithm Experiments Conclusion Disadvantages of ThinkD-FAST (cont.) Information loss: it may discard inserted edges even when memory is not full arrive - time : - estimate : count test - change: Yes store / delete

No (, ) ( , ) (, ) ( , ) Accurate Triangle Counting in Graph Streams with Deletions (by Kijung Shin) 32/46 Introduction Problem Algorithm Experiments Conclusion ThinkD-ACC: Even Better! Random pairing [RLH08] instead of Bernoulli trials Advantages: no parameter less information loss: utilizes memory as fully as possible more accurate estimation still unbiased Disadvantages: complicated slower than ThinkD-FAST Accurate Triangle Counting in Graph Streams with Deletions (by Kijung Shin)

33/46 Introduction Problem Algorithm Experiments Conclusion Scalability of ThinkD Let be the size of memory For processing changes in the input stream, [ Theorem 2 ] ThinkD-ACC takes ( ) linear in data size [ Theorem 3 ] ThinkD-FAST with takes ( ) Accurate Triangle Counting in Graph Streams with Deletions (by Kijung Shin) 34/46 Introduction

Problem Algorithm Experiments Conclusion Advantages of ThinkD Fast and Accurate: outperforming competitors Scalable: linear data scalability (Theorems 2 & 3) Theoretically Sound: unbiased estimates (Theorem 1) Accurate Triangle Counting in Graph Streams with Deletions (by Kijung Shin) 35/46 Road Map Problem Definition Proposed Method: ThinkD Experiments << Conclusions Accurate Triangle Counting in Graph Streams with Deletions (by Kijung Shin) 36/46 Introduction Problem Algorithm Experiments Conclusion Experimental Settings Competitors: Triest-FD [DERU17] & ESD [HS17] state-of-the-art algorithms for triangle counting in fully-dynamic graph streams Implementations: Datasets: insertions (edges in graphs) + deletions (random 20%) ER Synthetic (100B edges) Social Networks (1.8B+ edges, ) Citation Web Trust (16M+) (6M+) (0.7M+) Accurate Triangle Counting in Graph Streams with Deletions (by Kijung Shin)

37/46 Introduction Problem Algorithm Experiments Conclusion EXP1. Bias Analysis [THM1] Does ThinkD give unbiased estimates? True Count ThinkD-ACC ThinkD-FAST Triest-FD - memory budget of the edges - dataset: - #repeats: 10,000 Accurate Triangle Counting in Graph Streams with Deletions (by Kijung Shin) 38/46 Introduction

Problem Algorithm Experiments Conclusion EXP2. Variance Analysis Does ThinkD maintain estimates with small variance? Triest-FD ThinkD-FAST ThinkD-ACC True Count Number of Processed Changes - memory budget of the edges - dataset: - #repeats: 10,000 Accurate Triangle Counting in Graph Streams with Deletions (by Kijung Shin) 39/46 Introduction Problem

Algorithm Experiments Conclusion EXP3. Scalability [THM 2 & 3] Does ThinkD scale linearly with the size of the input stream? (THM 2 & 3) ThinkD-ACC ThinkD-FAST Linear scalability (slope=1) Number of Changes - dataset: ER - memory budget fixed to (i.e., = size / ) Accurate Triangle Counting in Graph Streams with Deletions (by Kijung Shin) 40/46 Introduction Problem Algorithm

Experiments Conclusion Advantages of ThinkD Fast and Accurate: outperforming competitors Scalable: linear data scalability (Theorems 2 & 3) Theoretically Sound: unbiased estimates (Theorem 1) Accurate Triangle Counting in Graph Streams with Deletions (by Kijung Shin) 41/46 Introduction Problem Algorithm Experiments Conclusion Estimation Error (ratio) EXP4. Space & Accuracy Is ThinkD more accurate than its best competitors? Triest-FD ThinkD-FAST

ESD ThinkD -ACC Memory budget (ratio) - dataset: - # repeats: 1000 Accurate Triangle Counting in Graph Streams with Deletions (by Kijung Shin) 42/46 Introduction Problem Algorithm Experiments Conclusion Estimation Error (ratio) EXP5. Speed & Accuracy Is ThinkD faster than its best competitors? ThinkD-ACC ESD Triest-FD

ThinkD -FAST Running time (Sec) - dataset: - # repeats: 1000 Accurate Triangle Counting in Graph Streams with Deletions (by Kijung Shin) 43/46 Introduction Problem Algorithm Experiments Conclusion Advantages of ThinkD Fast and Accurate: outperforming competitors Scalable: linear data scalability Theoretically Sound: unbiased estimates Accurate Triangle Counting in Graph Streams with Deletions (by Kijung Shin)

44/46 Road Map Problem Definition Proposed Method: ThinkD Experiments Conclusions << Accurate Triangle Counting in Graph Streams with Deletions (by Kijung Shin) 45/46 Introduction Problem Algorithm Experiments Conclusion Conclusions We propose ThinkD accurately estimates the count of triangles in large and fully-dynamic graphs Fast and Accurate: outperforming competitors Scalable: linear data scalability Theoretically Sound: unbiased estimates Download

ThinkD Accurate Triangle Counting in Graph Streams with Deletions (by Kijung Shin) 46/46 Introduction Problem Algorithm Experiments Conclusion References [RLH08] Rainer Gemulla et al., Maintaining bounded-size sample synopses of evolving datasets. The VLDB Journal 2008 [KMF11] U Kang et al., Spectral analysis for billion-scale graphs: Discoveries and implementation. PAKDD 2011 [DERU17] Lorenzo De Stefani et al., TRIST: Counting Local and Global Triangles in Fully-Dynamic Streams with Fixed Memory Size. TKDD 2017 [Shi17] Kijung Shin, WRS: Waiting Room Sampling for Accurate Triangle Counting in Real Graph Streams, ICDM 2017 [HS17] Han, Guyue, and Harish Sethu, "Edge sample and discard: A new algorithm for counting triangles in large dynamic graphs." ASONAM 2017 [LJK18] Yongsub Lim et al., Memory-efficient and Accurate Sampling for Counting Local Triangles in Graph Streams: From Simple to Multigraphs, TKDD 2018

Accurate Triangle Counting in Graph Streams with Deletions (by Kijung Shin) 47/46 Backup Slides Accurate Triangle Counting in Graph Streams with Deletions (by Kijung Shin) 48/46 Introduction Problem Algorithm Experiments Conclusion Proof of Unbiasedness Proof Sketch: Prob. that each added triangle is observed Prob. that each removed triangle is observed Thus, Accurate Triangle Counting in Graph Streams with Deletions (by Kijung Shin) 49/46

Recently Viewed Presentations

  • Circles and Parabolas - cpb-us-e1.wpmucdn.com

    Circles and Parabolas - cpb-us-e1.wpmucdn.com

    Parabolas. DEFINITION. A parabola is the locus of points such that every point is equidistant from a fixed point (called the focus) and a fixed line (called the directrix).
  • OVERVIEW OF SOLAR PV SYSTEMS FIREFIGHTER SAFETY &

    OVERVIEW OF SOLAR PV SYSTEMS FIREFIGHTER SAFETY &

    Purpose. The Purpose of this presentation is to provide awareness to the Fire Service Community as to how to safely operate around SOLAR PV Systems encountered on the fireground and to provide info on post fire investigation considerations.
  • One-Way Analysis of Covariance - Kent State University

    One-Way Analysis of Covariance - Kent State University

    One-Way Analysis of Covariance One-Way ANCOVA ANCOVA Allows you to compare mean differences in 1 or more groups with 2+ levels (just like a regular ANOVA), while removing variance from a 3rd variable What does this mean?
  • PEAKS: De Novo Sequencing using MS/MS spectra Bin

    PEAKS: De Novo Sequencing using MS/MS spectra Bin

    Condition one says the y-ion bQ is in between the b-ions R and Ra. Condition two is the other way around, the b-ion Ra is in between the y-ions Q and bQ. If you look at this figure, LGE, LVR...
  • Cincinnati United Premier

    Cincinnati United Premier

    Cincinnati United Sycamore/Mason U8-U18 Boys and Girls "Bringing the Best Together!" #1 Club in Cincinnati for Player Development Mission of CU The goal of Cincinnati United Soccer Club is to provide higher level of training and competition in community based...
  • Introduction to Structured Query Language (SQL)

    Introduction to Structured Query Language (SQL)

    SWEN 250 - Personal Software Engineering . Data. Difference between information and data? Data = raw facts. Information = processed data that is presented in a useful form. The . success of a software system. is usually judged by the...
  • Sellinger Applied Portfolio Report

    Sellinger Applied Portfolio Report

    Gilead Sciences, Inc is a biopharmaceutical company that discovers, develops, and commercializes medicines. They specialize in areas concerning HIV/AIDS, liver diseases (hepatitis B &C) and respiratory diseases. Gilead Sciences, Inc. has collaborations with BMS, GSK, Janssen, and Japan Tobacco to...
  • Spark vs R Performance Tradeoff - Computer Science at ...

    Spark vs R Performance Tradeoff - Computer Science at ...

    Possible Roles in IT. Program Office. Six Sigma. IT Security. Business Solutions (ERP) SAP ERP. Application Dev. Data Sciences. Infrastructure Services. As we look at the 'backend' of IT you think of servers that run applications, the disk space needed...