Self-Splitting Tree Search in a Parallel Environment Matteo Fischetti, Michele Monaci, Domenico Salvagnin University of Padova IFORS 2014, Barcelona 1 Parallel computation Modern PCs / notebooks have several processing units (cores) available

Running a sequential code on 8 cores only uses 12% of the available power whereas one would of course aim at using 100% of it IFORS 2014, Barcelona 2 Distributed computation Affordable servers offer 24+ quadcore units (blades)

Grids of 1000+ computers are available worldwide No doubt that parallel computing is becoming a must for CPU intensive applications, including optimization IFORS 2014, Barcelona 3 Parallelization of a sequential code We are given a deterministic sequential source code based on a divide-and-conquer algorithm (e.g., tree search)

We want to slightly modify it to exploit a given set of K (say) processors called workers IDEA: just run K times the same sequential code on the K workers but modify the source code so as to just skip some nodes (that will be processed instead by one of the other workers) Workload automatically splits itself among the workers IFORS 2014, Barcelona 4

The basic idea Assume you have K workers available and a sequential tree-search code In the source code, locate the place where tree nodes are popped-out from the node queue Add the following statements to your sequential code: When a sufficient n. of nodes has been generated, just kill some on an additional integer input parameter k nodes according to a rule that depends

Run the resulting sequential code on the K workers, with input k=1,2,,K Nave Rule: kill nodes with a certain probability (using k as random seed) heuristic as a same node can be killed by all K workers SelfSplit: use a rule that guarantees a node be killed in all but one of the K runs LLfatto IFORS 2014, Barcelona 5 Vanilla

implementation IFORS 2014, Barcelona 7 Paused-node implementation IFORS 2014, Barcelona 8 Related approaches The idea of parallelizing without communication is not new Laursen, Per S. 1994. Can parallel branch and bound without communication be effective? SIAM Journal on Optimization 4(2) 288-296.

but is was apparently ignored by the Mathematical Programming community Recent work for Constraint Programming (CP) Regin, Jean-Charles, Mohamed Rezgui, Arnaud Malapert. 2013. Embarrassingly parallel search. Christian Schulte, ed., Principles and Practice of Constraint Programming, Lecture Notes in Computer Science, vol. 8124. Springer Berlin Heidelberg, 596-610. Moisan, Thierry, Jonathan Gaudreault, Claude-Guy Quimper. 2013. Parallel discrepancy-based search. Christian Schulte, ed., Principles and Practice of Constraint Programming, Lecture Notes in Computer Science, vol. 8124. Springer Berlin Heidelberg, 30-46. IFORS 2014, Barcelona 10 Our hashtags

SelfSplit is #easy to implement SelfSplit can be the #firstoption to try SelfSplit can in fact be the #onlyoption when complicated (industrial) codes need to be parallelized #justforget to modifying the sources heavily

SelfSplit can be rather effective indeed #itworks IFORS 2014, Barcelona 11 SelfSplit for CP #itworks IFORS 2014, Barcelona 12 Pure B&B codes #stillworkswell Sequential code to parallelize: an old FORTRAN code of 3000+ lines from M. Fischetti, P. Toth, An Additive Bounding Procedure for the Asymmetric Travelling Salesman Problem, Mathematical

Programming A 53, 173-197, 1992. Parametrized AP relaxation (no LP) Branching on subtours Best-bound first Vanilla SelfSplit: just 8 new lines added to the sequential original code IFORS 2014, Barcelona 13 B&Cut codes #fair Sequential code to parallelize: B&C FORTRAN code (10K lines) from M. Fischetti, P. Toth, A Polyhedral Approach to the Asymmetric Traveling Salesman Problem Management Science 43, 11, 1520-1536, 1997.

M. Fischetti, A. Lodi, P. Toth, Exact Methods for the Asymmetric Traveling Salesman Problem, in The Traveling Salesman Problem and its Variations, G. Gutin and A. Punnen ed.s, Kluwer, 169-206, 2002. Main Features LP solver: CPLEX 12.5.1 Cuts: SEC, SD, DK, RANK (and pool) separated along the tree Dynamic (Lagrangian) pricing of var.s Variable fixing, primal heuristics, etc. IFORS 2014, Barcelona

14 MIP application (CPLEX) #notbad IFORS 2014, Barcelona 17 Why speedups change so much? Empirical rule: the more sophisticated the code, the smaller the speedup #curseofbeingtoosmart Typically explained by the fact that the solver learns during the run important information (cuts, conflicts, etc.) that cannot be shared by the workers in a no-communication framework

However SelfSplit learns a lot during its sampling phase: is loss of communication the only issue? We believe that performance variability plays a role here Sophisticated tree-search codes behave like chaotic systems (marginal changes modify the search path and may heavily affect performance) Maybe simpler B&B codes preferable when #millioncores will be available? IFORS 2014, Barcelona 18

Role of variability in workload split Synthetic experiments with 10, 100, 1000 random subtrees per worker (subtree size as a random variable) unif = uniform prt = Pareto heavy t. IFORS 2014, Barcelona 19

Thank you for your attention SelfSplit paper available at www.dei.unipd.it/~fisch/papers Slides (also of this talk) available at www.dei.unipd.it/~fisch/papers/slides IFORS 2014, Barcelona 20