Sum of squares optimization: scalability improvements and applications to difference of convex programming. Georgina Hall Princeton, ORFE Joint work with Amir Ali Ahmadi Princeton, ORFE 1 Nonnegative polynomials A polynomial is nonnegative if

( )= 4 5 2 +10 Is this polynomial nonnegative? 2 Optimizing over nonnegative polynomials (1/3) Interested in more than checking nonnegativity of a given polynomial Problems of the type: Linear objective and affine constraints in the coefficients of (e.g., sum of coefs =1 Decision variables

are the coefficients of the polynomial Nonnegativity condition Why would we be interested in problems of this type? 3 Optimizing over nonnegative polynomials (2/3) Optimization: polynomial optimization Optimal power flow problem

Combinatorial optimization problems Economics and game theory Sensor network 4 localization Optimizing over nonnegative polynomials (3/3) Controls: Automated search for Lyapunov

functions for dynamical systems Software verification Statistics: Convex regression 5 Imposing nonnegativity (1/3) Is this polynomial nonnegative? NP-hard to decide for degree What if can be written as a sum of squares (sos)?

6 Imposing nonnegativity (2/3) A polynomial of degree 2d is sos if and only if such that where is the vector of monomials of degree up to Example: 7 Imposing nonnegativity (3/3) Initial optimization problem: Sum of squares relaxation: Intractable

Equivalent semidefinite programming formulation: But: Size of 8 This talk Recent efforts to make sos more scalable by avoiding SDP Using sum of squares to optimize over convex functions 9 Alternatives to sum of squares: dsos and sdsos Sum of squares (sos)

SDP DD cone PSD cone SDD cone diagonalwiths.t. Diagonally dominant sum of squares (dsos) (dd) LP

Scaled diagonally dominant sum of squares (sdsos) (sdd) SOCP Ahmadi, Majumdar 10 Alternatives to sum of squares: dsos and sdsos Initial optimization problem: scalability Intractable

b Example: For a parametric family of polynomials: a 11 Improvements on dsos and sdsos Replacing sos polynomials by dsos/sdsos polynomials: +: fast bounds - : not always as good quality (compared to sos)

Iteratively construct a sequence of improving LP/SOCPs Initialization: Start with the dsos/sdsos polynomials Method: Cholesky change of basis 12 Cholesky change of basis (1/3) dd in the right basis psd but not dd Goal: iteratively improve on basis 13

Cholesky change of basis (2/3) Sos problem Initialize Step 2 Step 1 New basis Replace: +

One iteration of this method on a parametric family of polynomials: 14 Cholesky change of basis (3/3) Example: minimizing a degree-4 polynomial in 4 variables Lower bound on optimal value Theorem: Under mild assumptions, this algorithm converges, i.e., the optimal value/ solution of the sequence of LPs/SOCPs converges

to the optimal value/solution of the SDP. 15 This talk Recent efforts to make sos more scalable by avoiding SDP Using sum of squares to optimize over convex functions 16 Link between nonnegativity and convexity Optimizing over nonnegative polynomials Relax

Nonnegative polynomial in and Link with convexity convex ( ) 0, Relax Sos-convexity (SDP) sos

17 Application 1: 3D geometry problems 18 3D point cloud containment (1/3) Goal: contain a set of points with convex set of minimum volume. Applications: virtual and augmented reality, robotics, computer graphics. Idea: parametrize the convex set as sublevel set of a convex polynomial Sos-convex formulation sos-convex

[In collaboration with Vikas Sindhwani, Ameesh Makadia, Google, NYC] 19 3D point cloud containment (2/3) Euclidean distance between sets can be computed exactly using SDP. s.t. where sos-convex Polynomial optimization problem where objective and constraints are sos-convex Solution can be

computed exactly via SDP using first level of Lasserres hierarchy 20 3D point cloud containment (3/3) Controlling convexity with a parameter : sos-convex When we get our previous problem with convex sets. As , the shape can get less and less convex. 21

Application 2: Difference of convex programming [INFORMS Computing Society Best Student Paper Prize 2016] 22 Difference of Convex (DC) programming Problems of the form where , convex. Applications: Machine Learning (Sparse PCA, Kernel selection, feature selection in SVM) Hiriart-Urruty, 1985

Tuy, 1995 23 Difference of Convex (dc) decomposition Difference of convex (dc) decomposition: given a polynomial , find and such that where convex polynomials. Questions: Does such a decomposition always exist? Can I obtain such a decomposition efficiently? Is this decomposition unique? 24

Existence of dc decomposition (1/3) Theorem: Any polynomial can be written as the difference of two sos-convex polynomials. Corollary: Any polynomial can be written as the difference of two convex polynomials. 25 Existence of dc decomposition (2/3) Lemma: Let be a full dimensional cone in a vector space Thenany can be written as with. Proof sketch:

: K E such that 1

= 1 1 1 2 26 Existence of dc decomposition (3/3) Here, {polynomials of degree 2d, in n variables} sos-convex polynomials of degree 2d and in n variables Remains to show that is full dimensional: can be shown to be in the interior of

Also shows that a decomposition can be obtained efficiently: solving sos-convex is an SDP. In fact, we show that a decomposition can be found via LP and SOCP (not covered here). 27 Uniqueness of dc decomposition Dc decomposition: given a polynomial , find convex polynomials and such that

Questions: Does such a decomposition always exist? Yes Can I obtain such a decomposition efficiently? Through sos-convexity Is this decomposition unique? Initial decomposition Alternative decompositions

x convex Best decomposition? 28 Convex-Concave Procedure (CCP) Heuristic for minimizing DC programming problems. Idea: Input x initial point Convexify by linearizing

Solve convex subproblem x Take to be the solution of convex convex ( ) affine +1

() 29 Convex-Concave Procedure (CCP) Toy example: , where Convexify to obtain Initial point: Minimize and obtain Reiterate 43 21

0 30 Picking the best decomposition for CCP Algorithm Linearize around a point to obtain convexified version of Idea Pick such that it is as close as possible to affine around Mathematical translation Minimize curvature of at * Worst-case curvature*

Average curvature* s.t. convex s.t. convex * 31 Undominated decompositions (1/2) Definition: is an undominated decomposition of if no other decomposition of can be obtained by subtracting a (nonaffine) convex function from

( )= + Convexify around to get DOMINATED BY Convexify around to get If dominates then the next iterate in CCP obtained using always beats the one obtained using 32 Undominated decompositions (2/2) Theorem: Given a polynomial, consider min, (where ) , convex h ()convex,

s.t. Any optimal solution is an undominated dcd of (and an optimal solution always exists). Theorem: If has degree 4, it is strongly NP-hard to solve . Idea: Replace convex by sos-convex. 33 Comparing different decompositions (1/2) Solving the problem: , where has and Decompose run CCP for 4 minutes and compare objective value. Feasibility s.t. sos-convex

Undominated s.t. sos-convex sos-convex 34 Comparing different decompositions (2/2) Feasibility ( ) Average over 30 instances

Solver: Mosek Computer: 8Gb RAM, 2.40GHz processor Undominated Conclusion: Performance of CCP strongly affected by initial decomposition. 35 Main messages (1/2) Optimizing over nonnegative polynomials has many applications. Sum of squares techniques are powerful relaxations but expensive (SDP). Present more scalable versions of sum of squares (iterative LPs and

SOCPs). PSD Cholesky change of basis DD SDD 36 Main messages (2/2) Imposing convexity can be done using sum of squares techniques (leads to sos-convexity). convex

Sos-convexity can be used for 3D geometry problems. Sos-convexity can be used in DCP, to decompose a polynomial into a difference of convex polynomials. The choice of the decomposition impacts performance of CCP. 37 Thank you for listening Questions? Want to learn more? http://scholar.princeton.edu/ghall/ 38