Reducing number of operations: The joy of algebraic ...

Reducing number of operations: The joy of algebraic ...

Reducing number of operations: The joy of algebraic transformations CS498DHP Program Optimization Number of operations and execution time Fewer number of operations does not necessarily mean shorter execution times. Because of scheduling in a parallel environment. Because of locality.

Because of communication in a parallel program. Nevertheless, although it has to be applied carefully, reducing the number of operations is one of the important optimizations. In this presentation, we discuss transformation to reduce the number of operations or reduce the length of scheduling in an idealized parallel environment where communication costs are zero.

Scheduling + Consider the expression tree: It can be shortened by applying Associativity and commutativity: [a+h+b*(c+g+d*e*f) ] or Associativity, commutativity and distributivity: [a+h+b*c+b*g+b*d*e*f].

The second expression is the sortest of the three. This means that with enough resources the third expression is the fastest although is has the most operations. + h

a * + b g +

* c * d f e Locality

Consider: do i=1.n c(i) = a(i)+b(i)+a(i)/b(i) end do do i=1,n x(i) = (a(i)+b(i))*t(i)+a(i)/b(i) end do

do i=1,n d(i) = a(i)/b(i) c(i) = a(i)+b(i)+d(i) end do do i=1,n x(i) = (a(i)+b(i))*t(i)+d(i) end do The sequence on the right executes fewer operations, but, if n is large enough, it

also incurs in more cache misses. (We assume that t is computed between the two loops so that they cannot be fused.) Communication in parallel programs Consider: cobegin

do i=1,n a(i) = .. end do send a(1:n) // receive a(1:n)

coend cobegin do i=1,n a(i) = .. end do

// do i=1,n a(i) = .. end do coend The sequence on the right executes more operation s, but it would execute faster if the send operation is expensive.

Approaches to reducing cost of computation Eliminate (syntactically) redundant computations. Apply algebraic transformations to reduce the number of operations. Decompose sequential computations for parallel execution. Apply algebraic transformations to reduce the

height of expressions trees and thus reduce execution time in a parallel environment. Elimination of redundant computations Many of the transformations were discussed in the context of compiler transformations. Common subexpression elimination Loop invariant removal

Elimination of redundant counters Loop unrolling (not discussed, but should have). It eliminates bookkeeping operations. However, compilers will not eliminate all redundant computations. Here is an example where user intervention is needed: The following sequence do i=1,n

s = a(i)+s end do do i=1,n-1 t = a(i)+t end do t May be replaced by do i=1,n-1

t = a(i)+t end do s=t+a(n) t This transformation is not usually done by compilers. 2. Another example, from C, is the loop

for (i = 0; i < n; i++) { for (j = 0; j < n; j++) { a[i,j]=0; } } Which, if a is n n, can be transformed into the loop below that has fewer bookkeeping operations.

b=a; for (i = 0; i < n*n; i++) { *b=0; b++; } Applying algebraic transformations to reduce the number of operations For example, the expressions a*(b*c)+ (b*a)*d+a*e can be transformed into (a*b)*(c+d)

+a*e by distributivity and then by associativity and distributivity into a*(b*(c+d)+e). Notice that associativity has to be applied with care. For example, suppose we are operating on floating point values and that x is very much larger than y and z=-x. Then (y+x)+z may give 0 as a result, while y+(x+z) gives y as an answer. The application of algebraic rules can be very sophisticated. Consider the

computation of xn. A nave implementation would require n-1 multiplications. However, if we represent n in binary as n=b0+2(b1+2(b2 + )) and notice that xn=xb0 (xb1+2(b2 + ))2, the number of multiplications can be reduced to O(log n). function power(x,n) (assume n>0) if n==1 then return x

if n%2==1 then return x*power(x,n-1) else x=power(x,n/2); return x*x Horners rule A polynomial A(x) = a0 + a1x + a2x + a3x + ... may be written as A(x) = a0 + x(a1 + x(a2 + x(a3 + ...))). As a result, a polynomial may be evaluated at a

point x', that is A(x') computed, in (n) time using Horner's rule. That is, repeated multiplications and additions, rather than the naive methods of raising x to powers, multiplying by the coefficient, and accumulating. Conventional matrix multiplication Asymptotic complexity: 2n3 operations Each recursion step (blocked version): 8 multiplications, 4 additions

Strassens Algorithm Asymptotic complexity: O(nlog27) = O(n2.8) operations Each recursion step: 7 multiplications, 18 additions/subtractions Asymptotic complexity is solution of T(n)=7T(n/2)+18(n/2)2 Winograd Asymptotic complexity: O(n2.8..)operations Each recursion step: 7 multiplications, 15 additions/subtractions

Parallel matrix multiplication Parallel matrix multiplication can be accomplished without redundant operations. First observe that the time to compute a sum of n elements, given enough resources, is log 2 n . Time:

Time: With sufficient replication and computational resources matrix multiplication can take just one multiplication step and log 2 n additions Copying can also be done in logarithmic steps Parallelism and redundancy Algebra rules can be applied to reduce

tree height. In some cases, the height of the tree is reduced at the expense of an increase in the number of operations Parallel Prefix Redundancy in parallel sorting. Sorting networks.

Comparator (2-sorter) inputs outputs x min(x, y)

y max(x, y) Comparison Network n/2 comparisons per stage

1 0 0 0 0

1 0 0 1 0

1 1 0 1 1

1 d stages Sorting Networks inputs outputs 1

0 0 1 0 0 1 1 Sorting Network

0 0 0 0 1 1 1 1

sorted Insertion Sort Network inputs outputs depth 2n 3 comparator stages

comparators Odd-even transposition sort O(n) O(n2) Bubblesort

O(n) O(n2) Bitonic sort O(log(n)2) O(nlog(n)2)

Odd-even mergesort O(log(n)2) O(nlog(n)2) Shellsort O(log(n)2)

O(nlog(n)2)

Recently Viewed Presentations

  • Youth In and Out of the Labour Market: The Impact of the ...

    Youth In and Out of the Labour Market: The Impact of the ...

    Italy Malta Portugal Spain Denmark Finland Netherlands Sweden Bulgaria Czech Republic Hungary Poland Romania Slovakia Slovenia Estonia Latvia Lithuania Anglo Continent Mediterranean Scandinavia Central Europe Baltics-0.1-0.72918433101578262 0.72853578101293137-4.5 2.4878490814729242 0.22408558623680144 0.25989928902549381-1.335956731892268 0 ...
  • Professional Development Quality Mark Platinum Awarded Schools Alperton

    Professional Development Quality Mark Platinum Awarded Schools Alperton

    Professional Development Quality Mark Platinum Awarded Schools Alperton Community School, Brent Burntwood School, Wandsworth Glebe Primary School, Harrow
  • Chapter 1 Linear Equations and Graphs

    Chapter 1 Linear Equations and Graphs

    Chapter 6 Linear Programming: The Simplex Method Section 4 Maximization and Minimization with Problem Constraints * * * * * * * * * Introduction to the Big M Method In this section, we will present a generalized version of...
  • Research Poster 42 x 90 - F - NeuroVisual Medicine Institute

    Research Poster 42 x 90 - F - NeuroVisual Medicine Institute

    Motion sickness is common in these patients. It is likely caused by asymmetric vertical optokinetic stimulation/ nystagmus, which is asymmetric in both time and angle. Vertical optokinetic nystagmus has been shown to be one of the most potent stimuli for...
  • Structural Testing - Philadelphia University

    Structural Testing - Philadelphia University

    Structural (White-Box) Testing Software Testing Module Dr. Samer Hanna Outline Path Testing Control Flow Graph DD-Path Fraph Questions Path testing Structural testing method Based on the source code / pseudocode of the program or the system, and NOT on its...
  • Absolute Time - bath.k12.ky.us

    Absolute Time - bath.k12.ky.us

    Absolute Time Mrs. Wright 8th Grade Science Bath County Middle School Absolute age is the age, in years, of a rock or other object. Geologists determine absolute ages by using properties of the atoms that make up materials.
  • The First Line of Response: Student Disclosure of Sexual Assault

    The First Line of Response: Student Disclosure of Sexual Assault

    The First Line of Response: Student Disclosure of Sexual Misconduct NEW FACULTY ORIENTATION AUGUST 18, 2015 The role of faculty Awareness Knowledge Culture Title IX of the Education Amendments of 1972 37 Powerful Words….
  • Lecture 4a - chem.ucla.edu

    Lecture 4a - chem.ucla.edu

    The gas chromatography column consists of solid support that is covered with a high-boiling liquid in a thin capillary tube. ... Detectors II. Mass spectrometer. Spiking: the sample is run with and without the addition of a spike, which is...