CS 61C: Great Ideas in Computer Architecture Performance and Floating-Point Arithmetic Instructors: Krste Asanovi & Randy H. Katz http://inst.eecs.berkeley.edu/~cs61c/fa17 02/28/2020 Fall 2017-- Lecture #17 1 Outline Defining Performance Floating-Point Standard And in Conclusion 02/28/2020 2

What is Performance? Latency (or response time or execution time) Time to complete one task Bandwidth (or throughput) Tasks completed per unit time 02/28/2020 3 Cloud Performance: Why Application Latency Matters Key figure of merit: application responsiveness Longer the delay, the fewer the user clicks, the less the user happiness, and the lower the revenue per user 02/28/2020 4 CPU Performance Factors

A program executes instructions CPU Time/Program = Clock Cycles/Program x Clock Cycle Time = Instructions/Program x Average Clock Cycles/Instruction x Clock Cycle Time 1st term called Instruction Count 2nd term abbreviated CPI for average Clock Cycles Per Instruction 3rd term is 1 / Clock rate 02/28/2020 5 Restating Performance Equation Time = Seconds Program Instructions Clock cycles Seconds Clock

= Program Instruction Cycle CPU Iron Law! 02/28/2020 6 What Affects Each Component? Instruction Count, CPI, Clock Rate Hardware or software component? Affects What? Algorithm Instruction Count, CPI Programming Language Instruction Count, CPI

Compiler Instruction Count, CPI Instruction Set Architecture Instruction Count, Clock Rate, CPI 02/28/2020 7 Peer Instruction Computer Clock frequency Clock cycles per instruction

#instructions per program RED 1.0 GHz 2 1000 GREEN 2.0 GHz 5 800 ORANGE

0.5 GHz 1.25 400 YELLOW 5.0 GHz 10 2000 Which computer has the highest performance for a given program? 02/28/2020 8 Peer Instruction System

A B Rate (Task 1) 10 20 Rate (Task 2) 20 10 Which system is faster? RED: System A GREEN: System B ORANGE: Same performance YELLOW: Unanswerable question! 02/28/2020 9 Workload and Benchmark Workload: Set of programs run on a computer

Actual collection of applications run or made from real programs to approximate such a mix Specifies both programs and relative frequencies Benchmark: Program selected for use in comparing computer performance Benchmarks form a workload Usually standardized so that many use them 02/28/2020 10 SPEC (System Performance Evaluation Cooperative) Computer Vendor cooperative for benchmarks, started in 1989 SPECCPU2006 12 Integer Programs 17 Floating-Point Programs Often turn into number where bigger is faster SPECratio: reference execution time on old reference computer divided by execution time on new computer, reported as speed-up (SPEC 2017 just released)

02/28/2020 11 SPECINT2006 on AMD Barcelona Description Interpreted string processing Block-sorting compression GNU C compiler Combinatorial optimization Go game Search gene sequence Chess game Quantum computer simulation Video compression Discrete event simulation library Games/path finding XML parsing

Instruction Count (B) CPI Clock cycle Execution time (ps) Time (s) Reference Time (s) SPECratio 2,118 0.75 400 637

9,770 15.3 2,389 1,050 0.85 1.72 400 400 817 724 9,650 8,050 11.8 11.1

336 1,658 2,783 2,176 10.0 1.09 0.80 0.96 400 400 400 400 1,345 721 890 837 9,120

10,490 9,330 12,100 6.8 14.6 10.5 14.5 1,623 3,102 1.61 0.80 400 400 1,047 993 20,720

22,130 19.8 22.3 587 1,082 1,058 2.94 1.79 2.70 400 400 400 690 773 1,143 6,250

7,020 6,900 9.1 9.1 12 6.0 Summarizing SPEC Performance Varies from 6x to 22x faster than reference computer Geometric mean of ratios: N-th root of product of N ratios Geometric Mean gives same relative answer no matter what computer is used as reference Geometric Mean for Barcelona is 11.7 02/28/2020 13 Administrivia Midterm 2 in one week!

Guerrilla Session tonight 7-9pm in Cory 293 ONE double sided cheat sheet Stay tuned for room assignments Review session: Saturday 10-12pm in Hearst Annex A1 Homework 4 released! Caches and Floating Point Due after the midterm Still good cache practice! Proj2-Part2 released Due after the midterm, but good to do before!

02/28/2020 14 Outline Defining Performance Floating-Point Standard And in Conclusion 02/28/2020 Fall 2016 Lecture #17 15 Quick Number Review Computers deal with numbers What can we represent in N bits? 2N things, and no more! They could be Unsigned integers:

0 to 2N - 1 (for N=32, 2N 1 = 4,294,967,295) Signed Integers (Twos Complement) -2(N-1) to 2(N-1) - 1 (for N=32, 2(N-1) = 2,147,483,648) 02/28/2020 16 Other Numbers 1. Very large numbers? (seconds/millennium) 31,556,926,00010 (3.155692610 x 1010) 2. Very small numbers? (Bohr radius) 0.000000000052917710m (5.2917710 x 10-11) 3. Numbers with both integer & fractional parts? 1.5 First consider #3 our solution will also help with #1 and #2 02/28/2020 17

Goals for Floating Point Standard arithmetic for reals for all computers Like twos complement Keep as much precision as possible in formats Help programmer with errors in real arithmetic +, -, Not-A-Number (NaN), exponent overflow, exponent underflow Keep encoding that is somewhat compatible with twos complement E.g., 0 in Fl. Pt. is 0 in twos complement Make it possible to sort without needing to do floating-point comparison 02/28/2020 18 Representation of Fractions Binary Point like decimal point signifies boundary between integer and fractional parts: Example 6-bit representation:

10.1010two xx.yyyy 21 20 2-1 2-2 2-3 2-4 = 1x21 + 1x2-1 + 1x2-3 = 2.625ten If we assume fixed binary point, range of 6-bit representations with this format: 0 to 3.9375 (almost 4) 02/28/2020 19 Fractional Powers of 2 i 02/28/2020 0 1 2

3 4 5 6 7 8 9 10 11 12 13 14 15 2-i 1.0 1 0.5 1/2 0.25 1/4 0.125 1/8 0.0625 1/16 0.03125 1/32

0.015625 0.0078125 0.00390625 0.001953125 0.0009765625 0.00048828125 0.000244140625 0.0001220703125 0.00006103515625 0.000030517578125 20 Representation of Fractions with Fixed Pt. What about addition and multiplication? Addition is straightforward: 01.100 + 00.100 10.000

1.5ten 0.5ten 2.0ten 01.100 00.100 Multiplication a bit more complex: 00 000 000 00 Wheres the answer, 0.11? (e.g., 0.5 + 0.25; 0110 0 Need to remember where point is!) 00000 00000 02/28/2020 1.5ten 0.5ten 21

Representation of Fractions Our examples used a fixed binary point. What we really want is to float the binary point to make most effecitve use of limited bits example: put 0.1640625ten into binary. Represent with 5-bits choosing where to put the binary point. 000000.001010100000 Store these bits and keep track of the binary point 2 places to the left of the MSB Any other solution would lose accuracy! With floating-point representation, each numeral carries an exponent field recording the whereabouts of its binary point Binary point can be outside the stored bits, so very large and small numbers can be represented 02/28/2020

22 Scientific Notation (in Decimal) mantissa exponent 6.02ten x 1023 radix (base) decimal point Normalized form: no leadings 0s (exactly one digit to left of decimal point) Alternatives to representing 1/1,000,000,000 Normalized: Not normalized: 02/28/2020 1.0 x 10-9 0.1 x 10-8, 10.0 x 10-10 23 Scientific Notation (in Binary)

mantissa 1.01two x 2-1 binary point exponent radix (base) Computer arithmetic that supports it is called floating point, because it represents numbers where the binary point is not fixed, as it is for integers Declare such variable in C as float double for double precision 02/28/2020 24 Floating-Point Representation (1/4) 32-bit word has 232 patterns, so must be approximation of real numbers 1.0, < 2 IEEE 754 Floating-Point Standard:

1 bit for sign (s) of floating point number 8 bits for exponent (E) 23 bits for fraction (F) (get 1 extra bit of precision if leading 1 is implicit) (-1)s x (1 + F) x 2E Can represent from 2.0 x 10-38 to 2.0 x 1038 02/28/2020 25 Floating-Point Representation (2/4) Normal format: +1.xxxxtwo*2yyyytwo Multiple of Word Size (32 bits) 31 30 23 22 S Exponent 1 bit 8 bits

Significand 23 bits S represents Sign Exponent represents ys Significand represents xs Represent numbers as small as 2.0ten x 10-38 to as large as 2.0ten x 1038 02/28/2020 0 26 Floating-Point Representation (3/4) What if result too large? (> 2.0x1038 , < -2.0x1038 )

Overflow! Exponent larger than represented in 8-bit Exponent field What if result too small? (>0 & < 2.0x10-38 , <0 & > -2.0x10-38 ) overflow Underflow! Negative exponentunderflow larger than represented in 8-bit Exponent field overflow -2x1038 -1 -2x10-38 0 2x10-38 1 2x1038 What would help reduce chances of overflow and/or underflow? 02/28/2020

27 Floating-Point Representation (4/4) What about bigger or smaller numbers? IEEE 754 Floating-Point Standard: Double Precision (64 bits) 1 bit for sign (s) of floating-point number 11 bits for exponent (E) 52 bits for fraction (F) (get 1 extra bit of precision if leading 1 is implicit) (-1)s x (1 + F) x 2E Can represent from 2.0 x 10-308 to 2.0 x 10308 32-bit format called Single Precision 02/28/2020 28 Which is Less? (i.e., closer to -)

0 vs. 1 x 10-127 ? 1 x 10-126 vs. 1 x 10-127 ? -1 x 10-127 vs. 0 ? -1 x 10-126 vs. -1 x 10-127 ? 02/28/2020 29 Floating Point: Representing Very Small Numbers Zero: Bit pattern of all 0s is encoding for 0.000 But 0 in exponent should mean most negative exponent (want 0 to be next to smallest real) Cant use twos complement (1000 0000two) Bias notation: subtract bias from exponent Single precision uses bias of 127; DP uses 1023 0 uses 0000 0000two => 0-127 = -127; , NaN uses 1111 1111two => 255-127 = +128 Smallest SP real can represent: 1.0000 x 2-126 Largest SP real can represent: 1.1111 x 2+127

02/28/2020 30 IEEE 754 Floating-Point Standard (1/3) Single Precision (Double Precision similar): 31 30 23 22 S Exponent 1 bit 8 bits Significand 0 23 bits

Sign bit: 1 means negative 0 means positive Significand in sign-magnitude format (not 2s complement) To pack more bits, leading 1 implicit for normalized numbers 1 + 23 bits single, 1 + 52 bits double always true: 0 < Significand < 1 (for normalized numbers) Note: 0 has no leading 1, so reserve exponent value 0 just for number 0 02/28/2020 31 IEEE 754 Floating-Point Standard (2/3) IEEE 754 uses biased exponent representation Designers wanted FP numbers to be used even if no FP hardware; e.g., sort records with FP numbers using integer compares Wanted bigger (integer) exponent field to represent bigger numbers 2s complement poses a problem (because negative numbers look bigger) Use just magnitude and offset by half the range

02/28/2020 32 IEEE 754 Floating-Point Standard (3/3) Called Biased Notation, where bias is number subtracted to get final number IEEE 754 uses bias of 127 for single precision Subtract 127 from Exponent field to get actual exponent value Summary (single precision): 31 30 S Exponent 1 bit 8 bits 23 22 0 Significand

23 bits (-1)S x (1 + Significand) x 2(Exponent-127) Double precision identical, except with exponent bias of 1023 (half, quad similar) 02/28/2020 33 Bias Notation (+127) How it is interpreted How it is encoded , NaN Getting closer to zero Zero 02/28/2020

34 Peer Instruction Guess this Floating Point number: 1 1000 0000 1000 0000 0000 0000 0000 000 RED: -1 x 2128 GREEN: +1 x 2-128 ORANGE: -1 x 21 YELLOW: -1.5 x 21 02/28/2020 35 More Floating Point What about 0? Bit pattern all 0s means 0, so no implicit leading 1 What if divide 1 by 0? Can get infinity symbols +, - Sign bit 0 or 1, largest exponent (all 1s), 0 in fraction

What if do something stupid? ( , 0 0) Can get special symbols NaN for Not-a-Number Sign bit 0 or 1, largest exponent (all 1s), not zero in fraction What if result is too big? (2x10308 x 2x102) Get overflow in exponent, alert programmer! What if result is too small? (2x10-308 2x102) Get underflow in exponent, alert programmer! 02/28/2020 36 Representation for In FP, divide by 0 should produce , not overflow Why? OK to do further computations with E.g., X/0 > Y may be a valid comparison IEEE 754 represents Most positive exponent reserved for Significand all zeroes

02/28/2020 37 Representation for 0 Represent 0? Exponent all zeroes Significand all zeroes What about sign? Both cases valid +0: 0 00000000 00000000000000000000000 -0: 1 00000000 00000000000000000000000 02/28/2020 38 Special Numbers What have we defined so far? (Single Precision) Exponent Significand Object 00 0 0 nonzero ??? 1-254 anything +/- fl. pt. # 255 0 +/-

255 nonzero ??? 02/28/2020 39 Representation for Not-a-Number What do I get if I calculate sqrt(-4.0)or 0/0? If not an error, these shouldnt be either Called Not a Number (NaN) Exponent = 255, Significand nonzero Why is this useful? Hope NaNs help with debugging? They contaminate: op(NaN, X) = NaN Can use the significand to identify which! (e.g., quiet NaNs and signaling NaNs) 02/28/2020 40 Representation for Denorms (1/2) Problem: Theres a gap among representable FP numbers

around 0 Smallest representable positive number: a = 1.0 2 * 2-126 = 2-126 Second smallest representable positive number: b = 1.0001 2 * 2-126 Normalization = (1 + 0.0012) * 2-126 and implicit 1 = (1 + 2-23) * 2-126 is to blame! = 2-126 + 2-149 a - 0 = 2-126 b - a = 2-149 - 02/28/2020 Gaps!

b 0 a + 41 Representation for Denorms (2/2) Solution: We still havent used Exponent = 0, Significand nonzero DEnormalized number: no (implied) leading 1, implicit exponent = -126 Smallest representable positive number: a = 2-149 (i.e., 2-126 x 2-23) Second-smallest representable positive number: b = 2-148 (i.e., 2-126 x 2-22) 02/28/2020 0 +

42 Special Numbers Summary Reserve exponents, significands: Exponent Significand Object 0 0 0 0 nonzero Denorm 1-254 anything +/- fl. pt. number 255 0 +/- 255 Nonzero NaN

02/28/2020 43 Saving Bits Many applications in machine learning, graphics, signal processing can make do with lower precision IEEE half-precision or FP16 uses 16 bits of storage 1 sign bit 5 exponent bits (exponent bias of 15) 10 significand bits Microsoft BrainWave FPGA neural net computer uses proprietary 8-bit and 9-bit floating-point formats 44 Outline Defining Performance Floating Point Standard And in Conclusion

02/28/2020 45 And In Conclusion, Time (seconds/program) is measure of performance Instructions Clock cycles Seconds = Program Instruction Clock Cycle Floating-point representations hold approximations of real numbers in a finite number of bits - IEEE 754 Floating-Point Standard is most widely accepted attempt to standardize interpretation of such numbers (Every desktop or server computer sold since ~1997 follows these conventions)

31- 30 Single Precision: 23 22 0 S Exponent 1 bit 8 bits 02/28/2020 Significand 23 bits 46