Spring 2013 Advising Starts this week! CS2710 Computer Organization 1 Floating Point Numbers Lecture Objectives: 1) 2) 3) 4) 5) 6) Define floating point number. Define the terms fraction and exponent when dealing with floating point numbers.

Define overflow and underflow in relation to floating point numbers. Convert a floating point number from binary to decimal format. Convert a floating point number from decimal to floating point format. Calculate the result of adding two floating point numbers together. What is the difference between the following numbers? 3 0.3 3.14 22/7 CS2710 Computer Organization 3 What is the difference between the following numbers?

3 - integer 0.3, 3.14 Real numbers; fractional parts can be expressed perfectly in powers of 10 22/7 3.1428571 1428571 1428571 Rational, but the fractional part cannot be expressed perfectly in powers of 10, so is infinitely repeating as a decimal (base 10) fraction 3.141592653 Irrational cannot be expressed as a ratio of integers CS2710 Computer Organization 4

Representing a real number: Base 10 vs. Base 2 3.62510 = 3*100 + 6*10-1 + 2* 10-2 + 5* 10-3 . Is called the decimal point 11.1012 = 1*21 + 1*20 + 1*2-1 + 0*2-2 + 1*2-3 . Is called the binary point The decimal value 3.62510 can be represented perfectly as 11.1012 CS2710 Computer Organization 5 Real values in base 10 cannot always be represented perfectly in base 2 Fractions in binary only terminate if the denominator has 2 as the only prime factor.

Ex: 0.310 As a rational value: 3/10, but the denominator is not a power of 2 The (infinitely repeating) binary fraction is 0.0100110011001100110011 CS2710 Computer Organization 6 Definitions Scientific Notation A notation which renders numbers with a single digit to the left of the decimal point 91.0 = 9.1 101 91.0 = 0.91 102 91.0 = 91.0 10-1 is not in proper scientific notation

Normalized A number in proper scientific notation that has no leading zeros 91.0 = 9.1 101 91.0 = 0.91 102 is not normalized Floating Point Arithmetic where the decimal/base point is not fixed, which allows us to move the base point around in order to normalize it CS2710 Computer Organization 7 Normalized form for base 2 11.1012 = 11.1012 * 20 = 1.11012 * 21 (normalized form) Fraction

The value, between 0 and 1, placed in the fraction field (1101 in this case) Exponent The value that is placed in the exponent field that represents the power to which the base is raised (in this case, exponent is 1; base is 2) When we normalize a non-zero binary number, well always have a 1 to the left of the binary point! CS2710 Computer Organization 8 Major Issue: Finite # of digits A real value can be approximated as: 1.xxxxxxx2 2yyyy Types float and double in C/Java

To represent a real (floating point) number in a fixed number of digits, we need to decide how to allocate some fixed number of bits to both the fraction xxxxxx and the exponent yyyy This is a tradeoff between precision and range! CS2710 Computer Organization 9 Floating Point Standard Defined by IEEE Std 754-1985 Developed in response to divergence of representations Portability issues for scientific code Now almost universally adopted Two representations Single precision (32-bit)

Double precision (64-bit) CS2710 Computer Organization 10 IEEE Floating Point Standard single: 8 bits double: 11 bits single: 23 bits double: 52 bits Biased Exponent Fraction S

x ( 1) S (1 Fraction) 2 (BiasedExponent -Bias) S: sign bit of the Fraction (0 for +, 1 for -) Normalize Fraction: 1.0 |Fraction| < 2.0 Always has a leading pre-binary-point 1 bit, so no need to represent it explicitly (hidden bit) Fraction with the hidden 1. restored is called the significand BiasedExponent in excess representation =Actual Exponent + Bias Ensures BiasedExponent is unsigned Single: Bias = 127; Double: Bias = 1203

BiasedExponent 1-254 represents Actual Exponent of -126 to +127 BiasedExponents 0 and 255 have special meanings (table to follow) CS2710 Computer Organization 11 An example in 32 bits x ( 1)S (1 Fraction) 2 (BiasedExponent -Bias) 3.62510 = 1.11012 * 21 (normalized form) Sign is positive, so S=0 (sign bit 0) Fraction bits are 1101 (leading 1 is implicit) BiasedExponent: excess representation =Actual Exponent + Bias Actual Exponent of 1 means BiasedExponent is 128 (Bias is 127) Thus, 3.62510 is represented in 32 bits as 0100 0000 0101 1000 0000 0000 0000 0000

0x 4 0 6 8 0 0 0 0 -3.62510 is represented in 32 bits as 1100 0000 0101 1000 0000 0000 0000 0000 0x C 0 6CS2710 8Computer Organization 0 0 0 0

12 Another example in 32 bits x ( 1)S (1 Fraction) 2(Exponent+Bias) 1.010 = 1.02 * 20 (normalized form) Sign is positive, so S=0 (sign bit 0) Fraction bits are 0 (leading 1 is implicit) Exponent: excess representation: actual exponent + Bias Actual exponent of 0 means Exponent is 127 (Bias is 127) Thus, 1.010 is represented in 32 bits as 0011 1111 1000 0000 0000 0000 0000 0000 0x 3 F 8 0

0 0 0 0 And -1.010 is represented in 32 bits as 1011 1111 1000 0000 0000 0000 0000 0000 0x B F 8 CS2710 0 Computer 0 Organization 0 0 0 13

IEEE Floating Point Encodings Note that the special value of 0 for Exponent, along with 0 for Fraction, represent 0.0. A 0 for Exponent, with non-zero for Fraction, represent a denormalized value (always between 0 and 1) explanation to follow. CS2710 Computer Organization 14 Converting a number from Binary to Decimal Floating Point S ( 1) (1 Fraction) 2 S Exponent

( Exponent Bias ) Fraction 0 1 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 CS2710 Computer Organization 15 Example: Addition of two binary values (using a 3-bit fraction) 1.0012 27 + 1.1102 27 When exponents match, the fractional parts can simply be added together, giving =10.1112 27 Then, the result is normalized: =1.0112 28

The resulting exponent has to be checked for overflow: had the exponent only held 4 bits ( +/- 7 actual exponent range), then this addition would have resulted in a floating-point overflow. F.P. overflows result in values representing infinity Chapter 3 Arithmetic for Computers 16 Addition of two binary values with differing exponents 1.0002 24 - 1.1112 25 Since the exponents are different, the fractional parts cannot simply be added. Instead, we first denormalize the value with the smaller exponent (which may result in dropped bits): 1.0002 24 - 0.1112 24 = 0.0012 24 Then, we renormalize the result

1.0002 28 & check for underflow Chapter 3 (BiasedExponent=0) Arithmetic for Computers 17 Floating Point Hardware CS2710 Computer Organization 18 FP Instructions in MIPS FP hardware is coprocessor 1 Adjunct processor that extends the ISA Separate FP registers

32 single-precision: $f0, $f1, $f31 Paired for double-precision: $f0/$f1, $f2/$f3, Release 2 of MIPs ISA supports 32 64-bit FP regs FP instructions operate only on FP registers Programs generally dont do integer ops on FP data, or vice versa More registers with minimal code-size impact FP load and store instructions lwc1, ldc1, swc1, sdc1 Chapter 3 Arithmetic for Computers 19 FP Instructions in MIPS

Single-precision arithmetic add.s, sub.s, mul.s, div.s e.g., add.s $f0, $f1, $f6 Double-precision arithmetic add.d, sub.d, mul.d, div.d e.g., mul.d $f4, $f4, $f6 Single- and double-precision comparison c.xx.s, c.xx.d (xx is eq, lt, le, ) Sets or clears FP condition-code bit e.g. c.lt.s $f3, $f4 Branch on FP condition code true or false bc1t, bc1f

e.g., bc1t TargetLabel Chapter 3 Arithmetic for Computers 20 Example: Multiplication of two binary values (using a 3-bit fraction) 1.1002 23 * 1.1102 24 The fractional parts can simply be multiplied together, while the exponents are added together =10.1012 27 Then, the result is normalized (with loss of precision): =1.0102 28 The resulting exponent has to be checked for overflow: had the exponent only held 4 bits ( +/- 7 actual exponent range), then this addition would have resulted in a floating-point overflow. F.P. overflows result in values representing infinity

Chapter 3 Arithmetic for Computers 21 Scientific Notation Addition Algorithm CS2710 Computer Organization 22 FP Adder Hardware Step 1 Step 2 Step 3 Step 4

Chapter 3 Arithmetic for Computers 23