Computer Architecture

Computer Architecture

ISA (Wrap up) 1 RISC vs. CISC 2 CISC Evolution Storage and Memory High cost of memory. Need for compact code. Support for high-level languages Support for new applications (e.g., multimedia) 3 CISC Effects

Moved complexity from s/w to h/w Ease of compiler design (HLLCA) Easier to debug Lengthened design times Increased design errors 4 RISC Evolution Increasingly cheap memory Improvement in compiler technology Patterson: Make the common case fast 5 RISC Effect Move complexity from h/w to s/w Provided a single-chip solution Better use of chip area Better Speed

Feasibility of pipelining Single cycle execution stages Uniform Instruction Format 6 Key arguments RISC argument for a given technology, RISC implementation will be faster current VLSI technology enables single-chip RISC when technology enables single-chip CISC, RISC will be pipelined when technology enables pipelined CISC, RISC will have caches When CISC have caches, RISC will have multiple cores CISC argument CISC flaws not fundamental (fixed with more transistors) Moores Law will narrow the RISC/CISC gap (true) software costs will dominate (very true) 7 Role of Compiler: RISC vs. CISC CISC instruction:

MUL , RISC instructions: LOAD A, LOAD B, MUL A, B STORE RISC is dependent on optimizing compilers 8 Example CISC ISA: Intel X86 Operand sizes: 12 addressing modes:

Register. Immediate. Direct. Base. Base + Displacement. Index + Displacement. Scaled Index + Displacement. Based Index. Based Scaled Index. Based Index + Displacement. Based Scaled Index + Displacement. Relative.

Can be 8, 16, 32, 48, 64, or 80 bits long. Also supports string operations. Instruction Encoding: The smallest instruction is one byte. The longest instruction is 12 bytes long. The first bytes generally contain the opcode, mode specifiers, and register fields.

The remainder bytes are for address displacement and immediate data. 9 Example RISC ISA: PowerPC 8 addressing modes: Register direct. Immediate. Register indirect. Register indirect with immediate index (loads and stores).

Register indirect with register index (loads and stores). Absolute (jumps). Link register indirect (calls). Count register indirect (branches). Operand sizes: Four operand sizes: 1, 2, 4 or 8 bytes. Instruction Encoding: Instruction set has 15 different formats with many minor variations. All are 32 bits in length. 10 Example RISC ISA: HP-PA Operand sizes:

7 addressing modes: Register Immediate Base with displacement Base with scaled index and displacement Predecrement Postincrement PC-relative Five operand sizes ranging in powers of two from 1 to 16 bytes.

Instruction Encoding: Instruction set has 12 different formats. All are 32 bits in length. 11 Example RISC ISA: SPARC 5 addressing modes: Operand sizes:

Register indirect with immediate displacement. Register inderect indexed by another register. Register direct. Immediate. PC relative. Four operand sizes: 1, 2, 4 or 8 bytes. Instruction Encoding: Instruction set has 3 basic instruction formats with 3 minor variations. All are 32 bits in length. 12 CISC vs. RISC Characteristics RISC vs. CISC controversy is now 20 years old. After the initial enthusiasm for RISC machines, there has been a growing realization that

RISC designs may benefit from the inclusion of some CISC features, and Vice-versa. The result is that more recent RISC design, PowerPC and SPARC, are no longer "pure" RISC and the more recent CISC designs, notably the Pentium, AMD, and Core Duo incorporate core RISC characteristics. Intel called this hack CRISC. This concept was so moronic that even Intel could not market it! 13 Current Trends - x86 instructions decoded into RISC-like instructions (ROps) Intel called this hack CRISC. This concept was so moronic that even Intel could not market it! IA-64 - dependence on compilers for scheduling Athlon both direct execution and microprogrammed instructions

14 Why did Intel (CISC) win? x86 won because it was the first 16-bit chip. IBM put it in PCs because there was no competing choice Rest is inertia and financial feedback x86 is most difficult ISA to implement for high performance, but Because Intel sells the most processors ... It has the most money ... Which it uses to hire more and better engineers ... Which is uses to maintain competitive performance ... And given equal performance, compatibility wins ...

So Intel sells the most processors. 15 Towards Instruction Set Consolidation IA vision Future innovation should come in micro-architecture enhancements and compatible extensions to dominant instruction sets, rather than the creation of new instruction sets. 1 Is the trend clear? With ever growing software complexity and installed base the value of remaining compatible with and extending existing, dominant instruction sets heavily outweighs any disadvantages. 2 Is the time now?

Technology has passed the point where instruction set costs are no longer relevant. 16 What The Euro Can Teach Us The economic benefits of moving away from multiple currencies is enormous 17 Possible Solutions Port thousands of applications, operating systems, drivers, codecs, tool chains and virtual machines x86 MIPS STOP

ARM main() { printf(Hello World\n); } PowerPC EPIC SPARC Cell Other Resource intensive Slow Costly to maintain 18 Architectural Evolution MacroLevel Networking

Common Instruction Set Architecture Robust security Investment protection Today Built in OS integration Happening Now No need for multiple validations The Future? No need to port Storage

Server Desktop Laptop Handheld Ubiquitous 19 Extending x86 20 EPIC (Explicitly Parallel Instruction Computing) 21 IA- 64: The Itanium Processor A radical departure from the traditional

paradigms. Intel and Hewlett-Packard Co. designed a new architecture, IA-64, that they expected to be much more effective at executing instructions in parallel IA-64 is brand new ISA 22 IA - 64 IA-64 is a 64-bit. In IA-64 designs, instructions are scheduled by the compiler, not by the hardware. Much of the logic that groups, schedules, and tracks instructions is not needed thus simplifying the circuitry and promising to improve performance. 23 The EPIC Philosophy The acronym EPIC stands for Explicitly

Parallel Instruction Computing. The entire EPIC design philosophy can be summed up by the following: make use of parallel power whenever and wherever possible; and if it's not possible, make it possible. 24 The EPIC Philosophy 25 Intel IA-64 Massive resources 128 GPRs (64-bit, plus poison bit) 128 FPRs (82-bit)

64 predicate registers Also has branch registers for indirect branches Contrast to: RISC: 32 int, 32 FP, handful of control regs x86: 8 int, 8 fp, handful of control regs x86-64 bumps this to 16, SSE adds 8/16 MM regs 26 IA-64 Registers 27 IA-64 Groups Compiler assembles groups of instructions No register data dependencies between insts in the same group Memory deps may exist Compiler explicitly inserts stops to mark the

end of a group Group can be arbitrarily long 28 IA-64 Bundles Bundle == The VLIW Instruction 5-bit template encoding also encodes stops Three 41-bit instructions 128 bits per bundle average of 5.33 bytes per instruction x86 only needs 3 bytes on average 29 Instruction Types Instructions are divided into different type

determines which functional units they can execute on templates are based on these types 30 Bundle Templates Not all combinations of A, I, M, F, B, L and X are permitted Group stops are explicitly encoded as part of the template cant stop just anywhere Some bundles identical except for group stop

31 Individual Instruction Formats Fairly RISC-like like easy to decode, fields tend to stay put 32 Instruction Format: Bundles & Templates Bundle Set of three instructions (41 bits each) Template Identifies types of instructions in bundle 33 Explicitly Parallel Instruction Computing EPIC S2

S1 S0 T 128-bit instruction bundles from I-cache Fetch one or more bundles for execution (Implementation, Itanium takes two.) Processor Try to execute all instructions in parallel, depending on available units. functional units MEM MEM

INT INT FP FP B B B Retired instruction bundles 34 The Role of Compilers 35

Compiler and ISA ISA decisions are no more just for programming assembly language (AL) easily Due to HLL, ISA is a compiler target today Performance of a computer will be significantly affected by compiler Understanding the compiler technology today is critical to designing and efficiently implementing an instruction set Architecture choice affects the code quality and the complexity of building a compiler for it 36 Goal of the Compiler Primary goal is correctness Second goal is speed of the object code Others:

Speed of the compilation Ease of providing debug support Inter-operability among languages Flexibility of the implementation - languages may not change much but they do evolve - e. g. Fortran 66 ===> HPF Make the frequent cases fast and the rare case correct 37 Typical Modern Compiler Structure Common Intermediate Representation Somewhat language dependent Largely machine independent Small language dependent Slight machine dependent Language independent

Highly machine dependent 38 Typical Modern Compiler Structure (Cont.) Multi-pass structure easy to write bug-free compilers Transform HL, more abstract representations, into progressively lowlevel representations, eventually reaching the instruction set Compilers must make assumptions about the ability of later steps to deal with certain problems Ex. 1 choose which procedure calls to expand inline before they know the exact size of the procedure being called Ex. 2 Global common sub-expression elimination Find two instances of an expression that compute the same value and saves the result of the first one in a temporary Temporary must be register, not memory (Performance) Assume register allocator will allocate temporary into register 39 Optimization Types

High level - done at source code level Procedure called only once - so put it in-line and save CALL Local - done on basic sequential block (straight-line code) Common sub-expressions produce same value Constant propagation - replace constant valued variable with the constant - saves multiple variable accesses with same value Global - same as local but done across branches Code motion - remove code from loops that compute same value on each pass and put it before the loop Simplify or eliminate array addressing calculations in loop 40 Optimization Types (Cont.) Register allocation Use graph coloring (graph theory) to allocate registers NP-complete Heuristic algorithm works best when there are at least 16 (and preferably more) registers

Processor-dependent optimization Strength reduction: replace multiply with shift and add sequence Pipeline scheduling: reorder instructions to minimize pipeline stalls Branch offset optimization: Reorder code to minimize branch offsets 41 Register Allocation One the most important optimizations Based on graph coloring techniques Construct graph of possible allocations to a register Use graph to allocate registers efficiently Goal is to achieve 100% register allocation for all active variables. Graph coloring works best when there are at least 16 general-purpose registers available for integers and more for floating-point variables. 42 Constant propagation

a:= 5; ... // no change to a so far. if (a > b) { . . . } The statement (a > b) can be replaced by (5 > b). This could free a register when the comparison is executed. When applied systematically, constant propagation can improve the code significantly. 43 Strength reduction Example: for (j = 0; j = n; ++j) A[j] = 2*j; for (i = 0; 4*i <= n; ++i) A[4*i] = 0;

An optimizing compiler can replace multiplication by 4 by addition by 4. This is an example of strength reduction. In general, scalar multiplications can be replaced by additions. 44 Major Types of Optimizations and Example in Each Class 45 Change in IC Due to Optimization Level 1: local optimizations, code scheduling, and local register allocation Level 2: global optimization, loop transformation (software pipelining), global register allocation Level 3: + procedure integration 46

How can Architects Help Compiler Writers Provide Regularity Address modes, operations, and data types should be orthogonal (independent) of each other Simplify code generation especially multi-pass Counterexample: restrict what registers can be used for a certain classes of instructions Provide primitives - not solutions Special features that match a HLL construct are often un-usable What works in one language may be detrimental to others 47 How can Architects Help Compiler Writers (Cont.) Simplify trade-offs among alternatives How to write good code? What is a good code? Metric: IC or code size (no longer true) caches and pipeline

Anything that makes code sequence performance obvious is a definite win! How many times a variable should be referenced before it is cheaper to load it into a register Provide instructions that bind the quantities known at compile time as constants Dont hide compile time constants Instructions which work off of something that the compiler thinks could be a run-time determined value hand-cuffs the optimizer 48 Short Summary -- Compilers ISA has at least 16 GPR (not counting FP registers) to simplify allocation of registers using graph coloring Orthogonality suggests all supported addressing modes apply to all instructions that transfer data Simplicity understand that less is more in ISA design Provide primitives instead of solutions Simplify trade-offs between alternatives Dont bind constants at runtime

Counterexample Lack of compiler support for multimedia instructions 49 MIPS ISA 50 Growth of Processors Language of the Machine Well be working with the MIPS instruction set architecture similar to other architectures developed since the 1980's Almost 100 million

MIPS processors manufactured in 2002 used by NEC, Nintendo, Cisco, Silicon Graphics, Sony, 1400 1300 1200 1100 Other SPARC Hitachi SH PowerPC 900 Motorola 68K MIPS IA-32

800 ARM 1000 700 600 500 400 300 200 100 0 1998 1999 2000

2001 2002 51 MIPS Instruction Set (RISC) Instructions execute simple functions. Maintain regularity of format each instruction is one word, contains opcode and arguments. Minimize memory accesses whenever possible use registers as arguments. Three types of instructions: Register (R)-type only registers as arguments. Immediate (I)-type arguments are registers and numbers (constants or memory addresses). Jump (J)-type argument is an address. 52 MIPS Arithmetic Instructions

All instructions have 3 operands Operand order is fixed (destination first) Example: C code: a = b + c MIPS code: add a, b, c (well talk about registers in a bit) The natural number of operands for an operation like addition is three requiring every instruction to have exactly three operands conforms to the philosophy of keeping the hardware simple 53 Arithmetic Instr. (Continued) Design Principle: simplicity favors regularity.

Of course this complicates some things... C code: a = b + c + d; MIPS code: add a, b, c add a, a, d Operands must be registers (why?) 32 registers provided Each register contains 32 bits 54 Registers vs. Memory Arithmetic instructions operands must be registers 32 registers provided Compiler associates variables with registers. What about programs with lots of variables? Must use memory. Control

Input Memory Datapath Processor Output I/O 55 Memory Instructions Load and store instructions Example: C code: A[12] = h + A[8];

MIPS code: lw $t0, 32($s3) #addr of A in reg s3 add $t0, $s2, $t0 #h in reg s2 sw $t0, 48($s3) Can refer to registers by name (e.g., $s2, $t2) instead of number Store word has destination last Remember arithmetic operands are registers, not memory! Cant write: add 48($s3), $s2, 32($s3) 56 Machine Language Instructions, like registers and words of data, are also 32 bits long Example: add $t1, $s1, $s2

registers have numbers, $t1=8, $s1=17, $s2=18 Instruction Format: 000000 10001 10010 01000 00000 100000 op rs rt rd shamt funct 57 MIPS64 Instruction Format I - type instruction 6 Opcode

0 5 5 16 rs rt Immediate 5 6 10 11 15 16 31

Encodes: Loads and stores of bytes, words, half words. All immediates (rd rs op immediate) Conditional branch instructions (rs1 is register, rd unused) Jump register, jump and link register (rd = 0, rs = destination, immediate = 0) R - type instruction 6 5 Opcode rs 5 5 rt

rd 5 shamt 6 func 0 5 6 10 11 15 16 20 21 25 26 Register-register ALU operations: rd rs func rt Function encodes the data path operation: Add, Sub .. Read/write special registers and moves. J - Type instruction

6 Opcode 31 26 Offset added to PC 0 5 6 Jump and jump and link. Trap and return from exception 31 58 Summary: MIPS Registers and Memory $s0-$s7, $t0-$t9, $zero, Fast locations for data. In MIPS, data m ust be in registers to perform

32 registers $a0-$a3, $v0-$v1, $gp, arithm etic. MIPS register $zero alw ays equals 0. Register $at is $fp, $sp, $ra, $at reserved for the assem bler to handle large constants. Memory[0], 230 memoryMemory[4], ..., words Memory[4294967292] Accessed only by data transfer instructions. MIPS uses byte addresses, so sequential w ords differ by 4. Mem ory holds data structures, such as arrays, and spilled registers, such as those saved on procedure calls. 59 Summary: MIPS Instructions MIPS assembly language Category

Arithmetic add Instruction Example add $s1, $s2, $s3 Meaning $s1 = $s2 + $s3 Three operands; data in registers subtract sub $s1, $s2, $s3 $s1 = $s2 - $s3

Three operands; data in registers addi $s1, $s2, 100 lw $s1, 100($s2) sw $s1, 100($s2) lb $s1, 100($s2) sb $s1, 100($s2) lui $s1, 100 $s1 = $s2 + 100 Used to add constants $s1 = Memory[$s2 + 100]Word from memory to register Memory[$s2 + 100] = $s1 Word from register to memory $s1 = Memory[$s2 + 100]Byte from memory to register Memory[$s2 + 100] = $s1 Byte from register to memory $s1 = 100 * 2 16 Loads constant in upper 16 bits beq

$s1, $s2, 25 if ($s1 == $s2 ) go to PC + 4 + 100 Equal test; PC-relative branch branch on not equal bne $s1, $s2, 25 if ($s1 != $s2 ) go to PC + 4 + 100 Not equal test; PC-relative set on less than slt $s1, $s2, $s3

if ($s2 < $s3 ) $s1 = 1; else $s1 = 0 Compare less than; for beq, bne set less than immediate slti jump jump register jump and link j jr jal add immediate load w ord

store w ord Data transfer load byte store byte load upper immediate branch on equal Conditional branch Unconditional jump $s1, $s2, 100 if ($s2 < 100 ) $s1 = 1; Comments Compare less than constant else $s1 = 0 2500

$ra 2500 go to 10000 Jump to target address $ra go to For sw itch, procedure return $ra = PC + 4; go to 10000 For procedure call 60 Addressing Modes 1. Immediate addressing op rs rt Immediate

2. Register addressing op rs rt rd . .. funct Registers Register 3. Base addressing op rs

rt Memor y Address + Register Byte Halfword Word 4. PC-relative addressing op rs

rt Memor y Address PC + Word 5. Pseudodirect addressing op Address PC Memor y

Word 2004 Morgan Kaufman Publishers 61

Recently Viewed Presentations

  • Everyone agrees that prayer is vitally important to

    Everyone agrees that prayer is vitally important to

    Everyone agrees that prayer is vitally important to the spiritual life of every believer Prayer is invading the impossible. It is essentially a partnership of the redeemed child of God working hand in hand with God toward the realization of...
  • Formă Rară De Abces De Glandă Suprarenală

    Formă Rară De Abces De Glandă Suprarenală

    Hernia of the antero-lateral abdominal wall Definition Progressive protrusion through the abdominal wall of the peritoneum, with tendency to progress, together with an abdominal viscus SO An abdominal viscus will HAVE to leave the abdominal cavity There must be a...
  • Teacher Performance Evaluation and Professional Growth (T-PEPG) Model

    Teacher Performance Evaluation and Professional Growth (T-PEPG) Model

    2.3 Goal-Focused Planning. 3. Teachers are responsible for managing and monitoring student learning. 3.1 Managing Classroom Routines and Expectations. 3.2 Student Engagement. 3.3 Assessment of Student Progress. 4. Teachers think systematically about their practice and learn from experience . 4.1...
  • Avaliação Do Exame De Baciloscopia De Escarro Após ...

    Avaliação Do Exame De Baciloscopia De Escarro Após ...

    OMRAN Abdel R., 1971. - The epidemiologic transition : a theory of the epidemiology of population change, Milbank Memorial Fund Quarterly, vol. 49, n° 4, p. 509-538. Transição epidemiológica
  • Student Involved Conferences - Fraser CBL Journey

    Student Involved Conferences - Fraser CBL Journey

    Competency Work. Student Centric Work . Just the beginning... Benefits of Student-Led Conferences . Students take ownership of their learning. Parents and students have open communication about school, after-school activities and other important decisions in life.
  • Members of the Thematic Network LAMNET 49 MEMBERS

    Members of the Thematic Network LAMNET 49 MEMBERS

    Members of the Thematic Network LAMNET 49 MEMBERS Installed Electricity Capacity in the Sugarcane Sector in State of Sao Paulo Installed Electricity Capacity in the Sugarcane Sector in State of Sao Paulo * POLICIES for the PROMOTION of NEW and...
  • ENG4u: Literary Theory 101

    ENG4u: Literary Theory 101

    or the cultural context surrounding Dr. Seuss? What is Literary Theory? "A very basic way of thinking about literary theory is that these ideas act as different lenses critics use to view and talk about art, literature, and even culture.
  • Cisco Presentation Guide

    Cisco Presentation Guide

    I'm using a "show ISDN status" as my command. Next in step 2, you can designate the type and how much information you are looking. ... After one to two years of customer use, a code will go GD, meaning...