CACTI 7: New Tools for Interconnect Exploration in Innovative ...

CACTI 7: New Tools for Interconnect Exploration in Innovative ...

CACTI 7: New Tools for Interconnect Exploration in Innovative Off-Chip Memories Rajeev Balasubramonian Andrew B. Kahng Naveen Muralimanohar Ali Shafiee Vaishnav Srinivas 1 Main Memory Matters Software In-Memory DBs, Key-Value Stores Graph Algorithms, Deep Learning Architecture Commodity CPUs, Accelerators Shift in bottlenecks Example innovations: NDP, DDR to GDDR5 3x TOPS in TPUx TOPS in TPU Technology DDR4, HMC, HBM, NVM The Innovation Hub is Moving to Memory

2 Two Silos CACTI 7 can be used out-of-the-box when defining memory parameters for traditional memory systems CACTI 7 primitives can be leveraged to model and evaluate new memory architectures 3x TOPS in TPU Talk Outline CACTI for the main memory Inputs/outputs The nuts and bolts Modeling I/O power Design space exploration Case studies: two novel architectures Cascaded Channels Narrow Channels 4 CACTI for Memory Capacity #channels, ECC vs. Not DRAM Type: DDR3x TOPS in TPU,DDR4

Exhaustive Search Cost Table Bandwidth Table Power Parameters Access Pattern: bw, row buffer hits, Rd/Wr ratio Channel Configs Energy per access Inputs and outputs 5 DIMM Cost Cost factors: technology, capacity, support for ECC, max bandwidth, vendor Aggregated costs from online sources Cost is volatile and should be updated periodically Cost in dollars DDR3x TOPS in TPU 4GB 8GB

UDIMM 40 76 RDIMM 42 64 LRDIMM DDR4 UDIMM 26 46 RDIMM 3x TOPS in TPU3x TOPS in TPU 60

LRDIMM 16GB 3x TOPS in TPU2GB 122 3x TOPS in TPU04 211 287 126 3x TOPS in TPU10 279 3x TOPS in TPU3x TOPS in TPU1 64GB 1079 1474

Cost and capacity relationship is not linear 6 Bandwidth Bandwidth depends on load, voltage, and DIMM type 1DPC (MHz) DDR3x TOPS in TPU 1.3x TOPS in TPU5V 1.5V 1.3x TOPS in TPU5V 1.5V UDIMM-DR 53x TOPS in TPU3x TOPS in TPU 667 53x TOPS in TPU3x TOPS in TPU 667

RDIMM-DR 667 800 667 667 RDIMM-QR LRDIMM-QR DDR4 2DPC (MHz) 667 667 667 3x TOPS in TPUDPC (MHz) 1.3x TOPS in TPU5V 1.5V

53x TOPS in TPU3x TOPS in TPU 667 667 667 53x TOPS in TPU3x TOPS in TPU 53x TOPS in TPU3x TOPS in TPU 1.2V 1.2V 1.2V RDIMM-DR 1066 93x TOPS in TPU3x TOPS in TPU 800 RDIMM-QR 93x TOPS in TPU3x TOPS in TPU

800 LRDIMM-QR 1066 1066 800 7 Power Modeling Extending CACTI-I/O DDR4 and SerDes support added SerDes parameters from literature for different lengths/speeds For parallel buses, support for more accurate termination power with HSPICE simulations Different termination models for each bus type Different frequency, DIMMs per channel On-DIMM and on-board

Different range (short or long) 8 Interconnect Model API 9 Power Analysis (DDR3x TOPS in TPU) 10 Power Analysis (DDR4) 11 Cost and Bandwidth Analysis Highest possible BW for the demanded capacity Lowest possible cost for the demanded capacity 12 Two Case Studies Key Observations High DPC less BW

More channels high bw and low cost New Idea I: Cascaded Segments Each segment has few DIMMs higher BW New Idea II: Narrow Channels Partition the channel into many parallel channels Fewer DIMMs per data wire, new ECC higher BW Lower power on DIMM 13x TOPS in TPU Cascaded Channels Same DPC, higher BW DIMM DIMM 667MHz DIMM DIMM RoB CPU DIMM DIMM

53x TOPS in TPU3x TOPS in TPU MHz DIMM DIMM DIMM DIMM DIMM DIMM CPU 667MHz Relay on Board chip Same BW, lower cost 32 GB 32 GB 667MHz RoB CPU 64 GB

667 MHz 64 GB 64 GB CPU 667MHz one memory cycle increase in latency 14 Hybrid Memory NVM is slow Software optimized to access DRAM more Unbalanced channel Load D balanced channel Load D CPU D

N D N CPU N N One Channel DRAM One Channel NVM Frontend DRAM Backend NVM 15 Narrow Channels Command/Address Bus is shared between channels Higher Bandwidth but Higher Latency Lower frequency/power for DRAM Chips! ECC on DIMM and CRC for link to reduce bw 16

Methodology Trace-based simulation Trace fed to USIMM Memory-intensive Benchmarks (NPB and SPEC2006) Trace generated by Simics 8-core at 3x TOPS in TPU.2 GHz L1D = 3x TOPS in TPU2KB, L1I = 3x TOPS in TPU2KB, L2 = 8MB Power CACTI 7 17 Cascaded Channels DDR3x TOPS in TPU 25% higher BW DDR4 13x TOPS in TPU% higher BW 22% higher IPC 12% higher IPC 18 Cascaded Latency Memory Latency (CPU cycles)

550 500 Baseline 450 RoB 400 350 300 12 62 112 162 212 262 312 362

Memory Capacity (GB) 19 Cascaded Power: DRAM Cartridge 53x TOPS in TPU3x TOPS in TPU MHz 70% utilization 667MHz 667MHz 70% utilization 3x TOPS in TPU5% utilization CPU Baseline Cascaded CPU DIMM 23x TOPS in TPU.2W 22.6W BoB 5.5W 6.4W I/O

9.4W 12.2W Total 3x TOPS in TPU8.1W 41.2W Power/BW 7.9 (nJ/B) 6.7 (nJ/B) 20 Cascaded Cost 21 Cascaded Hybrid D D 2 CPU CPU DRAM N N N 800 MHz

NVM 533 MHz CPU CPU 533 MHz N N N NVM CPU CPU D N N DRAM D N N DRAM 667 MHz N N NVM

400 MHz CPU 800 MHz NVM 667 MHz DRAM 667 MHz NVM N N N 553 MHz D D 533 MHz D D 4 N D D D D DRAM 3

D D N N DRAM D N 667 MHz NVM 400 MHz 1.5 Baseline 1.4 RoB 1.3 1.2 1.1 1 0.9

0.8 0.7 0.6 50% 60% Case 4 70% 80% 90% N N 1.6 DRAM 667 MHz NVM 667 MHz 50% 60% 70% Case 3 80% 90% CPU DRAM

667 MHz NVM 667 MHz N 50% 60% 70% Case 2 80% 90% 1 D 50% 60% 70% Case1 80% 90% D D RoB

Normalized Exe. Time Baseline Percentage of Load on DRAM 22 Narrow Channel: Performance Performance Improvement: 2-channel-x36 18% 3-channel-x24 17% 23x TOPS in TPU Narrow Channel: Power 23% overall memory power reduction 24 Conclusion CACTI 7: models off-chip memories and I/O Detailed I/O power model Design space exploration Analyzes trade-offs: capacity, power, bandwidth, and cost Two novel architectures Cascaded channels

Narrow channels 25

Recently Viewed Presentations

  • EDGD801 Learning and behaviour Behaviour management strategies Lecture

    EDGD801 Learning and behaviour Behaviour management strategies Lecture

    Models continuum Autocratic, teacher-centred approaches Integrating learning and behaviour - Kounin Positive behaviour model - Jones Key skills - Detachment - Tactical ignoring - Limit setting the key to discipline Fred Jones - Positive Discipline Jacob Kounin - Preventative discipline...
  • PowerPoint 프레젠테이션 - Daum

    PowerPoint 프레젠테이션 - Daum

    금융환경론 (Financial Environment) 한양대학교 경영대학원 이동훈
  • Phrase penned by Mark Twain for the way

    Phrase penned by Mark Twain for the way

    Henry George Progress and Poverty Challenged Social Darwinism "The gulf between the employed and the employer is growing wider; social contrasts are becoming sharper" Lester Frank Ward Dynamic Sociology Reform Darwinism - cooperation not competition to succeed Edward Bellamy Looking...
  • The Effective Display of Health & Safety Data

    The Effective Display of Health & Safety Data

    Receipts of Radioactive Materials Number of medical use radioactive material receipts Number of non-medical use radioactive material receipts Results of University EH&S Lab Inspection Program, 2003 to 2005 Number of labs inspected and no violations detected Number of labs inspected...
  • Chapter 6: Prices Section 3 Introduction  What roles

    Chapter 6: Prices Section 3 Introduction What roles

    Prices provide a standard of measure of . value. throughout the world. Prices act as a signal that tells producers and consumers how to . adjust. Prices tell . buyers. and . sellers. whether goods are in short supply or...
  • First Day Orientation

    First Day Orientation

    Horn and Wiper/Washer Diagnosis and Repair. Accessories Diagnosis and Repair. TOOLS. Hemet High has a fully equipped tool room. We supply Safety Glasses (1. st. pair only, upon completion of safety module) You Supply: Close toed shoes (Mandatory) Coveralls (Optional...
  • 슬라이드 1 - Asia-Pacific Economic Cooperation

    슬라이드 1 - Asia-Pacific Economic Cooperation

    Eco-Friendly Energy Town To provide real benefits to local residents as increasing income, elevating welfare by installing Renewable energy system on unused & unwanted facilities.
  • Whole Number Review - Mackay Education

    Whole Number Review - Mackay Education

    An IV medication may be prepared by a physician, nurse, pharmacist, or a pharmacy technician. Page . 267. Moderate to large doses of fluids or medications are given this way. IV solutions are also used to maintain and to replace...