CACTI 7: New Tools for Interconnect Exploration in Innovative ...
CACTI 7: New Tools for Interconnect Exploration in Innovative Off-Chip Memories Rajeev Balasubramonian Andrew B. Kahng Naveen Muralimanohar Ali Shafiee Vaishnav Srinivas 1 Main Memory Matters Software In-Memory DBs, Key-Value Stores Graph Algorithms, Deep Learning Architecture Commodity CPUs, Accelerators Shift in bottlenecks Example innovations: NDP, DDR to GDDR5 3x TOPS in TPUx TOPS in TPU Technology DDR4, HMC, HBM, NVM The Innovation Hub is Moving to Memory
2 Two Silos CACTI 7 can be used out-of-the-box when defining memory parameters for traditional memory systems CACTI 7 primitives can be leveraged to model and evaluate new memory architectures 3x TOPS in TPU Talk Outline CACTI for the main memory Inputs/outputs The nuts and bolts Modeling I/O power Design space exploration Case studies: two novel architectures Cascaded Channels Narrow Channels 4 CACTI for Memory Capacity #channels, ECC vs. Not DRAM Type: DDR3x TOPS in TPU,DDR4
Exhaustive Search Cost Table Bandwidth Table Power Parameters Access Pattern: bw, row buffer hits, Rd/Wr ratio Channel Configs Energy per access Inputs and outputs 5 DIMM Cost Cost factors: technology, capacity, support for ECC, max bandwidth, vendor Aggregated costs from online sources Cost is volatile and should be updated periodically Cost in dollars DDR3x TOPS in TPU 4GB 8GB
UDIMM 40 76 RDIMM 42 64 LRDIMM DDR4 UDIMM 26 46 RDIMM 3x TOPS in TPU3x TOPS in TPU 60
LRDIMM 16GB 3x TOPS in TPU2GB 122 3x TOPS in TPU04 211 287 126 3x TOPS in TPU10 279 3x TOPS in TPU3x TOPS in TPU1 64GB 1079 1474
Cost and capacity relationship is not linear 6 Bandwidth Bandwidth depends on load, voltage, and DIMM type 1DPC (MHz) DDR3x TOPS in TPU 1.3x TOPS in TPU5V 1.5V 1.3x TOPS in TPU5V 1.5V UDIMM-DR 53x TOPS in TPU3x TOPS in TPU 667 53x TOPS in TPU3x TOPS in TPU 667
53x TOPS in TPU3x TOPS in TPU 667 667 667 53x TOPS in TPU3x TOPS in TPU 53x TOPS in TPU3x TOPS in TPU 1.2V 1.2V 1.2V RDIMM-DR 1066 93x TOPS in TPU3x TOPS in TPU 800 RDIMM-QR 93x TOPS in TPU3x TOPS in TPU
800 LRDIMM-QR 1066 1066 800 7 Power Modeling Extending CACTI-I/O DDR4 and SerDes support added SerDes parameters from literature for different lengths/speeds For parallel buses, support for more accurate termination power with HSPICE simulations Different termination models for each bus type Different frequency, DIMMs per channel On-DIMM and on-board
Different range (short or long) 8 Interconnect Model API 9 Power Analysis (DDR3x TOPS in TPU) 10 Power Analysis (DDR4) 11 Cost and Bandwidth Analysis Highest possible BW for the demanded capacity Lowest possible cost for the demanded capacity 12 Two Case Studies Key Observations High DPC less BW
More channels high bw and low cost New Idea I: Cascaded Segments Each segment has few DIMMs higher BW New Idea II: Narrow Channels Partition the channel into many parallel channels Fewer DIMMs per data wire, new ECC higher BW Lower power on DIMM 13x TOPS in TPU Cascaded Channels Same DPC, higher BW DIMM DIMM 667MHz DIMM DIMM RoB CPU DIMM DIMM
53x TOPS in TPU3x TOPS in TPU MHz DIMM DIMM DIMM DIMM DIMM DIMM CPU 667MHz Relay on Board chip Same BW, lower cost 32 GB 32 GB 667MHz RoB CPU 64 GB
667 MHz 64 GB 64 GB CPU 667MHz one memory cycle increase in latency 14 Hybrid Memory NVM is slow Software optimized to access DRAM more Unbalanced channel Load D balanced channel Load D CPU D
N D N CPU N N One Channel DRAM One Channel NVM Frontend DRAM Backend NVM 15 Narrow Channels Command/Address Bus is shared between channels Higher Bandwidth but Higher Latency Lower frequency/power for DRAM Chips! ECC on DIMM and CRC for link to reduce bw 16
Methodology Trace-based simulation Trace fed to USIMM Memory-intensive Benchmarks (NPB and SPEC2006) Trace generated by Simics 8-core at 3x TOPS in TPU.2 GHz L1D = 3x TOPS in TPU2KB, L1I = 3x TOPS in TPU2KB, L2 = 8MB Power CACTI 7 17 Cascaded Channels DDR3x TOPS in TPU 25% higher BW DDR4 13x TOPS in TPU% higher BW 22% higher IPC 12% higher IPC 18 Cascaded Latency Memory Latency (CPU cycles)
Memory Capacity (GB) 19 Cascaded Power: DRAM Cartridge 53x TOPS in TPU3x TOPS in TPU MHz 70% utilization 667MHz 667MHz 70% utilization 3x TOPS in TPU5% utilization CPU Baseline Cascaded CPU DIMM 23x TOPS in TPU.2W 22.6W BoB 5.5W 6.4W I/O
9.4W 12.2W Total 3x TOPS in TPU8.1W 41.2W Power/BW 7.9 (nJ/B) 6.7 (nJ/B) 20 Cascaded Cost 21 Cascaded Hybrid D D 2 CPU CPU DRAM N N N 800 MHz
NVM 533 MHz CPU CPU 533 MHz N N N NVM CPU CPU D N N DRAM D N N DRAM 667 MHz N N NVM
400 MHz CPU 800 MHz NVM 667 MHz DRAM 667 MHz NVM N N N 553 MHz D D 533 MHz D D 4 N D D D D DRAM 3
D D N N DRAM D N 667 MHz NVM 400 MHz 1.5 Baseline 1.4 RoB 1.3 1.2 1.1 1 0.9
0.8 0.7 0.6 50% 60% Case 4 70% 80% 90% N N 1.6 DRAM 667 MHz NVM 667 MHz 50% 60% 70% Case 3 80% 90% CPU DRAM
667 MHz NVM 667 MHz N 50% 60% 70% Case 2 80% 90% 1 D 50% 60% 70% Case1 80% 90% D D RoB
Normalized Exe. Time Baseline Percentage of Load on DRAM 22 Narrow Channel: Performance Performance Improvement: 2-channel-x36 18% 3-channel-x24 17% 23x TOPS in TPU Narrow Channel: Power 23% overall memory power reduction 24 Conclusion CACTI 7: models off-chip memories and I/O Detailed I/O power model Design space exploration Analyzes trade-offs: capacity, power, bandwidth, and cost Two novel architectures Cascaded channels
Models continuum Autocratic, teacher-centred approaches Integrating learning and behaviour - Kounin Positive behaviour model - Jones Key skills - Detachment - Tactical ignoring - Limit setting the key to discipline Fred Jones - Positive Discipline Jacob Kounin - Preventative discipline...
Henry George Progress and Poverty Challenged Social Darwinism "The gulf between the employed and the employer is growing wider; social contrasts are becoming sharper" Lester Frank Ward Dynamic Sociology Reform Darwinism - cooperation not competition to succeed Edward Bellamy Looking...
Receipts of Radioactive Materials Number of medical use radioactive material receipts Number of non-medical use radioactive material receipts Results of University EH&S Lab Inspection Program, 2003 to 2005 Number of labs inspected and no violations detected Number of labs inspected...
Prices provide a standard of measure of . value. throughout the world. Prices act as a signal that tells producers and consumers how to . adjust. Prices tell . buyers. and . sellers. whether goods are in short supply or...
Horn and Wiper/Washer Diagnosis and Repair. Accessories Diagnosis and Repair. TOOLS. Hemet High has a fully equipped tool room. We supply Safety Glasses (1. st. pair only, upon completion of safety module) You Supply: Close toed shoes (Mandatory) Coveralls (Optional...
An IV medication may be prepared by a physician, nurse, pharmacist, or a pharmacy technician. Page . 267. Moderate to large doses of fluids or medications are given this way. IV solutions are also used to maintain and to replace...
Ready to download the document? Go ahead and hit continue!