Conjoining Soft-Core FPGA Processors

Conjoining Soft-Core FPGA Processors

Conjoining Soft-Core FPGA Processors David Sheldona, Rakesh Kumarb, Frank Vahida*, Dean Tullsenb , Roman Lyseckyc Department of Computer Science and Engineering University of California, Riverside *Also with the Center for Embedded Computer Systems at UC Irvine a b c Department of Computer Science and Engineering University of California, San Diego Department of Electrical and Computer Engineering University of Arizona This work was supported in part by the National Science Foundation, the Semiconductor Research Corporation, and by hardware and software donations from Xilinx FPGA Soft Core Processors HDL description HDL Description Flexible implementation FPGA Soft-core Processor

ASIC FPGA or ASIC Spartan 3Virtex 2 Virtex 4 Technology independent David Sheldon, UC Riverside 2 of 22 FPGA Soft Core Processors Soft Core Processors can have configurable options Datapath units Cache Bus architecture Current commercial FPGA Soft-Core Processors Xilinx Microblaze Altera Nios David Sheldon, UC Riverside FPU P

MAC Cache FPGA 3 of 22 Conjoinment Overview Application 1 Application 2 Base microprocesso r Base microprocesso r Add necessary units to both processors Conjoin the FPU Unit FPU FPU FPU Conjoined FPU unit Conjoining David Sheldon, UC Riverside 4 of 22 Conjoinment Background Conjoinment proposed for multicore

desktop processing (Kumar 2004) Reduces size with reasonable performance overhead e.g., cache conjoinment overhead: 1%-13% ICache Sharing David Sheldon, UC Riverside DCache Sharing 5 of 22 Conjoinment for softcore FPGA processors Area savings Performance overhead Tuning heuristic for two configurable soft-cores with conjoin option ? David Sheldon, UC Riverside size perf Outline 6 of 22

Area Savings Multiplier 10000 Barrel Shifter Divider Base MicroBlaze 8000 6000 FPU Unit 122 2738 David Sheldon, UC Riverside 6% 32% 23% 4% Significant potential area savings Limitations Does not consider multiplexing costs FPU conjoined

2000 Equivalent LUTs 0 bs div mul fpu Unit instantiated w ith base processor Multiplier 1331 Divider 4000 Size Barrel Shifter 228 unconjoined Due to absence of FPGA synthesis tools supporting conjoinment But good potential justifies further investigation 7 of 22 Conjoinment for softcore FPGA processors Area savings Performance overhead Tuning heuristic for two configurable soft-cores with conjoin option ?

David Sheldon, UC Riverside size perf Outline 8 of 22 Performance Overhead No simulator exists for conjoined processors We developed our own Trace-based conjoined processor simulator app1 app2 Simulation uses pessimistic performance assumptions Xilinx simulator Kumar's techniques can improve Simulator outputs contention information Final cycles can be compared to unconjoined to determine performance overhead trace1trace2 Conj. simulat or

bre v Access stall Contention stall bitmn p David Sheldon, UC Riverside 9 of 22 Performance Overhead Speedup: Application time on optimally configured processor / avg. app. time on base processor Compared configuration with conjoinment versus without Performance overhead usually small, averaged just 4.2% Overhead caused by access delays and contention of the hardware units bre v bitmn p 17% 2.4% 4.5 4 3.5 3 2.5 2 1.5 Speedup 1 0.5 0

Conjoined Unc onjoined (brev),brev brev,(brev) (brev),canrdr brev,(canrdr) (brev),bitmnp brev,(bitmnp) (bitmnp),canrdr bitmnp,(canrdr) (bitmnp),bitmnp bitmnp,(bitmnp) (canrdr),canrdr canrdr,(canrdr) David Sheldon, UC Riverside 10 of 22 Conjoinment for softcore FPGA processors Area savings Performance overhead Tuning heuristic for two configurable soft-cores with conjoin option ? David Sheldon, UC Riverside size

perf Outline 11 of 22 Tuning Heuristic Multiplier Multiplier Base MicroBlaze 1 Multiplier Barrel Shifter Divider NO FPU FPU 1 FPU conjoined NO FPU FPU 2 Base MicroBlaze 2 5 choices per unit e.g., FPU no unit, 1 only, 2 only, 1 & 2, and conjoined

4 units 54 = 625 possible configurations Simulation: ~30 minutes per configuration Need search heuristic to tune David Sheldon, UC Riverside 12 of 22 Map to 0-1 Knapsack Problem Synthesis Synthesis Creating the model FPU MicroBlaze Barrel Shifter Base size Divider perf perf perf App Multiplier FPU size perf MicroBlaze

size size BS FPU MUL DIV Perf increment 1.1 0.9 1.2 1.0 Size increment 1.4 2.7 1.8 1.1 Perf/Size 0.96 0.34 0.63 0.93 David Sheldon, UC Riverside 13 of 22

Map to 0-1 Knapsack Problem First consider tuning without conjoinment Items: Problem of instantiating units to limited FPGA size can be mapped to the 0-1 knapsack problem Add items, each with weight and benefit, to weightconstrained knapsack such that profit maximized MUL 1 Weights: 1331 Benefits:0.08 1 1 228 121 MUL 2 FPU 1 2 2 1331 228 121 2738 0.62 0.00 0.00 FPU 2 2738 0.22 0.76 0.00 0.00 MUL 1 Note: Mapping inexact weights/benefi ts not strictly

David Sheldon, UC Riverside additive Base MicroBlaz e FPU 1 Base MicroBlaz e MUL 2 Available FPGA Knapsack 14 of 22 Disjunctively Constrained Knapsack Problem: If conjoined unit included, can't also include standalone unit Solution: Map to disjunctively-constrained 0-1 knapsack Items: Yanada T., Heuristic and Exact Algorithms for the Disjunctively Constrained Knapsack Problem, 2002 Prohibits specific item pairs from being in the knapsack ILP solution, running time is pseudo polynomial MUL 1 1 1 MUL 2

FPU 1 MUL C C C 2 2 FPU 2 FPU C Base MicroBlaz e Base MicroBlaz e Available FPGA David Sheldon, UC Riverside Knapsack 15 of 22 Disjunctively Constrained Knapsack Items: 1 1 MUL 1 Weights: 1331 Benefits:0.08 MUL 2 FPU 1 FPU 2

228 121 2738 1331 228 121 2738 0.62 0 0 0.22 0.76 0 0 MUL C 2 2 Conjoined benefits shows a Weights: 1331 small decrease in benefit from Benefits 1:0.06 the unconjoined Benefits 2:0.21 unit C C FPU C 228

121 2738 0.54 0.71 0 0 0 0 Conjoined units provide benefits to both processors Base MicroBlaz e Base MicroBlaz e Available FPGA David Sheldon, UC Riverside Knapsack 16 of 22 Disjunctively Constrained Knapsack Running Time Modeling

5 Synthesis runs for each Processor At most 4 runs of the conjoined Simulator Disjunctively Constrained 0-1 Knapsack NP-complete problem Solved with a heuristic Heuristic takes < 1 min David Sheldon, UC Riverside 17 of 22 Results Data gathered for the Xilinx Microblaze Soft-core Processor 10 EEMBC and Powerstone benchmarks aifir, BaseFP01, bitmnp, brev, canrdr, g3fax, g721_ps, idct, matmul, tblook, ttsprk Obtained results for all possible pairwise conjoinment We only show conjoinment data when both applications use unit To avoid making conjoinment appear better than it is David Sheldon, UC Riverside 18 of 22

Results bitmnp, bitmnp knapsack 8 bitmnp, bitmnp optimal canrdr, canrdr knapsack 7 canrdr, canrdr optimal 6 BaseFP01, BaseFP01 knapsack BaseFP01, BaseFP01 optimal 5 BaseFP01, bitmnp knapsack 4 BaseFP01, bitmnp optimal Speedup 3 BaseFP01, canrdr knapsack BaseFP01, canrdr optimal 2 tblook, tblook knapsack tblook, tblook optimal 1 tblook, bitmnp knapsack 0 tblook, bitmnp optimal 0 1000 2000 3000

4000 5000 6000 7000 8000 9000 Size (Equiv LUTs) apsack approach finds near-optimal David Sheldon, UC Riverside tblook, canrdr knapsack tblook, canrdr optimal in most cases 19 of 22 Results Knapsack heuristic finds near-optimal in most cases (versus exhaustive with conjoinment) Runs in seconds One example had sub-optimal results (2.9 times slower) Performance overhead due to conjoinment just a few percent on average knapsack 8 7 6 5 4 3 2

1 0 David Sheldon, UC Riverside exhaustive w/ conj. exhaustive w/o conj. 20 of 22 Results On average the knapsack approach yields the same size as the exhaustive with conjoinment Average size savings of 16% 12000 10000 knapsack exhaustive w/ conj. exhaustive w/o conj. 8000 6000 4000 2000 0 David Sheldon, UC Riverside 21 of 22 Conclusions Conjoining two soft-core FPGA processors reduces average size by 16%

Performance overhead just a few percent in most cases Disjunctively constrained 0-1 knapsack approach finds near-optimal in most cases But could be improved for some examples Future Consider multiplexing size and delay overheads Apply Kumar's advanced conjoining techniques to reduce overheads David Sheldon, UC Riverside 22 of 22

Recently Viewed Presentations

  • The Lawsons Family Mission Lawsons vision, (and without

    The Lawsons Family Mission Lawsons vision, (and without

    amme • • • Helping you achieve a positive balance in your life• Available to both you and your immediate family members• Offering Financial, Legal & Relationship support (telephone / face • to face counselling)• ...
  • Chapter 1

    Chapter 1

    Chapter 10 Reference Groups and Family Influences What is a Group? Two or more people who interact to accomplish either individual or mutual goals A membership group is one to which a person either belongs or would qualify for membership...
  • Diwali Samvat - 2075 Recommendation Company Sector CMP

    Diwali Samvat - 2075 Recommendation Company Sector CMP

    Mr. RavinderParkash Seth is the Managing Director of Elite Wealth Advisors Ltd (EWAL, henceforth), having its registered office at Casa Picasso, Golf Course Extension, Near Rajesh Pilot Chowk, Radha Swami, Sector-61, Gurgaon-122001 Haryana, is a SEBI registered Research Analyst and...
  • Netflix in the early 2000&#x27;s and beyond - The TV MegaSite, Inc.

    Netflix in the early 2000's and beyond - The TV MegaSite, Inc.

    Netflix was started in 1997 by 2 Software Millionaires, Marc Randolph and Reed Hastings. The company was launched in April 14, 1998 with only 925 different DVD's that you could rent over the Internet for a small fee. In 1999,...
  • Masoud Asadzadeh, Bryan A. Tolson, A. J. MacLean.

    Masoud Asadzadeh, Bryan A. Tolson, A. J. MacLean.

    Masoud Asadzadeh, Bryan A. Tolson, A. J. MacLean. Dept. of Civil & Environmental Engineering, University of Waterloo AGU Fall Meeting, Dec 17, 2009. Paper Number: H41A-0869 Upper bound is best attainable tradeoff in comparison MO optimizers try to find a...
  • Scholarships and Bursaries

    Scholarships and Bursaries

    The following tips will help your letter stand out: Write a strong thesis or umbrella statement to provide information on why you think you are a good candidate for the scholarship Introduce each section with a concise and clear sentence...
  • Chapter 5

    Chapter 5

    Minimum of five to eight characters is typical. Other password options include: ... The use of the finger utility can be disabled by turning it off on all UNIX, Linux servers and routers. ... Summary. A network security policy is...
  • The Gospels Part 08 42 HEALING OF THE

    The Gospels Part 08 42 HEALING OF THE

    of joints and of marrow, and discerning the thoughts and intentions of the heart. (13) And no creature is hidden from his sight, but all are naked and exposed to the eyes of him . to whom we must give...