## Modern Physical Design: Algorithm Technology Methodology

Andrew B. Kahng UCLA Majid Sarrafzadeh Northwestern

### Introduction

- This tutorial will cover "the latest word" in physical chip implementation methodology and physical design (PD) algorithm technology.
- The target audience consists of
  - system and circuit designers who would benefit from understanding tool capabilities in this arena,
  - CAD engineers (both R&D and support),
  - design project managers,
  - academic researchers.
- Familiarity with basic PD methodology is assumed.

### Trade-Off: Depth vs. Breadth

- Broad spectrum of possible material
- Only ~6-7 hours for presentation
- Not all possible topics covered in slides, not all slides covered in talks
  - ask questions if you'd like to hear about something in particular, esp. related to methodology or particular P&R techniques
- All tutorial materials will be available in softcopy at
  - http://vlsicad.cs.ucla.edu/ICCAD99TUTORIAL
  - http://www.ece.nwu.edu/nucad/ICCAD99TUTORIAL

### **Overview of the Tutorial**

- PART I: Technology and Methodology Context Setting (9:00 - 10:00)
- PART II: Fundamental Physical Design Formulation and Algorithms (10:00 - 12:00)
  - Coffee Break (10:30 10:45)
  - Lunch (12:00 1:00)
- PART III: Interaction with Upstream Floorplanning and Logic Synthesis (1:00 - 2:00)
- PART IV: Interaction with extraction, analysis, and performance validation (2:00 - 3:30)
  - Coffee Break (3:30 3:45)
- PART V: Linkage to Custom Layout (3:45 4:45)
- Conclusion (4:45 5:00)





### **Overall Roadmap Technology Characteristics**

| YEAR OF FIRST PRODUCT SHIPMENT         | 1997      | 1999      | 2002      | 2005      | 2008      | 2011      | 2014      |  |  |  |
|----------------------------------------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|--|--|--|
| TECHNOLOGY NODE                        |           |           |           |           |           |           |           |  |  |  |
| DENSE LINES (DRAM HALF-PITCH) (nm)     | 250       | 180       | 130       | 100       | 70        | 50        | 35        |  |  |  |
| ISOLATED LINES (MPU GATES) (nm)        | 200       | 140       | 100       | 70        | 50        | 35        | 25        |  |  |  |
| Logic (Low-Volume-ASIC)‡               |           |           |           |           |           |           |           |  |  |  |
| Usable transistors/cm2 (auto layout)   | 8M        | 14M       | 24M       | 40M       | 64M       | 100M      | 160M      |  |  |  |
| Nonrecurring engineering cost          | 50        | 25        | 15        | 10        | 5         | 2.5       | 1.3       |  |  |  |
| /usable transistor (microcents)        | 50        | 20        | 15        | 10        | 5         | 2.5       | 1.0       |  |  |  |
| Number of Chip I/Os – Maximum          |           |           |           |           |           |           |           |  |  |  |
| Chip-to-package (pads)                 | 1515      | 1867      | 2553      | 3492      | 4776      | 6532      | 8935      |  |  |  |
| (high-performance)                     | 1313      | 1007      | 2000      | 5432      | 4//0      | 0332      | 0333      |  |  |  |
| Chip-to-package (pads)                 | 758       | 934       | 1277      | 1747      | 2386      | 3268      | 4470      |  |  |  |
| (cost-performance)                     |           |           |           |           | 2000      | 0200      |           |  |  |  |
| Number of Package Pins/Balls – Maximum |           |           |           |           |           |           |           |  |  |  |
| Microprocessor/controller              | 568       | 700       | 957       | 1309      | 1791      | 2449      | 3350      |  |  |  |
| (cost-performance)                     | 000       | 100       | 001       | 1000      |           | 2440      | 0000      |  |  |  |
| ASIC                                   | 1136      | 1400      | 1915      | 2619      | 3581      | 4898      | 6700      |  |  |  |
| (high-performance)                     |           |           |           | 2010      |           | 1000      | 0.00      |  |  |  |
| Package cost (cents/pin)               | 0.78-2.71 | 0 70-2 52 | 0.60-2.16 | 0 51-1 85 | 0 44-1 59 | 0 38-1 36 | 0.33-1.17 |  |  |  |
| (cost-performance)                     |           |           |           |           |           |           |           |  |  |  |
| Power Supply Voltage (V)               |           |           |           |           |           |           |           |  |  |  |
| Minimum logic Vdd (V)                  | 1.8-2.5   | 1.5-1.8   | 1.2-1.5   | 0.9-1.2   | 0.6-0.9   | 0.5-0.6   | 0.37-0.42 |  |  |  |
| Maximum Power                          |           |           |           |           |           |           |           |  |  |  |
| High-performance with heat sink (W)    | 70        | 90        | 130       | 160       | 170       | 175       | 183       |  |  |  |
| Battery (W)—(Hand-held)                | 1.2       | 1.4       | 2         | 2.4       | 2.8       | 3.2       | 3.7       |  |  |  |
|                                        |           |           |           |           |           |           |           |  |  |  |
|                                        |           |           |           |           |           |           |           |  |  |  |
|                                        |           |           |           |           |           |           |           |  |  |  |

## Overall Roadmap Technology Characteristics (Cont'd)

| YEAR OF FIRST PRODUCT SHIPMENT                                                          | 1997    | 1999    | 2002    | 2005    | 2008                                   | 2011    | 2014    |  |  |  |
|-----------------------------------------------------------------------------------------|---------|---------|---------|---------|----------------------------------------|---------|---------|--|--|--|
| TECHNOLOGY NODE<br>DENSE LINES (DRAM HALF-PITCH) (nm)                                   | 250     | 180     | 130     | 100     | 70                                     | 50      | 35      |  |  |  |
| Chip Frequency (MHz)                                                                    |         |         |         |         |                                        |         |         |  |  |  |
| On-chip local clock<br>(high-performance)                                               | 750     | 1250    | 2100    | 3500    | 6000                                   | 10000   | 16903   |  |  |  |
| On-chip, across-chip clock<br>(high-performance)                                        | 375     | 1200    | 1600    | 2000    | 2500                                   | 3000    | 3674    |  |  |  |
| On-chip, across-chip clock<br>(high-performance ASIC)                                   | 300     | 500     | 700     | 900     | 1200                                   | 1500    | 1936    |  |  |  |
| On-chip, across-chip clock                                                              | 400     | 600     | 800     | 1100    | 1400                                   | 1800    | 2303    |  |  |  |
| (cost-performance)                                                                      |         |         |         |         |                                        |         |         |  |  |  |
| Chip-to-board (off-chip) speed<br>(high-performance, reduced-width,<br>multiplexed bus) | 375     | 1200    | 1600    | 2000    | 2500                                   | 3000    | 3674    |  |  |  |
| Chip-to-board (off-chip) speed<br>(high-performance, peripheral buses)                  | 250     | 480     | 885     | 1035    | 1285                                   | 1540    | 1878    |  |  |  |
| Chip Size (mm2) (@sample/introduction)                                                  |         |         |         |         |                                        |         |         |  |  |  |
| DRAM                                                                                    | 280     | 400     | 560     | 790     | 1120                                   | 1580    | 2240    |  |  |  |
| Microprocessor                                                                          | 300     | 340     | 430     | 520     | 620                                    | 750     | 901     |  |  |  |
| ASIC [max litho field area]                                                             | 480     | 800     | 900     | 1000    | 1100                                   | 1300    | 1482    |  |  |  |
| Lithographic Field Size (mm2)                                                           | 22 x 22 | 25 x 32 | 25 x 36 | 25 x 40 | 25 x 44                                | 25 x 52 | 25 x 59 |  |  |  |
|                                                                                         | 484     | 800     | 900     | 1000    | 1100                                   | 1300    | 1482    |  |  |  |
| Maximum Number Wiring Levels                                                            | 6       | 6–7     | 7       | 7–8     | 8–9                                    | 9       | 10      |  |  |  |
|                                                                                         |         |         |         |         | C Andrew B. Kahng<br>Majid Sarrafzadeh |         |         |  |  |  |

### **Technology Scaling Trends**

- Interconnect
  - Impact of scaling on parasitic capacitance
  - Impact of scaling on inductance coupling
  - Impact of new materials on parasitic capacitance & resistance
  - Trends in number of layers, routing pitch
- Device
  - V<sub>dd</sub>, V<sub>t</sub>, sizing
  - Circuit trends (multithreshold CMOS, multiple supply voltages, dynamic CMOS)
  - Impact of scaling on power and reliability















### **Scaling of Noise with Process**

- Cross coupling noise increases with
  - process shrink
  - frequency of operation
- Propagated noise increases with decrease in noise margins
  - decrease in supply voltage
  - more extreme P/N ratios for high speed operation
- IR drop noise increases with
  - complexity of chip size
  - frequency of chip
  - shrinking of metal layers

### **New Materials Implications**

- Lower dielectric
  - reduces total capacitance
  - doesn't change cross-coupled / grounded capacitance proportions
- Copper metallization
  - reduces RC delay
  - avoids electromigration (factor of 4-5 ?)
  - thinner deposition reduces cross cap
- Multiple layers of routing
  - enabled by planarized processes; 10% extra cost per layer
  - reverse-scaled top-level interconnects
  - relative routing pitch may increase
  - room for shielding

### **Technical Issues in UDSM Design**

### New issues and problems arising in UDSM technology

- catastrophic yield: critical area, antennas
- parametric yield: density control (filling) for CMP
- parametric yield: subwavelength lithography implications
  - optical proximity correction (OPC)
  - phase-shifting mask design (PSM)
- signal integrity
  - crosstalk and delay uncertainty
  - DC electromigration
  - AC self-heat
  - hot electrons

### Current context: cell-based place-and-route methodology

- placement and routing formulations, basic technologies
- methodology contexts

### **Technical Issues in UDSM Design**

- Manufacturability (chip can't be built)
  - antenna rules
  - minimum area rules for stacked vias
  - CMP (chemical mechanical polishing) area fill rules
  - layout corrections for optical proximity effects in subwavelength lithography; associated verification issues
- Signal integrity (failure to meet timing targets)
  - crosstalk induced errors
  - timing dependence on crosstalk
  - IR drop on power supplies
- Reliability (design failures in the field)
  - electromigration on power supplies
  - hot electron effects on devices
  - wire self heat effects on clocks and signals

## Why Now?

- These effects have always existed, but become worse at UDSM sizes because of:
  - finer geometries
    - greater wire and via resistance
    - higher electric fields if supply voltage not scaled
  - more metal layers
    - higher ratio of cross coupling to grounded capacitance
  - lower supply voltages
    - more current for given power
  - lower device thresholds
  - smaller noise margins
- Focus on interconnect
  - susceptible to patterning difficulties
    - CMP, optical exposure, resist development/etch, CVD, ...
  - susceptible to defects
    - critical area, critical volume

Gates/Cells Xtrs Masks Chip ► SW design Logic optimization Mas Detailed Design rection Partitioning placement Yield Technology Detailed ► Functional optimization mapping mapping routing Sorting Floorplanning Analysis Performance Power ⊳ N/A modeling estimation distribution Power analysis estimation System simulation Functional · C ⊳ N/A Verification simulation simulation LVS/DRC Formal checking Equivalence checking Static timing verification Chip test & ► Test Test logic Test Pattern Test diagnostics architecture insertion model generation & generation merge New Figure 4 (Draft Rev. B, 3-12-99) Red denotes most challenging activity





### Silicon Complexity and Design Complexity

- Silicon complexity: physical effects cannot be ignored
  - fast but weak gates; resistive and cross-coupled interconnects
  - subwavelength lithography from 350nm generation onward
  - delay, power, signal integrity, manufacturability, reliability all become first-class objectives along with area
- Design complexity: more functionality and customization, in less time
  - reuse-based design methodologies for SOC
- Interactions increase complexity
  - need robust, top-down, convergent design methodology

### **Guiding Philosophy in the Back-End**

- Many opportunities to leave \$\$\$ on table
  - physical effects of process, migratability
  - design rules more conservative, design waivers up
  - device-level layout optimizations in cell-based methodologies
- Verification cost increases
- Prevention becomes necessary complement to checking
- Successive approximation = design convergence
  - upstream activities pass intentions, assumptions downstream
  - downstream activities must be predictable
  - models of analysis/verification = objectives for synthesis
- More "custom" bias in automated methodologies

### **Implications of Complexity**

UDSM: Silicon complexity + Design complexity

convergent design: must abstract what's beneath

- prevention with respect to analysis/verification checks
- many issues to worry about (all are "first-class citizens"
- apply methodology (P/G/clock design, circuit tricks, ...) whenever possible
- must concede loss of clean abstractions: need unifications
  - synthesis and analysis in tight loop
  - logic and layout : chip implementation planning methodologies
  - layout and manufacturing : CMP/OPC/PSM, yield, reliability, SI, statistical design, ...
- must hit function/cost/TAT points that maximize \$/wafer
  - reuse-based methodology
  - need for differentiating IP  $\rightarrow$  <u>custom</u>-ization

# <section-header><section-header><list-item><list-item><list-item><list-item><list-item><list-item><list-item><list-item><list-item>

### **Example: Defect-related Yield Loss**

- High susceptibility to spot defect-related yield loss, particularly in metallization stages of process
- Most common failure mechanisms: shorts or opens due to extra or missing material between metal tracks
- Design tools fail to realize that values in design manuals are minimum values, not target values
- Spot defect yield loss modeling
  - extremely well-studied field
  - first-order yield prediction: Poisson yield model
  - critical-area model much more successful
  - fatal defect types (two types of short circuits, one type of open)







### Approaches to Spot Defect Yield Loss

- · Modify wire placements to minimize critical area
- Router issue
  - router understands critical-area analyses, optimizations
  - spread, push/shove (gridless, compaction technology)
  - layer reassignment, via shifting (standard capabilities)
  - related: via doubling when available, etc.
- Post-processing approaches in PV are awkward
  - breaks performance verification in layout (if layout has been changed by physical verification)
  - no easy loop back to physical design: convergence problems

# <section-header><section-header><list-item><list-item><list-item><list-item><list-item><list-item><list-item>

### Antennas

- Charging in semiconductor processing
- Standard solution: limit antenna ratio
  - antenna ratio =  $(A_{poly} + A_{M1} + ...) / A_{gate-ox}$
  - e.g., antenna ratio < 300
  - A<sub>Mx</sub> = metal (x) area electrically connected to node without using metal (x+1), and not connected to an active area

# <section-header><list-item><list-item><list-item><list-item><list-item><list-item><list-item><list-item><list-item>







- Pre routing specification
  - convenient, handled by router
  - robust but conservative
  - may consume big area
- Post routing specification
  - area efficient-shield only where needed & have space
  - ease task of router
  - sufficient shielding is not guaranteed
- Either way: definite interactions w/ fill insertion, possible interactions w/ phase-shifting (M1,M2?)

























New guess delay is adequate but try and improve cost

















### Outline

- Technology trends
- Post-layout optimization methodologies
  - manufacturability and reliability
  - performance
- Custom or custom-on-the-fly methodologies
- Flavors of classic planning-based methodologies
- Implications for P&R

### Custom Methodology in ASIC(?) / COT

- How much is on the table w.r.t. performance?
  - 4x speed, 1/3x area, 1/10x power (Alpha vs. Strongarm vs. "ASIC")
  - layout methodology spans RTL syn, auto P&R, tiling/generation, manual
  - library methodology spans gate array, std cell, rich std cell, liquid lib,
- Traditional view of cell-based ASIC
  - Advantages: high productivity, TTM, portability (soft IP, gates)
  - Disadvantages: slower, more power, more area, slow production of std cell library
- Traditional view of Custom
  - Advantages: faster, less power, less area, more circuit styles
  - Disadvantages: low productivity, longer TTM, limited reuse

### Custom Methodology in ASIC(?) / COT

### With sub-wavelength lithography:

- how much more guardbanding will standard cells need?
- composability is difficult to guarantee at edges of PSM layouts, when PSM layouts are routed, when hard IPs are made with different density targets, etc.
- context-independent composability is the foundation of cellbased methodology!
- With variant process flavors:
  - hard layouts (including cells) will be more difficult to reuse
- → Relative cost of custom decreases
- On the other hand, productivity is always an issue...

### Custom Methodology in ASIC(?) / COT

- Architecture
  - heavy pipelining
  - fewer logic levels between latches
- Dynamic logic
  - used on all critical paths
- Hand-crafted circuit topologies, sizing and layout
  - good attention to design reduces guardbands

The last seems to be the lowest-hanging fruit for ASIC

### Custom Methodology in ASIC(?) / COT

- ASIC market forces (IP differentiation) will define needs for xtor-level analyses and syntheses
- Flexible-hierarchical top-down methodology
  - basic strategy: iteratively re-optimize chunks of the design as defined by the layout, i.e., cut out a piece of physical hierarchy, reoptimize it ("peephole optimization")
    - for timing/power/area (e.g., for mismatched input arrival times, slews)
    - for auto-layout (e.g., pin access and cell porosity for router)
    - for manufacturability (density control, critical area, phaseassignability)
    - DOF's: diffusion sharing, sizing, new mapping / circuit topology sol's
    - chunk size: as large as possible (tradeoff between near-optimality, CPU time)
  - antecedents: IBM C5M, Motorola CELLERITY, DEC CLEO
  - "infinite library"recovers performance, density that a 300-cell library and classic cell-based flow leave on the table















### Planning / Implementation Methodologies

- Centered on logic design
  - wire-planning methodology with block/cell global placement
  - global routing directives passed forward to chip finishing
  - constant-delay methodology may be used to guide sizing
- Centered on physical design
  - placement-driven or placement-knowledgeable logic synthesis
- Buffer between logic and layout synthesis
  - placement, timing, sizing optimization tools
- Centered on SOC, chip-level planning
  - interface synthesis between blocks
  - communications protocol, protocol implementation decisions guide logic and physical implementation

### Planning / Implementation Methodologies

- Centered on logic design
  - wire-planning methodology with block/cell global placement
  - global routing directives passed forward to chip finishing
  - constant-delay methodology may be used to guide sizing
- Centered on physical design
  - placement-driven or placement-knowledgeable logic synthesis
- Buffer between logic and layout synthesis
  - placement, timing, sizing optimization tools
- Centered on SOC, chip-level planning
  - interface synthesis between blocks
  - communications protocol, protocol implementation decisions guide logic and physical implementation







### **KEY ISSUE: PREDICTABILITY**

- Everything we do is ultimately aimed at a predictable, estimatable back end (physical implementation after some handoff level of design)
- Predictability == regression models
- Predictability == an enforceable assumption
  constant-delay paradigm (logical effort, DEC, IBM, ...)
- Predictability == fast constructive prediction
  RT-level (Tera), gate-level flat full-chip (SPC)
- Predictability == remove the need for predictability
  - GALS, LIS
  - "protocol- / communication-based system-level design"

# **Problems With Physical Hierarchy**

- Physical hierarchy = hierarchical organization of the core layout region
- In general, <u>no relation</u> to high-quality (e.g., w.r.t. timing, routability) embedding of logic
  - artifactual physical hierarchy created by top-down placers
  - core region is relatively homogeneous, isotropic: imposing a hierarchy is generally harmful
- Of course, some obvious exceptions
  - regular structures (memories, PLAs, datapaths)
  - hard IP blocks
  - but these don't fit well in top-down placement anyway
- General trend: non-hierarchical embedding approaches

### **The Problem With Hierarchies**

- Two hierarchies: logical/functional, and physical
  schematic hierarchy also typical in structured-custom
- RTL design = logical/functional hierarchy
  - provides valuable clues for physical embedding: datapath structure, timing structure, etc.
  - can be incredibly misleading (e.g., all clock buffers in a single hierarchy block)
- Main issues:
  - how to leverage logical/functional hierarchy during embedding
  - when to deviate from designer's hierarchy
  - methodology for hierarchy reconciliation (buffers, repartitioning / reclustering, etc.)







### **Soft-Block Assembly**

- Hard rectilinear blocks make prediction of global wires extremely difficult
- Top-down constraint-driven assembly of soft fabrics: ability to significantly restructure circuit level blocks during the assembly process helps reach performance goals
  - For example, timing-critical interconnect paths can be completely restructured during assembly without changing any of the system level specification
- Key issue: how to determine the soft blocks in the first place
  - non-classical partitioning objectives: area sensitivity, functional and clocking structure, critical timing-path awareness, matching capabilities of block placer
  - block placement: largely unsolved issue
    - unclear whether packing-centric or connectivity-centric approaches are best

# <section-header><section-header><list-item><list-item><list-item><list-item><list-item><list-item><list-item><list-item>

### Cell-Based P&R: Classic Context

- Architecture design
  - golden microarchitecture design, behavioral model, RT-level structural HDL passed to chip planning
  - cycle time and cycle-accurate timing boundaries established
  - hierarchy correspondences (structural-functional, logical (schematic) and physical) well-established
- Chip planning
  - hierarchical floorplan, mixed hard-soft block placement
  - block context-sensitivity: no-fly, layer usage, other routing constraints
  - route planning of all global nets (control/data signals, clock, P/G)
  - induces pin assignments/orderings, hard (partial) pre-routes, etc.
- Individual block design -- various P&R methodologies
- Chip assembly -- possibly implicit in above steps
- What follows: qualitative review of key goals, purposes

**C** Andrew B. Kahng Majid Sarrafzadeh

### **Placement Directions**

- Global placement
  - engines (analytic, top-down partitioning based, (iterative annealing based) remain the same; all support "anytime" convergent solution
  - becomes more hierarchical
    - block placement, latch placement before "cell placement"
  - support placement of partially/probabilistically specified design
- Detailed placement
  - LEQ/EEQ substitution
  - shifting, spacing and alignment for routability
  - ECOs for timing, signal integrity, reliability
  - closely tied to performance analysis backplane (STA/PV)
  - support incremental "construct by correction" use model

### **Function of a UDSM Router**

- Ultimately responsible for meeting specs/assumptions
  - slew, noise, delay, critical-area, antenna ratio, PSM-amenable ...
- Checks performability throughout top-down physical impl.
  - actively understands, invokes analysis engines and macromodels
- Many functions
  - circuit-level IP generation: clock, power, test, package substrate routing
  - pin assignment and track ordering engines
  - monolithic topology optimization engines
  - <u>owns</u> key DOFs: small re-mapping, incremental placement, device-level layout resynthesis
  - is hierarchical, scalable, incremental, controllable, wellcharacterized (well-modeled), detunable (e.g., coarse/quick routing), ...

### **Out-of-Box Uses of Routing Results**

### Modify floorplan

- floorplan compaction, pin assignments derived from top-level route planning
- Determine synthesis constraints
  - budgets for intra-block delay, block input/output boundary conditions
- Modify netlist
  - driver sizing, repeater insertion, buffer clustering
- Placement directives for block layout
  - over-block route planning affects utilization factors within blocks
- Performance-driven routing directives
  - wire tapering/spacing/shielding choices, assumed layer assignments, etc.

### **Routing Directions**

- Cost functions and constraints
  - rich vocabulary, powerful mechanisms to capture, translate, enforce
- Degrees of freedom
  - wire widths/spacings, shielding/interleaving, driver/repeater sizing
  - router empowered to perform small logic resyntheses
- "Methodology"
  - carefully delineated scopes of router application
  - instance complexities remain tractable due to hierarchy and restrictions (e.g., layer assignment rules) that are part of the methodology
- Change in search mechanisms
  - iterative ripup/reroute replaced by "atomic topology synthesis utilities": construct entire topologies to satisfy constraints in arbitrary contexts
- Closer alignment with full-/automated-custom view
  - "peephole optimizations" of layout are the natural extensions of Motorola CELLERITY, IBM CM5, etc. methodologies

### **Noise Sources**

- Analog design concerns are due physical noise sources
  - because of discreteness of electronic charge and stochastic nature of electronic transport processes
  - example: thermal noise, flicker noise, shot noise
- Digital circuits due to large, abrupt voltage swings, create deterministic noise which is several orders of magnitude higher than stochastic physical noise
  - still digital circuits are prevalent because hey are inherently immune to noise
- Technology scaling and performance demands made noisiness of digital circuits a big problem