19 July

OCV, AOCV and POCV in VLSI : A comparative analysis

In this article, A comparative study of OCV (On Chip Variation), AOCV (Advance On Chip Variation) and POCV (Parametric On Chip Variation) have been done. Why and how a new variation model has evolved over the previous one and how it is better in term of timing pessimism have also been discussed.

Introduction:

We have already discussed the On-Chip Variation (OCV) in a previous article. It is recommended that to go through that article to understand the OCV and its sources in a better way. Briefly, there are two types of process variations:
  1. Systematic Variation (or Global Variation) and 
  2. Random Variation (or Local Variation)

Systematic variations are predictable in nature and could be modelled and tuned as the technology node get matured. But the random variation is highly unpredictable in nature and difficult to model it. Systematic variation is taken care in PVT and for random variation, we apply a derate factor on the delay of cells. Such a process variation could change the parameters of currents of the transistors and ultimately the delay of a cell. And if the delay of a cell gets affected it could result in the timing failure after fabrication and could result in post-silicon failure of a chip.

To avoid such failure and make the design immune from such process variation we have to keep the future process variation in mind and consider an expected delay variation while doing the Static Timing Analysis (STA). 

On chip Variation (OCV):

In OCV a fixed timing derate factor is applied to the delay of all the cells present in design so that in case of process variation affect the delay of any cells during the fabrication, it will not affect the timing requirements and chip will not fail after fabrication. 

Fabrication process variations could either increase or decrease the delay of a cell. So we need to set early and late value while setting the derate factor. STA tool would consider early or late timing derate based on the path and type of analysis. Here is an example of setting the OCV timing derate factor.

% set_timing_derate -cell_delay -rise -data -early 0.92

% set_timing_derate -cell_delay -rise -data -late 1.10

% set_timing_derate -cell_delay -rise -clock -early 0.95

% set_timing_derate -cell_delay -rise -clock -late 1.06

% set_timing_derate -cell_delay -fall -data -early 0.90

% set_timing_derate -cell_delay -fall -data -late 1.12

% set_timing_derate -cell_delay -fall -clock -early 0.94

% set_timing_derate -cell_delay -fall -clock -late 1.07


In the above example, line-1 sets 8% early timing derate and line-2 sets 10% late timing derate to the rising edge on the data path. Similarly, line-4 and line-5 sets 5% early and 6% late timing derate to the rising edge on the clock path.


Figure-1: Derate factor on setup analysis

Figure-2: Derate factor on hold analysis
  
Figure-1 and 2, shows the derate factor consideration by STA tool while setup and hold analysis for different paths. In reg2reg path timing analysis there is a launch flop from where data is launched and a capture flop where data is captured. The path between the clock source to the clock pin of the launch flop is called the launch clock path and the path between from the clock source to the clock pin of capture flop is called capture clock path. In setup analysis worse case could be late data path, late launch clock path and early capture clock path which could fail the setup timing. So STA tool will consider late timing derate for data path and launch clock path and early derate for the capture clock path. 

For hold analysis, a fast data path, early launch clock and late capture clock could be the worst scenario. So STA tool will consider always the worst scenario and take early derate factor for data path and launch clock path and late derate factor for capture clock path. 


Issues in OCV:

Fixed timing derate used for all the cells in the OCV is over pessimistic. In reality, there is the cancellation of random variation effect. All the cells in a particular path could not be delayed all or early all. There is a mixed type of effects always and this cause cancellation of effect in total. 

For example, consider a data path of 6 buffers and their typical delay is 20ps of each cell. Consider 20% late an early derate. So considering all the cells have the effect of late in delay, this path will have maximum delay 144ps. But in practice, it is very rare that all the cell will have an effect either late only or early only. Most likely some will be late and some will be early so there will be a cancellation of effect and real delay will always be less than 144ps. Figure-3 shows the delay variation with OCV derate factor.


Figure-3: Delay variation with OCV derate

The concepts of OCV fixed derate was modelled in technology node above 90nm. It was good for such higher technology node. But in lower technology node and especially high-frequency design, the timing closer became very difficult due to high pessimism of fixed derate. So for lower technology node, we need to resolve this issue. And so the concept of Advance On Chip Variation (AOCV) has evolved which do not uses the fixed derate. 

Advance On chip Variation (AOCV):
In AOCV derate is applied on each cell based on path depth and distance of the cell in the timing path and it also varies with cell type and drive strength of the cell. Distance is defined by a bounding box for the net and cells as shown in figure-4. 

Figure-4: Bounding box for cell and net distance

Distance: If the distance increases, systematic variation would increase and to mitigate the variation, we need to use higher derate value. So along with the distance, derate value increases. 

Path depth: In the case of distance is fixed and path depth increases, systematic variation would be constant but the random variation would tend to cancel each other. Therefore as path depth increases the derate factor would decrease. Figure-5, illustrate the path depth in the timing path.

Figure-5: Path depth in the timing path


Cell type: The derate is based on the cell type as an AND gate and an OR gate can not exhibit the same variation. Derate value also varies with drive strength of the cell, like AND2X2 and AND2X6 will have different drive derate value.

AOCV Analysis Mode: 
AOCV supports two analysis modes:
  1. Clock only
  2. Clock and data
In clock only mode, AOCV derate is applied only on the clock path so it has reduced effort and a fast runtime. Whereas in clock and data mode, AOCV derate is applied on full design. AOCV analysis requires an AOCV derate table for all the cells. Due to reduces timing pessimism in AOCV, it has been observed that huge time violation have fixed when we moved to clock only AOCV derate mode and the violations have further reduced when we moved to clock and data AOCV mode. 

AOCV analysis supports multiple AOCV derating table. There are generally two types of table are used either 1D table or 2D table. 1D table contains variation of derate values either with distance or depth whereas AOCV 2D derate table contains derate variation with both distance and depth together. An AOCV 2D derate table has shown in figure-6.


Figure-6: AOCV 2D derate table example


PrimeTime AOCV flow: 
There are some additional steps added in AOCV derarte analysis as compare to OCV fixed derate flow. A flow of primeTime tool for AOCV derate analysis has shown in figure-7. 

Figure-7: PrimeTime AOCV derate flow

Issues in AOCV:
AOCV does not perform very well bellow 40nm technology node and to improve that we need to improve the timing pessimism further. Distance and Depth based derate factor used in the AOCV is good for technology node above 40nm but tbe bellow node we need to improve it further. To address these issue Parametric On Chip Variation (POCV) has developed. POCV is very effective in technology node 20nm and bellow.
POCV is more realistic approach than that of OCV and AOCV. This method does not use distance and depth based derate factor. It uses delay sigma to model the delay variation of the cell. An advantage of POCV over AOCV is also that it reduces the slack pessimism between Graph Based Analysis (GBA) and Path Based Analysis (PBA). 

Parametric On chip Variation (POCV):
POCV advance variation technology provides statistical benefits without the overhead of expansive statistical library characterization. In POCV instead of applying the specific derate factor to a cell, cell delay is calculated based on delay variation (σ) of the cell. In POCV it is assumed that the normal delay value of a cell follows the normal distribution curve. An example of a normal distribution curve and standard deviation of data from the mean is shown in figure-8.
Figure-8: Standard deviation of data from mean


In normal distribution 68% of data falls within the 1σ range, 95% data falls within 2σ and 99.7% data fall within the range of 3σ.  

POCV Analysis:
  • POCV uses nominal delay value (µ) instead of using min or max value of delay to model the random variations.
  • Timing analysis is done using the nominal delay value (µ) and delay variation (σ) in the following way.
    1. Tool takes the value of σ from the timing library or an external file containing the POCV coefficient value C.
    2. Each arc time is then calculated statistically as the total of nominal delay and the variation.
    3. The tool then calculates the delay of the path by statistically combining these arc delay and perform the setup and hold timing analysis.
  • By default, the tool performs the POCV analysis at 3σ from the mean but other value can also be specified. More the value of standard deviation, it's more pessimistic in timing.
POCV input data:
The input for delay variation 
σ can be provided to the tool by the tool different ways as discussed below.

1. Using single POCV coefficient (C):  An external file containing the delay coefficient value C for each library cell, hierarchical cell or design. There is only one value of C for each timing arc of the cell irrespective of the input transition or output load. The delay variation σ is calculated based on C as follow. 
The Delay variation (σ) = C * Nominal delay (µ)
  
An example of POCV coefficient file:

version: 4.0
ocvm_type: pocvm
object_type: lib_cell
rf_type: rise fall
delay_type: cell
derate_type: early
object_spec: */INV*
coefficient: 0.0693

2. Using Library Variation Format (LVF):
The information of POCV variation is directly provided into the library itself in LVF format. In LVF format there are two indexes used one for input transition and other for output load. An example of the POCV LVF format has shown below.

ocv_sigma_cell_rise ("pocv_template_4x4") {

    sigma_type : "late";

    index_1("0.01, 0.04, 0.12, 0.80");

    index_2("0.01, 0.02, 0.03, 0.10");

    values( "σ11, σ12, σ13, σ14", \

            "σ21, σ22, σ23, σ24", \

            "σ31, σ32, σ33, σ34", \

            "σ41, σ42, σ43, σ44", );

}


Typically index-1 denotes the input transition and index-2 denotes the output load. 
If both data types are present in the design then by default the file with single POCV coefficient has higher precedence than POCV slew-load table or LVF format file.

POCV delay calculation:
    
Delay of a cell = Nominal delay (µ)  ±  (C * Nominal delay) * N
        
Where, C = POCV coefficient and  
N = Number of standard deviation
 OR
 
Delay of cell = Nominal delay (µ)  ±  Variation 

PrimeTime POCV analysis flow:
A flow of primeTime tool for POCV derate analysis has shown in figure-9. 

Figure-9: PrimeTime POCV analysis flow


Comparison between POCV and AOCV:
A basic comparison between POCV and AOCV has shown below. 



Summary:

In this article OCV, AOCV and POCV have been discussed in details, The aim of this article is to provide the basic concepts of these three on chip variation method and give a comparative insight. 

Thank you!

15 July

On Chip Variation in VLSI | OCV in Physical Design

In this article, we will discuss sources of On-Chip Variation (OCV) in VLSI, Why On Chip Variation occurs and how to take care of on chip variation in physical design. We will also discuss in very brief about the Advance On Chip Variation (AOCV) and Parametric On Chip Variation (POCV).


Background: 

The final output which goes to the fabrication laboratory after physical design and signoff in the ASIC design cycle is the .gds (Graphical Design System) file. IC (Integrated Circuit) is fabricated on the silicon wafer, based on this final gds data. A big silicon wafer is divided into the various small die and each die contain an individual IC. After wafer-level testing, we cut and separate each die and do packaging of IC.

We have the same gds data for all the ICs in all die but the location of dia is different on the wafer. If the gds is same for all the die then ideally electrical characteristics of all the ICs should have the same. But practically it is not. The IC manufactured in different die has variation in their electrical characteristics. Figure-1 shows a silicon wafer and die on the wafer.

Figure-1: Silicon wafer and die on the wafer


For example, let us consider three dies at different locations on the wafer as shown in the figure-2. Die-1 is situated at the centre of the wafer, die-3 at the edge of wafer and die-2 in between the centre and outer edge. 

Figure-2: Wafer, dies and transistors inside die


So Inside a wafer, there are hundreds of dies and there is a variation in each dia and also in the lots of wafers. Or if we investigate more deeply we found that there are millions of transistors inside an IC and all the transistors inside a single IC are not similar. So there are variations in the characteristics of transistors even inside a single IC along with the die and lots. Now an important question comes, from where all these variations come? What is the root cause of these variations? And the answer is the fabrication process itself is the main cause of these variations. So let's investigate the source of these variations.

Sources of Variations:

There are three major sources of variations, Process, Voltage and Temperature. These variations are collectively called PVT variations. We already do PVT analysis and take care of these variations while designing an ASIC, then why we need to take care of OCV separately? And the answer is, all the variations can not be taken care in PVT analysis. Some of them are predictable and can be modelled easily as the technology get matures but some of them are highly unpredictable and can not be modelled easily. Figure-3 shows the various components of the PVT and OCV variation together.


Figure-3: Variation components under PVT and OCV 


In process variation, there are two types of variations one is systematic variation and other is a non-systematic variation or random variation. Systematic variations come due to Optical Proximity Corrections (OPC) or Chemical Mechanical Policing (CMP) which are predictable in nature and can be modelled in PVT variations. Non-systematic variations come from the Random Dopant Fluctuation (RDF), Line Edge Roughness (LER) or due to Oxide Thickness Variations (OTV) which are highly unpredictable and can not be modelled easily. Or we can say that these variations are random in nature. 
In Voltage variation, one is due to variation in external supply voltage and other is internal voltage variation inside the chip. There is no ideal voltage supply and there is always 2-5% variations in supply voltage even after utmost care is taken in the supply voltage design. This type of variation is taken care in PVT but another type of variation is due to internal IR drops and it is not possible to model in PVT as it is random in nature and depends on the design. So we need to take care of such voltage variation in the OCV. 
If we talk about temperature, then there is an ambient temperature on which the chip is operating and another temperature is the junction temperature of the transistors. junction temperature is the sum of ambient temperature plus the temperature raised due to power dissipation in the chip. Junction temperature is always much greater than the ambient temperature and the characteristics of any transistors majorly depend on the junction temperature. Ambient temperature can be taken care in PVT but for the junction temperature variations, we need to take care in OCV. 

Let's discuss more all these variations.

I. Process Variations:

The drain current of an nMOS transistor in the linear region can be defined as 

Where Id is the drain current, μn is the mobility of electrons, ∈ox is the permittivity of silicon oxide, tox is the oxide thickness, W is the width of transistors and L is the gate length of the transistor as shown in figure-4. 


Figure-4: Terminals and schematics of a MOS device


In the drain current equation, the factors which are dependent on the fabrication process are:
Gate Oxide Thickness (tox)
With of transistor (W)
Length of the transistor (L)
and Threshold voltage of Transistor

So if any of the factors mentioned above varies during the fabrication process, It will affect the drain current. The delay of a cell is dependent on the drain current so due to process variation, the delay of a standard cell is going to vary. Now see some example, how these parameters can get affected during the fabrication process. Figure-5 and Figure-6 show the length and width variation associated with the photolithography process. 

Figure-5: Optical Proximity Correction

Optical Proximity Correction (OPC) is a process which is applied to the layout before mask generation in order to get better replication of layout on the wafer. In this process generally, the corner edge is of layout extended to get a better yield. A general photolithography flow has shown in figure-6.  

Figure-6: Photolithography flow 

A photolithography process is a non-ideal process and it is very hard to print the exact layout on the silicon wafer. So there are variations in the dimension of actual layout and printed geometry on the wafer.
Process variation generally includes:
  • Photolithography
  • Optical Proximity Correction (OPC)
  • Random Dopant Fluctuation (RDF)
  • Line Edge Roughness (LER)
  • Etching 
  • Chemical Mechanical Policing (CMP)
  • Oxide Thickness Variation (OTV)
So, in conclusion, there are many factors and high chances of variation while fabrication of a chip and these can lead the vary the delay of the standard cells. 

II. Voltage Variations:

The external voltage variation is taken care in the PVT but there could occur internal voltage variation in your chip based on the design. There could occur IR drop in your power delivery network which may lead to variation in available voltage to operate a cell. 
Power comes from the power pads/ Bumps and distributed to all standard cells inside the chip through the metal stripes and rails which is collectively called the power delivery network (PDN) or power grid. Distance between the power pad and standard cells could not be the same for all the standard cells. So there will be a variation of available VDD for the standard cells depending on the design. Delay of a cell is dependent on the available VDD, If VDD is less delay will be more.

III. Temperature Variations:

Transistors characteristics are strongly dependent on the junction temperature. Ambient temperature is taken care in PVT as per the application of ASIC. But junction temperature is dependent on the design of the chip. Power dissipation inside the chip could raise the temperature of nearby junctions and it could affect the performance of the entire chip. 
Sometimes there is also the formation of local hotspots based on the placement density and power requirements of cells which affects the temperature of the junction and ultimately lead to the variation in current and delay of cells. Junction temperature is the sum of ambient temperature and the temperature raised by the power dissipation of cell.  This whole thing is not predictable and can not be taken care in PVT so we have to take care of these variations in OCV.

Effects of On Chip Variation:

On Chip Variation is could lead to post-silicon failure if it is not taken care while designing the ASIC. Consider a case where there is an increase in delay in the data path or increase of delay in launch clock path or there is a decrease of delay in the capture clock path due to OCV. In all cases, there might be a setup time violation due to OCV. A similar case could also occur for the hold time. A proper timing closure chip could violate the timing and fail if we don't take care of OCV. 

How to take care of OCV:

To take care of OCV we need to add some pessimism in the timing of standard cells. We basically apply ±x% of additional delay to all the standard cells. Which is called OCV derate. 

OCV derate factor: 
Derate factor is a very simple approach to take of on chip variation. A fixed derate factor is applied on throughout the design. So that in case of any variation occurs will not cause the failure of the chip. But it added too much of timing passimism which leads to difficulties in the timing closure, especially in the lower nodes. 
So the industry has moved to different concepts from the fixed derate to distance and depth based derate which is called Advance On Chip Variation (AOCV). As the technology node further shrank more, AOCV also is not a good option and further Parametric On Chip Variation (POCV) evolved. We will discuss OCV, AOCV and POCV in another article. In short, we can say that as we moved from OCV to POCV timing pessimism reduced. 

 Thank You!


10 July

IR Drop Analysis in Physical Design | IR Analysis in VLSI

In this article, we will discuss what is IR drop in ASIC design, Why IR drop issue occurs, what are the effects of IR drop and how to analyze and prevent the IR drop issue.


What is IR drop issue:

The power supply (VDD and VSS) in a chip is uniformly distributed through the metal rails and stripes which is called Power Delivery Network (PDN) or power grid. Each metal layers used in PDN has finite resistivity. When current flow through the power delivery network, a part of the applied voltage will be dropped in PDN as per the Ohm's law. The amount of voltage drop will be V = I.R, which is called the IR drop. Figure-1 shows the IR drop in the Power net. Any metal net can be assumed as a combination of small R and C. 


Figure-1: IR drop in metal net


If the resistivity of metal wire is high or the amount of current following through the power net is high, A significant amount of voltage may be dropped in the power delivery network which will cause a lesser amount of voltage available to the standard cells than the actual amount of voltage applied. 

If V1 voltage is applied at the power port and current I is following in a particular net which has total resistance R, then the voltage available (V2) to the other end for the standard cell will be 

V2 = V1 - I.R

Standard cells or macros sometimes do not get the minimum operating voltage which is required to operate them due to IR drop in power delivery network even the application of sufficient voltage in the power port. Voltage drop in the power delivery network before reaching the standard cells is called IR drop.

This drop may cause the poor performance of the chip due to the increase of delay of standard cells and may cause the functional failure of the chip due to setup/hold timing violation. To avoid this issue, IR analysis must be done and consider its effect in timing analysis in the design cycle. 


Types of IR drop:

There are two types of IR drop in the ASIC design:

  1. Static IR drop

  2. Dynamic IR drop

Static IR drop is the voltage drop in the power delivery network (PDN) when there are no inputs switching means the circuit is in the static stage. Whereas dynamic IR drop is the voltage drop in the power delivery network when the inputs are continuously switching means the circuit is in a functional state. Dynamic IR drop will depend on the switching rate of instance.

When the inputs are switching continuously, more current would flow in the instances and also in PDN. So there will be more IR drop in the PDN. Therefore dynamic IR drop is more than the static IR drop.


Reasons for IR drop:

IR drop could occur due to various reasons but some main reasons are as bellow.

  • Poor design of power delivery network (lesser metal width and more separation in the power stripes)
  • inadequate via in power delivery network 
  • Inadequate number of decap cells availability
  • High cell density and high switching in a particular region
  • High impedance of the power delivery network
  • Rush current 
  • Insufficient number of voltage sources 
  • High RC value of the metal layer used to create the power delivery network

Effects of IR drop:

Delay of standard cells depends on the available power supply to the cell and if the power supply decreases the delay of cell increases. The increase in delay of a cell could affect the performance of the design. It is also possible that if the available voltage to a standard cells gets below a particular level, then the cell may stop function completely and could result in functional failure of the design. Or sometimes the IR drop is within the limit and only delay of cells get increased which affects the setup and hold timing of design and sometimes it causes failure of setup and hold timing. 

A sudden drop in the VDD line is also possible if the demand of current gets increased suddenly due to a large number of switching activities in a particular area of design. Such type of drop in VDD level is called voltage droop. Or it may cause sudden raise the level of ground voltage, which is called ground bounce. These are collectively called power noise. figure-2 shows the power noise due to IR drop.


Figure-2: power noise due to the IR drop 


In short IR drop could result 
  • Change in the delay of cells
  • Could violate the setup and hold timing 
  • Introduction of power noise in power supply nets

IR analysis and fixes:

Every EDA companies have their own IR analysis tool which performs the IR analysis and based on the analysis the techniques for the IR fixes are applied. Two most popular tools for IR analysis used in industry are:

  • RedHawk of Ansys
  • Voltus of Cadence Design System

Based on the analysis there are various techniques to fix the IR drop are applied. Some of the fixes which generally performed are:

  • Insertion of the sufficient number of de-cap cells which will boost the power delivery network. 
  • Reconstruct the power delivery network if it is not built properly. We could increase the width of the metal stripes or could decrease the separations between them
  • We could spread the logic cells in a region so that the load can be distributed


Thanks you!