## An Efficient Statistical Chip-Level Total Power Estimation Method Considering Process Variations with Spatial Correlation \*

Zhigang Hao<sup>†§</sup>, Sheldon X.-D. Tan<sup>§</sup> and Guoyong Shi<sup>†</sup>

<sup>†</sup>School of Microelectronics, Shanghai Jiao Tong University, Shanghai, 200240, China <sup>§</sup>Department of Electrical Engineering, University of California, Riverside, CA 92521, USA

## ABSTRACT

In this paper, we proposed an efficient statistical chip-level total power estimation method considering process variations with spatial correlation. Instead of computing dynamic power and leakage power separately, the new method compute the total power via circuit level simulation under realistic input testing vectors. To consider the process variations with spatial correlation, we first apply principle factor analysis method (PFA) to transform the correlated variables into uncorrelated ones and meanwhile reduce the number of resulting random variables. Afterwards, Hermite polynomials and sparse grid techniques are used to estimate total power distribution in a sampling way. The proposed method has no restrictions on models of statistical distributions for total powers. The proposed method works well when strong spatial correlation exists among random variables in the chip. Experimental results show that the proposed method has  $78\hat{X}$  times speedup than the Monte Carlo method under fixed input vector and 26X times speedup than the Monte Carlo method considering both random input vectors and process variations with spatial correlation.

#### 1. INTRODUCTION

For digital CMOS circuits, the total power consumption is given by the following formula:

$$P_{total} = P_{dyn} + P_{short} + P_{leakage},\tag{1}$$

in which  $P_{dyn}$ ,  $P_{short}$  and  $P_{leakage}$  represents dynamic power, short-circuit power and leakage power, respectively. Most of the previous works on power estimation either focus on dynamic power estimation [2, 5–7, 10, 13] or leakage power estimation [3, 12, 18, 22].

As technology scales down to nanometer ranges, the process induced variability has huge impacts on the circuit performance [14]. Further more, many variational parameters in the practical chips in nanometer range are spatially correlated, which makes the computations even more difficult [21] and simple assumption of independence for involved random variables can lead to significant errors.

Early research on power analysis is mainly focusing on dynamic power analysis [2, 4–6, 13], the solution ranges from the transition density based method [13], tagged probabilistic method [4] to the practical Monte Carlo based method [2, 5, 6]. Later on, designers realize that leakage power is becoming more and more significant and is very sensitive to the process variations. As a result, full chip leakage power estimation considering process variations under spatial correlation have been intensively studied in the past [3, 12, 18, 22], the method can be grid based [3, 18], projection based [12], simplified gate leakage model based [22].

Although total power can be computed by simply adding the dynamic power and leakage power (plus short-circuit power), practically, dynamic power and leakage power are correlated. For instance, leakage power of a gate depends on its input state, which depends on the primary inputs and timing of the circuits. Using dominant state or average values is less accurate than the precise circuit-level simulation under realistic testing input vectors. Under the process variations with spatial correlation, the dynamic power and leakage power are more correlated via process parameters. As a result, traditional separate approaches will not be accurate. Circuit level total power estimation based on real testing vectors is more desirable.

Fig. 1 shows the comparison of the circuit total power distribution of c432 from ISCAS'85 benchmark. We show two power variations. The first figure (upper) is obtained due to random input vectors. The second is obtained using a fixed input vector but under process variations with spatial correlation. As can be seen that, the variance induced by



Figure 1: The comparison of circuit total power distribution of circuit c432 in ISCAS'85 benchmark sets (top) under random input vectors (with 0.5 input signal and transition probabilities) and (bottom) under a fixed input vector with effective channel length spatial correlations.

process variations is comparable with the variance induced by random input vectors. As a result, consider process variation impacts on the total chip power is important for early design solution exploration and post-layout design sign-off validation.

<sup>\*</sup>This research was supported in part by the China Ministry of Education Special Fund for Doctoral Education Program (2010) and Graduate Student Overseas Research Program of Shanghai Jiao Tong University. This work is also funded in part by NSF grant under No. CCF-0448534, in part by NSF grant under No. OISE-0929699 and in part by National Natural Science Foundation of China (NSFC) grant under No. 60828008.

Several works had been proposed to consider the dynamic power consider process variation. Harish et al. [10] used hybrid power model based on Monte Carlo analysis, but the method is only applied to a small two-stage 2-input NAND gate. The work in [1] used a variation delay model to obtain minimum and maximum delay bound in order to estimate the number of glitches and dynamic power. The work in [7] introduced a new method based on *transition waveform* concept, where transition waveform is propagated through the circuit and the effect of partial swing could be considered. However, none of these works consider the process induced variations with spatial correlation which can be significant (as shown in Fig. 1).

In this paper, we propose an efficient statistical chip-level total power estimation (STEP) method considering process variations under spatial correlation in which both the dynamic power and leakage power are included. To the best knowledge of the authors, it is first work toward the statisti-cal total power analysis. The new method use the commercial Fast-SPICE tool (UltraSim) to obtain total chip power. To consider the process variations with spatial correlation, we first apply principle factor analysis method (P-FA) to transform the correlated variables into uncorrelated ones and meanwhile reduce the number of resulting random variables. Afterwards, Hermite polynomials and sparse grid techniques are used to estimating total power distribution in a sampling way. Experimental results show that the proposed method is 78X times faster than the Monte Carlo method under fixed input vector and 26X times faster than the Monte Carlo method considering both random input vectors and process variations with spatial correlation.

The rest of the paper is organized as follows: In Section 2 we review the Monte Carlo based power estimation method. Section 3 describes the proposed method of total power estimation under process variations with spatial correlation. The experimental results are presented in Section 4 to validate our method. Finally, Section 5 concludes this paper.

## 2. REVIEW ON THE MONTE CARLO-BASED POWER ESTIMATION METHOD

In general dynamic power  $P_{dyn}$  is expressed as follows,

$$P_{dyn} = \frac{1}{2} f_{clk} V_{dd}^2 \sum_{i=1}^n C_i S_i$$
 (2)

where n is the number of gates on a chip,  $f_{clk}$  is clock frequency,  $V_{dd}$  is the supply voltage,  $C_i$  is the sum of load capacitance and equivalent short-circuit capacitance at node i, and  $S_i$  is the switching activity for gate i. Many previous works about dynamic power estimation are based on (2), they can be Monte Carlo based [2,5,6] or probabilistic based [4,13]. The Monte Carlo based method is considered more accurate than probabilistic based method and at the same time without losing much efficiency [2]. In the Monte Carlo based method, the switching activity  $S_i$  in (2) can be modeled as

$$S_i = \frac{n_i\left(T\right)}{T} \tag{3}$$

in which  $n_i(T)$  is the number of transitions of node *i* in the time interval (-T/2, T/2]. The mean power  $P_T$  is defined as:

$$P_T = E\left[P_{dyn}\right] \tag{4}$$

The key part in Monte Carlo simulation is the stopping criterion. Suppose we need to perform N different simulations of the circuit, each of length T and the average and standard deviation of the N different  $P_{dyn}$  values are  $m_{dyn}$  and  $s_{dyn}$ , respectively. Therefore, we have

$$\lim_{N \to \infty} P\left\{ \frac{P_T - m_{dyn}}{s_{dyn} / \sqrt{N}} \le P_{dyn} \right\} = \Phi\left(P_{dyn}\right) \tag{5}$$

in which P is the probability and  $\Phi(P_{dyn})$  is the cumulative distribution function (CDF) of the standard normal distribution. Therefore, given the confidence level  $(1 - \alpha)$ , it follows that

$$P\left\{-\Phi_{1-\alpha/2} < \frac{P_T - m_{dyn}}{s_{dyn}/\sqrt{N}} \le \Phi_{1-\alpha/2}\right\} = 1 - \alpha \qquad (6)$$

Given a specified error tolerance  $\epsilon$ , (6) can be recast to:

$$\frac{|P_T - m_{dyn}|}{m_{dyn}} \le \frac{\Phi_{1-\alpha/2} s_{dyn}}{m_{dyn}\sqrt{N}} \le \epsilon \tag{7}$$

(7) can be viewed as the stopping criterion when N,  $m_{dyn}$  and  $s_{dyn}$  satisfies it.

Afterwards, the work in [5,6] further improve the efficiency of Monte Carlo based method. In [6], the author transform the power estimation problem to a survey sampling problem and applied stratified random sampling to improve the efficiency of Monte Carlo sampling. In [5], the author proposed two new sampling techniques, module-based and cluster-based, which can adapt stratification to further improve the efficiency of the Monte Carlo based techniques. However, all of these works are based on gate level logic simulation as they only consider dynamic powers. For total power estimation and estimating of impacts of process variations, one needs transistor level simulations. As a result, improving the efficiency of Monte-Carlo method becomes crucial and will be addressed in this paper.

## 3. THE STATISTICAL TOTAL POWER ES-TIMATION METHOD

In this section, we present the statistical chip-level total estimation of power, called STEP. The method can consider both fixed input vectors and random input vectors for power estimation. Power distribution considering process variations under fixed input vectors is important, because it can reveal the power distribution for the maximum power, the minimum power or the power due to user specified input vectors. This technique can be further applied to estimate the distribution for maximum power dissipation [20]. Power distribution under random input vectors is also important, as it can show the total power distribution caused by random input vectors and process variations with spatial correlation. We first present the overall flow of the proposed method under a fixed input vector in Fig. 2, and then highlight the major computing steps later. The flow of the method considering random input vectors is followed afterwards.

#### 3.1 Flow of the proposed analysis method under fixed input vector

The STEP method uses commercial Fast-SPICE tool for accurate total power simulation. It transforms the correlated variables into uncorrelated ones and reduce the number of random variables using the principle factor analysis (PFA) method [9]. Then it computes the statistical total power based on Hermite polynomials and sparse grid techniques [8].

#### **3.2** Variational models for process parameters

Following existing approaches, we assume that the process variations of  $L_{eff}$  and  $T_{ox}$  follow multivariate normal distributions [19]. Since  $T_{ox}$  is in vertical layout feature dimension, and is caused by chemical mechanical polishing processes, it only depends on local layout density and has Algorithm: STATISTICAL CHIP-LEVEL TOTAL ESTIMA-TION OF POWER (STEP) ALGORITHM UNDER A FIXED INPUT VECTOR.

**Input**: standard cell lib, netlist, input vector, placement information of design, standard deviation of  $L_{eff}$ . **Output**: analytical expression of the statistical chip-level total power in terms of Hermite polynomials.

- 1. Generate the correlation matrix  $\Omega_{n,n}$  of all gates  $L_{eff}$  from placement information.
- 2. Perform variable reduction correlation matrix for  $\Omega_{n,n}$  based on principle factor analysis (PFA).
- 3. Generate the n-dimensional Smolyak quadrature point sets of second order and corresponding weight set for the reduced variables.
- 4. Run Fast-SPICE tool to get the total power for each Smolyak quadrature sample under fixed input vector.
- 5. Compute the coefficients of Hermite polynomial of the full-chip total power.
- 6. Calculate the analytical expression of the full-chip total power and calculate mean value, standard deviation, PDF and CDF of the total power if required.

# Figure 2: The flow of proposed algorithm under a fixed input vector.

no spatial correlation [16]. For simplicity, we only focus on the spatial correlation of  $L_{eff}$  in this paper and we set the  $L_{eff}$  of all gates as random variables. In general, the number of process parameters that exhibit spatial correlation can be more than one, and it is understood that this is not a limitation of our approach.

The spatial correlation used in this paper is given by the following empirical exponential model [21].

$$\rho(d) = e^{-d^2/\eta^2},$$
(8)

where d is the distance between two gates and  $\eta$  is called the *correlation length*. Large  $\eta$  means the spatial correlation is strong, vice versa. The spatial correlation can be captured by the spatial covariance matrix  $\Omega_{n,n}$ , where n is the number of gates on chip. The elements in  $\Omega_{n,n}$  are modeled using (8), which are only related to d. Dealing with spatial correlation leads to quadratic computations as all the correlated variables are enumerated pairwise for accurate variance estimation.

As the number of correlated random variables can be great for large designs, to mitigate this problem, we use the principal factor analysis (PFA) [9,11] method in our paper.

#### **3.3** Variable decoupling and reduction

As the number of random variables of each gate's  $L_{eff}$  can easily exceeds several thousand for large circuits, this can greatly limit the size of the circuit that can be analyzed. In statistics theory, principle factor analysis (PFA) [9,11] based on the correlation matrix  $\Omega_{n,n}$  can be performed to determine the dominant variation sources. Specifically, for a set of zero-mean Gaussian distributed variables  $\vec{\delta}$  whose covariance matrix is  $\Omega_{n,n}$ , if there is a matrix L satisfying  $\Omega_{n,n} = LL^T$ , then  $\vec{\delta}$  can be represented by a set of independent standard normal distributed variables  $\vec{\gamma}_l$  as  $\vec{\delta} = L\vec{\gamma}_l$ .

Note that the solution for decoupling is not unique. For example, Cholesky decomposition can be used to seek L since the covariance matrix  $\Omega_{n,n}$  is always a semi-positive definite matrix. However Cholesky decomposition cannot reduce the number of variables. Instead, we use eigen decomposition on the covariance matrix which yields:

$$\Omega_{n,n} = LL^T, L = (\sqrt{\lambda_1}e_1, \dots, \sqrt{\lambda_n}e_n), \qquad (9)$$

where  $\lambda_i$  are eigenvalues in order of descending magnitude, and  $e_i$  are corresponding eigenvectors. PFA reduces the number of components in  $\vec{\delta}$  by truncating *L* using the first *k* items

$$L = (\sqrt{\lambda_1} e_1, \dots, \sqrt{\lambda_k} e_k),$$

then this leads to the approximation

$$\Omega_{n,n} \approx \lambda_1 e_1 e_1^T + \lambda_2 e_2 e_2^T + \dots + \lambda_k e_k e_k^T = LL^T.$$
(10)

In our paper, the elements in vector  $\vec{\delta} = [\delta_1, \delta_2, \dots, \delta_n]$  are all the gates' normalized  $L_{eff}$ , k is the reduced number of variables. PFA is efficient, especially when the correlation length is large. The error of PFA can be controlled by k: bigger k leads to a more accurate result. In our experiment, we set the error of PFA to be 1%. Typically, with strong correlation, the reduction effect can be remarkable.

## 3.4 Computing total power by orthogonal polynomials

In stead of using the Monte-Carlo method, a better approach is to use orthogonal polynomial based method, which will lead to much less sampling than standard Monte-Carlo method for small number of variables.

Specially, a random variable  $x(\xi)$  with limited variance can be approximated by truncated Hermite polynomial chaos expansion as follows [8]:

$$x(\vec{\xi}) = \sum_{q=0}^{Q} a_q H_q(\vec{\xi}), \qquad (11)$$

where  $\vec{\xi} = [\xi_1, \xi_2, ..., \xi_k]$ .  $\xi_i \sim N(0, 1)$ , and are orthogonal to each other.  $H_q(\vec{\xi})$  is Hermite polynomial and  $a_q$  is the deterministic coefficient. For example,  $2^{nd}$  order Hermite polynomial set includes

1, 
$$\xi_i$$
,  $\xi_i^2 - 1$ ,  $\xi_i \xi_j$ ,  $(i \neq j)$ . (12)

 $a_q$  can be determined by

$$a_q = \langle x(\vec{\xi}), H_q(\vec{\xi}) \rangle / \langle H_q^2(\vec{\xi}) \rangle \approx \sum x(\vec{\gamma_l}) H_q(\vec{\gamma_l}) w_l,$$
(13)

which is a multi-dimensional integration and can be obtained by efficient Smolyak numerical quadrature method. Where  $\gamma_l$  and  $w_l$  are Smolyak quadrature abscissas (quadrature points) and weights, respectively [15].

In our problem,  $x(\bar{\xi})$  will be the total power for the full chip. k is the number of reduced variables by performing the PFA method. The full-chip total power can be presented by Hermite polynomial expansion as

$$P_{tot}(\vec{\xi}) = \sum_{q=0}^{Q} P_{tot,q} H_q(\vec{\xi}), \qquad (14)$$

 $P_{tot,q}$  is then computed by the numerical Smolyak quadrature method. In this paper, we use  $2^{nd}$  order Hermite polynomials for statistical total power analysis and the Smolyak quadrature samples for k random variables is  $2k^2 + 3k + 1$ . The coefficient for  $q^{th}$  Hermite polynomial,  $P_{tot,q}$ , can be computed as the following:

$$P_{tot,q} = \sum P_{tot}(\vec{\gamma}_l) H_q(\vec{\gamma}_l) w_l / \langle H_q^2(\vec{\xi}) \rangle, \qquad (15)$$

where  $\vec{\gamma}_l$  is Smolyak quadrature sample. As stated in Section 3.3, certain quadrature sample can be converted to the sample in terms of the original gate effective channel length variables via  $\vec{\delta} = L \vec{\gamma}_l$ . Thus  $P_{tot}(\vec{\gamma}_l)$  can be obtained by running the circuit simulation tools like Fast-SPICE using the specified  $L_{eff}$  obtained from  $\vec{\delta}$  for each gate.

After the coefficients of the analytic expression of the total power (14) is obtained, we can then get the mean value, variance, PDF and CDF of full-chip total power very easily. For instance, the mean value and variance for the full-chip total power are

$$\mu_{tot} = P_{tot,0th}, \tag{16}$$

$$\sigma_{tot}^2 = \sum P_{tot,1st}^2 + 2 \sum P_{tot,2nd,type1}^2 + \sum P_{tot,2nd,type2}^2, \qquad (17)$$

where  $P_{tot,1st}$ ,  $P_{tot,2nd,type1}$  and  $P_{tot,2nd,type2}$  are the power coefficients for the second order Hermite polynomial set  $\xi_i$ ,  $\xi_i^2 - 1$  and  $\xi_i \xi_j$  defined in (12).

### 3.5 Flow of the proposed analysis method under random input vectors

To consider more input vectors or random input vectors used in the traditional dynamic power analysis, one simple way is to treat the input vector as one more *variational* parameter in our statistical analysis framework. This strategy can be easily fit into the simple Monte-Carlo based method [2] as we just add one dimension to the variable space. But for orthogonal polynomial based method, it is difficult to add this variable into existing space.

In probability theory, the PDF of a function of several random variables can be calculated from the conditional PDF for single random variable. Let  $P_{total} = g(U_{in}, L_{eff})$ , in which  $U_{in}$  is the variable of random input vectors,  $L_{eff}$  is the variable of gate effective channel length. The PDF of total power  $P_{total}$  can be calculated by:

$$f_{P_{total}}(p) = \int_{-\infty}^{\infty} f_{L_{eff}}(l|u) f_{U_{in}}(u) du$$
(18)

in which the PDF function under random input vectors  $f_{U_{in}}(u)$  is obtained by Monte Carlo based method [2] and the conditional PDF  $f_{L_{eff}}(l|U_{in} = u)$  under fixed input u can be obtained or interpolated from samples calculated from fixed-input algorithm in Fig. 2. Note u can be viewed as the power of chip under input u.

We use the example in Fig. 3 to illustrate the proposed method. In this figure, we first compute the power distribution (solid line) with random input vectors only. Then we select three input power points, a, b, c (with three corresponding input vectors). In each of the input power point, we perform statistical power analysis with process variations under the fixed power input (using the corresponding input vector). After this, we interpolate the power distributions for other power points for final integration.



## Figure 3: The selected power point a, b and c from the power distribution under random input vectors.

The flow of the proposed analysis method under random input vectors is shown in Fig. 4. The STEP algorithm computes the total power under random input vectors using the Monte Carlo based method [2].

### 4. EXPERIMENTAL RESULTS

Algorithm: STATISTICAL CHIP-LEVEL TOTAL POWER ES-TIMATION (STEP) ALGORITHM UNDER RANDOM INPUT VECTORS.

**Input**: standard cell lib, netlist, random input vectors, placement information of design, standard deviation of  $L_{eff}$ .

**Output**: total full-chip power distribution (PDF and CDF).

- 1. Compute the random input power distribution for the netlist using Fast-SPICE tool under random input vectors using the Monte Carlo based method [2].
- 2. Select several power points (eg. 3 or 5) from the random input power distribution such that they cover the power distributions evenly.
- 3. For each input vector selected in Step 2, perform statistical total power estimation (shown in Fig. 2) under the input vector.
- 4. Interpolate the mean and std for other power points (with distinguished values) in the power distribution (caused by random inputs only) from the samples obtained in Step 3.
- 5. Calculate the total power distribution under random input vectors (PDF and CDF) using the integration in (18).

#### Figure 4: The flow of proposed algorithm with random input vectors and process variations.

We implemented the proposed method in Matlab V7.8 and used Cadence Ultrasim 7.0 for Fast-SPICE simulations. All the experimental results are carried out in a Linux system with quad Intel Xeon CPUs with 3GHz and 16GB memory.

The STEP method was tested on circuits in the ISCAS'85 benchmark set. The circuits were synthesized with Nangate Open Cell Library under 45nm technology and the placement is obtained from UCLA/Umich Capo [17]. The test cases are given in Table 1 (all length units in  $\mu m$ ).

| Table 1: Summary of Denchmark Circuits. |        |         |               |                  |  |  |  |  |  |
|-----------------------------------------|--------|---------|---------------|------------------|--|--|--|--|--|
| Circuit                                 | Gate # | Input # | $Output \ \#$ | Area             |  |  |  |  |  |
| c432                                    | 242    | 36      | 7             | $55 \times 48$   |  |  |  |  |  |
| c880                                    | 383    | 60      | 16            | $85 \times 84$   |  |  |  |  |  |
| c1355                                   | 562    | 41      | 32            | $84 \times 78$   |  |  |  |  |  |
| c1908                                   | 972    | 33      | 25            | $102 \times 102$ |  |  |  |  |  |
| c3540                                   | 1705   | 50      | 22            | $141 \times 144$ |  |  |  |  |  |

Table 1: Summary of benchmark circuits

Effective channel length  $L_{eff}$  is modeled as sum of spatial correlated sources of variations based on (8). The nominal value of  $L_{eff}$  is 50nm and the  $3\sigma$  range is set as 20%. The same framework can be easily extended to include other parameters of variations.

Firstly, we use the Monte Carlo based method [2] to obtain the mean and standard deviation (std) of each circuit sample under random input vectors. The input signal and transition probabilities are 0.5, with the clock cycle of 180*ps*. The simulation time for each sample circuit is 10 clock cycles and the error tolerance  $\epsilon$  is 0.01.

Secondly, we observe the total power distribution for each sample circuit under fixed input vector. For each sample circuit, one input vector is selected, then we run the Monte Carlo Simulations (10,000 runs) under process variations with spatial correlation as well as our proposed STEP method. The results are shown in Table 2, in which MC Co and STEP means the Monte Carlo method considering process variations with spatial correlation and our proposed

method respectively. The average errors for mean and standard deviation of the STEP method are 2.90% and 6.00%, respectively. Fig. 5 shows the total power distribution (PDF

 Table 2: Total power distribution under fixed input vector

| Circuit | Mean   | (uW)   | Err  | Std ( | Err   |      |
|---------|--------|--------|------|-------|-------|------|
|         | MC Co  | STEP   | (%)  | MC Co | STEP  | (%)  |
| c432    | 267.6  | 261.7  | 2.23 | 10.22 | 9.54  | 6.78 |
| c880    | 606.9  | 610.5  | 0.59 | 19.88 | 18.09 | 9.02 |
| c1355   | 785.6  | 799.4  | 1.76 | 40.51 | 43.25 | 6.77 |
| c1908   | 1404.9 | 1294.4 | 7.86 | 76.15 | 79.73 | 4.71 |
| c3540   | 2824.6 | 2766.8 | 2.05 | 268.5 | 261.2 | 2.73 |

and CDF) of circuit c880 under a fixed input. Table 3 gives parameter values of the correlation length  $\eta$ , reduced number of variable k and sample count of Fast-SPICE running of the two methods. Sampling time dominates the total simulation time for both MC Co and the STEP method and the STEP method has 78X speedup over MC Co method on average. The more speedup can be gained for larger cases.



Figure 5: The comparison of total power distribution PDF and CDF between STEP method and Monte Carlo method for circuit c880 under a fixed input vector.

 Table 3: Sampling number comparison under fixed input vector

| Circuit | δ   | k | Sample Count |      | Speedup |
|---------|-----|---|--------------|------|---------|
|         |     |   | MC Co        | STEP | Over    |
| c432    | 50  | 6 | 10000        | 91   | 110     |
| c880    | 50  | 9 | 10000        | 190  | 53      |
| c1355   | 50  | 9 | 10000        | 190  | 53      |
| c1908   | 100 | 6 | 10000        | 91   | 110     |
| c3540   | 100 | 8 | 10000        | 153  | 65      |

Thirdly, we compare the STEP method with the Monte Carlo method under both random input vectors and process variations with spatial correlation. We select 3 power points from the total power distribution obtained by the Monte Carlo based method [2] and get the corresponding input vectors. We performed the STEP method under these 3 input vectors and obtain the corresponding mean and standard deviation, respectively. The (mean, std) samples for other power points with distinguished power values can be interpolated via the 3 samples.

(18) is used to calculate the PDF of total power distribution under both random input vectors and process variations

with spatial correlation. The results are shown in Table 4, where  $MC \ Co$ ,  $MC \ nCo$  and STEP represent the Monte Carlo method considering process variations with spatial correlation; the Monte Carlo method without considering process variations with spatial correlation and our proposed method, respectively. The average error of the mean and the standard deviation of our method compared with  $MC \ Co$  is 2.17% and 6.09% respectively. While the average error of the mean and the standard deviation of  $MC \ nCo$  compared with  $MC \ Co$  is 1.34% and 28.01%, respectively. The error (std) is increasing for larger test cases.

Obviously we can see that the Monte Carlo method considering only random input vectors fails to capture the true distribution when both input vector and process variations are considered. The parameter values of  $\delta$  and k is the same as in Table 3. The difference is that we need to run STEP for 3 times and the total sample numbers is increased correspondingly. However, the STEP method still has 26Xspeedup over the Monte Carlo method on average and remains to be accurate. Fig. 6 shows the power distribution comparison (PDF and CDF) of the STEP method and the Monte Carlo method under both random input vectors and process variations with spatial correlation for circuit c880. We observe that the distribution of the total power under a fixed input vector or under random input vectors has a distribution similar to normal as shown in Fig. 5 and 6, such distribution justify the use of Hermite-based orthogonal polynomials to represent the total power distributions.



Figure 6: The comparison of total power distribution pdf and cdf between STEP method and Monte Carlo method for circuit c880 under random input vector.

#### 5. CONCLUSIONS

In this paper, we have proposed an efficient statistical total chip power estimation method considering process variations with spatial correlation. The new method is based on accurate circuit level simulation under realistic testing input vectors to obtain accurate total chip powers. To improve the estimation efficiency, efficient sampling based approach has been applied using the orthogonal polynomial based representation and random variable transformation and reduction techniques. Experiment results show that the proposed method is 78X times faster than the Monte Carlo method under fixed input vector and 26X times faster than the Monte Carlo method considering both random input vectors and process variations with spatial correlation.

## 6. **REFERENCES**

Table 4: Total power distribution comparison under random input vector and spatial correlation

| Circuits | Mean (uW) |        |        | Errors(%) |      | Standard $Deviation(uW)$ |        |       | Errors(%) |      |
|----------|-----------|--------|--------|-----------|------|--------------------------|--------|-------|-----------|------|
|          | MC Co     | MC nCo | STEP   | MC nCo    | STEP | MC Co                    | MC nCo | STEP  | MC nCo    | STEP |
| c432     | 299.9     | 299.9  | 312.7  | 0.01      | 4.26 | 45.3                     | 40.4   | 44.6  | 10.9      | 1.52 |
| c880     | 609.8     | 604.5  | 604.4  | 0.88      | 0.89 | 57.1                     | 51.5   | 56.5  | 9.76      | 0.95 |
| c1355    | 802.6     | 777.1  | 778.3  | 3.18      | 3.04 | 56.3                     | 30.2   | 60.5  | 46.4      | 7.45 |
| c1908    | 1375.1    | 1361.6 | 1361.3 | 0.98      | 0.99 | 115.5                    | 79.4   | 128.5 | 31.3      | 11.3 |
| c3540    | 2775.8    | 2821.7 | 2822.2 | 1.65      | 1.67 | 309.3                    | 180.4  | 280.8 | 41.7      | 9.21 |

- J. Alexander and V. Agrawal, "Algorithms for estimating number of glitches and dynamic power in CMOS circuits with delay variations," in *IEEE* Symposium on VLSI (ISVLSI), May. 2009, pp. 127 -132.
- [2] R. Burch, F. Najm, P. Yang, and T. Trick, "A Monte Carlo approach for power estimation," *IEEE Trans.* on Very Large Scale Integration (VLSI) Systems, vol. 1, no. 1, pp. 63 –71, Mar. 1993.
- [3] H. Chang and S. S. Sapatnekar, "Full-chip analysis of leakage power under process variations, including spatial correlations," in *Proc. Design Automation Conf. (DAC)*, 2005, pp. 523–528.
- [4] C. Ding, C. Tsui, and M. Pedram, "Gate-level power estimation using tagged probabilistic simulation," *IEEE Trans. on Computer-Aided Design of Integrated Circuits and Systems*, vol. 17, no. 11, pp. 1099 – 1107, Nov. 1998.
- [5] C.-S. Ding, C.-T. Hsieh, and M. Pedram, "Improving the efficiency of monte carlo power estimation [vlsi]," *IEEE Trans. on Very Large Scale Integration (VLSI)* Systems, vol. 8, no. 5, pp. 584 –593, Oct. 2000.
- [6] C.-S. Ding, Q. Wu, C.-T. Hsieh, and M. Pedram, "Stratified random sampling for power estimation," *IEEE Trans. on Computer-Aided Design of Integrated Circuits and Systems*, vol. 17, no. 6, pp. 465–471, Jun. 1998.
- [7] Q. Dinh, D. Chen, and M. D. Wong, "Dynamic power estimation for deep submicron circuits with process variation," in *Proc. Asia South Pacific Design Automation Conf. (ASPDAC)*, Jan. 2010, pp. 587–592.
- [8] R. G. Ghanem and P. D. Spanos, Stochastic Finite Elements: A Spectral Approach. Dover Publications, 2003.
- [9] R. L. Gorsuch, Factor Analysis. Hillsdale, NJ, 1974.
- [10] B. P. Harish, N. Bhat, and M. B. Patil, "Process variability-aware statistical hybrid modeling of dynamic power dissipation in 65 nm cmos designs," in *Proc. Int. Conf. on Computing: Theory and Applications (ICCTA)*, 2007, pp. 94–98.
- [11] R. Jiang, W. Fu, J. M. Wang, V. Lin, and C. C.-P. Chen, "Efficient statistical capacitance variability modeling with orthogonal principle factor analysis," in *Proc. Int. Conf. on Computer Aided Design (ICCAD)*, 2005, pp. 683–690.
- [12] X. Li, J. Le, and L. T. Pileggi, "Projection-based statistical analysis of full-chip leakage power with non-log-normal distributions," in *Proc. Design Automation Conf. (DAC)*, July 2006, pp. 103–108.
- [13] F. Najm, "Transition density: a new measure of activity in digital circuits," *IEEE Trans. on Computer-Aided Design of Integrated Circuits and Systems*, vol. 12, no. 2, pp. 310–323, Feb. 1993.
- [14] S. Nassif, "Delay variability: sources, impact and trends," in *Proc. IEEE Int. Solid-State Circuits Conf.*, San Francisco, CA, Feb 2000, pp. 368–369.
- [15] E. Novak and K. Ritter, "Simple cubature formulas with high polynomial exactness," *Constructive Approximation*, vol. 15, no. 4, pp. 449–522, Dec 1999.

- [16] D. O. Ouma, D. S. Boning, J. E. Chung, W. G. Easter, V. Saxena, S. Misra, and A. Crevasse, "Characterization and modeling of oxide chemical-mechanical polishing using planarization length and pattern density concepts," vol. 15, no. 2, pp. 232–244, May 2002.
- [17] J. Roy, S. Adya, D. Papa, and I. Markov, "Min-cut floorplacement," *IEEE Trans. on Computer-Aided Design of Integrated Circuits and Systems*, vol. 25, no. 7, pp. 1313–1326, July 2006.
- [18] R. Shen, S. X.-D. Tan, and J. Xiong, "A linear algorithm for full-chip statistical leakage power analysis considering weak spatial correlation," in *Proc. Design Automation Conf. (DAC)*, Jun. 2010, pp. 481–486.
- [19] R. Teodorescu, B. Greskamp, J. Nakano, S. R. Sarangi, A. Tiwari, and J. Torrellas, "A model of parameter variation and resulting timing errors for microarchitects," in Workshop on Architectural Support for Gigascale Integration (ASGI), Jun 2007.
- [20] C.-Y. Wang and K. Roy, "Maximum power estimation for cmos circuits using deterministic and statistical approaches," *IEEE Trans. on Very Large Scale Integration (VLSI) Systems*, vol. 6, no. 1, pp. 134 -140, Mar. 1998.
- [21] J. Xiong, V. Zolotov, and L. He, "Robust extraction of spatial correlation," *IEEE Trans. on Computer-Aided Design of Integrated Circuits and Systems*, vol. 26, no. 4, 2007.
- [22] Z. Ye and Z. Yu, "An efficient algorithm for modeling spatially-correlated process variation in statistical full-chip leakage analysis," in *Proc. Int. Conf. on Computer Aided Design (ICCAD)*, Nov. 2009, pp. 295–301.