## Low Power Design and Synthesis using Voltage and Frequency Reduction

Saraju P. Mohanty Dept. of CSE, University of South Florida smohanty@csee.usf.edu For more details visit research page at: http://www.csee.usf.edu/~smohanty/



## **Outline of the Talk**

- Introduction
- Related Work
- Power fluctuation minimization through datapath scheduling
- Design of a low power a chip
- Conclusions



#### Why Low-Power ?

#### Major Motivation Extending battery life for portable applications











#### USF UNIVERSITY OF SOUTH FLORIDA

#### Why Low-Power ? .....

#### Battery lifetime



#### Environmental concerns

#### Cooling and energy costs



System reliability



Dept. of CSE

pad-

## **Dynamic Power Consumption**

Let,  $C_L$  = load capacitor,  $V_{dd}$  = supply voltage, N = average number of transitions/clock cycle =  $E(sw) = \alpha$  and f = clock frequency. The dynamic power consumption for CMOS:

$$P_{dynamic} = \frac{1}{2} C_L V_{dd}^2 N f$$

- Veendrick Observation: In a well designed circuit, shortcircuit power dissipation is less than 20% of the dynamic power dissipation.
- Sylvester and Kaul: At larger switching activity the static power is negligible compared to the dynamic power.

We focus on dynamic power reduction !!



## **Dynamic Power Reduction ?**

- Reduce Supply Voltage (V<sub>dd</sub>): delay increases; performance degradation
- Reduce Clock Frequency (f): only power saving no energy savings; results in performance degradation
- Reduce Switching Activity (N or E(sw)): no switching no power loss !!! Not fully under designers control. Switching activity depends on the logic function and correlations are difficult to handle.
- Reduce Physical Capacitance: done by reducing device size reduces the current drive of the transistor making the circuit slow



## **Our approach ?**

Adjust the frequency and supply voltage in a coordinated manner to reduce various forms of dynamic power while maintaining performance.







## Datapath Scheduling to Minimize Power Fluctuation



## **High-Level Synthesis ??**

#### McFarland (1990)

"HLS is conversion from an algorithmic level specification of the behavior of a digital system to a RT level structure that implements that behavior."

NOTE: also known as Behavioral Synthesis.



## **Phases of Behavioral Synthesis**



## **Datapath Scheduling ?**

- •Assumption: A datapath is represented as a data flow graph (DFG).
- Scheduling partitions the operations in a DFG into groups so that the operations in a same group can run concurrently.
- Considers the possible trade-offs between the total execution cost and hardware cost.
- Scheduling output:
  - -total number of control steps needed to execute all operations
  - -minimum number of FUs of each type to be used in the design
  - -the lifetimes of the variables generated during the computation



## **Fluctuation Minimization ??**

- Aim ? to minimize the fluctuation in the power consumption profile of the DFG over all the control steps during its execution.
- Two different design options :
  - Multiple voltage with dynamic frequency clocking (MVDFC)
  - multiple supply voltage with multicycling (MVMC)
- □Why power fluctuation minimization ?
  - to reduce power supply noise (L di/dt)
  - to reduce cross talk (M di/dt)
  - to increase battery efficiency (electrochemical efficiency)
  - to increase reliability (high current peak during short time)



### **Related work**

Energy efficient scheduling using voltage reduction

- Chang and Pedram 1997 Dynamic programming
- Johnson and Roy 1997 ILP based MOVER algorithm using multiple supply voltages

Lin, Hwang and Wu 1997 – ILP and heuristic for variable voltages (VV) and multicycling (MC)

Peak and transient power minimization

Raghunathan, Ravi and Raghunathan 2001 – data monitor operations in VHDL





## **Target Architecture**



•Each FU has one register and one MUX and operate at same voltage level as that of FU.

•Operational delay:

UNIVERSITY OF

SOUTH FLORIDA

 $(d_{FU} + d_{Mux} + d_{Reg} + d_{Conv})$ 

•Operating frequencies are calculated from the delays.

•Time for voltage conversion equals to time for frequency change.

•Controller has a storage unit to store cycle frequency indices.

16

## **Fluctuation Minimization ...**

Approach: Use ILP-based datapath scheduling to minimize power fluctuation.

Overall power fluctuation of the DFG is captured as mean power gradient (MPG).

➢MPG is a non-linear function due to presence of absolute function, but we use integer linear programming (ILP) for its minimization.



## **MPG Minimization: Modeling**

#### **Background Material**

 $\Box$ For a set of n observations,  $x_1$ ,  $x_2$ ,  $x_3$ ,  $\dots x_n$ , from a given distribution, the sample mean is  $m = 1/n \Sigma_i x_i$ . The observation-to-observation gradient can be defined as  $\Delta x_i = |x_i - x_{i-1}|$ . The mean gradient of the observations is given by MG =  $1/n \Sigma_i |x_i - x_{i-1}|$ .

## **MPG Minimization: Modeling ...**

- Power gradient for a cycle c,  $PG_c$ : defined as the absolute difference of a cycle power from previous cycle power.  $PG_c = |P_c P_{c-1}|_{(for any c = 2 to N)}$
- Peak of the power gradients PG<sub>p</sub>: Maximum of power gradients of all control steps.

 $PG_{p} = max (PG_{c}) = max (|P_{c} - P_{c-1}|)_{(\forall c=2 \rightarrow N)}$ 

• Mean power gradient MPG: Mean of the power gradients of all control steps.

MPG=1/N-1  $\Sigma_{(\forall c=2\rightarrow N)}$ PG<sub>c</sub>=1/N-1  $\Sigma_{(\forall c=2\rightarrow N)}$  |P<sub>c</sub> - P<sub>c-1</sub>| **NOTE**: The complete description is obtained after inserting the parameters, such as, capacitance, switching, voltage, frequency etc.



## **MPG Minimization: ILP**

**Linear Modeling of Nonlinearity** >General form involving absolute nonlinearity: Minimize:  $\Sigma_i |\mathbf{y}_i|$ (1) Subject to:  $y_i + \Sigma_j a_{ij} x_j \le b_i$ ,  $\forall i \text{ and } x_j \ge 0 \forall j$ >Let  $y_i$  be expressed as,  $y_i = y_i^1 - y_i^2$ , difference of two non-negative variables. >After algebraic manipulations, Minimize:  $\Sigma_i y_i^1 + y_i^2$ (2)Subject to:  $y_i^1 - y_i^2 + \Sigma_i a_{ii} x_i \le b_i$ ,  $\forall i$  $x_i \ge 0 \forall j \text{ and } y_i^1, y_i^2 \ge 0 \forall i$ Summary: change difference in objective function to sum and introduce the difference as constraints.

## **MPG Minimization: ILP Notations**

- •M<sub>k,v</sub> : max number of functional units of type F<sub>k,v</sub>
- S<sub>i</sub>: ASAP time stamp for the operation o<sub>i</sub>
- E<sub>i</sub>: ALAP time stamp for the operation o<sub>i</sub>
- P(C<sub>swi</sub>, v, f) : power consumption of F<sub>k,v</sub> used by o<sub>i</sub>
- ${}^{\bullet}\!x_{i,c,v,f}$  : decision variable, which takes the value of 1 if  $o_i$  is scheduled in control step c using  $F_{k,v}$  and c has frequency f

**NOTE:**  $C_{swi}$  is a measure of effective switching capacitance of FU<sub>i</sub>.



**MPG Minimization: ILP (MVDFC)** (1) Objective Function: Minimize the MPG for the whole DFG over all the control steps. Minimize: 1/N-1  $\Sigma_{(\forall c=2\rightarrow N)} |P_c - P_{c-1}|$ (1) The absolute is replaced with sum and the appropriate constraints. Minimize: 1/N-1  $\Sigma_{(\forall c=2\rightarrow N)} P_c + P_{c-1}$ (2)Subject to: Power gradient constraints After simplification, Minimize: 2/N-1  $\Sigma_{(\forall c=2 \rightarrow N-1)} P_c + P_1 + P_N$  (3) Subject to: Power gradient constraints Using decision variables,  $\begin{array}{l} \text{Minimize: } 2/\text{N-1} \Sigma_{c} \Sigma_{i} \Sigma_{v} \Sigma_{f} \ \textbf{x}_{i,c,v,f} \ \textbf{P}(\textbf{C}_{swi}, \textbf{v}, f) + \Sigma_{i} \Sigma_{v} \Sigma_{f} \\ \textbf{x}_{i,1,v,f} \ \textbf{P}(\textbf{C}_{swi}, \textbf{v}, f) + \Sigma_{i} \Sigma_{v} \Sigma_{f} \ \textbf{x}_{i,N,v,f} \ \textbf{P}(\textbf{C}_{swi}, \textbf{v}, f) \ \textbf{(4)} \end{array}$ Subject to: Power gradient constraints Dept. of CSE 22 SOUTH FLORIDA

## **MPG Minimization: ILP (MVDFC)...**

(2) Uniqueness Constraints: ensure that every operation  $o_i$  is scheduled to one unique control step and represented as,  $\forall i, 1 \le i \le O$ ,  $\sum_c \sum_v \sum_f x_{i,c,v,f} = 1$ 

(3) Precedence Constraints: guarantee that for an operation  $o_i$ , all its predecessors are scheduled in an earlier control step and successors are scheduled in an later control step;  $\forall i,j$ , any  $o_i \in \text{Pred}(o_j)$ ,  $\Sigma_v \Sigma_f \Sigma_{\{d=S_i \rightarrow E_i\}} d X_{i,c,v,f} - \Sigma_v \Sigma_f \Sigma_{\{d=S_j \rightarrow E_j\}} e X_{j,c,v,f} \leq -1$ 

(4) Resource Constraints: make sure that no control step contains more than  $F_{k,v}$  operations of type k operating at voltage v and are enforced as,  $\forall c, 1 \leq c \leq N$  and  $\forall v$ ,  $\sum_{\{i \in F_{k,v}\}} \sum_{f} x_{i,c,v,f} \leq M_{k,v}$ 



## **MPG Minimization: ILP (MVDFC)...**

- (5) Frequency Constraints: lower operating voltage functional unit can not be scheduled in a higher frequency control step; these constraints are expressed as,  $\forall i$ ,  $1 \le i \le O$ ,  $\forall c$ ,  $1 \le c \le N$ , if f < v, then  $x_{i,c,v,f} = 0$ .
- (6) Power Gradient Constraints : to eliminate the non-linearity introduced due to the absolute function introduced as,  $\forall c, 2 \le c \le N$ ,

 $\sum_{i}\sum_{v}\sum_{f} x_{i,c,v,f} P(C_{swi},v,f) - \sum_{i}\sum_{v}\sum_{f} x_{i,c-1,v,f} P(C_{swi},v,f) \leq PG_{p}$ 

**NOTE**: The unknown PG<sub>p</sub> is added to the objective function and minimized alongwith it.



## **MPG Minimization: ILP (MVMC)**

We followed similar steps as in the MVDFC case using the new decision variable y<sub>i,v,l,m</sub>. □No frequency constraints involved in MVMC. The following items are formulated: (1) Objective Function (2) Uniqueness Constraints (3) Precedence Constraints (4) Resource Constraints (5) Power Gradient Constraints Calculations of subscripts for decision variables and limits of summations are more involved compared to MVDFC case due to the additional parameter L<sub>i.v</sub>.



## **MPG Minimization: Scheduling**

- Step 1: Construct a look up table for (effective switching capacitance, average switching activity) pairs.
- Step 2: Find ASAP and ALAP schedule for UDFG.
- Step 3: Get the mobility graph.
- Step 4: Use AMPL for ILP formulations of DFG.
- Step 5: Solve the ILP formulations using LP-Solve.
- Step 6: Find the scheduled DFG.
- Step 7: Determine the cycle frequency indices and base frequency for MVDFC scheme.
- Step 8: Estimate power consumptions of the scheduled DFG.



## **MPG Minimization: Results**

- 1.Example circuit (EXP)
- 2.FIR filter
- 3.IIR filter
- 4.HAL differential eqn. solver

5. Auto-Regressive filter (ARF)

| Multipliers                     |      | ALUs |      | Serial |  |  |
|---------------------------------|------|------|------|--------|--|--|
| 2.4V                            | 3.3V | 2.4V | 3.3V | No     |  |  |
| 2                               | 1    | 1    | 1    | RC1    |  |  |
| 3                               | 0    | 1    | 1    | RC2    |  |  |
| 2                               | 0    | 0    | 2    | RC3    |  |  |
| 1                               | 1    | 0    | 1    | RC4    |  |  |
| USF UNIVERSITY OF SOUTH FLORIDA |      |      |      |        |  |  |

## **MPG Minimization: Results ...**

| Percentage reduction using MVDFC or<br>MVMC compared to SVSF |       |       |  |  |  |  |
|--------------------------------------------------------------|-------|-------|--|--|--|--|
| Power                                                        | MVDFC | MVMC  |  |  |  |  |
| MPG                                                          | 73.42 | 47.10 |  |  |  |  |
| Peak Power                                                   | 72.50 | 27.41 |  |  |  |  |
| Average Power                                                | 72.38 | 24.41 |  |  |  |  |
| Energy (PDP)                                                 | 54.13 | 0.0   |  |  |  |  |



### **MPG Minimization: Results**



### **MPG Minimization: Results ...**



30

# Low Power VLSI Design (Dual Voltage Dual Frequency Operating Image Watermarking Chip)



## **Digital Watermarking** ?

It is mine **Digital watermarking is** No, It is mine process for a embedding data (watermark) into a multimedia object for **Multimedia** its copyright protection **Object** and authentication. **Owner** Hacker Whose is this ? How to know? Types What's the solution of this Visible and Invisible ownership problem? Spatial/DCT/ Wavelet Solution Robust and Fragile "WATERMARKING" Researcher JNIVERSITY OF Dept. of CSE 32 SOUTH FLORIDA

## An Watermarked Image (from IBM)



## Watermarking: General Framework

>Encoder: Inserts the watermark into the host image Decoder: Decodes or extracts the watermark from image Comparator: Verifies if extracted watermark matches with the inserted one **NOTE:** We focus of design of encoders.



#### **Previous Work** (Hardware based Watermarking)

| Work                | Туре                 | Target<br>Object | Domain  | Techn<br>ology | Chip<br>Power |
|---------------------|----------------------|------------------|---------|----------------|---------------|
| Strycker,<br>2000   | Invisible<br>Robust  | Video            | Spatial | NA             | NA            |
| Tsai and<br>Lu 2001 | Invisible<br>Robust  | Video            | DCT     | 0.35µ          | 62.8<br>mW    |
| Mathai,<br>2003     | Invisible<br>Robust  | Image            | Wavelet | 0.18µ          | NA            |
| Garimella,<br>2003  | Invisible<br>Fragile | Image            | Spatial | 0.13µ          | 37.6<br>μW    |

## **Highlights of our Designed Chip**

- DCT domain Implementation
- First to insert both visible and / or invisible watermark
- First Low Power Design for watermarking using dual voltage and dual frequency
- Uses Pipelined / Parallelization for better performance



## Watermarking: JPEG Encoder (DCT Domain)









## **Invisible Algorithm Implemented**

- 1. Divide the original image into blocks.
- 2. Calculate the DCT coefficients of all the image blocks.
- 3. Generate random numbers to use as watermark.
- 4. Consider the three largest AC-DCT coefficients of an image block for watermark insertion.

Reference: I.J. Cox, et. al., "Secure Spread Spectrum Watermarking for Multimedia", IEEE Transactions on Image Processing, 1997.



## **Visible Algorithm Implemented**

- 1. Divide Original and watermark image into blocks.
- 2. Calculate DCT coefficients of all the blocks.
- 3. Find the edge blocks in the original image.
- 4. Find the local and global statistics of original image using DC-DCT and AC-DCT coefficients.
- 5. The mean of DC-DCT coefficients and mean and the variance of AC-DCT coefficients are useful.
- 6. Calculate the Scaling and embedding factors.
- 7. Add the original image DCT coefficients and the watermark DCT coefficients block by block.

Reference: S. P. Mohanty, and et. al., "A DCT Domain Visible Watermarking Technique for Images", *Proc. of the ICME* 2000.



#### **The Proposed Architecture**



## **Modules in Proposed Architecture**

- DCT Module: Calculates the DCT coefficients.
- Edge Detection Module: Determines edge blocks.
- Perceptual Analyzer Module: Determines perceptually significant regions using original image statistics.
- Scaling and Embedding Factor Module: Determines the scaling and embedding factors for visible watermark insertion.
- Watermark Insertion Module: Inserts the watermark
- Random Number Generator Module: Generates
  random numbers.





## **Modules in Proposed Architecture**



Dept. of CSE

USF UNIVERSITY OF SOUTH FLORIDA

### **Pipeline and Parallelism**





# **Prototype Chip Implementation** Tools used for the design

| Tools used                 | Purpose                       |
|----------------------------|-------------------------------|
| Cadence NClaunch           | VHDL simulator                |
| Synopsys Design Analyzer   | Verilog netlist generation    |
| Cadence Silicon Ensemble   | Layout, Placement and routing |
| Cadence Virtuose tool      | Layout editing                |
| Cadence Abstract Generator | Abstract generation           |
| Synopsys Nanosim           | Power and delay calculations  |

Standard Cells obtained from Virginia Tech. Technology: TSMC 0.25  $\,\mu\text{m}$ 





## **Prototype Chip: Floor plan**







## **Prototype Chip: Statistics**

Technology: TSMC 0.25 µ Total Area: 16.2 sq mm Dual Clocks: 280 MHz and 70 MHz Dual Voltages: 2.5V and 1.5V No. of Transistors: 1.4 million Power (dual voltage and frequency): 0.3 mW Power (single voltage and frequency): 1.9 mW



### Conclusions

- We capture power fluctuation in MVDFC and MVMC design scenario using the function MPG and minimize it using ILP.
- The MVDFC approach is better alternative. It is observed that for the circuits with equal number of addition and multiplication operations in the critical path the savings are maximum with no time penalty.
- Polynomial time complexity heuristic algorithms can be developed to obtain suboptimal, but faster solutions.
- The scheduling schemes are useful for data intensive applications.
- □ It is observed that the designed chip consumes only one fifth of the power compared conventional design.





