#### A Dual Voltage Dual Frequency Low Power VLSI Chip for Image Watermarking

#### Saraju P. Mohanty Dept. of CSE, University of North Texas smohanty@cs.unt.edu http://www.cs.unt.edu/~smohanty/

Dept. of Comp. Sc. & Engineering



# **Outline of the Talk**

- Introduction
- Why Low Power?
- Related Works
- Watermarking Algorithms
- Proposed Architecture
- Prototype Chip Implementation
- Conclusions



## Why Low-Power ? Major Motivation Extending battery life for portable applications



Dept. of Comp. Sc. & Engineering



#### Why Low-Power ? .....

#### Battery lifetime



Environmental concerns



Dept. of Comp. Sc. & Engineering

pard."



Cooling and energy costs



System reliability



# Power Consumption in CMOS Circuits Power Dissipation



**Leakage Current**: Reverse biased current in the parasitic diode and subthreshold current due to charge inversion existing at gate below  $V_{T}$ .

Standby Current: Continuous DC current from V<sub>dd</sub> to ground

Short-Circuit Current: DC current from V<sub>dd</sub> to ground during output transition

Capacitive Current: Flows to charge discharge capacitive loads.

Dept. of Comp. Sc. & Engineering



#### **Dynamic Power Consumption**

Let,  $C_L$  = load capacitor,  $V_{dd}$  = supply voltage, N = average number of transitions/clock cycle =  $E(sw) = \alpha$  and f = clock frequency. The dynamic power consumption for CMOS:

$$P_{dynamic} = \frac{1}{2} C_L V_{dd}^2 N f$$

- Veendrick Observation: In a well designed circuit, shortcircuit power dissipation is less than 20% of the dynamic power dissipation.
- Sylvester and Kaul: At larger switching activity the static power is negligible compared to the dynamic power.

We focus on dynamic power reduction !!



#### **Dynamic Power Reduction ?**

- Reduce Supply Voltage (V<sub>dd</sub>): delay increases; performance degradation
- Reduce Clock Frequency (f): only power saving no energy savings; performance degradation
- Reduce Switching Activity (N or E(sw)): no switching no power loss !!! Not fully under designers control. Switching activity depends on the logic function and correlations are difficult to handle.
- Reduce Physical Capacitance: done by reducing device size reduces the current drive of the transistor making the circuit slow



#### Our Approach ?

Adjust the frequency and supply voltage in a co-coordinated manner to reduce dynamic power while maintaining performance.



# Digital Watermarking ?



Dept. of Comp. Sc. & Engineering

JNIVERSITY<sub>of</sub> North Texas

# Digital Watermarking ?

Digital watermarking is a process for embedding data (watermark) into a multimedia object for its copyright protection and authentication.

#### <u>Types</u>

Visible and Invisible
Spatial/DCT/ Wavelet
Robust and Fragile



# An Watermarked Image (from IBM)



Dept. of Comp. Sc. & Engineering







Dept. of Comp. Sc. & Engineering



# Watermarking: General Framework

- Encoder: Inserts the watermark into the host image
- Decoder: Decodes or extracts the watermark from image
- Comparator: Verifies if extracted watermark matches with the inserted one



# Why Hardware Implementation ?

Hardware implementations of watermarking algorithms necessary for various reasons:

- Easy integration with multimedia hardware, such as digital camera, camcorder, etc.
- Low power
- High performance
- Reliable
- Real time applications



#### Previous Work (Hardware based Watermarking)

| Work                | Туре                 | Target<br>Object | Domain  | Techn<br>ology | Chip<br>Power |
|---------------------|----------------------|------------------|---------|----------------|---------------|
| Strycker,<br>2000   | Invisible<br>Robust  | Video            | Spatial | NA             | NA            |
| Tsai and Lu<br>2001 | Invisible<br>Robust  | Video            | DCT     | 0.35µ          | 62.8<br>mW    |
| Mathai,<br>2003     | Invisible<br>Robust  | Image            | Wavelet | 0.18µ          | NA            |
| Garimella,<br>2003  | Invisible<br>Fragile | Image            | Spatial | 0.13µ          | 37.6<br>μW    |



# **Previous Work: Summary**

- Many software implementations of watermarking algorithms.
- Only few hardware implementations.
- Just one hardware implementation in frequency domain which can insert only invisible watermark.
- All other implementations in spatial domain.



# **Highlights of our Designed Chip**

- DCT domain Implementation
- First to insert both visible and / or invisible watermark
- First Low Power Design for watermarking using dual voltage and dual frequency
- Uses Pipelined / Parallelization for better performance



#### Watermarking through JPEG Encoder





# Watermarking Through Digital Still Camera





# **Invisible Algorithm Implemented**

- 1. Divide the original image into blocks.
- 2. Calculate the DCT coefficients of all the image blocks.
- 3. Generate random numbers to use as watermark.
- 4. Consider the three largest AC-DCT coefficients of an image block for watermark insertion.

Reference: I.J. Cox, et. al., "Secure Spread Spectrum Watermarking for Multimedia", IEEE transactions on Image Processing, 1997.

Dept. of Comp. Sc. & Engineering



# Visible Algorithm Implemented

- 1. Divide Original and watermark image into blocks.
- 2. Calculate DCT coefficients of all the blocks.
- 3. Find the edge blocks in the original image.
- 4. Find the local and global statistics of original image using DC-DCT and AC-DCT coefficients.
- 5. The mean of DC-DCT coefficients and mean and the variance of AC-DCT coefficients are useful.
- 6. Calculate the Scaling and embedding factors.
- 7. Add the original image DCT coefficients and the watermark DCT coefficients block by block.

Reference: S. P. Mohanty, and et. al., "A DCT Domain Visible Watermarking Technique for Images", *Proc. of the IEEE ICME* 2000.

Dept. of Comp. Sc. & Engineering



#### **The Proposed Architecture**



# Highlights of the Proposed Architecture

- Hierarchical architecture.
- Decentralized controller scheme.
- Parallelism and Pipelining exploited.
- Dual Voltage and dual frequency mode operation



- DCT Module: Calculates the DCT coefficients.
- Edge Detection Module: Determines edge blocks.
- Perceptual Analyzer Module: Determines perceptually significant regions using original image statistics.
- Scaling and Embedding Factor Module: Determines the scaling and embedding factors for visible watermark insertion.
- Watermark Insertion Module: Inserts the watermark
- Random Number Generator Module: Generates random numbers.







Dept. of Comp. Sc. & Engineering

University<sub>of</sub> North Texas

Pseudorandom numbers generated using LFSR.







y00=((x00\*c00) + (x10\*c01) + (x20\*c02) + (x30\*c03)) y01=((x00\*c10) + (x10\*c11) + (x20\*c12) + (x30\*c13)) y02=((x00\*c20) + (x10\*c21) + (x20\*c22) + (x30\*c23))y03=((x00\*c30) + (x10\*c31) + (x20\*c32) + (x30\*c33))

x00=((in00\*c00) + (in01\*c01) + (in02\*c02) + (in03\*c03))x10=((in10\*c00) + (in11\*c01) + (in12\*c02) + (in13\*c03)) x20=((in20\*c00) + (in21\*c01) + (in22\*c02) + (in23\*c03))

x30=((in30\*c00) + (in31\*c01) + (in32\*c02) + (in33\*c03))

DCT module implements the following set of equations.

Modules in more Detail: DCT Module

#### Modules in more Detail: DCT Module

DCT module implemented as arrays of multipliers and adders.





#### Modules in more Detail: Scaling and Embedding Factor, Visible Insertion



#### Modules in more Detail: Invisible Insertion

• Insertion module implemented with a multiplier and an adder.

 $C_{\text{IWk}} = C_{\text{Ik}} + \alpha r_{\text{IK}}$ 





#### **Pipeline and Parallelism**



#### **Dual Voltage and Dual Frequency**



# **Dual Voltage: Level Converters**

- Level converters required to step up the low voltage to high voltage.
- Traditional level converter: Differential Cascode Voltage Switch (DCVS).
- In this work: Single Supply Level Converters faster, better power consumption, needs single voltage supply only.

Reference: R.Puri et. al., "Pushing ASIC performance in a power envelope" in the Proceedings of the Design Automation Conference, 2003, pp. 788-793



#### Layout and Schematic of SSLV







# **Prototype Chip Implementation: Tools Used**

| Tools                      | Purpose                       |  |  |
|----------------------------|-------------------------------|--|--|
| Cadence NClaunch           | VHDL simulator                |  |  |
| Synopsys Design Analyzer   | Verilog netlist generation    |  |  |
| Cadence Silicon Ensemble   | Layout, Placement and routing |  |  |
| Cadence Virtuose tool      | Layout Editing                |  |  |
| Cadence Abstract Generator | Abstract generation           |  |  |
| Synopsys Nanosim           | Power and delay calculations  |  |  |

Standard Cell Design Style adopted. Standard Cells obtained from Virginia Tech. Technology: TSMC 0.25 µm



# **Prototype Chip Implementation: Design Flow**





# **Design Flow Example: VHDL Code**

```
File Edit Window Tools Syntax
                                                                                 Help
entity edm3 is
port (clk, reset, vdd1, enable, vss1 : in std_logic;
      AnMax, An : in std_logic_vector(16 downto 0);
      edge_block, done, write_edm3 : out std_logic;
      countout : out std_logic_vector(7 downto 0)
      0 :
end entity edm3;
architecture behav of edm3 is
component counter8 is
port (clk : in std_logic;
      reset, vdd1, enable : in std_logic;
      q : inout std_logic_vector(7 downto 0)
      0:
end component counter8;
signal An_max, AnMax_by_2 : std_logic_vector(16 downto 0);
signal count_out : std_logic_vector(7 downto 0);
signal At, A, B, Bt, count, edgeblock, en_count, write, proces,
tempedgeblock : std_logic;
begin
COUNTER: counter8 port map (clk=>clk, reset=>reset, enable=>write, vdd1=>vdd1,
q=>count_out);
countout<=count_out;</pre>
counting: process(count_out) is
           begin
          if (count_out="111111111") then
          count <= 11;
          else
          count<='0':
          end if;
          end process;
```



# Design Flow Example: Synthesized Verilog Netlist

<u>File Edit Window T</u>ools Syntax

```
Help
```

nodule edm3 ( clk, reset, vdd1, enable, vss1, AnMax, An, edge\_block, done, write\_edm3, countout ); input [16:0] An; input [16:0] AnMax; output [7:0] countout; input clk, reset, vdd1, enable, vss1; output edge\_block, done, write\_edm3; wire Bt, n\_133, At88, n\_134, At, \"<"-return148 , count, Bt100, n195, n196,</pre> n197, n198, n199, n200, n201, n202, n203, n204, n205, n206, \\*cell\*78/U5/Z\_0 counter8 COUNTER ( .clk(clk), .reset(reset), .vdd1(vdd1), .enable( write\_edm3), .q(countout) ); and3\_1 U39 ( .ip1(n195), .ip2(n\_133), .ip3(n196), .op(\\*cell\*78/U5/Z\_0 ) or2\_1 U40 ( .ip1(n197), .ip2(n198), .op(At88) ); and2\_1 U41 ( .ip1(\"<"-return148 ), .ip2(n\_133), .op(n\_134) ); inv\_1 U42 ( .ip(reset), .op(n\_133) ); nand2\_1 U43 ( .ip1(n199), .ip2(n200), .op(Bt100) ); nard2\_1 U43 ( .ip1(n199), .ip2(n200), .op(Bt100) ); nor2\_1 U44 ( .ip1(n201), .ip2(n199), .op(n197) ); nor2\_1 U45 ( .ip1(n198), .ip2(n201), .op(n202) ); nor3\_1 U46 ( .ip1(n203), .ip2(n204), .ip3(n205), .op(count) ); nand2\_1 U47 ( .ip1(At), .ip2(n\_133), .op(n199) ); mux2\_2 U48 ( .ip1(n202), .ip2(n198), .s(n199), .op(write\_edm3) ); mux2\_2 U49 ( .ip1(n201), .ip2(enable), .s(n199), .op(n195) ); and2\_1 U50 ( .ip1(countout[1]), .ip2(countout[3]), .op(n206) ); nand3\_1 U51 ( .ip1(countout[0]), .ip2(countout[2]), .ip3(n206), .op(n205) nand2\_1 U52 ( .ip1(countout[5]), .ip2(countout[4]), .op(n203) ); nand2\_1 U53 ( .ip1(countout[7]), .ip2(countout[6]), .op(n204) ); inv\_1 U54 ( .ip(count), .op(n201) ); nand2\_1 U55 ( .ip1(Bt), .ip2(n\_133), .op(n196) ); inv\_1 U56 ( .ip(n196), .op(n198) ); nand2\_1 U57 ( .ip1(enable), .ip2(n196), .op(n200) ); drp\_2 At\_reg ( .ck(clk), .ip(At88), .rb(n\_133), .q(At) ); lp\_2 edgeblock\_reg ( .ck(\\*cell\*78/U5/Z\_0 ), .ip(n\_134), .q(edge\_block) ); dp\_2 done\_reg ( .ck(clk), .ip(count), .q(done) ); drp\_2 Bt\_reg ( .ck(clk), .ip(Bt100), .rb(n\_133), .q(Bt) ); edm3\_DW01\_cmp2\_17\_0 \lt\_100/lt/lt ( .A(An), .B({vss1, AnMax[16], AnMax[15], AnMax[14], AnMax[13], AnMax[12], AnMax[11], AnMax[10], AnMax[9], AnMax[8], AnMax[7], AnMax[6], AnMax[5], AnMax[4], AnMax[3], AnMax[2], AnMax[1]}), .LEQ(1'b0), .TC(1'b0), .LT\_LE(\"<"-return148 )</pre> ); endmodu] e

Dept. of Comp. Sc. & Engineering



# Design Flow Example: Placement and Routing



Dept. of Comp. Sc. & Engineering



# **Design Flow Example: Layout**





#### **Design Flow Example: Abstract Generation**

| X: -194635    | Y: 214994                                                              | Selected: 0 | DX: 0 | DY: 1071                            | Mem: 77245                           |
|---------------|------------------------------------------------------------------------|-------------|-------|-------------------------------------|--------------------------------------|
| File Edit Vie | ew                                                                     |             |       |                                     | ****                                 |
|               |                                                                        |             |       |                                     | 🏷 🛔 🧖                                |
|               |                                                                        |             |       |                                     | <b>Fit Fit ■</b>                     |
|               | 5 201 (* 115 6) (* 15 6)<br>1 201 (* 106 6) (* 16 6)<br>1 201 (* 16 6) |             |       |                                     |                                      |
|               |                                                                        |             |       |                                     | In In 2X F Sel                       |
|               |                                                                        |             |       |                                     | Out Out 2X Prev.                     |
|               | 6:6:6:6:6:6:6:6:6:6:6:6:6:6:6:6:6:6:6:                                 |             |       |                                     | L0 L1 LAII<br>Pan Redraw Clear       |
|               | , , , , , , , , , , , , , , , , , , ,                                  |             |       |                                     | Select                               |
|               | e del el els els la certe.<br>Sederet els els la certe el              |             |       |                                     | term Box+ Box-                       |
|               |                                                                        |             |       |                                     | iTerm All+ All-                      |
|               |                                                                        |             |       | CONTRACTOR CONTRACTOR IN CONTRACTOR | bus<br>pin<br>All Type-<br>By Name   |
|               |                                                                        |             |       |                                     | iPin By Conn                         |
|               | e torre tre a torre to to                                              |             |       |                                     | Summary Detailed                     |
|               |                                                                        |             |       |                                     | - Edit                               |
|               | r Carra di Sakata da Car<br>Carra di Sakata da Carra                   |             |       |                                     | Move snap90 🛲                        |
|               |                                                                        |             |       |                                     | Rotate RO 🛲                          |
|               |                                                                        |             |       |                                     | Properties Delete<br>Command History |
|               |                                                                        |             |       |                                     | J                                    |
|               |                                                                        |             |       |                                     |                                      |
|               |                                                                        |             |       |                                     |                                      |

Dept. of Comp. Sc. & Engineering



#### **Overall Prototype Chip: Layout**



Dept. of Comp. Sc. & Engineering



# **Prototype Chip: Floor plan**







# **Prototype Chip: Statistics**

Technology: TSMC 0.25 μ Total Area : 16.2 sq mm Dual Clocks: 280 MHz and 70 MHz Dual Voltages: 2.5V and 1.5V No. of Transistors: 1.4 million Power (dual voltage and frequency): 0.3 mW Chip (single voltage and frequency): 1.9 mW



# **Conclusion and Future Work**

- Dual Voltage, Dual frequency watermarking chip was developed.
- Invisible / Visible insertion
- Pipelined and Parallelized architecture for performance.
- Frequency domain implementation for real time audio and video watermarking.
- Real time watermark extraction.
- Need more robust watermarking algorithms.

