## Simultaneous Peak and Average Power Minimization during Datapath Scheduling for DSP Procesors Saraju P. Mohanty, N. Ranganathan and Sunil K. Chappidi Dept of Computer Science and Engineering, NNRC University of South Florida, Tampa, FL 33620, USA <a href="mailto:smohanty">smohanty</a>, ranganat, chappidi@csee.usf.edu #### Outline of the talk - Introduction - Related work - Target architecture - Power model - ILP formulations - Scheduling algorithm - Experimental results #### Peak power? The peak power is the maximum power consumption of the circuit at any instance during its execution. ## **Average power?** Total energy consumed per unit time. #### Why peak power reduction? #### Reduction of peak power consumption is essential: - (i) to maintain supply voltage levels - (ii) to increase reliability - (iii) to use smaller heat sinks - (iv) to make packaging cheaper #### Why average power reduction? The average power reduction is essential for the following reasons: - (i) to increase battery life time, - (ii) to enhance noise margin, - (iii) to reduce cooling and energy costs, - (iv) to reduce use of natural resources and - (v) to increase system reliability #### Related work (Energy efficient scheduling using voltage reduction) - Chang and Pedram [3], 1997 Dynamic programming - Johnson and Roy [4], 1997 ILP based MOVER algorithm using multiple supply voltages - Lin, Hwang and Wu [6], 1997 ILP and heuristic for variable voltages (VV) and multicycling (MC) - Mohanty and Ranganathan [10], 2003 Heuristic based using multiple voltage and dynamic clocking #### Related work #### (Peak Power efficient scheduling) - Martin and Knight [7,8,9], 1996 Simultaneous assignment and scheduling - Raghunathan, Ravi and Raghunathan [10], 2001 data monitor operations in VHDL - Mohanty, Ranganathan & Chappidi [12], 2003 ILP based using multiple voltages, dynamic clocking & multicycling - Shiue [15], 2000 ILP based and modified force direct scheduling for peak power minimization - Shiue and Chakrabarti [16], 2000 ILP model to minimize peak power and area for single voltage #### Voltage, Frequency and Power Trade-offs - (i) voltage reduction increase in delay - (ii) frequency reduction reduction in power not energy (and increase in delay) "Beyond of (i) and (ii) reduction of switching capacitance can be considered." #### What is our approach? Adjust the frequency and reduce the supply voltage for peak and average power reduction during datapath scheduling. #### **Target architecture** - Each functional unit has one register and one multiplexor. - The register and the multiplexor operate at the same voltage level as that of the functional units. - Level converters are used when a low-voltage functional unit is driving a high-voltage functional unit. - $\Box$ Operational delay of a FU: $(d_{FU} + d_{Mux} + d_{Reg} + d_{Conv})$ . #### **Power model**: Notations For a DFG let us assume: c = any control step or clock cycle in DFG N = total number of control steps in the DFG $R_c$ = number of resources active in step c f<sub>c</sub> = cycle frequency for control step c $\alpha_{i,c}$ = switching at resource i active in step c C<sub>i,c</sub> = load capacitance of resource i active in step c $V_{i,c}$ = operating voltage of resource i active in step c ## Cycle power (P<sub>c</sub>) The power consumption for any control step c is given by, $$P_c = \sum_{i=\{1\rightarrow Rc\}} \alpha_{i,c} C_{i,c} V_{i,c}^2 f_c$$ ## Peak power (P<sub>p</sub>) The peak power consumption of the DFG is the maximum power consumption over all the control steps, $$P_p = maximum(P_c)_{c=\{1\rightarrow N\}}$$ Using the above two equations the peak power consumption of the DFG is described as, $$P_p = \text{maximum} \left( \sum_{i=\{1 \rightarrow Rc\}} \alpha_{i,c} C_{i,c} V^2_{i,c} f_c \right)_{c=\{1 \rightarrow N\}}$$ ## 20 ## Average power (P<sub>a</sub>) Characterized as mean of the cycle power ( $P_c$ ): $$P_{a} = 1/N \left( \sum_{c=\{1 \to N\}} P_{c} \right)$$ $$= 1/N \left( \sum_{c=\{1 \to N\}} \sum_{i=\{1 \to Rc\}} \alpha_{i,c} C_{i,c} V^{2}_{i,c} f_{c} \right)$$ ## **Objective** ? To minimize: Peak power (P<sub>p</sub>) + Average power (P<sub>a</sub>) #### **ILP formulations for MVDFC: notations** O: total number of operations in the DFG $o_i$ : any operation i, 1 <= i <= 0 F<sub>k,v</sub>: functional unit of type k operating at voltage level v M<sub>k,v</sub>:maximum number of functional units of type k operating at voltage level v S<sub>i</sub>: as soon as possible time stamp for the operation o<sub>i</sub> E<sub>i</sub> :as late as possible time stamp for the operation o<sub>i</sub> P(i,v,f): power consumption due to operation $o_i$ using resource $F_{k,v}$ , which is operating at frequency f $x_{i,c,v,f}$ : decision variable which takes the value of 1 if operation $o_i$ is scheduled in control step c using the functional unit $F_{k,v}$ and c has frequency f - (i) Objective Function - (ii) Uniqueness Constraints - (iii) Precedence Constraints - (iv) Resource Constraints - (v) Frequency Constraints - (vi) Peak Power Constraints #### Objective Function: Minimize ( $$P_p + 1/N \Sigma_c \Sigma_v \Sigma_i \Sigma_f x_{i,c,v,f} * P(i,v,f)$$ ) #### **Uniqueness Constraints:** ensure that every operation $o_i$ is scheduled to one unique control step and represented as, $\forall i, 1 \leq I \leq O, \Sigma_c \Sigma_v \Sigma_f x_{i,c,v,f} = 1$ #### **Precedence Constraints:** guarantee that for an operation $o_i$ , all its predecessors are scheduled in an earlier control step and successors are scheduled in an later control step; $\forall i,j, \, o_i$ belong to $\text{Pred}(o_j), \, \sum_{v} \sum_{f} \sum_{\{d=S_i \, \rightarrow \, E_i\}} d \, x_{i,c,v,f} - \sum_{v} \sum_{f} \sum_{\{d=S_j \, \rightarrow \, E_j\}} e \, x_{j,c,v,f} \leq -1$ #### **Resource Constraints:** make sure that no control step contains more than $F_{k,v}$ operations of type k operating at voltage v and are enforced as, $\forall c, 1 \le c \le N$ and $\forall v$ , $$\sum_{\{i \in F_{k,v}\}} \sum_{f} x_{i,c,v,f} \leq M_{k,v}$$ #### Frequency Constraints: lower operating voltage functional unit can not be scheduled in a higher frequency control step; these constraints are expressed as, $\forall i$ , $1 \le i \le O$ , $\forall c$ , $1 \le c \le N$ , if f < v, then $x_{i,c,v,f} = 0$ . #### **Peak Power Constraints:** ensure that the maximum power consumption of the DFG does not exceed $P_p$ for any control step and we enforce these constraints as follows, $\forall c$ , $1 \le c \le N$ and $\forall v$ , $$\Sigma_{\{i \in F_{k,v}\}} \Sigma_f x_{i,c,v,f} P(i,v,f) \leq P_p$$ ## M #### **ILP formulations for MVMC: notations** O: total number of operations in the DFG o<sub>i</sub>: any operation i, 1 <= i <= O $F_{k,v}$ : functional unit of type k operating at voltage level v $M_{k,v}$ :maximum number of functional unit $F_{k,v}$ S<sub>i</sub>: as soon as possible time stamp for the operation o<sub>i</sub> E<sub>i</sub> :as late as possible time stamp for the operation o<sub>i</sub> $P(i,v,f_{clk})$ : power consumption due to operation $o_i$ using resource $F_{k,v}$ $y_{i,v,l,m}$ : decision variable which takes value of 1 if operation $o_i$ is using $F_{k,v}$ and scheduled in control steps $I \rightarrow m$ $L_{i,v}$ : latency for operation $o_i$ using $F_{k,v}$ - (i) Objective Function - (ii) Uniqueness Constraints - (iii) Precedence Constraints - (iv) Resource Constraints - (v) Peak Power Constraints #### Objective Function: Minimize ( $$P_p + 1/N \Sigma_l \Sigma_i \Sigma_v y_{i,v,l,(l+L_{i,v}-1)}$$ ) #### **Uniqueness Constraints:** ensure that every operation $o_i$ is scheduled to appropriate control steps within the range $(S_i, E_i)$ and represented as, $\forall i, 1 \le I \le O$ , $\sum_{\{l=S_i \to (S_i+E_i+1-L_{i,v})\}} y_{i,v,l,(l+L_{i,v}-1)} = 1$ ## M #### **ILP formulations for MVMC ....** #### **Precedence Constraints:** guarantee that for an operation $o_i$ , all its predecessors are scheduled in an earlier control step and its successors are scheduled in an later control step and are; $\forall$ i,j, $o_i$ belong to $Pred(o_i)$ , $$\begin{split} \Sigma_v & \Sigma_{\{l=S_i \rightarrow E_i\}} (l+L_{i,v}-1) y_{i,v,l,(l+Li,v-1)} \\ & - \Sigma_v \Sigma_{\{l=S_j \rightarrow E_j\}} l \ y_{j,v,l,(l+L_{j,v}-1)} \leq -1 \end{split}$$ #### **Resource Constraints:** make sure that no control step contains more than $F_{k,v}$ operations of type k operating at voltage v and are enforced as, $\Sigma_{\{i \in F_{k,v}\}} \Sigma_{l} y_{i,v,l,(l+L_{i,v}-1)} \leq M_{k,v}$ #### **Peak Power Constraints:** ensure that the maximum power consumption of the DFG does not exceed $P_{peak}$ for any control step and we enforce these constraints as follows, for all c, $1 \le c \le N$ and for all v, $\sum_{\{i \in F_k, v\}} \sum_{v} y_{i,v,l,(l+L_i,v-1)} P(i,v,f_{clk}) \le P_p$ #### Scheduling algorithm #### Input: - (i) unscheduled DFG - (ii) resource constraints - (iii) number of voltage levels - (iv) number of frequencies - (v) delay of resources #### **Output:** scheduled DFG, frequency, power #### Scheduling algorithm .... - Step 1: Find ASAP schedule of the UDFG. - Step 2: Find ALAP schedule of the UDFG. - **Step 3**: Determine the mobility graphs for each node. - **Step 4**: Modify the mobility graph for multicycling. - **Step 5**: Calculate operating frequency of a FU using delay model. - **Step 6**: Construct the ILP formulations of the DFG. - **Step 7**: Solve the ILP formulations using LP-Solve. - **Step 8**: Obtain the scheduled DFG. - **Step 9**: Determine cycle frequency, power. ## **Example Data Flow Graph** **ASAP** schedule **ALAP** schedule ## **Scheduling for MVDFC** **Mobility Graph** Scheduled DFG (for RC3) ## **Scheduling for MVMC** **Mobility Graph** Scheduled DFG (for RC3) #### **Experimental results: benchmarks** Example circuit (EXP) (8 nodes, 3\*, 3+, 9 edges) FIR filter (11 nodes, 5\*, 4+, 19 edges) IIR filter (11 nodes, 5\*, 4+, 19 edges) HAL solver (13 nodes, 6\*, 2+, 2-, 1 <, 16 edges) Auto-Regressive filter (ARF) (15 nodes, 5\*, 8+, 19 edges) ## M ## **Experimental results: resource constraints** | Multi | pliers | AL | Serial No | | |-------|--------|------|-----------|-----| | 2.4V | 3.3V | 2.4V | 3.3V | | | 2 | 1 | 1 | 1 | RC1 | | 3 | 0 | 1 | 1 | RC2 | | 2 | 0 | 0 | 2 | RC3 | | 1 | 1 | 0 | 1 | RC4 | ## **Experimental results: notations** S: single supply voltage and single frequency operation D: multiple supply voltages and dynamic frequency clocking operation M: multiple supply voltages and multicycling operation $$P_{pS}$$ , $P_{pD}$ and $P_{pM}$ : peak power in mW $$P_{aS}$$ , $P_{aD}$ and $P_{aM}$ : average power in mW $$\Delta P_{pD} = (P_{pS} - P_{pD}) / P_{pS} * 100$$ : % peak power reduction $$\Delta P_{pM} = (P_{pS} - P_{pM}) / P_{pS} * 100 : \%$$ peak power reduction $$\Delta P_{aD} = (P_{aS} - P_{aD}) / P_{aS} * 100$$ : % average power reduction $$\Delta P_{aM} = (P_{aS} - P_{aM}) / P_{aS} * 100 : % average power reduction$$ $$\Delta PDP_D = (PDP_S - PDP_{DFC}) / PDP_S * 100 : % PDP reduction$$ $$\Delta PDP_{M} = (PDP_{S} - PDP_{MC}) / PDP_{S} * 100 : \% PDP reduction$$ ## Experimental results : (% reduction) | | RCs | $\Delta P_{pD}$ | $\Delta P_{pM}$ | $\Delta P_{aD}$ | $\Delta P_{aM}$ | $\Delta PDP_D$ | $\Delta PDP_{M}$ | |--------|-----|-----------------|-----------------|-----------------|-----------------|----------------|------------------| | E | 1 | 74 | 49 | 73 | 26 | 55 | 0 | | X | 2 | 74 | 21 | 73 | 21 | 55 | 0 | | Р | 3 | 74 | 47 | 71 | 37 | 56 | 0 | | F | 1 | 74 | 49 | 74 | 18 | 53 | 0 | | | 2 | 74 | 21 | 73 | 13 | 52 | 0 | | R | 3 | 74 | 21 | 72 | 25 | 53 | 0 | | | 1 | 66 | 32 | 68 | 19 | 53 | 0 | | | 2 | 74 | 47 | 73 | 30 | 60 | 0 | | R | 3 | 74 | 47 | 72 | 41 | 59 | 5 | | H | 1 | 75 | 24 | 73 | 33 | 53 | 0 | | A<br>L | 2 | 75 | 22 | 73 | 30 | 53 | 0 | | _ | 3 | 73 | 47 | 72 | 40 | 54 | 0 | ## Ŋ. ## Reductions using different schedulers | | DF | -C | Shi<br>[1 | | | ertin<br>9] | Raghuna-<br>than[13] | | | nanty<br>1] | |-----------------|-----------------|-----------------|-----------------|-----------------|-----------------|-----------------|----------------------|-----------------|-----------------|-----------------| | Bench-<br>marks | $\Delta P_{pD}$ | $\Delta P_{aD}$ | $\Delta P_{pD}$ | $\Delta P_{aD}$ | $\Delta P_{pD}$ | $\Delta P_{aD}$ | $\Delta P_{pD}$ | $\Delta P_{aD}$ | $\Delta P_{pD}$ | $\Delta P_{aD}$ | | EXP | 73 | 72 | _ | _ | - | _ | _ | - | _ | - | | FIR | 71 | 72 | 63 | NA | 40 | NO | 23 | 38 | 71 | 53 | | IIR | 69 | 69 | _ | _ | - | _ | _ | - | _ | 1 | | HAL | 71 | 71 | 28 | NA | - | _ | _ | - | 73 | 70 | | ARF | 73 | 71 | 50 | NA | - | - | - | - | 68 | 67 | #### **Conclusions** | ☐ This paper describes peak and average power reduction schemes at behavioral level through datapath scheduling. | |------------------------------------------------------------------------------------------------------------------| | ☐ The scheduling schemes use ILP based minimization for MVDFC and MVMC mode of circuit design. | | ☐ For both the modes the scheduler could achieve significant peak and average power reduction. | | ☐ The scheduling schemes are useful for data intensive DSP applications. | | ☐ The applicability of the scheduling schemes for pipelining is to be investigated. | | ☐ The effect of switching activity is to be taken into account. | | ☐The effect on clocking network is to be studied. | # Thank you