CPEN 311
# Power

Updated 2018-03-19

**Why is power important?**

- Hard to get large current into a chip. This is because chips are small and there are only fixed amount of pins. It is also difficult to distribute the power across the chip which leads to uneven heating.
- In cheaper devices like phones, etc., we want power to be around or less than 2W.
- Vital to mobile / portable systems since energy is very heavy. More energy required means more battery mass.

Every transistor on the chip leaks power, regardless of what it’s doing. When the voltage is small (which usually is), there will be current that leaks to ground.

SRAM are very leaky.

Every time a node is switched (change in voltage), some power is dissipated due to charging the capacitance.

\[P_\text{dynamic}=\alpha fCV^2\]Where \(\alpha\) is some constant called **switch activity**, \(f\) is frequency, \(C\) is the equivalent capacitance, and \(V\) is the voltage. Note that we can’t really overclock (increase frequency) without increasing \(V\).

FPGA is basically a giant SRAM, thus FPGAs are very power hungry. The programmable switches implies more transistors wand more capacitances which lead to both more leak and dynamic power dissipated.

FPGA tools have technique to estimate power.

Quartus has a tool called *PowerPlay* which estimates the power used by a circuit.

**How does it know that?**

Recall the dynamic power equation, we need to find all the parameters that goes into the equation. For \(\alpha\), there are two ways: simulate power (takes longer) or base it from a statistic estimation (less accurate).

`page 10`

`page 11`

Consider an AND gate with output `x`

and an XOR gate with output `y`

. Then consider the activity at output `x`

and `y`

. Referring to the truth table for AND and XOR, we see that the the output of XOR is more likely to change. In mathematical sense, consider all possible transitions of the output based on random inputs, then the probability is given by:

Note:this assumes that the input signals are randomly distributed. But if we know that the input has some other distribution, that can be specified in the tool. Ultimately, it gives a better estimate.

Another way of finding the activity is to simulate the circuit. Which gives insight on how many transitions there are in every node.

One way is to have different area of the chip that is driven by a different voltage. Remember that lower voltage means lower power. Using this method, we can make more power hungry part of the chip inactive when they’re not in use. Simply wake them up when actually needed.

Another way is to increase the threshold voltage, note that this reduce the leakage power but will sacrifice performance.

Consider the clock. The clock is switching all the time, and it goes *everywhere* in the circuit. One method is called **clock gating** which turns off the clock network to some parts of the system that is not used. This is simple because there is only one predictable signal and thus only requires one gate. Note that this only reduce the dynamic power.

Some other things we can do to reduce power:

- Use lower power components (different threshold voltage)
- Minimize area for less leakage
- Lower supply voltage for the signals
- Optimize algorithm

- Instead of multiplying, a shifter and adder can be substituted to use less power
- Data transmitted in a bus can be encoded first to minimize the transitions

Recall that there might be glitches in a design, the glitches cause extra transitions which means more energy dissipation.

`XOR example`

Adding a register in between will eliminate the glitches because the FFs will only transition once, and no combinational logic to cause glitches.

However, the timing behavior actually changes because now certain tasks will take multiple cycles due to pipelining.

Some drawbacks also include more power usage due to extra power consumption from the clock tree and the FFs.

Only a small portion of the design is speed critical, so logic blocks for other paths can be switched into lower power configuration where it might be slower. This is OK because it has enough slack.

The CAD tool performs synthesis and timing analysis. For all the blocks that does not violate the timing constraint, it will switch them to lower power mode until it can’t be any slower.

Benefit: it’s automatic, downside: does not help optimization that much.