## Chapter II

## **Physics of Computation**

These lecture notes are exclusively for the use of students in Prof. MacLennan's *Unconventional Computation* course. ©2017, B. J. MacLennan, EECS, University of Tennessee, Knoxville. Version of August 22, 2018.

## A Energy dissipation

As an introduction to the physics of computation, and further motivation for unconventional computation, we will discuss Michael P. Frank's analysis of energy dissipation in conventional computing technologies (Frank, 2005b). The performance R of a computer system can meeasured by the number of computational operations executed per unit time. This ratio is the product of the number operations per unit of dissipated energy times the energy dissipation per unit time:

$$R = \frac{N_{\rm ops}}{t} = \frac{N_{\rm ops}}{E_{\rm diss}} \times \frac{E_{\rm diss}}{t} = F_{\rm E} \times P_{\rm diss}. \tag{II.1}$$

Here we have defined  $P_{\text{diss}}$  to be the power dissipated by the computation and the energy efficiency  $F_{\text{E}}$  to be to be the number of low-level bit operations performed per unit of energy. The key parameter is  $F_{\text{E}}$ , which is the reciprocal of the energy dissipated per bit operation.

This energy can be estimated as follows. Contemporary digital electronics uses CMOS technology, which represents a bit as the charge on a capacitor. The energy to set or reset the bit is (approximately) the energy to charge the capacitor or the energy dissipated when it discharges. Voltage is energy

per unit charge, so the work to move an infinitesimal charge dq from one plate to the other is Vdq, where V is the voltage between the plates. But V is proportional to the charge already on the capacitor, V=q/C. So the change in energy is  $dE=Vdq=\frac{q}{C}dq$ . Hence the energy to reach a charge Q is

$$E = \int_0^Q \frac{q}{C} \mathrm{d}q = \frac{1}{2} \frac{Q^2}{C}.$$

Therefore,  $E = \frac{1}{2}(CV)^2/C = \frac{1}{2}CV^2$  and  $F_E \approx (1 \text{ op})/(\frac{1}{2}CV^2)$ .

Frank observes that Moore's law in the 1985–2005 period was a result of an exponential decrease in C resulting from decreasing feature sizes (since capacitance is proportional to area) and a decrease in logic voltage V from 5V to about 1V (further improving E by a factor of 25). The clock rate also went up with smaller feature sizes. (See Fig. II.1.)

Unfortunately, neither the transistor lengths nor the voltage can be reduced much more, for if the signal is too small in comparison with thermal energy, then thermal noise will lead to unreliable operation, because the thermal fluctuations will be of the same order as the signals (Fig. II.2). The thermal energy is  $E_T = k_{\rm B}T$ , where  $k_{\rm B}$  is Boltzmann's constant and T is the absolute temperature. Since  $k_{\rm B} \approx 8.6 \times 10^{-5} \ {\rm eV/K} = 1.38 \times 10^{-23} {\rm J/K}$ , and room temperature  $T \approx 300 {\rm K}$ , room-temperature thermal energy is

$$E_T = k_{\rm B}T \approx 26 \text{ meV} \approx 4.14 \times 10^{-21} \text{J} \approx 4 \text{ zJ}.$$

(Fig. II.1 shows  $E_T$ .)

We have seen that  $E_{\rm sig} = \frac{1}{2}CV^2$ , but for reliable operation, how big should it be in comparison to  $E_T$ ? Frank estimates  $E_{\rm sig} \geq k_{\rm B}T \ln R$ , where the reliability  $R = 1/p_{\rm err}$ , for a desired probability of error  $p_{\rm err}$ .<sup>1</sup> For example, for a reasonable reliability  $R = 2 \times 10^{17}$ ,  $E_{\rm sig} \geq 40k_{\rm B}T \approx 1$  eV, which is the energy to move one electron with 1V logic levels. This implies a maximum energy efficiency of

$$F_{\rm E} = 1 \text{ op/eV} \approx \frac{1 \text{ op}}{1.6 \times 10^{-19} \text{J}} = 6.25 \times 10^{18} \text{op/J}.$$
 (II.2)

A round  $100k_{\rm B}T$  corresponds to an error probability of  $p_{\rm err}=e^{-100}=3.72\times 10^{-44}$  (at room temperature). Therefore, a reasonable target for reliable operation is

$$E_{\rm sig} \gtrsim 100 k_{\rm B} T \approx 2.6 \text{ eV} = 414 \text{ zJ}.$$

<sup>&</sup>lt;sup>1</sup>Frank (2005b, slide 7).



Figure II.1: Historical and extrapolated switching energy. Figure from Frank (2005b, slide 9).



Figure II.2: Depiction of 0-1-0-1-1 pulses in the presence of high thermal noise.

This, therefore, is an estimate of the minimum energy dissipation per operation for reliable operation using conventional technology. Nevertheless, these conclusions are independent of technology (electronic, optical, carbon nanotube, etc.), since they depend only on relative energy levels for reliable operation.<sup>2</sup>

One apparent solution is to operate at a lower temperature T, but it does not help much, since the effective T has to reflect the environment into which the energy is eventually dissipated (i.e., the energy dissipation has to include the refrigeration to operate below ambient temperature). Another possible solution, operating closer to  $k_{\rm B}T$  and compensating for low reliability with error-correcting codes, does not help, because we need to consider the total energy for encoding a bit. That is, we have to include the additional bits required for error detection and correction.

Frank observed in 2005 that the smallest logic signals were about  $10^4k_{\rm B}T$ , and therefore that there were only about two orders of magnitude improvement in reliable operation. "A factor of 100 means only around 10 years remain of further performance improvements, given the historical performance doubling period of about 1.5 years. Thus, by about 2015, the performance

<sup>&</sup>lt;sup>2</sup>Frank presentation, "Reversible Computing: A Cross-Disciplinary Introduction" (Beyond Moore), Mar. 10, 2014. put in bib

of conventional computing will stop improving, at least at the device level" (Frank, 2005b).

In fact, these limitations are becoming apparent. By 2011 computer engineers were worrying about "the 3 GHz wall," since computer clock speeds had been stalled at about that rate for five years.<sup>3</sup> Recent processors have gone a little beyond the barrier, but a "power wall" remains, for although individual processors can be operated at higher speeds, the millions or billions of transistors on a chip dissipate excessive amounts of energy. This presents an obstacle for future supercomputers.

As of August 2017 the fastest computer was the Sunway TaihuLight.<sup>4</sup> It is rated at 93 petaflops on the LINPACK benchmark and has 10,649,600 cores with 1.31 PB of memory. It consumes 16 MW, which is about the power consumed by 1400 homes, and is quite efficient, with  $F_{\rm E}=6$  Gflops/W, that is 166 pJ/flop (the fourth most efficient supercomputer at the time). To convert floating-point operations to basic logic operations, including all the overhead etc., one conversion estimate is  $10^7$  to  $10^8$  ops/flop.<sup>5</sup> Therefore, we can compare the theoretical best energy efficiency (Eq. II.2),  $F_{\rm E}^{-1}=1.6\times10^{-7}{\rm pJ/op}\approx1.6$  to 16 pJ/flop, with the 144pJ/flop of the Sunway TaihuLight. The gap is only about one order of magnitude. Indeed, it has been estimated that scaling up current technology to 1 exaflops would consume 1.5 GW, more than 0.1% of US power grid.<sup>6</sup> This is impractical.

It might be possible to get energy consumption down to 5 to 10 pJ/flop, but "the energy to perform an arithmetic operation is trivial in comparison with the energy needed to shuffle the data around, from one chip to another, from one board to another, and even from rack to rack." Indeed, due to the difficulty of programming parallel computers, and due to delays in internal data transmission, it is difficult to use more than 5% to 10% of a supercom-

 $<sup>^3</sup>Spectrum$  (Feb. 2011) spectrum.ieee.org/computing/hardware/nextgeneration-supercomputers/0 (accessed Aug. 20, 2012).

 $<sup>^4</sup>$ https://en.wikipedia.org/wiki/Sunway\_TaihuLight (accessed Aug. 11, 2017). I will use "flop" for "floating point operations" and "flops" for "floating point operations per second" = flop/s. Note that "flop" is a count and "flops" is a rate. Also note that since W = J/s, flops/W = flop/J.

<sup>&</sup>lt;sup>5</sup>And so this is one estimate of the difference in time scale between computational abstractions and the logic that implements them, which was discussed in Ch. I (p. 5).

<sup>&</sup>lt;sup>6</sup>Spectrum (Feb. 2011) spectrum.ieee.org/computing/hardware/nextgeneration-supercomputers/0 (accessed Aug. 20, 2012).

<sup>&</sup>lt;sup>7</sup>Spectrum (Feb. 2011) spectrum.ieee.org/computing/hardware/nextgeneration-supercomputers/0 (accessed Aug. 20, 2012).

puter's capacity for any extended period; most of the processors are idling.<sup>8</sup> So with those  $10.6 \times 10^6$  cores, most of the time about *ten million* of them are idle! There has to be a better way.

<sup>\*\*</sup>Spectrum\* (Feb. 2011) spectrum.ieee.org/computing/hardware/nextgeneration-supercomputers/0 (accessed 2012-08-20).