Security-aware Register Placement to Hinder Malicious Hardware Updating and Improve Trojan Detectability

Mehrshad Vosoughi 1,*, and Ali Jahanian 2
1 Electrical and Computer Engineering Department, Qazvin Branch of Islamic Azad University, Qazvin, Iran
2 Electrical and Computer Engineering Department, Shahid Beheshti University, Tehran, Iran

A R T I C L E  I N F O.
Article history:
Received: 12 July 2014
First Revised: 9 December 2014
Last Revised: 20 May 2015
Accepted: 14 July 2015
Published Online: 17 July 2015

Keywords:
Clock Tree Construction, Hardware Security, Register Placement.

A B S T R A C T
Nowadays, bulk of the designers prefer to outsource some parts of their design and fabrication process to the third-part companies due to the reliability problems, manufacturing cost and time-to-market limitations. In this situation, there are a lot of opportunities for malicious alterations by the off-shore companies. In this paper, we proposed a new placement algorithm that hinders the hardware Trojan insertion or simplifies the detection process in existence of Trojans. Experimental results show that the proposed placement improves the Trojan detectability of the attempted benchmarks against Trojan insertion more than 20% in reasonable cost in delay and wire length.

© 2015 ISC. All rights reserved.

1 Introduction
Currently, many of design houses are fab-less companies that outsource their fabrication process to offshore facilities known as foundries. [1]. In this situation, ICs are becoming increasingly vulnerable to malicious activities and alterations. The main concern of these vulnerabilities is in military systems, financial infrastructures and transportation security.

An adversary can alter the chip to disable or destroy a system at an arbitrary time or leak confidential information and secret keys. This alteration, which is called Hardware Trojan Horse (HTH), can be implemented in ASIC, commercial-off-the-shelf (COST) parts and any kind of microprocessors and microcontrollers. It can also be implemented as firmware modifications for examples in FPGA bit streams. This issue has been reported in some recent documents such as in IEEE Spectrum [1].

The design phase in IC fabrication process which is commonly designed by commercial CAD tool developers is safe and trusted. However, the IP blocks, models, and standard cells used by the designer during the design process and by the foundry during the post-design processes are untrusted. The fabrication phase is also untrusted. That is because the adversary can substitute or insert Trojan cells in the design during this phase to reach his goals.

Two different strategies can be made against the hardware Trojans: HTH detection and HTH prevention. Considerable contributions in the last few years are reported on HTH detection. Hardware Trojan detection methods can be categorized as either destructive, or non-destructive. Destructive methods are mostly reverse engineering methods that are very expensive and also we can not do this for every chip. Non-destructive methods can be categorized by two methods: automatic test pattern generation (ATPG),
Security-aware Register Placement to Hinder Malicious Hardware Updating — M. Vosoughi, and A. Jahanian

Trojans are designed as a state machine to be activated in a special state in order to hide it.

The rest of this paper is organized as follows. Section 2 describes the basic concepts and classification of hardware Trojans. Section 3 illustrates the proposed security-aware register/cell placement algorithm. Experimental results are discussed in Section 4 and finally, Section 5 concludes the paper.

2 Hardware Trojans

Hardware Trojan Horse (HTH) is any deliberate modification to original circuitry inserted by adversaries to exploit a system or to use hardware mechanisms to gain access to data or software running on the chips [13]. This modification can be insertion, substitution and deletion some circuitry or just change in size of wires and logic. To facilitate the development of Trojan mitigation, detection and protection techniques, it is necessary to systematically categorize the hardware Trojans. By this categorization and taxonomy, the study of characteristics of hardware Trojans is facilitated and detection or protection methods for each class can be developed.

Authors of [14] introduced a detailed taxonomy for hardware Trojans. They decomposed the Trojans based on three main parameters: physical characteristics, activation type and action of the Trojan. This classification is shown in Figure 1.

In Figure 1, Trojans are classified based on three main characteristics as follows [13]:

- Physical characteristics contains the physical features such as Trojan size, structure and geometrical distributions.
- Activation characteristics refer to the criteria that cause a Trojan to become active and carry out its disruptive function.
- Action characteristics identifies the Trojan based on its action when the Trojan is activated.

3 Proposed Algorithm

As mentioned before, we proposed a security-aware cell/register placement in this paper in order to prevent or harden the synchronous hardware Trojan insertion. In other words, using the proposed placement, Trojan insertion requires more layout processing efforts and also increases the probability of Trojan detection in a manipulated hardware. This section describes the proposed placement algorithm that is a modified Simulated Annealing (SA) algorithm.

Simulated annealing is a powerful technique that has been widely used to solve combinatorial optimization problems using a probabilistic hill-climbing algorithm. This technique has significant ability to escape from local optimums in the search space [16]. In real
annealing process, a material is lightly cooled to reach the lowest energy state. Slow cooling decreases random cell moves and makes it more stable. Slowing the cooling of the material allows each molecule to move into a place it feels most comfortable. As the material is kept at a high temperature, the molecules are able to move freely around, then when the material is cooled, the molecules are not able to move around as freely but still move limited distances [17].

Simulated annealing placement starts with a random initial placement. This placement is changed by exchanging location of two elements (Perturb). Then cost of this perturb is calculated. If the cost is decreasing then the move is accepted, but if the cost is not decreasing then the move is accepted with a probability. The probability of accepting an improper move decreases with falling the temperature. This technique helps the simulated annealing algorithm to climb out of local optimums in search for a global minimum [15]. We modified this algorithm to reach our mean.

Clock skew is an important concern in clock routing of high performance circuits. Clock skew is known as the maximum differences of clock arrival time to flip-flops. If the clock does not arrive to flip-flops at proper time, the chip would not work synchronously. We use the clock skew of the design as a signature to protect the design against hardware Trojan insertion. We presented a placement algorithm that places the flip-flops in the design in such a way that the insertion of extra flip-flops (as a part of HTH) requires many complicated layout processing and also changes the clock skew of the design, considerably. In this situation, delay characteristics of the design will be changed due to updating the design clock skew and Trojan detection will be simplified.

A widely used method for clock distribution is H-Tree construction. In this approach, clock tree is constructed such that delay from the clock source to each of the clock leaves is the same. Figure 2 shows a simplified binary H-shaped clock tree with eight leaves. We defined some regions of the chip as flip-flop feasible regions (FF FRs) in this paper. FF FRs are the area that inserting the flip-flops inside them, make feasible clock skew. In other words, locating the flip-flops inside the FF FRs, guarantees that the clock skew will be lower than a specified value which is normally determined based on the timing closure specified by the designer. In Figure 2, clock tree is drawn by thin lines and FF FRs are shown as dashed boxes. We planned the FF FRs before placement and refine the canonical placement flow such that all the flip-flops are placed inside the FF FRs. Algorithm 1, represents the proposed algorithm in more details.

Figure 1. Detailed taxonomy showing physical, activation, and action characteristics of Trojans

Figure 2. Canonical H-shaped clock tree
Algorithm 1 Proposed Placement Algorithm

Step 1: In this step, the algorithm divides the die area to $n$ equal regions by using the zero-skew H-tree clock routing algorithm. Each region represents a FF_FR. It is worth noting that total area inside the FF_FRs should be equal or larger than the total area of the design.

Step 2: In this step, an initial random placement is done. All the flip-flops are placed inside the FF_FRs and other cells are located outside the FF_FRs.

Step 3: In this step, a modified SA (Simulate Annealing) placement algorithm is proposed to determine locations of the cells and flip-flops. The main metric for SA placement algorithm is total wire length (TWL). However, we added a new cost (FFD) in our algorithm (total FF distance) representing the distance of the flip-flops from the center of their corresponding FF_FR. FFD is calculated as:

$$FFD = \sum_{j=1}^{n} \sum_{j=1}^{F} d_{ij}$$  (1)

where $n$ is the number of FF_FRs, $F$ is the number of flip-flops in FF_FRj and $d_{ij}$ is the the distance between FF$i$ and its corresponding FF_FRs (FF_FRj) center. Using the multi-objective cost function (consisting of total wire length and total FF distance), total wire length of the design is minimized and also flip-flops are forced to place as close as to the center of their FF_FRs.

Step 4: After the SA-based placement, all the cells are placed as well as the flip-flops and the flip-flops are forced to be located closer to their FF_FRs as much as possible. This force causes to generate some congested FF_FRs in the design. In other words, many of the FF_FRs may contain more flip-flops than some other FF_FRs. In this step we balance the congestion of all FF_FRs. At first, the maximum of allowed flip-flops for each FF_FR is calculated as:

$$q = \lceil \frac{\# FF}{n} \rceil$$  (2)

where $q$ is the maximum capacity of flip-flops for each FF_FR, $\# FF$ is the number of flip-flops in design and $n$ is the number of FF_FRs. It is worth noting that by the use of H-Tree construction, $n$ can be calculated as:

$$n = 2^i \text{ for } (i \leq 2)$$  (3)

If the number of flip-flops in a FF_FR is more than $q$, the algorithm selects a flip-flop from this FF_FR and places it to a FF_FR in which the number of its containing flip-flops are less than $q$. This replacement method should be done with aware of least increase in TWL.

After the proposed placement and congestion refinement, all the cells are placed and flip-flops are accumulated inside the FF_FRs. It should be noted that there is no white space between the flip-flops in the FF_FRs. After placement of the cells and flip-flops according to the proposed algorithm, flip-flops are concentrated to their FF_FRs and clock routing will result a good clock wire length under a skew constraint.

In this situation, if an adversary wants to insert HTH into the circuit, he/she should find some white spaces to place the HTH. As there is no white space in the FF_FRs, HTH should be placed outside of these regions. Therefore, the clock will arrive to HTH flip-flops with a different skew. If adversary wants to reach to the proper skew (for example zero skew) with any bounded skew clock routing algorithm (like BST/DME), he will face a high clock wire length. These differences in clock behavior, cause to facilitate the HTH detection.

We will show in our experiments that insertion of extra flip-flops will result a significant increase in clock wire length with a bounded skew (for example zero skew).

When the clock wire length increases significantly, the clock routing would be hardened and may be impossible. Even if the adversary wants to decrease the clock wire length, the clock skew for his flip-flops would be high and the HTH will not act properly. So, with this method of placement, we hardened the possibility of HTH insertion in design.

4 Experimental Results

We implemented the proposed placement algorithm in the EduCAD tool [18], a Linux-based educational physical design platform with ten benchmarks from IWLS benchmark suite [19] to evaluate the benefits and overheads of the proposed algorithm. Table 1 shows the
We considered two different design flows. At the first flow, benchmarks are placed by EduCAD with canonical SA-based placer and the second flow utilizes the modified SA. Then the list of flip flops locations are extracted of the placed circuit in each flow and clock tree of each benchmark circuit is constructed. The goal of most of the current clock routing algorithms is on achieving zero-skew at the expense of longer wire length, resulting in high power dissipation. However in practice, circuits still operate correctly within a tolerable skew bound. Therefore, in order to reduce clock net power dissipation, authors in [20] believe that the clock routing algorithm should consider bounded-skew trees (BST) instead of zero-skew trees (ZST) and proposed a clock routing algorithm based on this claim which they named BST/DME. BST/DME minimizes total clock wire length under any given path-length skew bound.

We used BST/DME in order to construct the clock tree. After clock routing, clock skew and clock wire length is extracted. Then some flip flops are inserted to the design as Hardware Trojan in random locations in both design flows. The number of inserting HTH flip flop for each design is 5% of the number of original FF in design. These two flows are shown in Figure 3.

Table 2 represents the obtained clock tree wire length by using BST/DME tool with zero skew. Column #FFs shows the number flip flops and columns Without HTH and With HTH represent the clock wire length in two canonical and proposed design flows. As shown in Table 2, overhead of clock tree wire length in the proposed placement is about 20% more than the canonical placement. It means that after using the proposed placement, Trojan insertion requires considerable changes in clock tree wire length that highlights the Trojan-contained circuit. In this circuit, Trojan will be more detectable.

As we expect, increase of CLKWL by the existence of HTH FF in our modified SA is a significant percentage, but in canonical SA is negligible. So, if an adversary wants to insert HTH in a design placed with our proposed algorithm, increase in CLKWL is a big obstacle for him. Therefore, inserting HTH in a die placed by our proposed algorithm would be hardened.

Table 3 represents the wire length overhead of our modified SA placement algorithm. In this table, two first columns show the TWL in canonical SA and column three shows the TWL in our modified SA and the last column represents the TWL overhead of our modified SA placement algorithm in compare to canonical SA. We see that our modified SA cost is not a significant overhead.

Table 4 shows the delay overhead of our modified SA placement algorithm. In this table, we present the delay increase of our method by comparing with canonical SA. As can be seen in Table 4, the delay overhead of the proposed algorithm is not considerable.

It is noting that our placement is fixed-die. Therefore, chip area does not change during the placement.

5 Conclusion

In this paper, we proposed a new placement algorithm that hinders the hardware Trojan insertion or simplifies the detection process in existence of Trojans. In the presented algorithm, cells/registers of design are placed such that HTH insertion will be hard and also detection of the Trojans will be simplified when design is modified by attackers. Experimental results show that the proposed placement improves the immunity of attempted benchmarks against Trojan insertion. These improvements are gained in cost of 6.4% delay (on average) and 9.8% wire length (on average)
Table 2. Comparison of the proposed and canonical placement algorithms in terms of clock wire length

<table>
<thead>
<tr>
<th>BM</th>
<th>Clock wirelength Without HTH</th>
<th>Clock wirelength With HTH</th>
<th>Clock wirelength Without HTH</th>
<th>Clock wirelength With HTH</th>
<th>Overhead (%)</th>
</tr>
</thead>
<tbody>
<tr>
<td>s5378</td>
<td>1888</td>
<td>4238</td>
<td>2605</td>
<td>2852</td>
<td>1.19</td>
</tr>
<tr>
<td>spi</td>
<td>7236</td>
<td>7301</td>
<td>3883</td>
<td>4576</td>
<td>0.90</td>
</tr>
<tr>
<td>des_area</td>
<td>5677</td>
<td>5907</td>
<td>2343</td>
<td>2905</td>
<td>4.05</td>
</tr>
<tr>
<td>s38584</td>
<td>29266</td>
<td>30187</td>
<td>22141</td>
<td>20494</td>
<td>3.15</td>
</tr>
<tr>
<td>tv80</td>
<td>13164</td>
<td>13294</td>
<td>6459</td>
<td>8298</td>
<td>0.99</td>
</tr>
<tr>
<td>systemcase</td>
<td>20687</td>
<td>21420</td>
<td>13249</td>
<td>16735</td>
<td>3.54</td>
</tr>
<tr>
<td>s38417</td>
<td>37405</td>
<td>38431</td>
<td>29766</td>
<td>32608</td>
<td>2.74</td>
</tr>
<tr>
<td>mem_ctrl</td>
<td>31621</td>
<td>32877</td>
<td>20340</td>
<td>24509</td>
<td>3.97</td>
</tr>
<tr>
<td>des_area</td>
<td>45275</td>
<td>46637</td>
<td>35559</td>
<td>41286</td>
<td>3.01</td>
</tr>
<tr>
<td>dma</td>
<td>54995</td>
<td>56500</td>
<td>38207</td>
<td>54071</td>
<td>2.74</td>
</tr>
<tr>
<td>Average</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td>2.63%</td>
</tr>
</tbody>
</table>

Table 3. Experimental results in terms of total wire length overhead

<table>
<thead>
<tr>
<th>BM</th>
<th>TWL in Canonical placement</th>
<th>TWL in Proposed placement</th>
<th>Overhead (%)</th>
</tr>
</thead>
<tbody>
<tr>
<td>s5378</td>
<td>116346</td>
<td>139858</td>
<td>20.2</td>
</tr>
<tr>
<td>spi</td>
<td>435216</td>
<td>477714</td>
<td>9.8</td>
</tr>
<tr>
<td>des_area</td>
<td>729872</td>
<td>758286</td>
<td>3.9</td>
</tr>
<tr>
<td>s38584</td>
<td>1462671</td>
<td>1647886</td>
<td>12.7</td>
</tr>
<tr>
<td>tv80</td>
<td>1391814</td>
<td>1454938</td>
<td>4.5</td>
</tr>
<tr>
<td>systemcase</td>
<td>1681310</td>
<td>1774713</td>
<td>5.6</td>
</tr>
<tr>
<td>s38417</td>
<td>2511486</td>
<td>2864596</td>
<td>14.1</td>
</tr>
<tr>
<td>mem_ctrl</td>
<td>2826223</td>
<td>3060943</td>
<td>8.3</td>
</tr>
<tr>
<td>usb_funct</td>
<td>3630536</td>
<td>4011246</td>
<td>10.5</td>
</tr>
<tr>
<td>dma</td>
<td>6909107</td>
<td>7468922</td>
<td>9.8</td>
</tr>
<tr>
<td>Average</td>
<td></td>
<td></td>
<td>9.8%</td>
</tr>
</tbody>
</table>

Table 4. Delay overhead of the proposed placement vs. canonical placement

<table>
<thead>
<tr>
<th>BM</th>
<th>Delay of Canonical placement</th>
<th>Delay of Proposed placement</th>
<th>Overhead (%)</th>
</tr>
</thead>
<tbody>
<tr>
<td>s5378</td>
<td>1669</td>
<td>1705</td>
<td>2.2</td>
</tr>
<tr>
<td>spi</td>
<td>3835</td>
<td>4270</td>
<td>11.3</td>
</tr>
<tr>
<td>des_area</td>
<td>5233</td>
<td>5200</td>
<td>-0.6</td>
</tr>
<tr>
<td>s38584</td>
<td>4654</td>
<td>4987</td>
<td>7.2</td>
</tr>
<tr>
<td>tv80</td>
<td>7153</td>
<td>7213</td>
<td>0.8</td>
</tr>
<tr>
<td>systemcase</td>
<td>8390</td>
<td>8181</td>
<td>-2.5</td>
</tr>
<tr>
<td>s38417</td>
<td>4833</td>
<td>5402</td>
<td>11.8</td>
</tr>
<tr>
<td>mem_ctrl</td>
<td>5320</td>
<td>5793</td>
<td>8.9</td>
</tr>
<tr>
<td>usb_funct</td>
<td>4309</td>
<td>4962</td>
<td>15.2</td>
</tr>
<tr>
<td>dma</td>
<td>7264</td>
<td>8010</td>
<td>10.3</td>
</tr>
<tr>
<td>Average</td>
<td></td>
<td></td>
<td>6.4%</td>
</tr>
</tbody>
</table>

overheads.

References


Mehrshad Vosoughi received his B.Sc. degree in computer engineering from Islamic Azad University, south Tehran Branch, Tehran, Iran in 2010, and the M.Sc. degree in computer system architecture from Azad University, Qazvin Branch, Qazvin, Iran in 2012. His current research interests are hardware security and automatic computer-aided design.

Ali Jahanian received B.Sc. degree in computer engineering from University of Tehran, Tehran, Iran in 1996 and the M.Sc. and Ph.D. degrees in computer engineering at Amirkabir University of Technology, Tehran, Iran in 1998 and 2008, respectively. He joined Shahid Beheshti University, Tehran, Iran in 2008. His current research interests are VLSI design automation and automotive embedded system design.