1 Introduction
Nowadays, huge amounts of information are transmitted in networks which is why information security is a subject that should be paid utmost attention to. To have confidentiality as one of the main aspects of security, encryption is utilized. Encryption algorithms can be divided into symmetric key and public key. Block cipher is one of the most important types of symmetric key algorithms. Modern block ciphers consists of several rounds. In each round input should be confused and diffused for which substitution and permutation (diffusion) layers are used, respectively. The substitution layer is often implemented using several small non-linear substitution boxes (S-boxes) and the diffusion layer combines the output of each S-box.
To represent diffusion layers, an n × n matrix is used where n is the number of S-boxes in each round. Generally, the matrix can be over finite fields, and in the simplest case, n × n matrix can be over GF (2) (binary matrix). The E2 [1] and the Camellia block ciphers [2] use 8 × 8 binary matrices with the branch number of 5 while ARIA [3] uses an involutory 16 ×16 binary matrix with the branch number of 8 as a diffusion layer. Also, Skinny [4] and Midori [5] are the block ciphers using binary matrices with good properties to implement in hardware. In [6] a 32 × 32 bi-nary matrix with the branch number 10 was proposed. Some 24 × 24 binary matrices are introduced in [7] with good implementation properties for lightweight block ciphers.
If the diffusion layer has good properties, then many S-boxes will be active after a few rounds which is why the diffusion layer design has a deep impact on the resistance of block ciphers against linear [8] and differential cryptanalysis [9].
In this paper, to investigate the diffusion properties of proposed binary matrices, we employ them as the diffusion layer of the SPN structure shown in Fig. 1. This structure uses n S-boxes of length m bits. So, the diffusion layer of the considered SPNs is n × n binary matrices.

Figure 1.One round of SPN structure with the binary diffusion layer
We use the following definitions in this paper.
Definition 1. For any given ∆x, ∆y, Γx, Γy ∈ , the differential and linear probabilities of S-box S, are defined as:
where a.b denotes the parity of the bit-wise product of a and b.
Definition 2. The maximum differential and linear probabilities of an S-box are defined as:
Definition 3. An S-box is a differential (resp. linear) active S-box if the input difference (resp. output mask) value of the S-box is nonzero.
For a linear diffusion layer which can be represented by a matrix multiplication y = A.x, it can be shown that the output difference (∆y) is obtained from the input difference (∆x) by ∆y = A. ∆ x:
Also, the input linear mask (Γx) is obtained from the the output linear mask (Γy) by (Γx)= AtΓy where At denotes the transpose of matrix A:
Definition 4. The minimum number of differential active S-boxes of two rounds of a linear diffusion layer D is called the differential branch number and is defined as [10]:
where w(x) is the number of non-zero elements in vector x.
From relations (5) and (6), we can conclude that for a block cipher with matrix A as its diffusion layer, the minimum number of linear active S-boxes (MNLAS) is equivalent to the minimum number of differential active S-boxes (MNDAS) of that block cipher with matrix At as diffusion layer.
Notations
In this paper, to avoid the representation of large matrices, some notations are introduced as follows:
Definition 5. zn×m shows an all-zeros n×m matrix. For example:
Definition 6.Pi0 ,i1 ,i2 ,i3 represents a 4 × 4 binary matrix with one 0 in each row where ij (ij ∈ 0, 1, 2, 3) shows the position of zero elements in the jth row. For example:
Definition 7. represents an n × n binary matrix with one 1 in each row where ij (ij ∈ 0, 1,... , n − 1) shows the position of non-zero element in the jth row. For example:
Our Contribution
Counting the number of (linear or differential) active S-boxes of consecutive rounds of an SPN is often done by considering the truncated values for the block state. It means each all-zero or non-zero S-box input and output is shown by one bit 0 or 1, respectively. Thus, the block state is shown by an n-bit vector instead of an nm-bit value. In [11] a counting method was proposed that can be used to compute active S-boxes for several rounds of Rijndael. The complexity of this method linearly increases with the rise in the number of rounds.
It is proved that the ratio of MNDAS and MN-LAS to number of rounds for binary matrices proposed in [3] never exceeds Br/2 where Br is the matrix branch number. Then some 16 × 16 binary matrices will be proposed for which the mentioned ratio ex-ceeds Br/2 with an increase in number of rounds. For the new matrices MNDAS for R rounds (e.g. R = 40) will be determined using the method mentioned in [11]. To find MNLAS, the method is applied to At instead of A. The proposed matrices have the largest value of MNDAS and MNLAS among existing binary matrices for R > 5 rounds.
The mentioned method cannot be used for 32 × 32 binary matrices because of the method running time and the limitation of the used memory. Thus, some limitations have been applied to the matrices of the mentioned size and the counting method has been modified. In this paper, a new 32 × 32 binary matrix is also proposed. 8 rounds of an SPN structure using the new matrix as the diffusion layer has at least 48 (linear or differential) active S-boxes. The counting method can be extended for a new 32 × 32 non-binary matrix which results in at least 60 (linear or differential) active S-boxes after 8 rounds. All of the proposed 16 × 16 and 32 × 32 matrices can be efficiently implemented.
The rest of this paper is organized as follows. In Section 2, the counting method mentioned in [11] with some slight changes are explained. Using this method, in Section 2, some 16 × 16 binary matrices are offered for which the MNDAS of several rounds are determined. In Section 3, a 32 × 32 binary matrix with at least 48 active S-boxes for 8 rounds and also a 32 × 32 non-binary matrix with at least 60 active S- boxes for 8 rounds will be proposed and the conclusion is in the last section.
2 New Binary Matrices
In this section, some new 16 × 16 binary matrices are proposed which provide the following properties when used as the diffusion layer in the SPN structure:
− The active S-boxes for R rounds of the proposed SPNs is large.
− The SPN can be efficiently implemented. According to the mentioned method, to compute the MNDAS for R rounds of the SPN structure using the new matrices, array T must be of size R × 216.
The main idea to propose a new 16 × 16 matrix is the usage of the AES structure whose MDS matrix is replaced by binary matrices. By using the mentioned method, we observed that the MNDAS and MNLAS for 4r rounds are greater than the expected value which is 16r. The first proposed matrix is as follows:
This matrix is the result of multiplying by Q1, where Q1 is given below.
The differential branch number of A1 is 4. The MNDAS of r rounds of the SPN structure using A1 as the diffusion layer has been shown in Table 1. It can be observed from Table 1 that when the number of rounds increases, the ratio of the MNDAS to the number of rounds, also increases. Although there are other options for matrices Q and Π, the results for the mentioned one are the best.
| r | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 12 14 ... 20 ... | 39 | 40 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| MNDAS | 1 | 4 | 7 | 16 | 22 | 28 | 33 | 38 | 43 | 48 | 60 70 ... 106 ... | 217 | 224 |
| MNLAS | 1 | 4 | 7 | 16 | 22 | 28 | 33 | 38 | 43 | 48 | 60 70 ... 106 ... | 217 | 224 |
| MNDAS/r | 1.00 | 2.00 | 2.33 | 4.00 | 4.40 | 4.66 | 4.71 | 4.75 | 4.77 | 4.80 | 5 5 ... 5.3 ... | 5.56 | 5.60 |
To investigate the security of the SPN cipher with A1 as the diffusion layer against the impossible differential attack, we checked the length of impossible differentials. We looked for truncated differentials of probability 1 in both encryption and decryption directions and found out that after four rounds there is no byte whose difference is zero and there is no set of bytes that the XOR of their differences is zero with probability 1. This ensures us that this SPN does not have any impossible differential in more than 6 rounds. In other words, we found a 6-round impossible differential for the SPN that uses A1.
It should be noted that the A1 matrix is better than the Skinny matrix in terms of the active S-boxes and the resistance to impossible differential attack, but in terms of hardware implementation in Skinny, 12 XOR operations are needed forthe diffusion layer but for A1 this value requires 32 XOR operations (28 operations with an auxiliary variable). Due to the fact that in order to compare the implementation of two block ciphers, it is necessary to compare the number of rounds of the algorithm as well as the designed S-box, it is not possible to have a detailed review between these diffusion layers of block ciphers in this paper.
To find a binary matrix with at a most 4-round impossible differential, we searched all permutation Π, but for all of them, there are impossible differentials on 5 or more rounds. To overcome this problem, another matrix, A2, is proposed which is the result of multiplying Q2 by Π = the form: , where Q2 is of
The form of the upper left 8 × 8 sub-block of Q2 is like the matrix used in the Camellia block cipher.
The matrix A2 = Q2 × Π has at most 4-round impossible differential distinguisher.
The MNDAS of r rounds of the SPN structure using A2 as the diffusion layer has been shown in Table 2. The inverse of matrix Q2 can be efficiently implemented similar to the implementation of matrix Q2. Software implementation of this matrix has been described in Appendix 1.
| r | 1234 | 567 | 8 | 910 1214 ... 20 ... 3940 |
|---|---|---|---|---|
| MNDAS | 15816 | 20 2531 | 36 | 41 47 5868 ... 101 ... 205 211 |
| MNDAS/r | 1 2.50 2.66 4 | 4 4.16 4.42 | 4.5 | 4.55 4.7 4.83 4.85 ... 5.05 ... 5.25 5.27 |
| MNLAS | 15816 | 21 2529 | 36 | 40 47 5769 ... 102 ... 207 212 |
| MNLAS/r | 1 2.5 2.66 4 | 4.2 4.16 4.14 | 4.5 | 4.44 4.7 4.66 4.92 ... 5.1 ... 5.25 5.3 |
3 The Matrix
The method proposed in Section 2 cannot be directly used for 32 × 32 binary matrices to determine the MNDAS of r rounds because of the large running time of the algorithm for each round.
Thus, a new method is used to determine a lower bound for the active S-boxes of r rounds of an SPN structure which uses a 32 ×32 binary matrix as its diffusion layer. This method is different from the method described in Section 2 in the following aspects:
(1) Here, two m-bit sub-blocks (instead of one m-bit sub-block) are truncated to one bit. Thus, the number of columns of the array T will be
(2) In each iteration of the search algorithm two rounds will be analyzed (instead of one round in each iteration). Thus the array T will be of size (instead of r × 2n).
In the following, we introduce a new 32 × 32 binary matrix with suitable linear and differential properties. Consider the following 32 × 32 binary matrix A3 = P.Π, where:
and
where
The matrix Π is similar to the matrix representation of permutation P4 introduced in [12 ]. B4×4 is a 4*4 binary matrix as follows:
The sequence of transformations applied to the input of an SPN structure (regardless of the key addition) can be written as
Concerning commutativity of Π and S, this sequence can be rewritten as:
Fig 2 shows the SPN structure corresponding to the relation (18) which is composed of two different types of rounds S ∘ P ∘ S and Π ∘ P ∘ Π applied alternately. The input to the rth sub-cipher S ∘ P ∘ S is shown by , while the input to the rth sub-cipher Π ∘ P ∘ Π is represented by
In Section 2, the truncated value of was represented by Iiwhich could be 0 or 1. Now the double-truncated value for every pair , i= 0, 1,... , 15 is shown by which can be or . . If both of and are all-zero, will be shown by . In other cases, it will be shown by . can be shown with the doubletruncated vector which corresponds to the double-truncated value . This double-truncated value is a number between 0 and 65535. Similar notation can be used for the input to the ith application of Π∘ P ∘Π.
Applying the S-box layer (S) to a double-truncated vector does not change its value, i.e., the Stransformation is considered an identity transformation on a double-truncated vector. As can be seen from Fig. 2 the Π is equivalent to a permutation on 16 2m-bite values. Therefore Π acts as a permutation on 16 elements of a double truncated vector. This transformation is one-to-one. The application of P is realized by a parallel application of 8 B functions. Double- truncated input to a B can be shown by . Applying B to and results in . Applying B to results in . A function B is called active if its input (or output) is not all zero. Each active B in the application of S ∘ P ∘ S makes at least 4 S-boxes active in its input and its output.

Figure 2.Equivalent form of four rounds of the SPN with A3 as the diffusion layer
Based on Fig. 2, the analysis of the mentioned structure is as follows. To find a lower bound for MNDAS of R rounds of the mentioned SPN, all of the double-truncated values for the input of the SPN shown in Fig. 2 and all of the possible transitions by application of the P layers will be considered. In the considered paths, each active B in S ∘ P ∘ S sequence of functions, adds four active S-boxes to the lower bound of MNDAS.
Considering a double-truncated vector for the first application of S ∘ P ∘ S, the number of paths is exponentially increased by adding each P layer in the sequence of applied functions. To overcome this problem, an array T of the size is built. The numbering to the rows and the columns of the array starts from 1 and 0, respectively. T[i][j] shows the MNDAS of the characteristics which finished with the doubletruncated value j (a number from 0 to 65535) at the end of the ith application of Π∘P∘Π. Computation of i + 1th row of T is performed by using the ith row of T. T[r + 1][j] is computed using the exhaustive search over T[r][i], i = 0,... , 65535 and all of the possible transitions from to , k = 0, 1,... , 65535 by i + 1th application of the S ∘ P ∘ S and all of the possible transitions from to to to by i + 1th application of the Π∘ P ∘Π. For each active B in the i + 1th application of the S∘P∘S, 4 is added to the lower bound of the number of active S-boxes of the characteristic. Thus we have
For the mentioned structure the minimum non-zero values of the 4th row of the T array is 48 which means that 8 rounds of the structure have at least 48 active S-boxes.
Table 3 and Table 4 show the possible transitions from to and to , respectively.
| The active Bs | ||
|---|---|---|
| 0 | 0 | 0 |
| 1 | 3 | 1 |
| 2 | 3 | 1 |
| 3 | 1,2,3 | 1 |
| 4 | 12 | 1 |
| 5 | 15 | 2 |
| 6 | 15 | 2 |
| 7 | 13,14,15 | 2 |
| ... | ... | ... |
| 65535 | 21845,21846,21847,21849,...,65535 | 8 |
| 0 | 0 |
| 1 | 9 |
| 2 | 144 |
| 3 | 153 |
| 4 | 2304 |
| 5 | 2313 |
| 6 | 2448 |
| 7 | 2457 |
| ... | ... |
| 15 | 39321 |
| ... | ... |
| 65535 | 21845,21852,21853,21957,...,65535 |
Example 1. For computing T[r+1][153] it can be seen from Table 4 that one of the possible values for which results in is 3. From Table 3 it is concluded that 1,2 and 3 are some values for which can be transformed to . So minimum of T[r][1] + 1 × 4, T[r][2] + 1 × 4 and T[r][3] + 1 × 4 can be a candidate for T[r+1][153]. Considering all of the possible transformations results in the final value of T[r+1][153].
The most important properties of B which affect the resultant bound are as follows:
(1) The minimum active sub-blocks in the input and output of B takes its maximum possible value.
(2) Double-truncated pairs and results in
To see the effect of the second property on the overall diffusion of the SPN, consider the following matrix B1:
The minimum positive active sub-blocks in the input and output of B1 and B are the same, but if B1 is used instead of B in the SPN mentioned in this section, the obtained lower bound of MNDAS for 8 rounds of the SPN would be 16, which is much smaller than the lower bound for SPN using B. This egregious difference is the result of the possible transformations of to and to for the matrix B1.
For the introduced method in this section, if the matrix P uses 8 different matrices B with the two mentioned properties, the obtained lower bound will not change.
The bound can be increased if we use MDS matrices (in GF (28)) instead of the binary matrix B in relation (14). In this case, using the mentioned method, the lower bound of the MNDAS for 8 rounds is increased to 60. For a 4 × 4 MDS matrix the possible double-truncated transitions would be the same as the ones for matrix B. The only difference is that for each active B, 5 is added to the lower bound of MNDAS. The SPN structures using the new diffusion layer, have at most 6-round impossible differentials. It can be implemented efficiently similar to Rindael-256 because Π is similar to ShiftRows and B is similar to MixColumns. The lower bound of the MNDAS for 8 rounds of Rindael-256 is 50 which is much less than the bound 60 calculated for the proposed SPN. The new SPN can be used to design a block cipher with a block size of 256 bits.
4 Conclusion
In this paper, we concentrated on the design of binary matrices with suitable cryptographic properties. Firstly, using the AES structure, a new 16 × 16 binary matrix was proposed. Although, this matrix has appropriate results as far as the active S-boxes for R rounds of the Camellia block cipher is concerned, its resistance against impossible differential attack is not preferable. To solve this problem, The Camellia binary matrix idea and byte permutation (similar to ShiftRows in AES ) were utilized resulting in a new structure. Using this new structure, a 16 × 16 matrix was proposed where 10 rounds of an SPN structure using this matrix as its diffusion layer has 47 linear and differential active S-boxes and there is no impossible differential distinguisher for more than 4 rounds.
Due to the limitation of the memory and time for counting the active S-boxes for a 32 × 32 binary matrix, the counting method mentioned in was modified to provide a lower bound for the active S-boxes. Then based on 4 × 4 binary matrices, a new structure for 32 × 32 binary matrices was presented. Using the proposed structure, it was found that the lower bound for the active S-boxes in 8 rounds of the SPN structure employing this matrix as its diffusion layer is 48. Since the counting method used for this structure could be extended to non-binary matrices, the 4 × 4 binary matrix was replaced by a 4 × 4 MDS matrix. The lower bound for the number of active S-boxes is 60 for the new non-binary structure. The resultant bound is better than the corresponding lower bound for Rijndael-256 which is 50 [13]. All the proposed diffusion layers proposed in this paper can be efficiently implemented in software.
A Software Implementations
Now some points are described about the method of implementing of round function on processors with word length 32 or above. In a round function of the SPN structure after the key addition layer which simply is implemented by XOR operations, the Sbox layer and diffusion layer are applied to the input block as follows:
Q ∘ Π ∘ S
where Q and Π are n × n binary matrices. The above functions are applied to the input block X = (x1, x2, ..., xn), where each xi, i = 1, 2, ... , n is a byte. Since we can interchange S and Π, Q ∘ Π ∘ S(X) = Q ∘ S ∘ Π(X) . Thus, in the following the method of implementation is described. If all of the 8-bit Sboxes used in S-box layer are the same, to implement Q∘ S for all Qs introduced in this paper it suffices to pre-compute and store the following 4 lookup tables each with 256 4-byte word entries:
On the other hand, to implement the inverse of the round function the following 4 lookup tables are stored:
The matrix Q1 introduced in Section 2 can be divided into 4 independent 4 × 4 sub-blocks. Thus, applying Q1 ∘ S to input vector can be implemented by independent calculation of 4 sub-blocks of the output. Each output sub-block is calculated by 4 table lookups and 3 XOR operations. Calculation of the Q1−1 ∘ S−1 is the same.
Matrix Q2 introduced in Section 2 can be divided into two 8 × 8 sub-blocks which can be implemented independently. Here, the implementation of the first 8 × 8 sub-block is described. Implementation of the second 8 × 8 sub-block is performed similarly. To compute (y1, y2,... ..., y8), the following vectors are calculated first:
Then (y1, y2, y3, y4) is computed as follows:
and (y5, y6, y7, y8) is computed as follows:
where z0 ⋘ 8 shows the circular shifting of z0 by 8 bits. Thus, the calculation of each output sub-block needs 8 table lookups, 8 XOR operations and 1 circular shift operation. To compute (y9, y10,... ..., y16) the following vectors are calculated first:
Then (y9, y10, y11, y12) is computed as follows:
and (y13, y14, y15, y16) is computed as follows:
Thus, the calculation of each output sub-block needs 8 table lookups, 8 XOR operations, and 1 circular shift operation.
Inverse of Q2 is as follows.
It is clear that Q2−1 ∘ S−1 can be implemented similar to Q2 ∘ S.
B Hardware Implementations of Binary Matrix
To hardware implementation of binary matrix
We can implement this matrix directly or recursive. To implement recursive, these relations as below (xi’s are inputs and yi’s are output):
If we want to decrease 8 XOR to 7 XOR, we need to define a temporary variable:
References
- M. Kanda, S. Moriai, K. Aoki, H. Ueda, Y. Takashima, K. Ohta, and T. Matsumoto. E2- A New 128-Bit Block Cipher. IEICE Transac- tions on Fundamentals of Electronics, Communi- cations and Computer Sciences, E83-A(1):48–59, 2000.
- K. Aoki, T. Ichikawa, M. Kanda, M. Matsui, S. Moriai, J. Nakajima, and T. Tokita. Camellia: A 128-bit Block Cipher Suitable for Multiple Platforms-Design and Analysis. In SAC 2000, volume 2012, pages 39–56. Springer-Verlag Berlin Heidelberg, 2001.
- D. Kwon, J. Kim, S. Park, S.H. Sung, Y. Sohn, J.H. Song, Y. Yeom, E-J. Yoon, S. Lee, J. Lee, S. Chee, D. Han, and J. Hong. New Block Cipher ARIA. In ICISC2003, volume 2971, pages 432–445. Springer-Verlag, 2003.
- Christof Beierle, J´er´emy Jean, Stefan K¨olbl, Gregor Leander, Amir Moradi, Thomas Peyrin, Yu Sasaki, Pascal Sasdrich, and Siang Meng Sim. The skinny family of block ciphers and its low-latency variant mantis. In Advances in Cryptology–CRYPTO 2016: 36th Annual Inter- national Cryptology Conference, Santa Barbara, CA, USA, August 14-18, 2016, Proceedings, Part II 36, pages 123–153. Springer, 2016.
- Subhadeep Banik, Andrey Bogdanov, Takanori Isobe, Kyoji Shibutani, Harunaga Hiwatari, Toru Akishita, and Francesco Regazzoni. Midori: A block cipher for low energy. In Advances in Cryptology–ASIACRYPT 2015: 21st Interna- tional Conference on the Theory and Application of Cryptology and Information Security, Auck- land, New Zealand, November 29–December 3, 2015, Proceedings, Part II 21, pages 411–436. Springer, 2015..
- B. Koo, H. Jang, and J. Song. On constructing of a 32 * 32 binary matrix as a diffusion layer for a 256-bit block cipher. In ICISC2006, volume 4296, pages 51–64. Springer-Verlag, 2006.
- Muharrem Tolga Sakallı, Sedat Akleylek, Bora Aslan, Ercan Bulu¸s, and Fatma Bu¨yu¨ksarac¸og˘lu Sakallı. On the construction of and binary ma- trices with good implementation properties for lightweight block ciphers and hash functions. Mathematical Problems in Engineering, 2014 , 2014.
- M. Matsui. Linear Cryptanalysis Method for DES Cipher. In EUROCRYPT’93, volume 765, pages 386–397. Springer-Verlag, 1993.
- E. Biham and A. Shamir. Differential Cryptanalysis of DES-like Cryptosystems. In CRYPTO’90, volume 537, pages 2–21. Springer-Verlag, 1990.
- J. Daemen.Cipher and Hash Function Design Strategies Based on Linear and Differen- tial Cryptanalysis. PhD thesis, Elektrotechniek Katholieke Universiteit Leuven, Belgium, 1995.
- Mahdi Sajadieh, Arash Mirzaei, Hamid Mala, and Vincent Rijmen. A new counting method to bound the number of active s-boxes in rijn- dael and 3d. Designs, Codes and Cryptography, 83:327–343, 2017.
- Hongjun Wu. The hash function jh. Submission to NIST (round 3), 6, 2011.
- J. Daemen and V. Rijmen. The Design of Rijn- dael: AES - The Advanced Encryption Standard. Springer-Verlag, 2002.
