Differential Power Analysis on AES - Hands On Multi Bit Attack

In a previous article, I went through single bit DPA attacks on AES. Here, I want to go over a multi-bit attack and compare efficacy.

Instead of splitting the traces into two groups based on the status of a single bit, we:

  • Assign each trace a value based on the full 8 bit SBOX lookup
  • Analyze the traces as a group

How can this be done? For the first task, we start the same way as for the single bit attack. Somewhere - we don't know where - in each data trace there is a byte whose value is dependent ONLY on the changing data and key bytes. The SBOX is a constant lookup table, which is known. Whereas previously we looked at the power draw of a single bit, we now realize that the power draw is related to the number of the bits in the byte that are one. Either setting each bit increases or decreases the power draw of the function, but whatever it does, it will do so every time. We know the data bytes that are being fed to the device, but we don't know the key. However, if we guess a key byte then we can predict the outcome of this operation. For example, take trace #1 in the example data set and examine plain text byte 1: 0x25, both of which we are given. We don't know the key, but there are only 256 possible values. Lets guess them one by one. Start with 0x00, then 0x01 and so on. The XOR of 0x25 and 0x00 is 0x25. The SBOX lookup for that value is 0x3F. If we guessed instead that the key is 0x01, then the XOR result would be 0x24. The SBOX lookup for that value is 0x36. Now - and this is where we start to differ from the single bit attack - the number of bits set in the SBOX result of 0x3F (binary 00111111) is 6. And the number of bits set in the SBOX result of 0x36 (binary 00110110) is 4. And so on ... for all other values of the key guess up to 0xFF. This number of bits set is called the Hamming Weight.

Lets write a piece of MATLAB code for this case. Assume that the data is in the variable traces as in the single bit example. And we use the same data as provided by [2].

We need an outer loop that loops over the all the key bytes. There are 16 in the AES-128 example. Inside that, we need a loop that goes over the 256 possible key byte values. For each of the inner and outer loop variables, we need an SBOX lookup and corresponding Hamming Weight prediction.


for currentKeyByte=1:16 % for every byte in the key for currentKeyByteGuess = 0:255 % iterate through all candidate key bytes, 0x00 to 0xff xorResult = bitxor(plaintext(:,currentKeyByte),currentKeyByteGuess); % first operation is an XOR between the cleartext and the guessed key sboxLookup = SBOX(xorResult+1); HammingWeightHypothesis(:,currentKeyByteGuess+1) = sum( dec2bin(sboxLookup).' == '1' ); % calculate Hamming Weight end;end;

Now we have to look at what to do with the Hamming Weight hypotheses.

Instead of sorting each trace into one of two groups, we gave each trace a number corresponding to the number of bits set (Hamming Weight) after the SBOX lookup. Then we need a way to decide how well all those weights fits to the actual power measurements. This can be done with a correlation computation called the Pearson Correlation Coefficient. There is a very good example in [2], which I will use here:

Measured value   Hamming value
from trace       for correct key
4.15             3
4.15             3
4.2              4
4.21             4
4.24             5
4.25             5
4.26             5
4.3              6
4.31             6
4.29             6
 
Correlation Coefficient 
0.99188

Note that the units of the trace are completely arbitrary. Lets multiply the values by 100:

Measured value   Hamming value
from trace       for correct key
415              3
415              3
420              4
421              4
424              5
425              5
426              5
430              6
431              6
429              6
 
Correlation Coefficient 
0.99188 

Or by any other value. Lets say we multiply by 123.456:

Measured value   Hamming value
from trace       for correct key
512.3424         3
512.3424         3
518.5152         4
519.74976        4
523.45344        5
524.688          5
525.92256        5
530.8608         6
532.09536        6
529.62624        6
 
Correlation Coefficient 
0.99188

The Pearson Correlation Coefficient determines how linearly the trace values change with the Hamming values. With linear, we mean that if the trace goes up by some value for a Hamming increase of 1, then it will go up twice that value for a Hamming increase of two. Example: if we measure 100 on the trace for a Hamming value of 1 and it goes to 108 for a Hamming value of 2, then a linear correlation implies a change to 116 for a Hamming value of 3. It is this relationship we are trying to determine. The whole point behind using the Hamming Weight is the hypothesis that such a linear relationship exists.

Then, for all data traces and for each 'time index' we calculate how well the predicted power draw correlates with the observations:

for currentTimeIndex=1:traceLength % check for correlation between the 200 power traces and each of the key guesses for the current data point dataPoints = traces(:,currentTimeIndex); % the data points for all the traces at the current time index prediction = HammingWeightHypothesis(:,currentKeyByteGuess+1); % lets see how well the data points match with the predicted hamming weights % by calculating the Pearson Correlation Coefficient for the current data point % we want the maximum correlation, positive or negative, so we take the absolute value pearsonCorrelation(currentKeyByte,currentKeyByteGuess+1,currentTimeIndex)=abs(cov(dataPoints,prediction)/(sqrt(var(dataPoints))*sqrt(var(prediction))));end;

Just like in the single bit example, we do not know at which point in time we may get a good correlation. That is the reason why we calculate the correlation for each 'time index'. Together, the code becomes:

for currentKeyByte=1:16 % for every byte in the key for currentKeyByteGuess = 0:255 % iterate through all candidate key bytes, 0x00 to 0xff currentKeyByteGuess xorResult = bitxor(plaintext(:,currentKeyByte),currentKeyByteGuess); % first operation is an XOR between the cleartext and the guessed key sboxLookup = SBOX(xorResult+1); HammingWeightHypothesis(:,currentKeyByteGuess+1) = sum( dec2bin(sboxLookup).' == '1' ); % calculate Hamming Weight for currentTimeIndex=1:traceLength % check for correlation between the 200 power traces and each of the key guesses for the current data point dataPoints = traces(:,currentTimeIndex); % the data points for all the traces at the current time index prediction = HammingWeightHypothesis(:,currentKeyByteGuess+1); % lets see how well the data points match with the predicted hamming weights % by calculating the Pearson Correlation Coefficient for the current data point % we want the maximum correlation, positive or negative, so we take the absolute value pearsonCorrelation(currentKeyByte,currentKeyByteGuess+1,currentTimeIndex) = abs(cov(dataPoints,prediction)/(sqrt(var(dataPoints)) * sqrt(var(prediction)))); end; end; F = squeeze(pearsonCorrelation(currentKeyByte,:,:)); % fix dimensionality of array [bestMatch, bestMatchTimeIndex] = find(F==max(F(:))); % find highest correlation solvedKey(currentKeyByte) = bestMatch(1) - 1;end;

This is a plot of the correlation with time with the wrong key for key byte 1:

And this is a plot of the correlations with time of all 255 wrong keys for key byte 1:

And this is a plot of the correlations with time of all keys, including the correct one. It is obvious which key is the correct one and also where the computation is performed:

When we repeat this for all 16 key bytes, we can recover the key. Previously, with 200 traces, running attacks on several bits separately and combining results we could extract 13 of 16 bytes of the correct key. Let's have a look at how many traces are necessary to extract the key when using a full 8 bit Hamming-Weight attack.

Correct Key:                00 11 22 33 44 55 66 77 88 99 AA BB CC DD EE FF
Predicted(all 200 traces):  00 11 22 33 44 55 66 77 88 99 AA BB CC DD EE FF
Predicted(    150 traces):  00 11 22 33 44 55 66 77 88 99 AA BB CC DD EE FF
Predicted(    100 traces):  00 11 22 33 44 55 66 77 88 99 AA BB CC DD EE FF
Predicted(     50 traces):  00 11 22 33 7F 55 66 77 88 99 AA BB 5A DD EE FF
Predicted(     25 traces):  C3 11 BF 33 7F 55 66 39 A6 99 FB BB 93 DD EE FF

It makes intuitive sense that a method that uses more information would be more effective at extracting the key. We do, in fact, see that here.

References:

[1] Owen Lo, William J. Buchanan & Douglas Carson (2017) Power analysis attacks on the AES-128 S-box using differential power analysis (DPA) and correlation power analysis (CPA), Journal of Cyber Security Technology, 1:2, 88-107, DOI: 10.1080/23742917.2016.1231523

[2] Hardware Security and Trust: Design and Deployment of Integrated Circuits in a Threatened Environment. Nicolas Sklavos, Ricardo Chaves, Giorgio Di Natale, Francesco Regazzoni. Springer, Jan 11, 2017