hoschi@gfuzz:~$

AES (3/3): Encryption and Decryption

Overview

With all the calculations and lookup tables from the previous articles, we are now able to specify the actual enciphering and deciphering functions, including some tests taken from the original paper to make sure the implementation works in a sane manner. Configuration By using the macro definition AES_CALCULATE_LOOKUP_TABLES in config.h, we decide if the library has to create the lookup tables by itself our use the hardcoded ones. Default: 1, 0 means harcoded

#define AES_CALCULATE_LOOKUP_TABLES 1

The AES_KEY_SIZE macro simply specifies the length of our private key we want to use. Default: 128

#define AES_KEY_SIZE 128

Initialization

First at all, the library needs to be initialized. In both encryption and decryption cases we have to prepare the Key Schedule as well, the order of the subkeys will make the difference later.

uint8_t aes_init(const uint8_t *key);

*key has to be a pointer to a 128/192/256bit private key array. If neccessary, the initialization function will also trigger the AES lookup table creation, using the following functions which have been described in the previous article.

static void gen_bytesub(void);
static void gen_inv_bytesub(void);
static void gen_rc_table(void);
static void gen_multiplication(void);

Key Schedule

In AES, each enciphering and deciphering round gets it’s own sub key, initially derived from the original private key. Each sub key depend on the previous sub key, so it does not make sense to create them on the fly. Instead, we prepare all subkeys during initialization and store them in the following array to be accessed later in the desired order:

uint32_t AES_KEY_SCHEDULE[AES_SUBKEYS][AES_SUBKEY_PARTS] = { { 0 } };

The Key Schedule array is made of unsigned 32bit integers elements, this is due to the fact that each subkey is made of a number of words (AES_SUBKEY_PARTS). In AES a word is made of 32bits. The number of elements and words depend on the size of the key. Also, the number of subkeys depend on the number of rounds plus 1 (for the unaltered, initial private key)

key size in bits  |  AES_SUBKEYS    | AES_SUBKEY_PARTS
        128       |      11         |          44
        192       |      13         |          52
        256       |      15         |          60

The first subkey in the array is the initial private key, from there

  • Each 32 subkey part gets XORed using it’s pendant part from the previous subkey Additionally, the first word of each subkey receives a bit more alteration:
  • Left shift rotate by 1
  • Substitute each 4 bytes of the word using the AES_SUB_BYTE macro
  • XOR the word (24th bit only) by it’s corresponding round coefficient using the AES_RC macro The function has to be called by providing a pointer to a 128/192/256bit private key array.
static void key_schedule(const uint8_t *key);

Encryption

Depending on the key size, the encryption can be described using the following pseudo-code (Please note: The last round is missing the mix_column layer):

key_addition(key[0], state)
for round 1 step 1 to AES_ROUNDS-1
    byte_substitution(state)
    shift_rows(state)
    mix_column(state)
    key_addition(key[round], state)
end for
byte_substitution(state)
shift_rows(state)
key_addition(key[AES_ROUNDS], state)

Each enciphering round starts with the key addition layer, consuming the first subkey in the key schedule. This procedure is also known as “Key Whitening”.

Key Addition Layer

We learned that addition in GF(2^m) can be implemented as XOR, so all we have to do is XOR the input block with the subkey.

static void key_addition(uint32_t *key, uint8_t *state);

The function demands an unsigned 32bit pointer to the appropriate subkey and an unsigned 8bit pointer to the current input state array.

Byte Substitution Layer

The Byte Substition is the algorithms centrepiece of diffusion. It offers an efficient way to spread the input state bits all over the ciphertext while it is very easy to implement: Substitute each byte from the input state with the corresponding value from the AES_SUB_BYTE table using the SUB_BYTE macro

static void byte_substitution(uint8_t *state);

Besides the input state pointer there is no further parameter needed, the rest comes from the AES_SUB_BYTE array

Shift Rows Layer

While we only had operations that worked on a byte to byte or word to word level, the Shift Rows Layer is the first layer that starts mixing parts of the words between each other. This way, AES provides another level of diffusion of the plaintext. Round after round, the bits are more and more spread all over the ciphertext. Like the name says, the function shifts rows and not columns. We apply the following pattern:

  • 1st row remains the same
  • 2nd row gets shifted by one to the left
  • 3rd row gets shifted by two to the left
  • 4th row gets shifted by three to the left Each b stands for a 8bit block of the input state:
    b00 b04 b08 b12 => b00 b04 b08 b12
    b01 b05 b09 b13 => b05 b09 b13 b01
    b02 b06 b10 b14 => b10 b14 b02 b06
    b03 b07 b11 b15 => b15 b03 b07 b11
    
    static void shift_rows(uint8_t *state);
    

    The function needs a pointer to the input state only

Mix Column Layer

The Mix Column Layer is a bit more tricky to implement and requires GF(2^8) arithmetic. However, this can also be solved by a table lookup since our input states are limited to 8bit and the multiplication factors are 0x01 (noop), 0x02 and 0x03.

    8 bit          0x00
b0 b4  b8 b12 ^ 02 03 01 01
b1 b5  b9 b13 ^ 01 02 03 01
b2 b6 b10 b14 ^ 01 01 02 03
b3 b7 b11 b15 ^ 03 01 01 02

The table above describes the necessary operations for each byte of the columns within GF(2^8) using modulo p reduction:

b0 = b0 * 0x02 + b4 * 0x03 +  b8 * 0x01 + b12 * 0x01
b1 = b1 * 0x01 + b5 * 0x02 +  b9 * 0x03 + b13 * 0x01
b2 = b2 * 0x01 + b6 * 0x01 + b10 * 0x02 + b14 * 0x03
b3 = b3 * 0x03 + b7 * 0x01 + b11 * 0x01 + b15 * 0x02

A rudimentary C implementation could look like the following example, AES_MULTIPLY_0x2 and AES_MULTIPLY_0x3 are macros that do a static lookup. Again, the addition is implemented as XOR:

b0 = AES_MULTIPLY_0x2(b0) ^ AES_MULTIPLY_0x3(b4) ^ b8 ^ b12);

Unfortunately, we cannot directly override b0 here since the other calculations for b4, b8 and b12 depend on the initial state of the mix column layer. My implementation uses a temporary register for the states which will be copied over the input state after each column.

Decryption

Deciphering the plaintext is inversing the encryption, the library needs to be initialized with the correct key, the key schedule remains the same but will be used in the opposite order. The following pseudo code demonstrates decryption in AES, appropriate to encryption, which skips the mix column layer in the last round, we have to skip it in the first:

key_addition(key[AES_ROUNDS], state)
inv_shift_rows(state)
inv_byte_substitution(state)
for round AES_ROUNDS-1 step -1 to 1
    key_addition(key[round], state)
    inv_byte_substitution(state)
    inv_shift_rows(state)
    inv_mix_column(state)
end for
key_addition(key[0], state)

Key Addition Layer

The Key Addition Layer in decryption is the same function that is being used during encryption.

static void key_addition(uint32_t *key, uint8_t *state);

Inverse Byte Substitution Layer

Implementation is straight forward: Substitute each byte from the input state with the corresponding value from the INV_AES_SUB_BYTE table using the INV_SUB_BYTE macro

static void inv_byte_substitution(uint8_t *state);

Inverse Shift Rows Layer

To invert the Shift Rows Layer we just have to undo the shifts from the enciphering round:

  • 1st row remains the same
  • 2nd row gets shifted by one to the right
  • 3rd row gets shifted by two to the right
  • 4th row gets shifted by three to the right Each b stands for a 8bit block of the input state:
    b00 b04 b08 b12 => b00 b04 b08 b12
    b05 b09 b13 b01 => b01 b05 b09 b13
    b10 b14 b02 b06 => b02 b06 b10 b14
    b15 b03 b07 b11 => b03 b07 b11 b15
    
    static void inv_shift_rows(uint8_t *state);
    

    The function needs a pointer to the input state only

Inverse Mix Column Layer

The Mix Column Layer needs a bit more work to be inversed and again requires some GF(2^8) arithmetic. The inverse calculations can be taken from a pre calculated lookup table, too. This time we need to multiply by 0x0e, 0x0b, 0x0d, 0x09 (incl. modulo p reduction):

b0 b4  b8 b12 ^ 0e 0b 0d 09
b1 b5  b9 b13 ^ 09 0e 0b 0d
b2 b6 b10 b14 ^ 0d 09 0e 0b
b3 b7 b11 b15 ^ 0b 0d 09 0e

The actual calculation matches the one from the encryption round, only the factors differ. We can use the macros AES_MULTIPLY_0x9, AES_MULTIPLY_0xb, AES_MULTIPLY_0xd and AES_MULTIPLY_0xe for that purpose.

b0 = AES_MULTIPLY_0xe(b0) ^ AES_MULTIPLY_0xb(b4) ^ AES_MULTIPLY_0xd(b8) ^ AES_MULTIPLY_0x9(b12)
static void inv_mix_column(uint8_t *state);

Again, we do not overwrite the bytes immediately but store it in a temporary register to copy it over once each column has been finished.

Inverse Key Whitening

The final step in decryption is to apply the actual private key to the cipher text. The result will be the original plain text.

Cipher Block Chaining Mode

From my point of view, CBC does not belong into the algorithm itself. Together with integrity checks and indication of success, it is more part of the actual implementation. The algorithm does not care about the bits running through. Therefore, my CBC implementation can not be found in the library itself but in the dummy program which can also be found in the source tree (/main.c)

Tests

The good thing about tests is that if you implement them first (what an Utopia!!), you can fiddle with the library functions until they do the right thing. The authors of the AES paper were so kind to include a step-by-step state description of different key sizes and input states. They can be found at the end of the original paper as Appendix A.

TEST INPUT FOR THE AES ALGORITHM, TAKEN FROM FIPS PUB 197, APPENDIX A
    https://gogs.gfuzz.de/oliver.peter/AES/src/master/doc/nist.fips.197.pdf

128bit secret key
    0x2b7e151628aed2a6abf7158809cf4f3c

128bit input blocks
    0xae2d8a571e03ac9c9eb76fac45af8e51
    0x30c81c46a35ce411e5fbc1191a0a52ef
    0xf69f2445df4f9b17ad2b417be66c3710

128bit output blocks
    0xf5d3d58503b9699de785895a96fdbaaf
    0x43b1cd7f598ece23881b00e3ed030688
    0x7b0c785e27e8ad3f8223207104725dd4

Example output of the test program:

~/src/git/AES % make test
rm -f .obj/libaes.o *~ run
clang -Wall -Werror -O3 -g -fsanitize=address -fno-omit-frame-pointer -c libaes.c -o .obj/libaes.o
clang -Wall -Werror -O3 -g -fsanitize=address -fno-omit-frame-pointer -c test.c -o .obj/test.o
clang -Wall -Werror -O3 -g -fsanitize=address -fno-omit-frame-pointer  .obj/libaes.o .obj/test.o -o run_test
./run_test
*** BEGIN AES TESTS
    * AES Encryption Round  0
    `--> Round 0 successfully encrypted 
    * AES Encryption Round  1
    `--> Round 1 successfully encrypted 
    * AES Encryption Round  2
    `--> Round 2 successfully encrypted 
-------------------------------------------------------------
    * AES Decryption Round  0
    `--> Round 0 successfully decrypted 
    * AES Decryption Round  1
    `--> Round 1 successfully decrypted 
    * AES Decryption Round  2
    `--> Round 2 successfully decrypted 
-------------------------------------------------------------
*** END OF PROGRAM