AES (2/3): A Description of AES Lookup Tables

20 Nov 2018

↪ Proof of concept

The actual motivation behind the previous article is that I did not want to just copy & paste the AES lookup tables from the book or other implementations, I wanted them to calculate them by myself. It might make sense for a tiny implementation, with limited ram and rom, to calculate the result of the byte substitution from scratch every time. However, this was not my intention. In my implementation, hardcoding the tables makes almost no difference in runtime, and absolutely no difference in memory consumption (the tables will be kept in memory anyway). The whole task is more like a proof of concept, an excercise, that shows my Finite Field arithmetic has been implemented correctly.

For that purpose, there is a macro definition in config.h which allows you to control the behaviour during compile time (0 = hardcoded / 1 = runtime):

#define AES_CALCULATE_LOOKUP_TABLES 1

The sub-functions being used in this article (i.e. multiplicative_inverse()) have already been described in the previous article.

↪ Byte Substitution Table

The Byte Substitution Table is AES’ core of diffusion, it is used within the Key Schedule and the Byte Substition Layer. Compared to DES this table can also be seen as a single S-Box.

↪ Generation

static void
gen_bytesub(void)
{
    for(int i = 0; i < 256; i++)
    {
        uint8_t result = multiplicative_inverse(i);
        uint8_t x = i & 0xF;
        uint8_t y = i >> 4;
        AES_BYTE_SUB[y][x] = result;
    }
    return;
}

For each element in GF(2^8), we take the multiplicative inverse and store it in a static lookup table called AES_BYTE_SUB:

static const uint8_t AES_BYTE_SUB[16][16] = {
    // 0x0,  0x1,  0x2,  0x3,  0x4,  0x5,  0x6,  0x7,  0x8,  0x9,  0xA,  0xB,  0xC,  0xD,  0xE,  0xF,
    { 0x63, 0x7C, 0x77, 0x7B, 0xF2, 0x6B, 0x6F, 0xC5, 0x30, 0x01, 0x67, 0x2B, 0xFE, 0xD7, 0xAB, 0x76 }, // 0x0
    { 0xCA, 0x82, 0xC9, 0x7D, 0xFA, 0x59, 0x47, 0xF0, 0xAD, 0xD4, 0xA2, 0xAF, 0x9C, 0xA4, 0x72, 0xC0 }, // 0x1
    { 0xB7, 0xFD, 0x93, 0x26, 0x36, 0x3F, 0xF7, 0xCC, 0x34, 0xA5, 0xE5, 0xF1, 0x71, 0xD8, 0x31, 0x15 }, // 0x2
    { 0x04, 0xC7, 0x23, 0xC3, 0x18, 0x96, 0x05, 0x9A, 0x07, 0x12, 0x80, 0xE2, 0xEB, 0x27, 0xB2, 0x75 }, // 0x3
    { 0x09, 0x83, 0x2C, 0x1A, 0x1B, 0x6E, 0x5A, 0xA0, 0x52, 0x3B, 0xD6, 0xB3, 0x29, 0xE3, 0x2F, 0x84 }, // 0x4
    { 0x53, 0xD1, 0x00, 0xED, 0x20, 0xFC, 0xB1, 0x5B, 0x6A, 0xCB, 0xBE, 0x39, 0x4A, 0x4C, 0x58, 0xCF }, // 0x5
    { 0xD0, 0xEF, 0xAA, 0xFB, 0x43, 0x4D, 0x33, 0x85, 0x45, 0xF9, 0x02, 0x7F, 0x50, 0x3C, 0x9F, 0xA8 }, // 0x6
    { 0x51, 0xA3, 0x40, 0x8F, 0x92, 0x9D, 0x38, 0xF5, 0xBC, 0xB6, 0xDA, 0x21, 0x10, 0xFF, 0xF3, 0xD2 }, // 0x7
    { 0xCD, 0x0C, 0x13, 0xEC, 0x5F, 0x97, 0x44, 0x17, 0xC4, 0xA7, 0x7E, 0x3D, 0x64, 0x5D, 0x19, 0x73 }, // 0x8
    { 0x60, 0x81, 0x4F, 0xDC, 0x22, 0x2A, 0x90, 0x88, 0x46, 0xEE, 0xB8, 0x14, 0xDE, 0x5E, 0x0B, 0xDB }, // 0x9
    { 0xE0, 0x32, 0x3A, 0x0A, 0x49, 0x06, 0x24, 0x5C, 0xC2, 0xD3, 0xAC, 0x62, 0x91, 0x95, 0xE4, 0x79 }, // 0xA
    { 0xE7, 0xC8, 0x37, 0x6D, 0x8D, 0xD5, 0x4E, 0xA9, 0x6C, 0x56, 0xF4, 0xEA, 0x65, 0x7A, 0xAE, 0x08 }, // 0xB
    { 0xBA, 0x78, 0x25, 0x2E, 0x1C, 0xA6, 0xB4, 0xC6, 0xE8, 0xDD, 0x74, 0x1F, 0x4B, 0xBD, 0x8B, 0x8A }, // 0xC
    { 0x70, 0x3E, 0xB5, 0x66, 0x48, 0x03, 0xF6, 0x0E, 0x61, 0x35, 0x57, 0xB9, 0x86, 0xC1, 0x1D, 0x9E }, // 0xD
    { 0xE1, 0xF8, 0x98, 0x11, 0x69, 0xD9, 0x8E, 0x94, 0x9B, 0x1E, 0x87, 0xE9, 0xCE, 0x55, 0x28, 0xDF }, // 0xE
    { 0x8C, 0xA1, 0x89, 0x0D, 0xBF, 0xE6, 0x42, 0x68, 0x41, 0x99, 0x2D, 0x0F, 0xB0, 0x54, 0xBB, 0x16 }  // 0xF
};

The 4 left-most bits represent the y axis, the 4 right-most the x axis, therefore we can use a macro definition for the actual lookup:

#define SUB_BYTE(x) (AES_BYTE_SUB[((x) >> 4)][((x) & 0xF)])

↪ Inverse Byte Substitution Table

Like the name says, the Inverse Byte Substitution Table is being used during decryption to map back the enciphered text to the original plain text. It is part of the Inverse Byte Substition Layer.

↪ Generation

static void
gen_inv_bytesub(void)
{
    for(int i = 0; i < 256; i++)
    {
        uint8_t x = i & 0xF;
        uint8_t y = i >> 4;
        uint8_t inverse = AES_BYTE_SUB[y][x];
        x = inverse & 0xF;
        y = inverse >> 4;
        AES_INV_BYTE_SUB[y][x] = i;
    }
    return;
}

Since we already know the Byte Substitution Table, it’s inverse is calculated by taking the value of each element and store it into a new table. Here again, the 4 left-most bits represent the y axis, the 4 right-most the x axis of the table AES_INV_BYTE_SUB:

static const uint8_t AES_INV_BYTE_SUB[16][16] = {
    // 0x0,  0x1,  0x2,  0x3,  0x4,  0x5,  0x6,  0x7,  0x8,  0x9,  0xA,  0xB,  0xC,  0xD,  0xE,  0xF,
    { 0x52, 0x09, 0x6A, 0xD5, 0x30, 0x36, 0xA5, 0x38, 0xBF, 0x40, 0xA3, 0x9E, 0x81, 0xF3, 0xD7, 0xFB }, // 0x0
    { 0x7C, 0xE3, 0x39, 0x82, 0x9B, 0x2F, 0xFF, 0x87, 0x34, 0x8E, 0x43, 0x44, 0xC4, 0xDE, 0xE9, 0xCB }, // 0x1
    { 0x54, 0x7B, 0x94, 0x32, 0xA6, 0xC2, 0x23, 0x3D, 0xEE, 0x4C, 0x95, 0x0B, 0x42, 0xFA, 0xC3, 0x4E }, // 0x2
    { 0x08, 0x2E, 0xA1, 0x66, 0x28, 0xD9, 0x24, 0xB2, 0x76, 0x5B, 0xA2, 0x49, 0x6D, 0x8B, 0xD1, 0x25 }, // 0x3
    { 0x72, 0xF8, 0xF6, 0x64, 0x86, 0x68, 0x98, 0x16, 0xD4, 0xA4, 0x5C, 0xCC, 0x5D, 0x65, 0xB6, 0x92 }, // 0x4
    { 0x6C, 0x70, 0x48, 0x50, 0xFD, 0xED, 0xB9, 0xDA, 0x5E, 0x15, 0x46, 0x57, 0xA7, 0x8D, 0x9D, 0x84 }, // 0x5
    { 0x90, 0xD8, 0xAB, 0x00, 0x8C, 0xBC, 0xD3, 0x0A, 0xF7, 0xE4, 0x58, 0x05, 0xB8, 0xB3, 0x45, 0x06 }, // 0x6
    { 0xD0, 0x2C, 0x1E, 0x8F, 0xCA, 0x3F, 0x0F, 0x02, 0xC1, 0xAF, 0xBD, 0x03, 0x01, 0x13, 0x8A, 0x6B }, // 0x7
    { 0x3A, 0x91, 0x11, 0x41, 0x4F, 0x67, 0xDC, 0xEA, 0x97, 0xF2, 0xCF, 0xCE, 0xF0, 0xB4, 0xE6, 0x73 }, // 0x8
    { 0x96, 0xAC, 0x74, 0x22, 0xE7, 0xAD, 0x35, 0x85, 0xE2, 0xF9, 0x37, 0xE8, 0x1C, 0x75, 0xDF, 0x6E }, // 0x9
    { 0x47, 0xF1, 0x1A, 0x71, 0x1D, 0x29, 0xC5, 0x89, 0x6F, 0xB7, 0x62, 0x0E, 0xAA, 0x18, 0xBE, 0x1B }, // 0xA
    { 0xFC, 0x56, 0x3E, 0x4B, 0xC6, 0xD2, 0x79, 0x20, 0x9A, 0xDB, 0xC0, 0xFE, 0x78, 0xCD, 0x5A, 0xF4 }, // 0xB
    { 0x1F, 0xDD, 0xA8, 0x33, 0x88, 0x07, 0xC7, 0x31, 0xB1, 0x12, 0x10, 0x59, 0x27, 0x80, 0xEC, 0x5F }, // 0xC
    { 0x60, 0x51, 0x7F, 0xA9, 0x19, 0xB5, 0x4A, 0x0D, 0x2D, 0xE5, 0x7A, 0x9F, 0x93, 0xC9, 0x9C, 0xEF }, // 0xD
    { 0xA0, 0xE0, 0x3B, 0x4D, 0xAE, 0x2A, 0xF5, 0xB0, 0xC8, 0xEB, 0xBB, 0x3C, 0x83, 0x53, 0x99, 0x61 }, // 0xE
    { 0x17, 0x2B, 0x04, 0x7E, 0xBA, 0x77, 0xD6, 0x26, 0xE1, 0x69, 0x14, 0x63, 0x55, 0x21, 0x0C, 0x7D }  // 0xF
};

The lookup can easily be implemented as a macro definition, too:

#define INV_SUB_BYTE(x) (AES_INV_BYTE_SUB[((x) >> 4)][((x) & 0xF)])

↪ Round Coeffecient Table

In each iteraton of the AES Key Schedule, a round coefficient (RC) is being used to add a level of confusion to the key. Depending on the size of the key, a different number of RCs is being used:

bits    iterations      RCs
-------------------------------
128         10           10
192         12            8
256         14            7

Longer keys require less RCs, shorter keys require more, this is due to the fact that the number depends on the relation between iterations, sub keys and rounds. More on that topic can be found in the next article in the Key Schedule Section.

↪ Generation

static void
gen_rc_table(void)
{
    for(int j = 0; j < AES_WI_RUNS; j++)
    {
        if(j % AES_SUBKEY_PARTS == 0)
        {
            uint32_t rcon = 1 << (j / AES_SUBKEY_PARTS );
            rcon = aes_polynomial_division((uint16_t *) &rcon);
            rcon <<= 24;
            RC[j / AES_SUBKEY_PARTS] = rcon;
        }
    }
    return;
}

Each new subkey part will have it’s own RC, while each RC will be x (NB: x is defined as 0x02 in GF(2^8) power to the value of the current round. This can be realized using shifts and mod p reduction. Furthermore the RC will be applied (XOR later) only to the 24th bit of the round key so we left shift it by 24, and store it in a static table RC:

static const uint32_t RC[10] = {
    0x01000000, 0x02000000, 0x04000000, 0x08000000, 0x10000000,
    0x20000000, 0x40000000, 0x80000000, 0x1b000000, 0x36000000
};

The RC lookup is a simple 1:1 macro definition:

#define AES_RC(x) (RC[(x)])

↪ Multiplication Tables

↪ Disclaimer

One of my goals is to avoid any Finite Field calculation during the runtime of the algorithm and instead, have everything in static lookup tables. I have seen some other implementations where the following calculations are done within a short function. However, here comes mine.

The following tables and macro definitions are used during the Mixed Column Layer. Since the concept is quite easy to understand but the output is relatively large I do not show them here explicitly (they can be found in the library source code anyway). The tables are plain multiplication results of all elements 0x00 to 0xFF with 0x02, 0x03, 0x0E, 0x0B, 0x0D, 0x0D, 0x09 (these factors are defined in the Mixed Column and respectively the Inversed Mixed Column Layer). Because all operations, depending on the multiplication result, may or may not require modulo p reduction, I decided to propagate the values in lookup tables and be able to exclude all Finite Field arithmetic functions from the library during compile time (AES_CALCULATE_LOOKUP_TABLES == 0).

static const uint8_t MULTIPLY_0x2[256] = {..};
static const uint8_t MULTIPLY_0x3[256] = {..};
static const uint8_t MULTIPLY_0xE[256] = {..};
static const uint8_t MULTIPLY_0xB[256] = {..};
static const uint8_t MULTIPLY_0xD[256] = {..};
static const uint8_t MULTIPLY_0x9[256] = {..};
Generation
static void
gen_multiplication(void)
{
    for(int i = 0; i < 256; i++)
    {
        uint16_t result = 0;
        /* 0x2 */
        result = multiply_polynomial(0x2, i);
        MULTIPLY_0x2[i] = aes_polynomial_division(&result);

        /* 0x3 */
        result = multiply_polynomial(0x3, i);
        MULTIPLY_0x3[i] = aes_polynomial_division(&result);

        /* 0x9 */
        result = multiply_polynomial(0x9, i);
        MULTIPLY_0x9[i] = aes_polynomial_division(&result);

        /* 0xE */
        result = multiply_polynomial(0xE, i);
        MULTIPLY_0xE[i] = aes_polynomial_division(&result);

        /* 0xB */
        result = multiply_polynomial(0xB, i);
        MULTIPLY_0xB[i] = aes_polynomial_division(&result);

        /* 0xD */
        result = multiply_polynomial(0xD, i);
        MULTIPLY_0xD[i] = aes_polynomial_division(&result);
    }
    return;
}

The implementation is straight forward, all elements will be calculated within a loop over all possible elements in GF(2^8), the results will be stored in the appropriate multiplication table. The results are accessible using the following macro definitions:

#define AES_MULTIPLY_0x2(x) (MULTIPLY_0x2[(x)])
#define AES_MULTIPLY_0x3(x) (MULTIPLY_0x3[(x)])
#define AES_MULTIPLY_0xE(x) (MULTIPLY_0xE[(x)])
#define AES_MULTIPLY_0xB(x) (MULTIPLY_0xB[(x)])
#define AES_MULTIPLY_0xD(x) (MULTIPLY_0xD[(x)])
#define AES_MULTIPLY_0x9(x) (MULTIPLY_0x9[(x)])

Go to next article: Encryption and Decryption

hoschi@gfuzz:~$

Archive

About

RSS

AES (2/3): A Description of AES Lookup Tables

20 Nov 2018

↪ Proof of concept

↪ Byte Substitution Table

↪ Generation

↪ Inverse Byte Substitution Table

↪ Generation

↪ Round Coeffecient Table

↪ Generation

↪ Multiplication Tables

↪ Disclaimer