[HFC] Hidden Features of union + struct in C－Edison.X. Blog

這篇文章續 [HFC] Hidden Features of User Defined Type in C 。

union 說穿了其實沒什麼好 Hidden Feature 的，只是一般在寫 code 時，較高階部份大概沒什麼機會看到這個，寫較低階常和 struct 合用。早期使用 union 其中一項原因為，在只有幾百k記憶體的時代，它是拿來省記憶體的技巧之一。

對於 union 特性不熟，請先找資料補充。另這篇文章筆者沒再翻過哪些已不被標準接納。

由於筆者在書上較少看到 union 範例，書本大多是提一下就帶過，此文敘述方式以範例較多。

由於筆者對語法分析不擅長，故這部份的範例便不再提及。

附帶一提，C++ union 也可以拿來宣告 class 。

8bits 記憶體完成九九乘法表

想一下一般99乘法表怎麼寫。

Code Snippet
for(int i=2; i<=9; ++i){
    for(int j=1; j<=9; ++j)
        printf("%d *%d = %2d\n", i, j, i*j);
    puts("");
}

上面的 i, j 為 int，若 sizeof(int) 為 4bytes，共吃了 8 bytes (32bits)。但事實上 i, j 都只有跑到 9 ，用 4 bits 就可完成。

這題目大致有兩種方向，一種是用bitwise hacker，另一是用 struct 之 bit field 特性。以 bitwise 方式，是宣告一個 unsigned char 出來，前 4 bits 存 i ，後 4 bits 存 j ，所以要判斷大小、要做加法，都要先以取出前4或後4 bits 之值，這種作法顯然較花時間刻程式碼，不附上。

另一種使用 struct bit field 方式如下。

Code Snippet
struct {
    unsigned char i:4;
    unsigned char j:4;
}var;
 
for(var.i=2; var.i<=9; ++var.i){
    for(var.j=1; var.j<=9; ++var.j)
        printf("%d * %d = %2d\n", var.i, var.j, var.i*var.j);
    puts("");
}

直接造一個 struct 出來，裡面兩個變數都指定 4 bits，直接對它們做為運算即可。

bit pattern

若對於一個數，需要常拿查第 n bit 為 1/0 時，可以這麼做

Code Snippet
int GetBit(unsigned char x, size_t idx)
{
    return ( x & (1<<idx) )!=0;
}

另一種方式是 union 裡再包 struct

Code Snippet
union U8{
    unsigned char val;
    struct{unsignedchar b0:1, b1:1, b2:1, b3:1;,b4:1, b5:1, b6:1, b7:1;};
};
void print_binary(union U8 var)
{
    printf("%d%d%d%d%d%d%d%d\n",
        var.b7,var.b6,var.b5,var.b4,
        var.b3,var.b2,var.b1,var.b0);
}
int main()
{    
    union U8 var;
    var.val = 10;
    print_binary(var); // display 00001010
    return 0;
}

< 寫到這裡讓人有點痛恨為什麼 printf 沒有 %b ... >

上面這例舉得很差，原因是若用 GetBit(x, idx) 時， idx 可以放一變數，自然就可以跑回圈，如

Code Snippet
for(int idx=0; idx<8; ++idx)
    printf("%d", GetBit(x, idx));

甚至在一些情況下，idx 是必須事先指定算好代入的，這點 struct bit field 辦不到，因 bit field 一項重要的限制為，它不能寫成陣列型式，即

Code Snippet
union U8{
    unsigned char val;
    struct {unsigned char b[8]:1;};
};

上述這段程式碼敘述是錯誤的。沒有 array 可用，只有 8 bits 還可以像上面這樣硬爆，如果 32 bits 呢？若以這種設計模式，的確是寫到 unsigned int b31 : 1 可能性較高，而不會用 4 個 union U8 。使用 4 個 union U8 沒有 1 個 U32 存取來得簡便，除非有必要知道 BYTE0、BYTE1、BYTE2、BYTE3 裡面的第幾個 bits 。

Print Type

早期一段較有名教學用的 code ， enum , struct , union 都用上了，它的作用和 C++ cout 頗相似，使用這段 code 不必在意資料型態為何，便直接做輸出。由於筆者認為是罕見讓人驚艷的範例，故放上較完整之程式碼。

首先定義一些資料型態。

Code Snippet
typedef enum { INTEGER, POINTER } Type;

再做一份 struct 包 union

Code Snippet
typedef struct
{
    Type type;
    union {
        int integer;
        void *pointer;
    } ;
} Value;

再來是針對各資料型態做相對應的 function。

Code Snippet
Value make_val_int(int x) {
    Value v={INTEGER};
    v.integer = x;
    return v;
}
Value make_val_ptr(void * ptr)
{
    Value v={POINTER};
    v.pointer = ptr;
    return v;
}

做完 new function 後再做一份 print function。

Code Snippet
void PrintVal(Value v)
{
    switch(v.type){
    case INTEGER: printf("%d\n", v.integer); break;
    case POINTER: printf("%p\n", v.pointer); break;
    }
}

調用時

Code Snippet
Value var;
var = make_val_int(10);
PrintVal(var);
var = make_val_ptr(&var);
PrintVal(var);

這麼做優點是，一份 8 bytes 記憶體 ( struct 大小 ) 可供多種不同資料型態使用，但相對的那些 make_val_xxx 與 PrintVal 必須自己動手刻。上面這技巧在較高階之程式語言(如 vb、autoit )，有「自動資料型態」( automation variant) 的也在用， OLE 、COM 也如此。

INPUT struct

另一種較佳的範例，大概屬 M$ 對 struct Input 之定義。

Code Snippet
typedef struct tagINPUT {
    enum {INPUT_MOUSE, INPUT_KEYBOARD, INPUT_HARDWARE}Type;
    union {
        MOUSEINPUT    mi;
        KEYBDINPUT    ki;
        HARDWAREINPUT hi;
    };
} INPUT, *PINPUT;

其中 MOUSEINPUT、KEYBDINPUT、HARDWAREINPUT 三個又是各自之 struct。相對的若要存取一系列的 Input 操作，可考慮這麼做。

Code Snippet
typedef struct {
    enum {EventKeyPress, EventKeyRelease, 
          EventMousePress, EventMouseRelease} EvenType;
    union{
        unsigned int KeyCode; // use for EventKeyxxx
        struct { // use for EventMousexxx
            int x, y;
            unsigned ButtonCode;
        };
    };
}InputEvent;

IEEE754 欄位分析

提醒一下，這功能可用 frexp 與 ldexp 完成。

union 、struct 拿來做浮點數欄位分析是件非常適合的事。首先，先寫一個從 10 進位無號數轉成 2 進位字串之副函式，同時具有指定寬度之功能。

Code Snippet
char * to_binary(char * dst, uint32_t x, size_t width)
{
    uint32_t mask = 1U << (width-1);
    size_t i=0;
    while(mask){
        dst[i] = '0' + ( (x&mask)!=0 );
        ++i;
        mask>>=1;
    }
    dst[i] = 0;
    return dst;
}

再來定義 union 及其 struct 欄位，欄位部份可參考 wiki 。

Code Snippet
typedef union tagFloat{
    float    val;
    uint32_t hex;
    struct {
        uint32_t mantissa  :23; // bit[0:22]
        uint32_t exponent  : 8; // bit[23:30]
        uint32_t sign      : 1; // bit[31]
    };
}Float;

接下來就沒什麼技巧了，放上測試程式碼。

Code Snippet
int main()
{
    char str_exp[30], str_man[30];
    Float f;
    f.val = -1.5;
 
    to_binary(str_exp, f.exponent, 8);
    to_binary(str_man, f.mantissa, 23);
    
    printf("Dec         : %f\n", f.val);
    printf("Hex         : %08x\n", f.hex);
    printf("Sign        : %d\n", f.sign);
    printf("Exponent    : %08x < %s > \n", f.exponent, str_exp);
    printf("Mantissa    : %08x < %s > \n", f.mantissa, str_man);
 
    return 0;
}

要做 bit 設定也行，在 union 裡面可以額外再加上另一個 no-name struct , struct field bit0~bit31 : 1 < 如果願意這麼做的話..>。

32 bits 會了，自然 64 bits 之 double 也不成問題。

Endian

以上述之 IEEE754 而言，在 wiki 之說明是 sign 於第 31 bit, exponent 為 30~23 bit，但上述之例子是 mantissa 寫最前面、exponent 次之、sign 最後，原因是假設 client 端電腦為little endian。

一般用到 union，必須考慮到位元的「次序」關係時，就該考慮到 little endian / big endian 之不同。要判斷 big/little 方法很多，但注意到的，要判斷的話必須使用 macro，而不能使用 function。因 struct 之定義必須在編譯期前完成，故較建議使用 macro 完成。

解決這種問題第一種方式是直接由使用者「額外定義」，大概長這樣。

為考慮 big / endian ，下面這段供參考 < 手邊沒 big endian 機器可供驗證 >

Code Snippet
#ifdef BIG_ENDIAN // for big endian define struct
typedef union tagFloat{
    float    val;
    uint32_t hex;
    struct {
        uint32_t sign      : 1;
        uint32_t exponent  : 8;
        uint32_t mantissa  :23;
    };
}Float;
#else // for little endian define struct
typedef union tagFloat{
    float    val;
    uint32_t hex;
    struct {
        uint32_t mantissa  :23;
        uint32_t exponent  : 8;
        uint32_t sign      : 1;
    };
}Float
#endif

另一種是由程式自動判斷，下面這段 code 供參考。

Code Snippet
#include <stdint.h>
 
#define LITTLE_ENDIAN 0x41424344UL
#define BIG_ENDIAN    0x44434241UL
#define PDP_ENDIAN    0x42414443UL
#define ENDIAN_ORDER  ('ABCD')
 
#if ENDIAN_ORDER==LITTLE_ENDIAN
typedef union tagFloat{
    float    val;
    uint32_t hex;
    struct {
        uint32_t mantissa  :23;
        uint32_t exponent  : 8;
        uint32_t sign      : 1;
    };
}Float;
#elif ENDIAN_ORDER==BIG_ENDIAN
typedef union tagFloat{
    float    val;
    uint32_t hex;
    struct {
        uint32_t sign      : 1;
        uint32_t exponent  : 8;
        uint32_t mantissa  :23;
    };
}Float;
#elif ENDIAN_ORDER==PDP_ENDIAN
/* other define */
#else
/* other define */
#endif