[HFC] Hidden Features of User Defined Type in C－Edison.X. Blog

C 語言裡之 User Defined Type 包含了 struct、union、enum，C++ 裡多了 class。此文主要針對 C struct 做撰之，強調是在 C 底下而非 C++ 底下，原因為 C++ struct 特性與 C 並不完全相同，另強調有些問題具相依性，此篇若說明遇到相依性問題時，會於標題註明。

注意事項

1. 限於時間與篇幅有限，此文非常不適合初學者念；過程中敘述若有誤，請不吝指出。

2. 就筆者所知，若為 undefined behavior 會註明。

3. 語法是否正確，部份必須視 compiler 是否支援，本文僅以 vc 測試。

4. user defined type 議題很多，畢竟 C 語言就是靠 macro、union、struct、function pointer 此四大特性，組合出令人驚艷之作法，此文所舉僅為冰山一角。

section 1: emun type

1. 說明

2. 定義與使用

3. typedef <1>

4. typedef <2>

5. 避開 #define macro

6. 少以 enum 進行數值運算

section 2: union type

1. 說明

2. 定義與宣告

3. typedef

4. 暱名 union < 假暱名 >

5. union excerise : 判斷 big / little endian

6. union excerise : HIWORD / LOWORD

section 3: struct type

1. 說明

2. 定義與使用

3. typedef <1>

4. typedef <2>

5. typedef <3>

6. anonymous struct ( 暱名結構體, 假暱名)

7. 具 auto copy 特性

8. 拿來做 interface

9. padding / 編排順序

10. 取得資料成員之起始位址與偏移位址

11. 強制 padding 方式

12. bit-field

13. struct in union exercise <IEEE754>

14. struct 其他議題

Reference

section 1 : enum type

1. 說明

(1) enum 為定義某個資料型態裡，只允許出現哪些數值。如在 C 語言裡面沒有 bool 資料型態，其中一種作法是用 enum 做 bool。

(2) enum 裡可自定義代表數值，但其代表數值必須為整數型態 ( 故可以負數示之 )、且可重覆，但不可為浮點數。

(3) enum 實際上之作法，被推斷較可能以 int 實現，故上述第一點之 bool 較少以 enum 實作之 ( 因多用了 byte 數 )

2. 定義與使用

Code Snippet
#include <stdio.h>
 
enum tagErrorType{
    Success=0, FunError=1, UnknowError=-1
};
int main()
{
         tagErrorType e1 = Success; // fail
    enum tagErrorType e2 = Success; // pass
    enum tagErrorType e3 = 0;       // pass in C , fail in C++    
    return 0;
}

3. typedef <1>

Code Snippet
#include <stdio.h>

enum tagErrorType{
    Success=0, FunError=1, UnknowError=-1
};
   typedef enum tagErrorType ErrorType;  // pass
// typedef      tagErrorType ErrorType2; // fail
 
int main()
{
//    enum ErrorType e1 = Success; // pass in C , fail in C++
           ErrorType e2 = Success; // pass    
    return 0;
}

4. typedef <2>

Code Snippet
#include <stdio.h>

typedef enum tagErrorType{
    Success=0, FunError=1, UnknowError=-1
}ErrorType;
 
int main()
{
    enum ErrorType e1 = Success; // pass
         ErrorType e2 = Success; // pass
    return 0;
}

5. 避開 #define macro

有幾種 macro 型式可用 enum 適當取代，下述之 Bool 實作較少以 enum 實作。而其中的 enum {WIDTH, HEIGHT} 使用的是暱名結構體，即不給予列舉元一個正式的名稱。

Code Snippet
#include <stdio.h>
#define W 2
#define H 3
#define TRUE  (1==1)
#define FALSE (0==1)
 
enum {WIDTH=2, HEIGHT=3};
typedef enum tagBool{True=1, False=0}Bool;
 
int main()
{
    int ar1[W][H]={0};          // pass
    int ar2[WIDTH][HEIGHT]={0}; // pass
    Bool repeat = False;   // pass
 
    while(TRUE)   {/* some statement */ break;}
    while(True)   {/* some statement */ break;}
    while(repeat) {/* some statement */ break;}
    return 0;
}

以 enum 替代 #define 常數之方式，受限於 enum 裡之內容必須為整數型態，不能放浮點數。

6. 少以 enum 進行數值運算

以上述之 Bool 為例，看兩個例子

Code Snippet
Bool in;
in = False; // pass
in = !True; // don't do that

上述之 !True 雖可正常執行，語意上也正確，但建議別這麼做，甚至筆者認為在 enum 之資料形態上都該避免。看個極端的例子。

Code Snippet
#include <stdio.h>
typedef enum tagBool{True=1, False=0}Bool;
int main()
{
    Bool is_true, flag=False;
    is_true = 1000 - flag;
    if(is_true==True) printf("True\n");
    if(is_true==False) printf("False\n");
    printf("is_true = %d\n", is_true); // only output this line
    return 0;
}

上例主要說明，在以 enum 做數值運算過程中，可能使得運算結果使得該列舉元變得無定義。

section 2 : union type

1. 說明

(1) union 是將所有成員，塞到同一塊記憶體裡面；而 union 大小以佔用記憶體最多的成員為主。

(2) union 在記憶體之配置，與 little / big endian 有關

(3) union 較少單純使用，底層上較常與 struct 合用。

2. 定義與宣告

Code Snippet
// 定義
union tagu{
    char x[4];
    int  y;
};
int main()
{         
          tagu u1; // fail
    union tagu u2; // pass
    return 0;
}

3. typedef

Code Snippet
// 定義
typedef union tagu{
    char x[4];
    int  y;
}u;
int main()
{         
    union tagu u1; // pas
          tagu u2; // fail
    union    u u3; // fail
             u u4; // pass
    return 0;
}

4. 暱名 union <假暱名>

Code Snippet
union {
    char c;
    int  i;
}u;
int main()
{         
    u.i=10;
    u.c='a';
    return 0;
}

事實上這裡的 u 不算是真正的暱名，真正的 union 應是包含在另一個 union 底層，或是被包含在另一個 union 底層。

5. union excerise : 判斷 big / little endian

Code Snippet
#include <stdio.h>
union {
    unsigned char c;
    int  i;
}u;
int main()
{         
    u.i = 0x1234;
    if(u.c==0x12) puts("big edian");
    else puts("little edian");
    return 0;
}

6. union excerise : HIWORD / LOWORD

Win32 API 裡有一對 macro 是 HIWORD 、LOWORD，主要是取得某 4 bytes 數值，前二位元組與後二位元組之數值，此時可考慮用 union 完成 < 速度會比 bitwise 操作慢點 >

Code Snippet
union{
    unsigned x;
    unsigned short y[2];
}u;
unsigned short hi_word(unsigned x){
    u.x = x;
    return u.y[1]; // little endian
}
unsigned short lo_word(unsigned x){
    u.x = x;
    return u.y[0]; // little endian
}

section 3 : struct type

1. 說明

(1) struct 是將一群有意義的資料，再重新定義成一種新的資料型態。

(2) struct 在 C++ 裡有另一層含義，可做為宣告類別之識別字。

(3) struct 較底層特性在 C 與 C++ 裡不完全相同。

(4) padding / alignment 問題請不要深入研究它，除了 depends on machine，甚至 depends on compiler。

2. 定義與使用

Code Snippet
struct Point{
    // int x, y;
    int x;
    int y;    
};
 
int main()
{
//    Point pt1; // fail in C, pass in C++
    struct Point pt2 = {1, 2}; // pass
    struct Point pt3;     // pass
    pt3.x = 1, pt3.y = 2; // pass
    return 0;
}

C99 之後在做 initialize assigned 時允許這麼做

Code Snippet
struct Point pt = {
    .x = 1, .y = 2;
};

3. typedef <1>

Code Snippet
struct tagLink{
    int    data;
//         tagLink *nxt; // fail
    struct tagLink *nxt; // pass
};
 
int main()
{
    struct tagLink *head;  // pass
           tagLink *trail; // fail
    return 0;
}

4. typedef <2>

Code Snippet
typedef struct taglink1{
    int data;
    struct taglink1 * nxt;
}link1;
int main()
{
    struct taglink1 head1; // pass in C, pass in C++
           taglink1 head2; // fail in C, pass in C++
    struct    link1 head3; // fail in C, fail in C++
              link1 head4; // fail in C, pass in C++
    return 0;
}

5. typedef <3>

Code Snippet
typedef struct taglink link;
struct taglink{
    int data;
    link *next;
};

6. anonymous struct ( 暱名結構體, 假暱名)

Code Snippet
#include <stdio.h>
struct {
    const char * name;
    int cost;
}Animal[] = {
    { "Cougar", 80 },    { "Tiger",  85 },
    { "Lion",   95 },    { "Monkey", 60 }
};
 
int main()
{
    int i;
    for(i=0; i<sizeof(Animal) / sizeof(Animal[0]); ++i)
        printf("%10s %d\n", Animal[i].name,Animal[i].cost);
    return 0;
}

一樣，這裡的 Animal struct 實際上並不是真正的暱名 struct，真正的暱名 struct 應被包含在另一個 union 或 struct 裡。這範例只是顯示出， Animal 這份 struct 就只有 Animal 可用。

7. 具 auto copy 特性

Code Snippet
#include <stdio.h>
typedef struct tagArr{int array[5];} Arr;
Arr get_var()  // 可返回 local struct variable
{
    Arr arr = { {10,11,12,13,14} };
    return arr;
}
 
int main()
{
    int i;
    // 對其 array 成員做初始化
    Arr arr1 = { {1,2,3,4,5} };
    // 可接受返回之 struct var , 自動對其成員做複製
    Arr arr2 = get_var();        
    for(i=0; i<5; ++i) printf("%d ", arr1.array[i]);
    for(i=0; i<5; ++i) printf("%d ", arr2.array[i]);
    return 0;
}

若 struct 裡包含靜態陣列時，會逐一複制陣列元素；若包含的是指標 ( heap 指標 ) 時，只會複製 address vale。

但在 C 語言裡 struct 不可拿來做 == 之比較。

8. 拿來做 interface

在 struct 裡塞 function pointer ，這種作法可視為一種 interface。一份 library interface 之作法供參考。

Code Snippet
#include <math.h>
typedef struct tagMyMath{
    double (*fabs_)(double);
    double (*sin_)(double);
    double (*cos_)(double);
}MyMath;
 
int main()
{
    MyMath lib = {fabs, sin, cos};
    double x = lib.sin_(10.0);
    double y = lib.cos_(10.0);
    double z = lib.fabs_(-10.0);
    return 0;
}

用 C 模擬 C++ class 時很常用到 struct < 該說是必用 > ，但由於 function pointer 沒辦法直接綁定指向函式，不少寫法是在進行 Ctor 前寫一個 assigned function，才可進行 Ctor。事實上善用 struct 特性，constructor 可寫得很簡便。

Code Snippet
#include <stdio.h>
 
// implement those functions
void func1() {};
  intfunc2() {return 1;}
  intfunc3(int x) {return x+1;}
 
typedef struct tagClass Class;
struct tagClass{
    int x, y;
    void (*func1_)();
    int  (*func2_)();
    int  (*func3_)(int);
}ObjInit = {0,0,func1, func2,func3};
 
int main()
{
    Class obj = ObjInit ;
    return 0;
}

9. padding / 編排順序

先看以下 struct

Code Snippet
struct s{
    char s1; // 1 byte
    int  s2; // 4 bytes
    char s3; // 1 byte
};

雖資料型態上佔了 6 bytes，但實際上 sizeof(struct s) 通常不只佔 6 bytes，因為了使得處理速度加快，故資料型態會盡可能對齊 4 的倍數。以我手邊之 compiler ，實際上 sizeof(struct s) 是 12 bytes，大概是這樣

Code Snippet
struct s{
    char s1; // 1 byte , offset 3 bytes ,[0:3]
    int  s2; // 4 bytes, offset 0 bytes ,[4:7]
    char s3; // 1 byte , offset 3 bytes ,[7:11]
};

事實上有簡單技巧可減少 padding 之效應，便是將同類型的資料型態放在一起。

Code Snippet
struct s{
    char s1;
    char s2;
    int  s3;
};

上述最後只有用到 8 bytes，比起原本的 12 bytes 少。部份團隊甚至強調，將佔用記憶體多的資料型態放前面，佔用記憶體少的資料型態放後面。以此例而言，int s3 需放在最前面。

10. 取得資料成員之起始位址與偏移位址

這裡為了程式碼方便看，我沒轉成 ptrdiff_t 做計算，事實上過程中該轉成此資料型態才較正確。

有些情況下，欲知某個 struct 成員在 padding 後之起始位址。用的是 address 相減概念。

成員 1 始起位址 = 記憶體中成員1 位址 - 記憶體中 struct 位址
成員 2 始起位址 = 記憶體中成員2 位址 - 記憶體中 struct 位址
成員 3 始起位址 = 記憶體中成員3 位址 - 記憶體中 struct 位址
成員 n 始起位址 = 記憶體中成員n 位址 - 記憶體中 struct 位址

Code Snippet
void find_start_addr()
{
    int start1, start2, start3;
    struct s data;
    start_s1 = (char*)&(data.s1) - (char*)&data;
    start_s2 = (char*)&(data.s2) - (char*)&data;
    start_s3 = (char*)&(data.s3) - (char*)&data;
    printf("%d %d %d\n", start_s1, start_s2, start_s3); // 0 4 8    
}

這裡其實在 stddef.h 裡面有指令可完成，offsetof 實質上是一 macro，但過程中沒實際佔用任何記憶體，實作也是以此概念

Code Snippet
  #ifdef_WIN64
#define offsetof(s,m)   (size_t)( (ptrdiff_t)&reinterpret_cast<const volatile char&>((((s *)0)->m)) )
#else
#define offsetof(s,m)   (size_t)&reinterpret_cast<const volatile char&>((((s *)0)->m))
#endif

可能好懂一點的寫法

Code Snippet
#define offsetof(s, m) ( \
     (size_t)( (char*)&((s*)0)->m - /* 成員   m 位址 */ \
               (char*)(s*)0))       /* 結構體 s 位址 */

註解記得拿掉。使用上也算簡便。

Code Snippet
struct s{
    char s1;
    int  s2;
    char s3;
};
 
void find_offset()
{
    int offset_s1, offset_s2, offset_s3;
    offset_s1 = offsetof(struct s, s1);
    offset_s2 = offsetof(struct s, s2);
    offset_s3 = offsetof(struct s, s3);
    printf("%d %d %d\n", offset_s1, offset_s2, offset_s3); // 0 4 8
}

若要得知在某個資料成員之後 padding 後 offset 多少個 bytes ，一樣的觀念。

成員 1 offset = 記憶體中成員 2 位址 - 記憶體中成員 1 位址 - sizeof(成員1)
成員 2 offset = 記憶體中成員 3 位址 - 記憶體中成員 2 位址 - sizeof(成員2)
成員 n offset = 記憶體中成員 n+1 位址 - 記憶體中成員 n 位址 - sizeof(成員n) ??

較麻煩的是成員最後一個成員的 offset 。計算方式是弄一個連續的 struct [2] 出來，接下來..

成員 n offset = 記憶體中 struct [1] 位址 - 記憶體中 struct [0] 成員 n 位址 - sizeof(成員n)

Code Snippet
struct s{
    char s1;
    int  s2;
    char s3;
};
 
void find_offset()
{
    int offset_s1, offset_s2, offset_s3=0;
    struct s data[2];
    offset_s1 = (char*)&(data[0].s2) - (char*)&(data[0].s1) - sizeof(data[0].s1);
    offset_s2 = (char*)&(data[0].s3) - (char*)&(data[0].s2) - sizeof(data[0].s2);
    offset_s3 = (char*)&(data[1])    - (char*)&(data[0].s3) - sizeof(data[0].s3);
    printf("%d %d %d\n", offset_s1, offset_s2, offset_s3); // 3 0 3
}

仿 offset 之作法做成 macro 時，由於沒辦法判斷該成員是否為最後一個成員，故分二個 macro 寫，程式碼測試與 macro 大致如下。

Code Snippet
#include <stdio.h>
typedef struct tagss{
    char s1;
    int  s2;
    char s3;
}ss;
 
#define offsetof(s, m) (size_t) (             \
     ((char*)&((s*)0)->m - /* 成員   m 位址 */ \
      (char*)(s*)0)        /* 結構體 s 位址 */ \
     )
 
#define paddingof(s,m1,m2) (size_t) (           \
     ( (char*)&((s*)0)->m2  ) - /* 成員 2 位址 */ \
     ( (char*)&((s*)0)->m1  ) - /* 成員 1 位址 */ \
        sizeof( ((s*)0)->m1 )   /* 成員 1 大小 */ \
 )
 
#define paddingof_last(s,m) (size_t) ( \
         (char*) ( ((s*)0)+1 ) -  /* s[1]   位址 */ \
         (char*) (&((s*)0)->m) -  /* s[0].m 位址 */ \
         sizeof( ((s*)0)->m )     /* s[0].m 大小 */ \
     )
 
int main()
{
    printf("offset of s1: %d\n", offsetof(ss, s1)); // 0
    printf("offset of s2: %d\n", offsetof(ss, s2)); // 4
    printf("offset of s3: %d\n", offsetof(ss, s3)); // 8
    printf("padding between s1 and s2: %d\n", paddingof(ss, s1, s2));  // 3
    printf("padding between s2 and s3: %d\n", paddingof(ss, s2, s3));  // 0
    printf("padding of s3            : %d\n", paddingof_last(ss, s3)); // 3
    return 0;
}

macro 之註解一樣記得拿掉。

11. 強制 padding 方式

在一些情況下，我們會希望 struct 完全不要有 padding 現象，這時必須由 compiler 專有指令支援。

Code Snippet
/* pragma pack sample in VC */
struct s1{
    char s11; // 1 byte , offset 3 bytes, [0:3]
    int  s12; // 4 bytes, offset 0 byte , [4:7]
    char s13; // 1 byte , offset 3 bytes, [8:11]
}; // sizeof(struct s1) = 12
 
#pragma pack(push,1)
struct s2{
    char s11; // 1 byte , offset 0 bytes, [0:0]
    int  s12; // 4 bytes, offset 0 byte , [1:4]
    char s13; // 1 byte , offset 0 bytes, [5:5]
}; // sizeof(struct s2) = 6
#pragma pack(pop)
 
#pragma pack(push,2)
struct s3{
    char s11; // 1 byte , offset 1 bytes, [0:1]
    int  s12; // 4 bytes, offset 0 byte , [2:5]
    char s13; // 1 byte , offset 1 bytes, [6:7]
}; // sizeof(struct s2) = 8
#pragma pack(pop)

關於 #pragma pack 之範例，可上 msdn 參考。注意建議少直接用 #pragma pack(n)，因其作用域會影響下個 struct alignment 方式。

gcc 下類似之指令為__attribute__ ((packed, aligned(2)))，有興趣可自行查一下怎麼用。

在調用這類似編譯器指令時需慎思，這是速度與空間的 trade-off。

12. bit-field

(1) bit-field is depends on machine \ compiler 。

(2) 除了常見的 struct 有 bit-field，其實 union 也有 bit-field。

有些情況下， struct 某些成員只會用到某幾個 bit，便可這麼定義 struct。

Code Snippet
typedef struct tags1{
    unsigned char m1 : 4;  /* m1 佔 4  bits */
    unsigned char m2 : 4;  /* m2 佔 4  bits */
             int  m3 : 10; /* m3 佔 10 bits */
} s1;

但這例其實看不出什麼效果，再看其它例。

Code Snippet
typedef struct tagBitField{
    char m0:1;
    char m1:1;
    char m2:1;
    char m3:1;
    char m4:1;
    char m5:1;
    char m6:1;
    char m7:1;
}BitField;

上面的 BitField 剛好佔 1 byte。如果是該 bit-field 只想指定有幾個 bit 數，但不想有成員變數去接它的話，只要不寫變數即可。

Code Snippet
typedef struct tags{
    int m1:2; // m1   bit [0:1]
    int   :14;
    int m3:1;
}s;

上面第二個欄位指定用了 14 bits，沒用任何變數去儲存。

注意一點，若struct 裡用了 bit field 後，就不能再用上述的 offsetof 之類的 macro，因其成員不可用 & 做取址操作。

另外 C 語言裡沒辦法使用 bit field array，也就是這麼寫是錯的。

Code Snippet
typedef struct tagBitField{
    char arr[8]:1; // 語法錯誤
}BitField;

目前沒辦法達到這需求，所以打算用 union + struct bit filed 方式顯示二進位也是不可能的。另外 bit-field 可以指定為 0 ，它代表的意義是，在記憶體裡 padding 至下個位址，如下所示。

Code Snippet
typedef struct tags{
    int m1:2; // m1   bit [0:1]
    int   :0; // padding to next 4's byte
    int m3:1;
}s;

s 算出來為 8 bytes。另注意的是，指定的 bit-field 不可以比該資料型態 bit 數還高，如用 char m:10，這通常在編譯時就會給語法錯誤 ( 在少數部份 C++ 編譯器裡，它是可以的，但建議還是別這麼用)。

bit-field 為 0 時，一定不可能指定成員名稱，只可以用資料型態 : 0 方式撰之。

13. struct in union exercise <IEEE754>

這又是一個 trade-off 問題，struct in union 大多可用 pointer + bitwise 方式完成，只是用 struct in union 在開發時會較清晰，問題便在於 struct padding 往往不好掌握 - 特別是對 bit-field 而言。

我們以一個單精度 IEEE754 例子而言，必須做二個前提假設探討才有意義 (1) little endian (2) padding 後結果為 4，例子供參考。

Code Snippet
#include <stdio.h>
typedef union tagSingle{
    struct { /* 暱名結構體 */        
        unsigned int  mantissa  : 23;
        unsigned int  exponment : 8;
        unsigned int  sign : 1;
    };
    float        dec;
    unsigned int hex;
}Single;
 
int main()
{
    Single s;
    s.dec = 2.75;
    printf("dec       : %f\n", s.dec);
    printf("hex       : %x\n", s.hex);
    printf("sign      : %x\n", s.sign);
    printf("exponment : %x\n", s.exponment);
    printf("mantissa  : %x\n", s.mantissa);
    return 0;
}

在 struct 裡面那段注意到，裡面用的資料型態全都一樣，若 mantissa 用 char : 1 、exponment 用 char : 8，padding 後之結果將使得 sizeof(struct ) = 8，原因是實際上它看作是二個 char ，再加上一個 unsigned ，而第二個 char padding 後就佔了 4 bytes。倍精度也是一樣的做法，便不再示範。

14. struct 其他議題

筆者所知 struct 還有兩個較大議題，包含 OOC (Object Orient in C)、FSM 實作，鑑於篇幅過長，這兩個議題不於此文中探00討，日後有時間再開文探討。

Reference

1. Excepert C language

2. The Standards and Implementations of the C Programming Language

3. IEEE754 Coverter <online, for 32-bit only >

4. Double (IEEE754 Double precision 64-bit) <online, for 64-bit only>

5. IEEE754 Analysis < online, Rounding Mode, Binary32 64 128 >