程式師世界 >> 編程語言 >> C語言 >> VC >> 關於VC++ >> 構建GB2312漢字庫的unicode碼表

構建GB2312漢字庫的unicode碼表

編輯：關於VC++

構建 GB2312 漢字庫的 unicode 碼表嵌入式系統總離不了處理漢字。一般漢字的處理方法是（以手機接受短信為例）：比如你收到了一封短信，該短信解碼後是按照 UTF-16 表示的，那麼我們需要根據每一個漢字的unicode 碼找到它在 GB2312 庫中的位置，然後再用對應的點陣數據在屏幕上顯示出來。

於是乎，必須有一種手段將 unicode 碼和漢字字模的數據對應起來。最常用的手段是做一個 unicode 碼表，在該數組中查找到匹配的 unicode 碼後，用匹配的 index（數組索引）值在另外一個由該 index 值對應的字模記錄的數組中的數據去顯示。

+-----------------+ 查表 +-----------------+ 同index + -------------------+

+-------- ---------+ +-----------------+ +-------------------+

本文簡要介紹一下如何生成 unicode 碼表，其它相關的漢字處理技術不在本文的討論范圍之內。:)

用下面兩個函數可以把 unicode 碼表構造出來（*注1）：

void UnicodeToGB2312(unsigned char* pOut,unsigned short uData) { 　　 WideCharToMultiByte(CP_ACP,NULL,&uData,1,pOut,sizeof (unsigned short),NULL,NULL); 　　 return; } void Gb2312ToUnicode(unsigned short* pOut,unsigned char *gbBuffer) { 　　 MultiByteToWideChar (CP_ACP,MB_PRECOMPOSED,gbBuffer,2,pOut,1); 　　 return; }

一個簡單的例子如下（隨手寫的一段代碼，只是演示一下構造數組的過程，不要挑刺兒啊! ^_^ ）：

/*-----------------------------------------------*\
|  GB2312 unicode table constructor               |
|  author: Spark Song                             ||  file  : build_uni_table.c                      |
|  date  : 2005-11-18                             |
\*-----------------------------------------------*/
#include <stdio.h>
#include <windows.h>
void UnicodeToGB2312(unsigned char* pOut,unsigned short uData);
void Gb2312ToUnicode(unsigned short* pOut,unsigned char *gbBuffer);
void construct_unicode_table();
int main(int argc, char *argv[])
{
	construct_unicode_table();
	return 0;
}
void construct_unicode_table()
{
    #define GB2312_MATRIX   (94)
    #define DELTA           (0xA0)
    #define FONT_ROW_BEGIN (16  + DELTA)
    #define FONT_ROW_END   (87 + DELTA)
    #define FONT_COL_BEGIN (1  + DELTA)
    #define FONT_COL_END   (GB2312_MATRIX + DELTA)
    #define FONT_TOTAL     (72 * GB2312_MATRIX)
    int i, j;
    unsigned char   chr[2];
    unsigned short  uni;
    unsigned short  data[FONT_TOTAL] = {0};
    int index = 0;
    unsigned short buf;
    //生成unicode碼表
    for (i=FONT_ROW_BEGIN; i<=FONT_ROW_END; i++)
        for(j=FONT_COL_BEGIN; j<=FONT_COL_END; j++)
        {
            chr[0] = i;
            chr[1] = j;
            Gb2312ToUnicode(&uni, chr);
            data[index] = uni; index++;
        }
   //排個序，以後檢索的時候就可以用binary-search了
    for (i=0;i<index-1; i++)
        for(j=i+1; j<index; j++)
            if (data[i]>data[j])
            {
                buf = data[i];
                data[i] = data[j];
                data[j] = buf;
            }
    //輸出到STD_OUT
    printf("const unsigned short uni_table[]={\n");
    for (i=0; i<index; i++)
    {
        uni = data[i];
        UnicodeToGB2312(chr, uni);
        printf("    0x%.4X%s /* GB2312 Code: 0x%.2X%.2X ==> Row:%.2d Col:%.2d */\n",
                uni,
                i==index-1?" ":",",
                chr[0],
                chr[1],
                chr[0] - DELTA,
                chr[1] - DELTA
                );
    }
    printf("};\n");
    return ;
}
void UnicodeToGB2312(unsigned char* pOut,unsigned short uData)
{
    WideCharToMultiByte(CP_ACP,NULL,&uData,1,pOut,sizeof(unsigned short),NULL,NULL);
    return;
}
void Gb2312ToUnicode(unsigned short* pOut,unsigned char *gbBuffer)
{
    MultiByteToWideChar(CP_ACP,MB_PRECOMPOSED,gbBuffer,2,pOut,1);
    return;
}

用 VC 編譯後，在 DOS 中執行：

build_uni_table.exe > report.txt

可以得到如下的txt文件：

const unsigned short　uni_table[]={ 　　0x4E00, /* GB2312 Code: 0xD2BB ==> Row:50 Col:27 */ 　　0x4E01, /* GB2312 Code: 0xB6A1 ==> Row:22 Col:01 */ 　　0x4E03, /* GB2312 Code: 0xC6DF ==> Row:38 Col:63 */ 　　0x4E07, /* GB2312 Code: 0xCDF2 ==> Row:45 Col:82 */ ... ... 　　0x9F9F, /* GB2312 Code: 0xB9EA ==> Row:25 Col:74 */ 　　0x9FA0, /* GB2312 Code: 0xD9DF ==> Row:57 Col:63 */ 　　0xE810, /* GB2312 Code: 0xD7FA ==> Row:55 Col:90 */ 　　0xE811, /* GB2312 Code: 0xD7FB ==> Row:55 Col:91 */ 　　0xE812, /* GB2312 Code: 0xD7FC ==> Row:55 Col:92 */ 　　0xE813, /* GB2312 Code: 0xD7FD ==> Row:55 Col:93 */ 　　0xE814　/* GB2312 Code: 0xD7FE ==> Row:55 Col:94 */ };

然後把這個生成的數組copy到項目代碼中使用就okey了。hoho，其實在開發中編寫代碼來構造代碼的機會很多，coder不用coding輔助自己開發多浪費啊～ :)

本文配套源碼