程式師世界 >> 編程語言 >> C語言 >> VC >> 關於VC++ >> 淺談PDFlib中文輸出（四）PDFlib 接收的幾種文本輸入形式

淺談PDFlib中文輸出（四）PDFlib 接收的幾種文本輸入形式

編輯：關於VC++

PDFlib的textformat參數用以設定文本輸入形式，其有效值如下:

bytes: 在字符串中每個字節對應於一個字符。主要應用於8位編碼。

utf8：字符串是 UTF-8編碼。

ebcdicutf8：字符串是EBCDIC的UTF-8編碼，只應用於IBM iSeries和zSeries。

utf16：字符串是 UTF-16編碼。如果字符串是以Unicode的標記字節順序號(BOM)開始，PDFlib會接收BOM信息後將其從字符串首移去。如果字符串不帶BOM，字符串的字節順序將取決於主機的字節順序。Intel x86系統是小尾（little-endian，0xFFFE ）, 而Sparc和PowerPC系統是大尾（big-endian, 0xFEFF)。

utf16be：字符串是大尾字節順序的UTF-16編碼。對BOM沒有特殊處理。

utf16le：字符串是小尾字節順序的UTF-16編碼。對BOM沒有特殊處理。

auto:對於8位編碼，它相當於“bytes”, 對於寬字符字符串(Unicode, glyphid, UCS2 或UTF16 CMap)，它相當於“utf16”。

在編程語言裡，我們將可以自動處理Unicode字符串的語言稱為支持Unicode語言（Unicode-capable），它們是COM, .NET, Java, REALbasic及Tcl等。對於需對Unicode字符串進行特殊處理的語言稱為不支持Unicode語言（non-Unicode-capable），它們是C, C++, Cobol, Perl, PHP, Python 及RPG等。

在non-Unicode-capable語言裡，“auto”設置將會正確處理大部分文本字符串。

對於Unicode-capable語言，textformat參數的缺省值是“utf16”；而non-Unicode-capable語言的缺省值是“auto”。

除此之外，PDFlib還支持在SGML和HTML經常使用的字符引用方法（Character Reference）。前提是將參數charref設成真, textformat設成“bytes”:

PDF_set_parameter(p, "charref", "true"); PDF_set_parameter(p, "textformat", "bytes");下面給出一些有效的Character Reference:

 soft hyphen

€ Euro glyph (hexadecimal)

€ Euro glyph (decimal)

€ Euro glyph (entity name)

< less than sign

> greater than sign

& ampersand sign

Α Greek Alpha

下面是一個相關的例子--C 源程序(附上生成的pdf文件 –PDFlib_cs4.pdf)。

/*******************************************************************/
/* This example demostrates output Chinese Simplified text with different
/* ''textformat'' option under Chinese Simplifed Windows.
/*******************************************************************/
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include "pdflib.h"
int main(void)
{
    PDF             *p = NULL;
    int                 Font_E = 0, Font_H = 0, Font_CS = 0, Left = 50, y = 800, i = 0;
    const int       INCRY = 25;
    char              text[128], buf[128];
    /* 1 byte text (English: "Simplified Chinese") */
    static const char byte_text[] =
        "\123\151\155\160\154\151\146\151\145\144\040\103\150\151\156\145\163\145";
    static const int byte_len = 18;
    static const char byte2_text[] = {0x53,0x69,0x6D,0x70,0x6C,0x69,0x66,0x69,0x65,
                   0x64,0x20,0x43,0x68,0x69,0x6E,0x65,0x73,0x65};
    static const int byte2_len = 18;
    /* 2 byte text (Simplified Chinese) */
    static const unsigned short utf16_text[] = {0x7B80,0x4F53,0x4E2D,0x6587};
    static const int utf16_len = 8;
    static const unsigned char utf16be_text[] ="\173\200\117\123\116\055\145\207";
    static const int utf16be_len = 8;
    static const unsigned char utf16be_bom_text[] = "\376\377\173\200\117\123\116\055\145\207";
    static const int utf16be_bom_len = 10;
    static const unsigned char utf16le_text[] ="\200\173\123\117\055\116\207\145";
    static const int utf16le_len = 8;
    static const unsigned char utf16le_bom_text[] = "\377\376\200\173\123\117\055\116\207\145";
    static const int utf16le_bom_len = 10;
    static const unsigned char utf8_text[] = "\347\256\200\344\275\223\344\270\255\346\226\207";
    static const int utf8_len = 12;
    static const unsigned char utf8_bom_text[] = "\xEF\xBB\xBF\xE7\xAE\x80\xE4\xBD\x93\xE4\xB8\xAD\xE6\x96\x87";
    static const int utf8_bom_len = 15;
    static const char htmlutf16_text[] = "簡體中文";
    static const int htmlutf16_len = sizeof(htmlutf16_text) - 1;
    typedef struct
    {
        char *textformat;
        char *encname;
        const char *textstring;
        const int  *textlength;
        const char *bomkind;
    } TestCase;
static const TestCase table_8[] = {
 { "bytes",      "winansi",  (const char *)byte_text,         &byte_len,      ""},
     { "auto",        "winansi",  (const char *)byte_text,         &byte_len,      ""},
     { "bytes",      "winansi",  (const char *)byte2_text,       &byte2_len,     ""}, };
    static const TestCase table_16[] =  {
{ "auto",  "unicode",  (const char *)utf16_text,       &utf16_len,      ""},
{ "utf16", "unicode",  (const char *)utf16_text,       &utf16_len,      ""},
{ "auto",  "unicode",  (const char *)utf16be_bom_text, &utf16be_bom_len, ", UTF-16+BE-BOM"},
{ "auto",     "unicode",     (const char *)utf16le_bom_text, &utf16le_bom_len, ", UTF-16+LE-BOM"},
{ "utf16be", "unicode",    (const char *)utf16be_text,         &utf16be_len,    ""},
{ "utf16le",   "unicode",   (const char *)utf16le_text,           &utf16le_len,    ""},
{ "utf8",       "unicode",    (const char *)utf8_text,               &utf8_len,       ""},
{ "auto",       "unicode",   (const char *)utf8_bom_text,      &utf8_bom_len, ", UTF-8+BOM"},
{ "bytes", "unicode",   (const char *)htmlutf16_text, &htmlutf16_len, ", HTML unicode character"}, };
    const int   tsize_8 = sizeof table_8 / sizeof (TestCase);
    const int   tsize_16 = sizeof table_16 / sizeof (TestCase);
    /* create a new PDFlib object */
    if ((p = PDF_new()) == (PDF *) 0)
    {
        printf("Couldn''t create PDFlib object (out of memory)!\n");
        return(2);
    }
    PDF_TRY(p) {
	if (PDF_begin_document(p, "pdflib_cs4.pdf", 0, "") == -1) 
            {
	    printf("Error: %s\n", PDF_get_errmsg(p));
	    return(2);
	}
	PDF_set_info(p, "Creator", "pdflib_cs4.c");
	PDF_set_info(p, "Author", "myi@pdflib.com");
	PDF_set_info(p, "Title", "Output Chinese Simplify with Different textformat");
        /* Start a new page. */
        PDF_begin_page_ext(p, a4_width, a4_height, "");
        Font_H = PDF_load_font(p, "Helvetica-Bold", 0, "winansi", "");
        /* 8-bit encoding */
        Font_E = PDF_load_font(p, "Times", 0, "winansi", "");
        PDF_setfont(p, Font_H, 24);
        PDF_show_xy(p, "8-bit encoding", Left+40,  y);
        y -= 2*INCRY;
        for (i = 0; i < tsize_8; ++i)
        {
            PDF_setfont(p, Font_H, 14);
            sprintf(text, "%s encoding, %s textformat %s: ", table_8[i].encname, 
                table_8[i].textformat, table_8[i].bomkind);
            PDF_show_xy(p, text, Left,  y);
            y -= INCRY;
            PDF_set_parameter(p, "textformat", table_8[i].textformat);
            PDF_setfont(p, Font_E, 14);
            PDF_show_xy(p, table_8[i].textstring, Left,  y);
            y -= INCRY;
        } /* for */
        /* 16-bit encoding */
        PDF_setfont(p, Font_H, 24);
        y -= 2*INCRY;
        PDF_show_xy(p, "16-bit encoding", Left+40,  y);
        y -= 2*INCRY;
        PDF_set_parameter(p, "charref", "true");
        Font_CS = PDF_load_font(p, "STSong-Light", 0, "UniGB-UCS2-H", "");
        for (i = 0; i < tsize_16; i++)
        {
            PDF_setfont(p, Font_H, 14);
            sprintf(text, "%s encoding, %s textformat %s: ", table_16[i].encname, 
                table_16[i].textformat, table_16[i].bomkind);
            PDF_show_xy(p, text, Left,  y);
            y -= INCRY;
            PDF_setfont(p, Font_CS, 14);
            sprintf(buf, "textformat %s",table_16[i].textformat);
            PDF_fit_textline(p, table_16[i].textstring, *table_16[i].textlength,
                             Left, y, buf);
            y -= INCRY;
        } /* for */
        /* End of page. */
        PDF_end_page_ext(p, "");
        PDF_end_document(p, "");
    }
    PDF_CATCH(p) {
        printf("PDFlib exception occurred in pdflib_cs4 sample:\n");
        printf("[%d] %s: %s\n",
	    PDF_get_errnum(p), PDF_get_apiname(p), PDF_get_errmsg(p));
        PDF_delete(p);
        return(2);
    }
    PDF_delete(p);
    return 0;
}

本文配套源碼

關於VC++