寬字元

寬字元（Wide character）是電腦抽象術語（沒有規定具體實現細節），表示比1位元組還寬的資料類型。不同於Unicode。

Unicode 編輯

ISO/IEC 10646:2003 Unicode 4.0 指出：

"The width of wchar_t is compiler-specific and can be as small as 8 bits. Consequently, programs that need to be portable across any C or C++ compiler should not use wchar_t for storing Unicode text. The wchar_t type is intended for storing compiler-defined wide characters, which may be Unicode characters in some compilers." 翻譯：「wchar_t的寬度屬於編譯器的特性，且可以小到8位元。所以程式若需要跨過所有C和C++ 編譯器的可攜性，就不應使用wchar_t儲存Unicode文字。wchar_t類型是為儲存編譯器定義的寬字元，在部分編譯器中，其可以是Unicode字元。」

"ANSI/ISO C leaves the semantics of the wide character set to the specific implementation but requires that the characters from the portable C execution set correspond to their wide character equivalents by zero extension."

作業系統編輯

對於Windows API及Visual Studio編譯器，wchar_t是16位元寬。由於不能在單個wchar_t字元中，支援系統所有可表示的字元（即UTF-16小尾字元），因而破壞了ANSI/ISO C標準。

在類Unix系統中，wchar_t是32位元寬。單個wchar_t字元可表示任意UTF-32大尾字元。

程式設計語言編輯

C/C++ 編輯

wchar_t在ANSI/ISO C中是一個資料類型。某些其它的程式語言也用它來表示寬字元。在ANSI C程式庫表頭檔中，<wchar.h>和<wctype.h>處理寬字元。

最初，C90語言標準定義了類型wchar_t：

"an integral type whose range of values can represent distinct codes for all members of the largest extended character set specified among the supported locales" (ISO 9899:1990 §4.1.5)

C語言與C++語言於2011年發布的各自語言標準中引入了固定大小的字元類型char16_t與char32_t。wchar_t仍保持由編譯器實現定義其細節。

Python 編輯

Python語言使用wchar_t作為字元類型Py_UNICODE的基礎。它依賴於該系統是否 wchar_t「相容於被選擇的Python Unicode編譯版本」。^[1]

寬窄轉換編輯

任何非寬字元的字元集，無論是單位元組字元集（SBCS），還是（可變長）多位元組字元集（MBCS），都稱作窄字元集（narrow character set）。

寬字元集的一個用途，是作為任意兩個窄字元集相互轉換的中間表示。

寬字元集與窄字元集的轉換，有多種方法。

使用Windows API 編輯

例如：

// we want to convert an MBCS string in lpszA  
int nLen = MultiByteToWideChar(CP_ACP,
    0,
    lpszA, -1,
    NULL,
    NULL);

LPWSTR lpszW = new WCHAR[nLen];  
MultiByteToWideChar(CP_ACP,
    0,   
    lpszA, -1,
    lpszW,
    nLen);

// use it to call OLE here  
pI->SomeFunctionThatNeedsUnicode(lpszW);

// free the string  
delete[] lpszW;

使用ATL 3.0的字串轉換宏編輯

在Microsoft的atlconv.h中，定義了四個宏：

A2CW ： (LPCSTR) -> (LPCWSTR)
A2W ： (LPCSTR) -> (LPWSTR)
W2CA ： (LPCWSTR) -> (LPCSTR)
W2A ： (LPCWSTR) -> (LPSTR)

使用前需要先用宏定義中間輔助變數：

USES_CONVERSION;

上述四個宏的轉化過程為：對輸入字串，按照2比1計算寬窄字串長度關係；然後在執行棧上分配出空間，呼叫ATLW2AHELPER或ATLA2WHELPER幫助函式完成轉換。優點是代碼退出當前程式塊，棧上的空間被自動回收。缺點是寬窄字元2比1的大小關係，僅適用於單位元組字元集。^[2]

另外，需要注意不要在大量迴圈結構中使用轉換宏，這會導致快速耗用棧空間。可以把轉換宏寫到一個小函式中。

使用ATL 7.0的字串轉換類與宏編輯

字串轉換類（模板）的格式為：

 C SourceType 2[ C]DestinationType[ EX]

其中SourceType與DestinationType可以是A、W、T、OLE（等價於W）。中間可選的C表示結果為const。EX表示緩衝區存放的字元數由模板參數指定。

預設的靜態緩衝區大小為128個字元。可以指定帶EX字尾的模板的參數，以節約空間。可以使用第二個參數指定窄字元的locale。如果執行棧的剩餘空間不夠用，自動在堆上分配空間，並在超出了該變數的作用域是自動釋放堆上的空間。不需要使用USES_CONVERSION宏。注意作為局部對象，如果是無名的臨時實例，表達式結束時該變數將自動解構，事後再參照該實例所含的結果字串將無效。^[3]

使用_bstr_t類編輯

適用於Microsoft開發平台。範例：

#include <comutil.h>
#pragma comment(lib, "comsuppw.lib")
 
std::string ws2s(const std::wstring& ws)
{
    _bstr_t t = ws.c_str();
    char* pchar = (char*)t;
    std::string result = pchar;
    return result;
}

C語言標準庫函式mbstowcs()和wcstombs() 編輯

定義於stdlib.h。需要預先分配目標緩衝區。

參考文獻編輯

^ https://docs.python.org/c-api/unicode.html （頁面存檔備份，存於網際網路檔案館） accessed 2009 12 19
^ MSDN TN059: Using MFC MBCS/Unicode Conversion Macros. [2018-01-12]. （原始內容存檔於2018-01-12）.
^ MSDN: ATL and MFC String Conversion Macros. [2018-01-12]. （原始內容存檔於2018-01-12）.

[1] ttps://docs.python.org/c-api/unicode.html （頁面存檔備份，存於網際網路檔案館） accessed 2009 12 19

[2] MSDN TN059: Using MFC MBCS/Unicode Conversion Macros. [2018-01-12]. （原始內容存檔於2018-01-12）.

[3] MSDN: ATL and MFC String Conversion Macros. [2018-01-12]. （原始內容存檔於2018-01-12）.

[1]

[2]

[3]

寬字元

Unicode 編輯

作業系統 編輯

程式設計語言 編輯

C/C++ 編輯

Python 編輯

寬窄轉換 編輯

使用Windows API 編輯

使用ATL 3.0的字串轉換宏 編輯

使用ATL 7.0的字串轉換類與宏 編輯

使用_bstr_t類 編輯