大家在使用數據庫的時候,總會出現各種各樣的編碼問題,看了MySQL官方文檔後,記錄下一些MySQL的編碼體系知識,如MySQL有那幾層使用編碼的地方,MySQL客戶端和服務端交互時哪些環節涉及到的編碼,和如何指定編碼。
基本概念:
mysql數據庫編碼層次:系統層,server層,database層,table層,column層,還有client,connection和result三種和客戶端通訊相關的場景;A character set is a set of symbols and encodings. A collation is a set of rules for comparing characters in a character set;To connect to MySQL from Java you have to use the JDBC driver from MySQL. The MySQL JDBC driver is called MySQL Connector/J.MySQL 4.1以下對unicode支持不好jdbc3.0.16及以上才支持使用數據庫本身編碼,否則使用ISO8859-1在mysql控制台下輸入show variables like 'character_set_%'; 查看當前編碼相關系統變量,後面會解析其中幾項
mysql> SHOW VARIABLES LIKE 'character%';
+--------------------------+---------------------------------+
| Variable_name | Value |
+--------------------------+---------------------------------+
| character_set_client | latin1 |
| character_set_connection | latin1 |
| character_set_database | latin1 |
| character_set_filesystem | binary |
| character_set_results | latin1 |
| character_set_server | latin1 |
| character_set_system | utf8 |
| character_sets_dir | D:"mysql-5.0.37"share"charsets" |
mysql的多層字符編碼支持
1.server層
作用
整個數據庫服務器默認編碼。
配置點
通過系統變量character_set_server參數指定server層編碼
可以在mysqld執行時加入該參數
或者在編輯mysql時候,設置該參數
2.database層
作用
數據庫級別默認編碼。
配置點
通過系統變量character_set_database參數指定database層編碼
建表時候指定編碼
3.table層
同理,table層的編碼設置僅影響當前表的所有未指定編碼的列的編碼。但這個指定是mysql獨有的,而且只能通過sql在建表或修改表時指定, 在標准sql中,沒有可指定表編碼的sql語法
4.column層
作用
設置列的編碼。
配置點
建表或修改列時設置。這是標准sql語法。
mysql server與client交互時編碼如何轉換
1.客戶端發送語句
character set and collation system variables are involved in handling traffic for the connection between a client and the server. Every client has connection-related character set and collation system variables.
The server takes the character_set_client system variable to be the character set in which statements are sent by the client.
在客戶端和服務端通訊時,會涉及到另外幾個編碼設置相關的系統變量的,每個客戶端都有屬於自己的編碼鏈接相關編碼。
服務端使用系統變量character_set_client來處理客戶端發來的語句。
2.服務端處理語句
The server uses the character_set_connection and collation_connection system variables. It converts statements sent by the client from character_set_client to character_set_connection (except for string literals that have an introducer such as _latin1 or _utf8)
服務端會把客戶端發來的語句(以character_set_client 編碼)轉換為character_set_connection編碼。
A character string literal may have an optional character set introducer and COLLATE clause [_charset_name]'string' [COLLATE collation_name]
如:SELECT _latin1'string' COLLATE latin1_danish_ci;
在缺少編碼指定是,默認會使用character_set_connection指定的編碼。
The character set used for literals that do not have a character set introducer and for number-to-string conversion.
沒有前導編碼修飾(introducer)的文本和數字到字符的轉換會應用character_set_connection編碼。
For comparisons of strings with column values, collation_connection does not matter because columns have their own collation, which has a higher collation precedence.
表的列字段與客戶端傳來的語句進行比較時,會把客戶端語句轉成列對應編碼再進行比較,這是因為列字段擁有更高優先級。
3.服務端返回內容
The character_set_results system variable indicates the character set in which the server returns query results to the client
系統變量character_set_results用來把數據以該編碼方式返回給客戶端。
下面用一張圖來大致描述下上面的內容(個人理解所畫)
服務端如何自動判斷並設置編碼?
The character encoding between client and server is automatically detected upon connection. You specify the encoding on the server using the character_set_server for server versions 4.1.0 and newer, and character_set system variable for server versions older
than 4.1.0. The driver automatically uses the encoding specified by the server.
如果客戶端連接時沒有提供編碼(連接串無characterEncoding),則服務端會使用character_set_server變量來作為客戶端編碼(4.1.0後)。
For example, to use 4-byte UTF-8 character sets with Connector/J, configure the MySQL server with character_set_server=utf8mb4, and leave characterEncoding out of the Connector/J connection string. Connector/J will then autodetect the UTF-8 setting.
客戶端如何制定編碼?
To override the automatically detected encoding on the client side, use the characterEncoding property in the URL used to connect to the server.
When a client connects to the server, it sends the name of the character set that it wants to use. The server uses the name to set the character_set_client, character_set_results, and character_set_connection system variables. In effect, the server performs
a SET NAMES operation using the character set name.
客戶端與服務端建立鏈接時,會發送客戶端所希望使用的編碼集。服務端會用這個編碼集去初始化三個系統變量character_set_client, character_set_results, and character_set_connection。如執行了語句 SET NAMES XXX一般:
SET NAMES xx可以指定connection編碼為xx:
character_set_connection,character_set_results,character_set_client 系統變量可修改;
SET NAMES 'charset_name' 這句SQL等同與執行下面3個語句:
SET character_set_client = charset_name;
SET character_set_results = charset_name;
SET character_set_connection = charset_name;
A SET CHARACTER SET charset_name 等同於執行下面3個語句:
SET character_set_client = charset_name;
SET character_set_results = charset_name;
SET collation_connection = @@collation_database;
參考
http://dev.mysql.com/doc/refman/5.5/en/charset.html