涉及無線相關的 MySQL 數據庫建議都提前采用 utf8mb4 字符集,避免 emoji 表情符號帶來的問題
MySQL Server > 5.5.3
mysql> \s -------------- mysql Ver 14.14 Distrib 5.1.73, for redhat-linux-gnu (i386) using readline 5.1 Connection id: 3 Current database: Current user: root@localhost SSL: Not in use Current pager: stdout Using outfile: '' Using delimiter: ; Server version: 5.6.25-debug-log Source distribution Protocol version: 10 Connection: Localhost via UNIX socket Server characterset: utf8mb4 Db characterset: utf8mb4 Client characterset: utf8mb4 Conn. characterset: utf8mb4 UNIX socket: /data/mysql/mysql.sock Uptime: 46 min 53 sec
ALTER DATABASE dbname CHARACTER SET = utf8mb4 COLLATE = utf8mb4_unicode_ci; ALTER TABLE tbname CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci; ALTER TABLE tbname CHANGE old_column_name new_column_name VARCHAR(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
[client] default-character-set = utf8mb4 [mysql] default-character-set = utf8mb4 [mysqld] character-set-client-handshake = FALSE character-set-server = utf8mb4 collation-server = utf8mb4_general_ci init_connect='SET NAMES utf8mb4'
<charset name="utf8mb4"> <family>Unicode</family> <description>UTF-8 Unicode</description> <alias>utf-8</alias> <collation name="utf8_general_ci" id="33"> <flag>primary</flag> <flag>compiled</flag> </collation> <collation name="utf8_bin" id="83"> <flag>binary</flag> <flag>compiled</flag> </collation> </charset>
mysql> show variables where variable_name like 'character_set%' or variable_name like 'coll%'; +--------------------------+----------------------------------+ | Variable_name | Value | +--------------------------+----------------------------------+ | character_set_client | utf8 | | character_set_connection | utf8 | | character_set_database | utf8 | | character_set_filesystem | binary | | character_set_results | utf8 | | character_set_server | utf8 | | character_set_system | utf8 | | character_sets_dir | /u01/mysql/share/mysql/charsets/ | | collation_connection | utf8_general_ci | | collation_database | utf8_general_ci | | collation_server | utf8_general_ci | +--------------------------+----------------------------------+ 11 rows in set (0.00 sec) mysql> show char set; +----------+-----------------------------+---------------------+--------+ | Charset | Description | Default collation | Maxlen | +----------+-----------------------------+---------------------+--------+ | big5 | Big5 Traditional Chinese | big5_chinese_ci | 2 | | dec8 | DEC West European | dec8_swedish_ci | 1 | | cp850 | DOS West European | cp850_general_ci | 1 | | hp8 | HP West European | hp8_english_ci | 1 | | koi8r | KOI8-R Relcom Russian | koi8r_general_ci | 1 | | latin1 | cp1252 West European | latin1_swedish_ci | 1 | | latin2 | ISO 8859-2 Central European | latin2_general_ci | 1 | | swe7 | 7bit Swedish | swe7_swedish_ci | 1 | | ascii | US ASCII | ascii_general_ci | 1 | | ujis | EUC-JP Japanese | ujis_japanese_ci | 3 | | sjis | Shift-JIS Japanese | sjis_japanese_ci | 2 | | hebrew | ISO 8859-8 Hebrew | hebrew_general_ci | 1 | | tis620 | TIS620 Thai | tis620_thai_ci | 1 | | euckr | EUC-KR Korean | euckr_korean_ci | 2 | | koi8u | KOI8-U Ukrainian | koi8u_general_ci | 1 | | gb2312 | GB2312 Simplified Chinese | gb2312_chinese_ci | 2 | | greek | ISO 8859-7 Greek | greek_general_ci | 1 | | eucjpms | UJIS for Windows Japanese | eucjpms_japanese_ci | 3 | +----------+-----------------------------+---------------------+--------+ 40 rows in set (0.00 sec) mysql> show COLLATION; +--------------------------+----------+-----+---------+----------+---------+ | Collation | Charset | Id | Default | Compiled | Sortlen | +--------------------------+----------+-----+---------+----------+---------+ | big5_chinese_ci | big5 | 1 | Yes | Yes | 1 | | big5_bin | big5 | 84 | | Yes | 1 | | dec8_swedish_ci | dec8 | 3 | Yes | Yes | 1 | | dec8_bin | dec8 | 69 | | Yes | 1 | | cp850_general_ci | cp850 | 4 | Yes | Yes | 1 | | eucjpms_bin | eucjpms | 98 | | Yes | 1 | +--------------------------+----------+-----+---------+----------+---------+ 219 rows in set (0.01 sec)
– character_set_server:默認的內部操作字符集
– character_set_client:客戶端來源數據使用的字符集
– character_set_connection:連接層字符集
– character_set_results:查詢結果字符集
– character_set_database:當前選中數據庫的默認字符集
– character_set_system:系統元數據(字段名等)字符集
以字符序對應的字符集名稱開頭;以_ci(表示大小寫不敏感 case ignore )、_cs(表示大小寫敏感 case sensitive )或_bin(表示按編碼值比較)結尾。
例如:在字符序“utf8_general_ci”下,字符“a”和“A”是等價的
1. MySQL Server收到請求時將請求數據從character_set_client轉換為character_set_connection;
2. 進行內部操作前將請求數據從character_set_connection轉換為內部操作字符集,其確定方法如下:
• 使用每個數據字段的CHARACTER SET設定值;
• 若上述值不存在,則使用對應數據表的DEFAULT CHARACTER SET設定值(MySQL擴展,非SQL標准);
• 若上述值不存在,則使用對應數據庫的DEFAULT CHARACTER SET設定值;
• 若上述值不存在,則使用character_set_server設定值。
3. 將操作結果從內部操作字符集轉換為character_set_results。
emoji表情mysql的utf8是不支持,需要修改設置為utf8mb4
在MYSQL 5.5 之前, UTF8 編碼只支持1-3個字節,只支持BMP這部分的unicode編碼區, 關於BMP,http://en.wikipedia.org/wiki/Mapping_of_Unicode_characters 這裡看,基本就是0000~FFFF這一區。
從MYSQL5.5開始,可支持4個字節UTF編碼utf8mb4,一個字符最多能有4字節,所以能支持更多的字符集,utf8mb4兼容utf8,且比utf8能表示更多的字符
CREATE TABLE `ios_emoji` ( `id` int(11) NOT NULL AUTO_INCREMENT COMMENT '自增ID', `unicode` varchar(100) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci DEFAULT NULL COMMENT 'Unicode編碼', `utf8` varchar(100) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci DEFAULT NULL COMMENT 'UTF8編碼', `utf16` varchar(100) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci DEFAULT NULL COMMENT 'UTF16編碼', `sbunicode` varchar(100) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci DEFAULT NULL COMMENT 'SBUnicode編碼', `filename` varchar(100) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci DEFAULT NULL COMMENT '文件名', `filebyte` longblob COMMENT '文件內容字節', PRIMARY KEY (`id`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8 ROW_FORMAT=COMPACT COMMENT='ios表情編碼表';
參考文章
http://drupal.stackexchange.com/questions/166405/why-are-we-using-utf8mb4-general-ci-and-not-utf8mb4-unicode-ci
http://stackoverflow.com/questions/766809/whats-the-difference-between-utf8-general-ci-and-utf8-unicode-ci
http://blog.csdn.net/leshami/article/details/42024217
http://www.laruence.com/2008/01/05/12.html