程序師世界是廣大編程愛好者互助、分享、學習的平台,程序師世界有你更精彩!
首頁
編程語言
C語言|JAVA編程
Python編程
網頁編程
ASP編程|PHP編程
JSP編程
數據庫知識
MYSQL數據庫|SqlServer數據庫
Oracle數據庫|DB2數據庫
 程式師世界 >> 數據庫知識 >> MYSQL數據庫 >> MySQL綜合教程 >> 【mysql】添加對emoji的支持,mysql添加emoji

【mysql】添加對emoji的支持,mysql添加emoji

編輯:MySQL綜合教程

【mysql】添加對emoji的支持,mysql添加emoji


1、簡介

涉及無線相關的 MySQL 數據庫建議都提前采用 utf8mb4 字符集,避免 emoji 表情符號帶來的問題

MySQL Server >  5.5.3

2、配置+升級

當前配置

mysql> \s
--------------
mysql  Ver 14.14 Distrib 5.1.73, for redhat-linux-gnu (i386) using readline 5.1

Connection id:          3
Current database:
Current user:           root@localhost
SSL:                    Not in use
Current pager:          stdout
Using outfile:          ''
Using delimiter:        ;
Server version:         5.6.25-debug-log Source distribution
Protocol version:       10
Connection:             Localhost via UNIX socket
Server characterset:    utf8mb4
Db     characterset:    utf8mb4
Client characterset:    utf8mb4
Conn.  characterset:    utf8mb4
UNIX socket:            /data/mysql/mysql.sock
Uptime:                 46 min 53 sec

更改數據庫/表

ALTER DATABASE dbname CHARACTER SET = utf8mb4 COLLATE = utf8mb4_unicode_ci;
ALTER TABLE tbname CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
ALTER TABLE tbname CHANGE old_column_name new_column_name VARCHAR(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;

my.cnf配置

[client]  
default-character-set = utf8mb4  
 
[mysql]  
default-character-set = utf8mb4  
 
[mysqld]  
character-set-client-handshake = FALSE  
character-set-server = utf8mb4  
collation-server = utf8mb4_general_ci  
init_connect='SET NAMES utf8mb4'

 /usr/share/mysql/charsets/Index.xml配置

<charset name="utf8mb4">
  <family>Unicode</family>
  <description>UTF-8 Unicode</description>
  <alias>utf-8</alias>
  <collation name="utf8_general_ci"     id="33">
   <flag>primary</flag>
   <flag>compiled</flag>
  </collation>
  <collation name="utf8_bin"            id="83">
    <flag>binary</flag>
    <flag>compiled</flag>
  </collation>
</charset>

查看字符集

mysql> show variables where variable_name like 'character_set%' or variable_name like 'coll%';
+--------------------------+----------------------------------+
| Variable_name            | Value                            |
+--------------------------+----------------------------------+
| character_set_client     | utf8                             |
| character_set_connection | utf8                             |
| character_set_database   | utf8                             |
| character_set_filesystem | binary                           |
| character_set_results    | utf8                             |
| character_set_server     | utf8                             |
| character_set_system     | utf8                             |
| character_sets_dir       | /u01/mysql/share/mysql/charsets/ |
| collation_connection     | utf8_general_ci                  |
| collation_database       | utf8_general_ci                  |
| collation_server         | utf8_general_ci                  |
+--------------------------+----------------------------------+
11 rows in set (0.00 sec)
	
mysql> show char set;
+----------+-----------------------------+---------------------+--------+
| Charset  | Description                 | Default collation   | Maxlen |
+----------+-----------------------------+---------------------+--------+
| big5     | Big5 Traditional Chinese    | big5_chinese_ci     |      2 |
| dec8     | DEC West European           | dec8_swedish_ci     |      1 |
| cp850    | DOS West European           | cp850_general_ci    |      1 |
| hp8      | HP West European            | hp8_english_ci      |      1 |
| koi8r    | KOI8-R Relcom Russian       | koi8r_general_ci    |      1 |
| latin1   | cp1252 West European        | latin1_swedish_ci   |      1 |
| latin2   | ISO 8859-2 Central European | latin2_general_ci   |      1 |
| swe7     | 7bit Swedish                | swe7_swedish_ci     |      1 |
| ascii    | US ASCII                    | ascii_general_ci    |      1 |
| ujis     | EUC-JP Japanese             | ujis_japanese_ci    |      3 |
| sjis     | Shift-JIS Japanese          | sjis_japanese_ci    |      2 |
| hebrew   | ISO 8859-8 Hebrew           | hebrew_general_ci   |      1 |
| tis620   | TIS620 Thai                 | tis620_thai_ci      |      1 |
| euckr    | EUC-KR Korean               | euckr_korean_ci     |      2 |
| koi8u    | KOI8-U Ukrainian            | koi8u_general_ci    |      1 |
| gb2312   | GB2312 Simplified Chinese   | gb2312_chinese_ci   |      2 |
| greek    | ISO 8859-7 Greek            | greek_general_ci    |      1 |
| eucjpms  | UJIS for Windows Japanese   | eucjpms_japanese_ci |      3 |
+----------+-----------------------------+---------------------+--------+
40 rows in set (0.00 sec)

mysql> show COLLATION;
+--------------------------+----------+-----+---------+----------+---------+
| Collation                | Charset  | Id  | Default | Compiled | Sortlen |
+--------------------------+----------+-----+---------+----------+---------+
| big5_chinese_ci          | big5     |   1 | Yes     | Yes      |       1 |
| big5_bin                 | big5     |  84 |         | Yes      |       1 |
| dec8_swedish_ci          | dec8     |   3 | Yes     | Yes      |       1 |
| dec8_bin                 | dec8     |  69 |         | Yes      |       1 |
| cp850_general_ci         | cp850    |   4 | Yes     | Yes      |       1 |
| eucjpms_bin              | eucjpms  |  98 |         | Yes      |       1 |
+--------------------------+----------+-----+---------+----------+---------+
219 rows in set (0.01 sec)

– character_set_server:默認的內部操作字符集

– character_set_client:客戶端來源數據使用的字符集

– character_set_connection:連接層字符集

– character_set_results:查詢結果字符集

– character_set_database:當前選中數據庫的默認字符集

– character_set_system:系統元數據(字段名等)字符集

MySQL中的字符序名稱遵從命名慣例

以字符序對應的字符集名稱開頭;以_ci(表示大小寫不敏感 case ignore )、_cs(表示大小寫敏感 case sensitive )或_bin(表示按編碼值比較)結尾。

例如:在字符序“utf8_general_ci”下,字符“a”和“A”是等價的

MySQL中的字符集轉換過程

1. MySQL Server收到請求時將請求數據從character_set_client轉換為character_set_connection;

2. 進行內部操作前將請求數據從character_set_connection轉換為內部操作字符集,其確定方法如下:

• 使用每個數據字段的CHARACTER SET設定值;

• 若上述值不存在,則使用對應數據表的DEFAULT CHARACTER SET設定值(MySQL擴展,非SQL標准);

• 若上述值不存在,則使用對應數據庫的DEFAULT CHARACTER SET設定值;

• 若上述值不存在,則使用character_set_server設定值。

3. 將操作結果從內部操作字符集轉換為character_set_results。

三、emoji表情與utf8mb4

emoji表情mysql的utf8是不支持,需要修改設置為utf8mb4

在MYSQL 5.5 之前, UTF8 編碼只支持1-3個字節,只支持BMP這部分的unicode編碼區, 關於BMP,http://en.wikipedia.org/wiki/Mapping_of_Unicode_characters 這裡看,基本就是0000~FFFF這一區。

從MYSQL5.5開始,可支持4個字節UTF編碼utf8mb4,一個字符最多能有4字節,所以能支持更多的字符集,utf8mb4兼容utf8,且比utf8能表示更多的字符

CREATE TABLE `ios_emoji` (  
  `id` int(11) NOT NULL AUTO_INCREMENT COMMENT '自增ID',  
  `unicode` varchar(100) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci DEFAULT NULL COMMENT 'Unicode編碼',  
  `utf8` varchar(100) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci DEFAULT NULL COMMENT 'UTF8編碼',  
  `utf16` varchar(100) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci DEFAULT NULL COMMENT 'UTF16編碼',  
  `sbunicode` varchar(100) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci DEFAULT NULL COMMENT 'SBUnicode編碼',  
  `filename` varchar(100) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci DEFAULT NULL COMMENT '文件名',  
  `filebyte` longblob COMMENT '文件內容字節',  
  PRIMARY KEY (`id`)  
) ENGINE=InnoDB DEFAULT CHARSET=utf8 ROW_FORMAT=COMPACT COMMENT='ios表情編碼表'; 

  

參考文章

http://drupal.stackexchange.com/questions/166405/why-are-we-using-utf8mb4-general-ci-and-not-utf8mb4-unicode-ci

http://stackoverflow.com/questions/766809/whats-the-difference-between-utf8-general-ci-and-utf8-unicode-ci

http://blog.csdn.net/leshami/article/details/42024217

http://www.laruence.com/2008/01/05/12.html

 

  1. 上一頁:
  2. 下一頁:
Copyright © 程式師世界 All Rights Reserved