程式師世界 >> 數據庫知識 >> MYSQL數據庫 >> 關於MYSQL數據庫 >> MySQL在關聯復雜情況下所能做出的一些優化

MySQL在關聯復雜情況下所能做出的一些優化

編輯：關於MYSQL數據庫

昨天處理了一則復雜關聯SQL的優化，這類SQL的優化往往考慮以下四點：

第一.查詢所返回的結果集，通常查詢返回的結果集很少，是有信心進行優化的；

第二.驅動表的選擇至關重要，通過查看執行計劃，可以看到優化器選擇的驅動表,從執行計劃中的rows可以大致反映出問題的所在；

第三.理清各表之間的關聯關系，注意關聯字段上是否有合適的索引；

第四.使用straight_join關鍵詞來強制表之間的關聯順序，可以方便我們驗證某些猜想；

SQL:
執行時間：

mysql> select c.yh_id,
-> c.yh_dm,
-> c.yh_mc,
-> c.mm,
-> c.yh_lx,
-> a.jg_id,
-> a.jg_dm,
-> a.jg_mc,
-> a.jgxz_dm,
-> d.js_dm yh_js
-> from a, b, c
-> left join d on d.yh_id = c.yh_id
-> where a.jg_id = b.jg_id
-> and b.yh_id = c.yh_id
-> and a.yx_bj = ‘Y'
-> and c.sc_bj = ‘N'
-> and c.yx_bj = ‘Y'
-> and c.sc_bj = ‘N'
-> and c.yh_dm = '006939748XX' ;

1 row in set (0.75 sec)

這條SQL查詢實際只返回了一行數據，但卻執行耗費了750ms，查看執行計劃：

mysql> explain
-> select c.yh_id,
-> c.yh_dm,
-> c.yh_mc,
-> c.mm,
-> c.yh_lx,
-> a.jg_id,
-> a.jg_dm,
-> a.jg_mc,
-> a.jgxz_dm,
-> d.js_dm yh_js
-> from a, b, c
-> left join d on d.yh_id = c.yh_id
-> where a.jg_id = b.jg_id
-> and b.yh_id = c.yh_id
-> and a.yx_bj = ‘Y'
-> and c.sc_bj = ‘N'
-> and c.yx_bj = ‘Y'
-> and c.sc_bj = ‘N'
-> and c.yh_dm = '006939748XX' ;

+—-+————-+——-+——–+——————+———+———+————–+——-+————-+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+—-+————-+——-+——–+——————+———+———+————–+——-+————-+
| 1 | SIMPLE | a | ALL | PRIMARY,INDEX_JG | NULL | NULL | NULL | 52616 | Using where |
| 1 | SIMPLE | b | ref | PRIMARY | PRIMARY | 98 | test.a.JG_ID | 1 | Using index |
| 1 | SIMPLE | c | eq_ref | PRIMARY | PRIMARY | 98 | test.b.YH_ID | 1 | Using where |
| 1 | SIMPLE | d | index | NULL | PRIMARY | 196 | NULL | 54584 | Using index |
+—-+————-+——-+——–+——————+———+———+————–+——-+————-+

可以看到執行計劃中有兩處比較顯眼的性能瓶頸：

| 1 | SIMPLE | a | ALL | PRIMARY,INDEX_JG | NULL | NULL | NULL | 52616 | Using where |

| 1 | SIMPLE | d | index | NULL | PRIMARY | 196 | NULL | 54584 | Using index |

由於d是left join的表，所以驅動表不會選擇d表，我們在來看看a,b,c三表的大小：

mysql> select count(*) from c;
+———-+
| count(*) |
+———-+
| 53731 |
+———-+

mysql> select count(*) from a;
+———-+
| count(*) |
+———-+
| 53335 |
+———-+

mysql> select count(*) from b;
+———-+
| count(*) |
+———-+
| 105809 |
+———-+

由於b表的數據量大於其他的兩表，同時b表上基本沒有查詢過濾條件，所以驅動表選擇B的可能排除；

優化器實際選擇了a表作為驅動表，而為什麼不是c表作為驅動表？我們來分析一下：

第一階段：a表作為驅動表
a–>b–>c–>d:
(1):a.jg_id=b.jg_id—>(b索引:PRIMARY KEY (`JG_ID`,`YH_ID`) )

(2):b.yh_id=c.yh_id—>(c索引:PRIMARY KEY (`YH_ID`))

(3):c.yh_id=d.yh_id—>(d索引:PRIMARY KEY (`JS_DM`,`YH_ID`))
由於d表上沒有yh_id的索引，索引在d表上添加索引：

alter table d add index ind_yh_id(yh_id);

執行計劃：

+—-+————-+——-+——–+——————+———–+———+————–+——-+————-+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+—-+————-+——-+——–+——————+———–+———+————–+——-+————-+
| 1 | SIMPLE | a | ALL | PRIMARY,INDEX_JG | NULL | NULL | NULL | 52616 | Using where |
| 1 | SIMPLE | b | ref | PRIMARY | PRIMARY | 98 | test.a.JG_ID | 1 | Using index |
| 1 | SIMPLE | c | eq_ref | PRIMARY | PRIMARY | 98 | test.b.YH_ID | 1 | Using where |
| 1 | SIMPLE | d | ref | ind_yh_id | ind_yh_id | 98 | test.b.YH_ID | 272 | Using index |
+—-+————-+——-+——–+——————+———–+———+————–+——-+————-+

執行時間：

1 row in set (0.77 sec)

在d表上添加索引後，d表的掃描行數下降到272行（最開始為：54584 ）

| 1 | SIMPLE | d | ref | ind_yh_id | ind_yh_id | 98 | test.b.YH_ID | 272 | Using index |

第二階段：c表作為驅動表

d
^
|
c–>b–>a
由於在c表上有yh_dm過濾性很高的篩選條件，所以我們在yh_dm上創建一個索引：

mysql> select count(*) from c where yh_dm = '006939748XX';
+———-+
| count(*) |
+———-+
| 2 |
+———-+

添加索引：

alter table c add index ind_yh_dm(yh_dm)

查看執行計劃：

+—-+————-+——-+——–+——————-+———–+———+————–+——-+————-+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+—-+————-+——-+——–+——————-+———–+———+————–+——-+————-+
| 1 | SIMPLE | a | ALL | PRIMARY,INDEX_JG | NULL | NULL | NULL | 52616 | Using where |
| 1 | SIMPLE | b | ref | PRIMARY | PRIMARY | 98 | test.a.JG_ID | 1 | Using index |
| 1 | SIMPLE | c | eq_ref | PRIMARY,ind_yh_dm | PRIMARY | 98 | test.b.YH_ID | 1 | Using where |
| 1 | SIMPLE | d | ref | ind_yh_id | ind_yh_id | 98 | test.b.YH_ID | 272 | Using index |
+—-+————-+——-+——–+——————-+———–+———+————–+——-+————-+

執行時間：

1 row in set (0.74 sec)

在c表上添加索引後，索引還是沒有走上，執行計劃還是以a表作為驅動表，所以我們這裡來分析一下為什麼還是以a表作為驅動表？

1):c.yh_id=b.yh_id—>( PRIMARY KEY (`JG_ID`,`YH_ID`) )

a.如果以c表為驅動表，則c表與b表在關聯的時候，由於在b表沒有yh_id字段的索引，由於b表的數據量很大，所以優化器認為這裡如果以c表作為驅動表，則會與b表產生較大的關聯（這裡可以使用straight_join強制使用c表作為驅動表）；
b.如果以a表為驅動表，則a表與b表在關聯的時候，由於在b表上有jg_id字段的索引，所以優化器認為以a作為驅動表的代價是小於以c作為驅動板的代價；
所以我們如果要以C表為驅動表，只需要在b上添加yh_id的索引：

alter table b add index ind_yh_id(yh_id);

2):b.jg_id=a.jg_id—>( PRIMARY KEY (`JG_ID`) )

3):c.yh_id=d.yh_id—>( KEY `ind_yh_id` (`YH_ID`) )
執行計劃：

+—-+————-+——-+——–+——————-+———–+———+————–+——+————-+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+—-+————-+——-+——–+——————-+———–+———+————–+——+————-+
| 1 | SIMPLE | c | ref | PRIMARY,ind_yh_dm | ind_yh_dm | 57 | const | 2 | Using where |
| 1 | SIMPLE | d | ref | ind_yh_id | ind_yh_id | 98 | test.c.YH_ID | 272 | Using index |
| 1 | SIMPLE | b | ref | PRIMARY,ind_yh_id | ind_yh_id | 98 | test.c.YH_ID | 531 | Using index |
| 1 | SIMPLE | a | eq_ref | PRIMARY,INDEX_JG | PRIMARY | 98 | test.b.JG_ID | 1 | Using where |
+—-+————-+——-+——–+——————-+———–+———+————–+——+————-+

執行時間：