mysql not in、left join、IS NULL、NOT EXISTS 效力成績記載。本站提示廣大學習愛好者:(mysql not in、left join、IS NULL、NOT EXISTS 效力成績記載)文章只能為提供參考,不一定能成為您想要的結果。以下是mysql not in、left join、IS NULL、NOT EXISTS 效力成績記載正文
NOT IN、JOIN、IS NULL、NOT EXISTS效力比較
語句一:select count(*) from A where A.a not in (select a from B)
語句二:select count(*) from A left join B on A.a = B.a where B.a is null
語句三:select count(*) from A where not exists (select a from B where A.a = B.a)
曉得以上三條語句的現實後果是雷同的曾經良久了,然則一向沒有深究其間的效力比較。一向感到上語句二是最快的。
明天任務上由於要對一個數萬萬行數據的庫停止數據消除,須要刪失落兩千多萬行數據。年夜量的用到了以上三條語句所要完成的功效。原來用的是語句一,然則成果是履行速度1個小時32分,日記文件占用21GB。時光上固然可以接收,然則對硬盤空間的占用確是個成績。是以將一切的語句一都換成語句二。本認為會更快。沒想到履行40多分鐘後,第一批50000行都沒有刪失落,反而讓SQL SERVER瓦解失落了,成果使人驚訝。試了試零丁履行這條語句,查詢近一萬萬行的表,語句一用了4秒,語句二卻用了18秒,差距很年夜。語句三的效力與語句一接近。
第二種寫法是年夜忌,應當盡可能防止。第一種和第三種寫法實質上簡直一樣。
假定buffer pool足夠年夜,寫法二絕對於寫法一來講存在以下幾點缺乏:
(1)left join自己更耗資本(須要更多資本來處置發生的中央成果集)
(2)left join的中央成果集的范圍不會比表A小
(3)寫法二還須要對left join發生的中央成果做is null的前提挑選,而寫法一則在兩個聚集join的同時完成了挑選,這部門開支是額定的
這三點綜合起來,在處置海量數據時就會發生比擬顯著的差別(重要是內存和CPU上的開支)。我疑惑樓主在測試時buffer pool能夠曾經處於飽和狀況,如許的話,寫法二的那些額定開支不能不借助磁盤上的虛擬內存,在SQL Server做換頁時,因為觸及到較慢的I/O操作是以這類差距會加倍顯著。
關於日記文件過年夜,這也是正常的,由於刪除的記載多嘛。可以依據數據庫的用處斟酌將恢復模子設為simple,或許在刪除停止後將日記truncate失落並把文件shrink上去。
由於之前已經作過一個對這個庫停止無前提刪除的劇本,就是要刪除數據量較年夜的表中的一切數據,然則由於客戶請求,不克不及應用truncate table,怕損壞已有的庫構造。所以只能用delete刪,其時也碰到了日記文件過年夜的成績,其時采取的辦法是分批刪除,在SQL2K頂用set rowcount @chunk,在SQL2K5頂用delete top @chunk。如許的操作不只使刪除時光年夜年夜削減,並且讓日記量年夜年夜削減,只增加了1G閣下。
然則此次消除數據的任務須要加上前提,就是delete A from A where ....前面有前提的。再次應用分批刪除的辦法,卻曾經沒後果了。
不知您知不曉得這是為何。
mysql not in 和 left join 效力成績記載
起首解釋該條sql的功效是查詢聚集a不在聚集b的數據。
not in的寫法
select add_tb.RUID
from (select distinct RUID
from UserMsg
where SubjectID =12
and CreateTime>'2009-8-14 15:30:00'
and CreateTime<='2009-8-17 16:00:00'
) add_tb
where add_tb.RUID
not in (select distinct RUID
from UserMsg
where SubjectID =12
and CreateTime<'2009-8-14 15:30:00'
)
前往444行記載用時 0.07sec
explain 成果
+----+--------------------+------------+----------------+---------------------------+------------+---------+------+------+--
----------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows |
Extra |
+----+--------------------+------------+----------------+---------------------------+------------+---------+------+------+--
----------------------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 452 |
Using where |
| 3 | DEPENDENT SUBQUERY | UserMsg | index_subquery | RUID,SubjectID,CreateTime | RUID | 96 | func | 2 |
Using index; Using where |
| 2 | DERIVED | UserMsg | range | SubjectID,CreateTime | CreateTime | 9 | NULL | 1857 |
Using where; Using temporary |
+----+--------------------+------------+----------------+---------------------------+------------+---------+------+------+--
----------------------------+
剖析:該條查詢速度快緣由為id=2的sql查詢出來的成果比擬少,所以id=1sql所以運轉速度比擬快,id=2的應用了暫時表,不曉得這個時刻能否應用索引?
個中一種left join
select a.ruid,b.ruid
from(select distinct RUID
from UserMsg
where SubjectID =12
and CreateTime >= '2009-8-14 15:30:00'
and CreateTime<='2009-8-17 16:00:00'
) a left join (
select distinct RUID
from UserMsg
where SubjectID =12 and CreateTime< '2009-8-14 15:30:00'
) b on a.ruid = b.ruid
where b.ruid is null
前往444行記載用時 0.39sec
explain 成果
+----+-------------+------------+-------+----------------------+------------+---------+------+------+-----------------------
-------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra
|
+----+-------------+------------+-------+----------------------+------------+---------+------+------+-----------------------
-------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 452 |
|
| 1 | PRIMARY | <derived3> | ALL | NULL | NULL | NULL | NULL | 1112 | Using where; Not exists
|
| 3 | DERIVED | UserMsg | ref | SubjectID,CreateTime | SubjectID | 5 | | 6667 | Using where; Using
temporary |
| 2 | DERIVED | UserMsg | range | SubjectID,CreateTime | CreateTime | 9 | NULL | 1838 | Using where; Using
temporary |
+----+-------------+------------+-------+----------------------+------------+---------+------+------+-----------------------
-------+
剖析:應用了兩個暫時表,而且兩個暫時表做了笛卡爾積,招致不克不及應用索引而且數據量很年夜
別的一種left join
select distinct a.RUID
from UserMsg a
left join UserMsg b
on a.ruid = b.ruid
and b.subjectID =12 and b.createTime < '2009-8-14 15:30:00'
where a.subjectID =12
and a.createTime >= '2009-8-14 15:30:00'
and a.createtime <='2009-8-17 16:00:00'
and b.ruid is null;
前往444行記載用時 0.07sec
explain 成果
+----+-------------+-------+-------+---------------------------+------------+---------+--------------+------+---------------
--------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra
|
+----+-------------+-------+-------+---------------------------+------------+---------+--------------+------+---------------
--------------------+
| 1 | SIMPLE | a | range | SubjectID,CreateTime | CreateTime | 9 | NULL | 1839 | Using where;
Using temporary |
| 1 | SIMPLE | b | ref | RUID,SubjectID,CreateTime | RUID | 96 | dream.a.RUID | 2 | Using where;
Not exists; Distinct |
+----+-------------+-------+-------+---------------------------+------------+---------+--------------+------+---------------
--------------------+
剖析:兩次查詢都是用上了索引,而且查詢時同時停止的,所以查詢效力應當很高
應用not exists的sql
select distinct a.ruid
from UserMsg a
where a.subjectID =12
and a.createTime >= '2009-8-14 15:30:00'
and a.createTime <='2009-8-17 16:00:00'
and not exists (
select distinct RUID
from UserMsg
where subjectID =12 and createTime < '2009-8-14 15:30:00'
and ruid=a.ruid
)
前往444行記載用時 0.08sec
explain 成果
+----+--------------------+---------+-------+---------------------------+------------+---------+--------------+------+------
------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra
|
+----+--------------------+---------+-------+---------------------------+------------+---------+--------------+------+------
------------------------+
| 1 | PRIMARY | a | range | SubjectID,CreateTime | CreateTime | 9 | NULL | 1839 | Using
where; Using temporary |
| 2 | DEPENDENT SUBQUERY | UserMsg | ref | RUID,SubjectID,CreateTime | RUID | 96 | dream.a.RUID | 2 | Using
where |
+----+--------------------+---------+-------+---------------------------+------------+---------+--------------+------+------
------------------------+
剖析:同上根本上是一樣的,只是分化了2個查詢次序履行,查詢效力低於第3個
為了驗證數據查詢效力,將上述查詢中的subjectID =12的限制前提去失落,成果統計查詢時光以下
0.20s
21.31s
0.25s
0.43s
laserhe協助剖析成績總結
select a.ruid,b.ruid
from( select distinct RUID
from UserMsg
where CreateTime >= '2009-8-14 15:30:00'
and CreateTime<='2009-8-17 16:00:00'
) a left join UserMsg b
on a.ruid = b.ruid
and b.createTime < '2009-8-14 15:30:00'
where b.ruid is null;
履行時光0.13s
+----+-------------+------------+-------+-----------------+------------+---------+--------+------+--------------------------
----+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra
|
+----+-------------+------------+-------+-----------------+------------+---------+--------+------+--------------------------
----+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 1248 |
|
| 1 | PRIMARY | b | ref | RUID,CreateTime | RUID | 96 | a.RUID | 2 | Using where; Not exists
|
| 2 | DERIVED | UserMsg | range | CreateTime | CreateTime | 9 | NULL | 3553 | Using where; Using
temporary |
+----+-------------+------------+-------+-----------------+------------+---------+--------+------+--------------------------
----+
履行效力相似與not in的效力
數據庫優化的根本准繩:讓笛卡爾積產生在盡量小的聚集之間,mysql在join的時刻可以直接經由過程索引來掃描,而嵌入到子查詢外頭,查詢規
劃器就不知道用適合的索引了。
一個SQL在數據庫裡是這麼優化的:起首SQL會剖析成一堆剖析樹,一個樹狀數據構造,然後在這個數據構造裡,查詢計劃器會查找有無適合
的索引,然後依據詳細情形做一個分列組合,然後盤算這個分列組合中的每種的開支(相似explain的輸入的盤算機可讀版本),然後比擬裡
面開支最小的,拔取並履行之。那末:
explain select a.ruid,b.ruid from(select distinct RUID from UserMsg where CreateTime >= '2009-8-14 15:30:00'
and CreateTime<='2009-8-17 16:00:00' ) a left join UserMsg b on a.ruid = b.ruid and b.createTime < '2009-8-14 15:30:00'
where b.ruid is null;
和
explain select add_tb.RUID
-> from (select distinct RUID
-> from UserMsg
-> where CreateTime>'2009-8-14 15:30:00'
-> and CreateTime<='2009-8-17 16:00:00'
-> ) add_tb
-> where add_tb.RUID
-> not in (select distinct RUID
-> from UserMsg
-> where CreateTime<'2009-8-14 15:30:00'
-> );
explain
+----+--------------------+------------+----------------+-----------------+------------+---------+------+------+------------
------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra
|
+----+--------------------+------------+----------------+-----------------+------------+---------+------+------+------------
------------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 1248 | Using where
|
| 3 | DEPENDENT SUBQUERY | UserMsg | index_subquery | RUID,CreateTime | RUID | 96 | func | 2 | Using index;
Using where |
| 2 | DERIVED | UserMsg | range | CreateTime | CreateTime | 9 | NULL | 3509 | Using where;
Using temporary |
+----+--------------------+------------+----------------+-----------------+------------+---------+------+------+------------
------------------+
開支是完整一樣的,開支可以從 rows 誰人字段得出(根本上是rows誰人字段各個行的數值的乘積,也就是笛卡爾積)
然則呢:上面這個:
explain select a.ruid,b.ruid from(select distinct RUID from UserMsg where CreateTime >= '2009-8-14 15:30:00'
and CreateTime<='2009-8-17 16:00:00' ) a left join ( select distinct RUID from UserMsg where createTime < '2009-8-14
15:30:00' ) b on a.ruid = b.ruid where b.ruid is null;
履行時光21.31s
+----+-------------+------------+-------+---------------+------------+---------+------+-------+-----------------------------
-+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra
|
+----+-------------+------------+-------+---------------+------------+---------+------+-------+-----------------------------
-+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 1248 |
|
| 1 | PRIMARY | <derived3> | ALL | NULL | NULL | NULL | NULL | 30308 | Using where; Not exists
|
| 3 | DERIVED | UserMsg | ALL | CreateTime | NULL | NULL | NULL | 69366 | Using where; Using temporary
|
| 2 | DERIVED | UserMsg | range | CreateTime | CreateTime | 9 | NULL | 3510 | Using where; Using temporary
|
+----+-------------+------------+-------+---------------+------------+---------+------+-------+-----------------------------
-+
我就有些不明確
為什麼是四行
而且中央兩行偉大非常
按理說
查詢計劃器應當能把這個查詢優化得跟後面的兩個一樣的
(至多在我熟習的pgsql數據庫裡我有信念是一樣的)
但mysql外頭不是
所以我感到查詢計劃器外頭能夠照樣糙了點
我後面說過優化的根本准繩就是,讓笛卡爾積產生在盡量小的聚集之間
那末下面最初一種寫法至多沒有違背這個准繩
固然b 表由於相符前提的異常多,根本上不會用索引
然則其實不應當妨害查詢優化器看到裡面的join on前提,從而和後面兩個SQL一樣,拔取主鍵停止join
不外我後面說過查詢計劃器的感化
實際下去講
遍歷一遍一切能夠,盤算一下開支
是公道的
我感到這裡最初一種寫法沒有遍歷完全一切能夠
能夠的緣由是子查詢的完成照樣比擬簡略?
子查詢對數據庫切實其實是個挑釁
由於根本都是遞歸的器械
所以在這個環節有點缺點其實不奇異
其實你細心想一想,最初一種寫法不過是我們第一種寫法的一個變種,症結在表b的where 前提放在哪裡
放在外面,就不會用索引去join
放在裡面就會
這個自己就是分列組合的一個能夠