當兩張表進行聯結的時候,如果表1中的數據行是否出現在結果集中需要根據表2中出現或不出現至少一個相匹配的數據行來判斷,這種情況就會發生半聯結;而反聯結便是半聯結的補集,它們會作為數據庫中常見的聯結方法如NESTED LOOPS,MERGE SORT JOIN,HASH JOIN的選項出現。
實際上半聯結和反聯結本身也可以被認同是兩種聯結方法;在CBO優化模式下,優化器能夠根據實際情況靈活的轉換執行語句從而實現半聯結和反聯結方法,畢竟沒有什麼SQL語法可以顯式的調用半聯結和反聯結,它們只是SQL語句滿足某些條件時優化器可以選擇的選項而已,不過仍然有必要深入這兩種選項在特定情況下帶來的性能優勢。
半聯結
半聯結通常都發生在使用含有IN和EXISTS的相關子查詢的時候,=ANY的用法與IN相同,所以也會出現發生半聯結的情況;不過也是有例外的,在11gR2版本中,優化器不會為任何包含在OR分支中的子查詢選擇半聯結,這也是現在官檔中唯一明確標識的限制條件,來看幾種發生場景:
復制代碼 代碼如下:
-- 使用IN關鍵字的相關子查詢 => 發生NESTED LOOPS半聯結
SQL> select department_name
2 from hr.departments dept
3 where department_id in (select department_id from hr.employees emp);
11 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 2605691773
----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 10 | 190 | 3 (0)| 00:00:01 |
| 1 | NESTED LOOPS SEMI | | 10 | 190 | 3 (0)| 00:00:01 |
| 2 | TABLE ACCESS FULL| DEPARTMENTS | 27 | 432 | 3 (0)| 00:00:01 |
|* 3 | INDEX RANGE SCAN | EMP_DEPARTMENT_IX | 41 | 123 | 0 (0)| 00:00:01 |
----------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - access("DEPARTMENT_ID"="DEPARTMENT_ID")
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
11 consistent gets
0 physical reads
0 redo size
742 bytes sent via SQL*Net to client
524 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
11 rows processed
-- 使用EXISTS關鍵字的相關子查詢 => 發生NESTED LOOPS半聯結
SQL> select department_name
2 from hr.departments dept where exists
3 (select null from hr.employees emp where emp.department_id = dept.department_id);
11 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 2605691773
----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 10 | 190 | 3 (0)| 00:00:01 |
| 1 | NESTED LOOPS SEMI | | 10 | 190 | 3 (0)| 00:00:01 |
| 2 | TABLE ACCESS FULL| DEPARTMENTS | 27 | 432 | 3 (0)| 00:00:01 |
|* 3 | INDEX RANGE SCAN | EMP_DEPARTMENT_IX | 41 | 123 | 0 (0)| 00:00:01 |
----------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - access("EMP"."DEPARTMENT_ID"="DEPT"."DEPARTMENT_ID")
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
11 consistent gets
0 physical reads
0 redo size
742 bytes sent via SQL*Net to client
524 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
11 rows processed
-- 謂語中使用了OR分支中的EXISTS子查詢 => 禁用半聯結
SQL> select department_name
2 from hr.departments dept
3 where 1=2 OR exists
4 (select null from hr.employees emp where emp.department_id = dept.department_id);
11 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 440241596
----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 27 | 432 | 4 (0)| 00:00:01 |
|* 1 | FILTER | | | | | |
| 2 | TABLE ACCESS FULL| DEPARTMENTS | 27 | 432 | 3 (0)| 00:00:01 |
|* 3 | INDEX RANGE SCAN | EMP_DEPARTMENT_IX | 2 | 6 | 1 (0)| 00:00:01 |
----------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter( EXISTS (SELECT 0 FROM "HR"."EMPLOYEES" "EMP" WHERE
"EMP"."DEPARTMENT_ID"=:B1))
3 - access("EMP"."DEPARTMENT_ID"=:B1)
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
35 consistent gets
0 physical reads
0 redo size
742 bytes sent via SQL*Net to client
524 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
11 rows processed
從結果集來看,我們很容易聯想到內聯結,那為什麼半聯結通常來說會獲得更高的性能呢?這實際也是半聯結優化的關鍵,拿NESTED LOOPS來舉例,在NESTED LOOPS聯結中,驅動表被讀取後需要逐個的進入內層循環來進行匹配工作,並且只有當外層循環的數據行與內層循環中的每一行數據匹配運算完成後才會結束一個結果集的獲取;而相對而言,半聯結的區別在於數據集1中的每一條記錄只返回一次,而不管數據集2中有幾條匹配的記錄,因此,半聯結會在找到子查詢中匹配到的第一條數據後立即結束處理從而提高性能。
對於某些需要利用半聯結來提高性能的場景,可以通過手動的方式控制半聯結的執行計劃,使用SEMIJOIN和NO_SEMIJOIN提示分別可以指定優化器使用和禁用半聯結。
復制代碼 代碼如下:
-- 使用NO_SEMIJOIN提示禁用半聯結
SQL> select department_name
2 from hr.departments dept
3 where department_id in (select /*+ no_semijoin */department_id from hr.employees emp);
11 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 3372191744
------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 106 | 1802 | 4 (25)| 00:00:01 |
| 1 | VIEW | VM_NWVW_2 | 106 | 1802 | 4 (25)| 00:00:01 |
| 2 | HASH UNIQUE | | 106 | 2544 | 4 (25)| 00:00:01 |
| 3 | NESTED LOOPS | | 106 | 2544 | 3 (0)| 00:00:01 |
| 4 | TABLE ACCESS FULL| DEPARTMENTS | 27 | 567 | 3 (0)| 00:00:01 |
|* 5 | INDEX RANGE SCAN | EMP_DEPARTMENT_IX | 4 | 12 | 0 (0)| 00:00:01 |
------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
5 - access("DEPARTMENT_ID"="DEPARTMENT_ID")
Statistics
----------------------------------------------------------
506 recursive calls
0 db block gets
188 consistent gets
7 physical reads
0 redo size
742 bytes sent via SQL*Net to client
524 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
10 sorts (memory)
0 sorts (disk)
11 rows processed
除此之外,我們還可以使用_always_semi_join隱藏參數選擇半聯結的聯結類型,有關_always_semi_join參數的可選值:
復制代碼 代碼如下:
SQL> SELECT
2 PARNO_KSPVLD_VALUES pvalid_par#,
3 NAME_KSPVLD_VALUES pvalid_name,
4 VALUE_KSPVLD_VALUES pvalid_value,
5 DECODE(ISDEFAULT_KSPVLD_VALUES, 'FALSE', '', 'DEFAULT' ) pvalid_default
6 FROM
7 X$KSPVLD_VALUES
8 WHERE
9 LOWER(NAME_KSPVLD_VALUES) LIKE '%'||LOWER(nvl('&pname',name_kspvld_values))||'%'
10 ORDER BY
11 pvalid_par#,
12 pvalid_default,
13 pvalid_Value
14 /
PAR# PARAMETER VALUE DEFAULT
------ -------------------------------------------------- ------------------------------ -------
1705 _always_semi_join CHOOSE
_always_semi_join HASH
_always_semi_join MERGE
_always_semi_join NESTED_LOOPS
_always_semi_join OFF
該參數的默認值為choose,表示選用半聯結的類型由優化器來決定,下面來使用_always_semi_join參數將上面的NESTED LOOPS半聯結改變為HASH JOIN半聯結:
復制代碼 代碼如下:
-- 默認發生NESTED LOOPS SEMI
SQL> select department_name
2 from hr.departments dept
3 where department_id in (select department_id from hr.employees emp);
11 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 2605691773
----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 10 | 190 | 3 (0)| 00:00:01 |
| 1 | NESTED LOOPS SEMI | | 10 | 190 | 3 (0)| 00:00:01 |
| 2 | TABLE ACCESS FULL| DEPARTMENTS | 27 | 432 | 3 (0)| 00:00:01 |
|* 3 | INDEX RANGE SCAN | EMP_DEPARTMENT_IX | 41 | 123 | 0 (0)| 00:00:01 |
----------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - access("DEPARTMENT_ID"="DEPARTMENT_ID")
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
11 consistent gets
0 physical reads
0 redo size
742 bytes sent via SQL*Net to client
524 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
11 rows processed
-- session級別修改參數
SQL> alter session set "_always_semi_join"=merge;
Session altered.
-- 發生MERGE JOIN SEMI
SQL> select department_name
2 from hr.departments dept
3 where department_id in (select department_id from hr.employees emp);
11 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 954076352
--------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 10 | 190 | 4 (25)| 00:00:01 |
| 1 | MERGE JOIN SEMI | | 10 | 190 | 4 (25)| 00:00:01 |
| 2 | TABLE ACCESS BY INDEX ROWID| DEPARTMENTS | 27 | 432 | 2 (0)| 00:00:01 |
| 3 | INDEX FULL SCAN | DEPT_ID_PK | 27 | | 1 (0)| 00:00:01 |
|* 4 | SORT UNIQUE | | 107 | 321 | 2 (50)| 00:00:01 |
| 5 | INDEX FULL SCAN | EMP_DEPARTMENT_IX | 107 | 321 | 1 (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
4 - access("DEPARTMENT_ID"="DEPARTMENT_ID")
filter("DEPARTMENT_ID"="DEPARTMENT_ID")
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
5 consistent gets
1 physical reads
0 redo size
742 bytes sent via SQL*Net to client
523 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
1 sorts (memory)
0 sorts (disk)
11 rows processed
-- 從trace來看優化器的選擇還是非常可靠的。。
反聯結
從本質上來說,反聯結和半聯結很多相似的因素,反聯結的發生通常是在使用含有NOT IN,NOT EXISTS的相關子查詢的時候,同樣,如果子查詢謂語OR分支中,反聯結也會被禁用,它和半聯結主要的不同點還是在返回數據的匹配方式上,它是會返回在子查詢中沒有匹配到的數據行,不過其優化的原理是一致的,通過在子查詢中找到第一條匹配記錄而立即停止處理來提高效率,一下是發生的集中場景:
復制代碼 代碼如下:
SQL> set autotrace traceonly
-- NOT IN 觸發反聯結
SQL> select department_name
2 from hr.departments
3 where department_id not in
4 (select department_id from hr.employees where department_id is not null);
16 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 3082375452
----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 17 | 323 | 3 (0)| 00:00:01 |
| 1 | NESTED LOOPS ANTI | | 17 | 323 | 3 (0)| 00:00:01 |
| 2 | TABLE ACCESS FULL| DEPARTMENTS | 27 | 432 | 3 (0)| 00:00:01 |
|* 3 | INDEX RANGE SCAN | EMP_DEPARTMENT_IX | 41 | 123 | 0 (0)| 00:00:01 |
----------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - access("DEPARTMENT_ID"="DEPARTMENT_ID")
filter("DEPARTMENT_ID" IS NOT NULL)
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
12 consistent gets
6 physical reads
0 redo size
985 bytes sent via SQL*Net to client
535 bytes received via SQL*Net from client
3 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
16 rows processed
-- EXISTS觸發反聯結
SQL> select department_name
2 from hr.departments dept
3 where not exists
4 (select null from hr.employees emp where emp.department_id = dept.department_id);
16 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 3082375452
----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 17 | 323 | 3 (0)| 00:00:01 |
| 1 | NESTED LOOPS ANTI | | 17 | 323 | 3 (0)| 00:00:01 |
| 2 | TABLE ACCESS FULL| DEPARTMENTS | 27 | 432 | 3 (0)| 00:00:01 |
|* 3 | INDEX RANGE SCAN | EMP_DEPARTMENT_IX | 41 | 123 | 0 (0)| 00:00:01 |
----------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - access("EMP"."DEPARTMENT_ID"="DEPT"."DEPARTMENT_ID")
Statistics
----------------------------------------------------------
3 recursive calls
0 db block gets
13 consistent gets
0 physical reads
0 redo size
985 bytes sent via SQL*Net to client
535 bytes received via SQL*Net from client
3 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
16 rows processed
從上面的例子可以看出minus和outer join操作都可以巧妙的實現同樣的結果,不過從執行計劃來看,minus操作顯然沒有反聯結操作優化,而使用outer join雖然發生了反聯結優化,但是由於使用了帶空值的虛擬記錄來匹配數據行,不便於理解,因此實際還是不建議使用的。
如果想要手動控制反聯結的執行計劃,這裡也有一些hint和參數可以使用,常用的hint有:
1.ANTIJOIN-進行反聯結,優化器決定聯結類型
2.USE_ANTI-老版本的提示,和ANTIJOIN功能一致
3.[NL_AJ] | [HASH_AJ] | [MERGE_AJ]-指定發生反聯結的類型(10g開始被棄用,不過仍然可以生效)
在參數控制方面,也有個和_always_semi_join非常相同的_always_anti_join參數,用法完全一致;還有參數_optimizer_null_aware_antijoin,_optimizer_outer_to_anti_enable用於控制對含空值和外聯結的反聯結轉換。
復制代碼 代碼如下:
-- 使用hint顯式指定反聯結類型
SQL> select department_name
2 from hr.departments dept
3 where not exists (select /*+ hash_aj */ null from hr.employees emp
4 where emp.department_id = dept.department_id);
16 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 3587451639
----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 17 | 323 | 5 (20)| 00:00:01 |
|* 1 | HASH JOIN ANTI | | 17 | 323 | 5 (20)| 00:00:01 |
| 2 | TABLE ACCESS FULL| DEPARTMENTS | 27 | 432 | 3 (0)| 00:00:01 |
| 3 | INDEX FULL SCAN | EMP_DEPARTMENT_IX | 107 | 321 | 1 (0)| 00:00:01 |
----------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("EMP"."DEPARTMENT_ID"="DEPT"."DEPARTMENT_ID")
Statistics
----------------------------------------------------------
566 recursive calls
0 db block gets
193 consistent gets
0 physical reads
0 redo size
985 bytes sent via SQL*Net to client
535 bytes received via SQL*Net from client
3 SQL*Net roundtrips to/from client
12 sorts (memory)
0 sorts (disk)
16 rows processed
-- 使用_optimizer_null_antijoin參數關閉反聯結中的空值考慮選項-即返回空值的情況不使用反聯結
SQL> alter session set "_optimizer_null_aware_antijoin"=false;
Session altered.
SQL> select department_name
2 from hr.departments
3 where department_id not in (select department_id from hr.employees);
no rows selected
Execution Plan
----------------------------------------------------------
Plan hash value: 3416340233
----------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 26 | 416 | 30 (0)| 00:00:01 |
|* 1 | FILTER | | | | | |
| 2 | TABLE ACCESS FULL| DEPARTMENTS | 27 | 432 | 3 (0)| 00:00:01 |
|* 3 | TABLE ACCESS FULL| EMPLOYEES | 2 | 6 | 2 (0)| 00:00:01 |
----------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter( NOT EXISTS (SELECT 0 FROM "HR"."EMPLOYEES" "EMPLOYEES"
WHERE LNNVL("DEPARTMENT_ID"<>:B1)))
3 - filter(LNNVL("DEPARTMENT_ID"<>:B1))
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
172 consistent gets
0 physical reads
0 redo size
343 bytes sent via SQL*Net to client
513 bytes received via SQL*Net from client
1 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
0 rows processed