EMP表是Oracle測試賬戶SCOTT中的一張雇員表,首先,我們來看看emp表的數據
SQL> select * from emp; EMPNO ENAME JOB MGR HIREDATE SAL COMM DEPTNO ----- ---------- --------- ---------- --------- ---------- ---------- ---------- 7369 SMITH CLERK 7902 17-DEC-80 800 20 7499 ALLEN SALESMAN 7698 20-FEB-81 1600 300 30 7521 WARD SALESMAN 7698 22-FEB-81 1250 500 30 7566 JONES MANAGER 7839 02-APR-81 2975 20 7654 MARTIN SALESMAN 7698 28-SEP-81 1250 1400 30 7698 BLAKE MANAGER 7839 01-MAY-81 2850 30 7782 CLARK MANAGER 7839 09-JUN-81 2450 10 7788 SCOTT ANALYST 7566 19-APR-87 3000 20 7839 KING PRESIDENT 17-NOV-81 5000 10 7844 TURNER SALESMAN 7698 08-SEP-81 1500 0 30 7876 ADAMS CLERK 7788 23-MAY-87 1100 20 7900 JAMES CLERK 7698 03-DEC-81 950 30 7902 FORD ANALYST 7566 03-DEC-81 3000 20 7934 MILLER CLERK 7782 23-JAN-82 1300 10 14 rows selected.
其中,empno是員工編號,同時也是該表的主鍵,ename是員工姓名,sal是員工工資,deptno是員工部門。
如何找出每個部門的最高工資的員工信息呢?
常用的方法是關聯查詢,SQL語句如下:
select emp.deptno,ename,sal from emp, (select deptno,max(sal)maxsal from emp group by deptno) t where emp.deptno=t.deptno and emp.sal=t.maxsal;
結果如下:
DEPTNO ENAME SAL ---------- ---------- ---------- 30 BLAKE 2850 20 SCOTT 3000 10 KING 5000 20 FORD 3000
下面我們來看看執行計劃:
Execution Plan ---------------------------------------------------------- Plan hash value: 269884559 ----------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU) | Time | ----------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 3 | 117 | 7 (15)| 00:00:01 | |* 1 | HASH JOIN | | 3 | 117 | 7 (15)| 00:00:01 | | 2 | VIEW | | 3 | 78 | 4 (25)| 00:00:01 | | 3 | HASH GROUP BY | | 3 | 21 | 4 (25)| 00:00:01 | | 4 | TABLE ACCESS FULL| EMP | 14 | 98 | 3 (0)| 00:00:01 | | 5 | TABLE ACCESS FULL | EMP | 14 | 182 | 3 (0)| 00:00:01 | ----------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - access("EMP"."DEPTNO"="T"."DEPTNO" AND "EMP"."SAL"="T"."MAXSAL") Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 13 consistent gets 0 physical reads 0 redo size 625 bytes sent via SQL*Net to client 419 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 4 rows processed
不難看出,該查詢針對同一個表走了兩次全盤掃描,成本為7,邏輯讀為13。
如何對上述查詢進行優化呢?在這裡,用到分析函數LAST_VALUE,LAST_VALUE返回排序集中的最後一個值。
SELECT deptno,ename,sal, LAST_VALUE(sal) OVER(PARTITION BY deptno ORDER BY sal ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)maxsal FROM emp;
輸出結果如下:
DEPTNO ENAME SAL MAXSAL ---------- ---------- ---------- ---------- 10 MILLER 1300 5000 10 CLARK 2450 5000 10 KING 5000 5000 20 SMITH 800 3000 20 ADAMS 1100 3000 20 JONES 2975 3000 20 SCOTT 3000 3000 20 FORD 3000 3000 30 JAMES 950 2850 30 MARTIN 1250 2850 30 WARD 1250 2850 30 TURNER 1500 2850 30 ALLEN 1600 2850 30 BLAKE 2850 2850 14 rows selected.
不難看出,sal等於maxsal的行即為每個部門最高工資的員工,下面用嵌套子查詢得到目標結果。
SELECT deptno,ename,sal FROM ( SELECT deptno,ename,sal, LAST_VALUE(sal) OVER(PARTITION BY deptno ORDER BY sal ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)maxsal FROM emp) WHERE sal=maxsal;
輸出結果如下:
DEPTNO ENAME SAL ---------- ---------- ---------- 10 KING 5000 20 SCOTT 3000 20 FORD 3000 30 BLAKE 2850
下面我們來看看該語句的執行計劃:
Execution Plan ---------------------------------------------------------- Plan hash value: 4130734685 ---------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ---------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 14 | 644 | 4 (25)| 00:00:01 | |* 1 | VIEW | | 14 | 644 | 4 (25)| 00:00:01 | | 2 | WINDOW SORT | | 14 | 182 | 4 (25)| 00:00:01 | | 3 | TABLE ACCESS FULL | EMP | 14 | 182 | 3 (0)| 00:00:01 | ---------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter("SAL"="MAXSAL") Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 6 consistent gets 0 physical reads 0 redo size 619 bytes sent via SQL*Net to client 419 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 1 sorts (memory) 0 sorts (disk) 4 rows processed
可見,引入了分析函數以後,成本和邏輯讀都減少了一半。
通過查詢的結果,我們可以看出,20號部門有兩個人的工資最高,有時候,我們只想得到一個人的信息,如何實現呢?
在這裡我們會用到分析函數LAG,具體SQL如下:
SELECT deptno,ename,sal,LAG(sal)OVER(ORDER BY deptno) presal FROM ( SELECT deptno,ename,sal, LAST_VALUE(sal) OVER(PARTITION BY deptno ORDER BY sal ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)maxsal FROM emp) WHERE sal=maxsal;
輸出結果如下:
DEPTNO ENAME SAL PRESAL ---------- ---------- ---------- ---------- 10 KING 5000 20 SCOTT 3000 5000 20 FORD 3000 3000 30 BLAKE 2850 3000
剔除sal等於presal的行
SELECT deptno,ename,sal FROM ( SELECT deptno,ename,sal,LAG(sal)OVER(ORDER BY deptno) presal FROM ( SELECT deptno,ename,sal, LAST_VALUE(sal) OVER(PARTITION BY deptno ORDER BY sal ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)maxsal FROM emp) WHERE sal=maxsal) WHERE sal <> presal or presal is null;
輸出結果如下:
DEPTNO ENAME SAL ---------- ---------- ---------- 10 KING 5000 20 SCOTT 3000 30 BLAKE 2850
總結:
在實際生產環境中,此類應用還是蠻多的,譬如如何查詢每個時段耗時最大的工單。當然,通過上述演示,我們也看出了group by函數的局限性。
關於LAST_VALUE和LAG函數的具體應用及說明,可參考Oracle官方文檔:
1. LAST_VALUE
2. LAG