在MySQL中,我們通常都使用limit來完成數據集獲取的分頁操作,而在Oracle數據庫中,並沒有類似limit一樣的方便方法來實現分頁,因此我們通常都是直接在SQL語句中完成分頁,這裡就需要借助於rownum偽列或row_number()函數了,本文將分別展示使用rownum偽列和row_number()分析函數來完成Oracle數據分頁操作的具體使用方法,並分析和比較兩者的性能優劣。
一、初始化測試數據
首先測試數據我選取了數據字典all_objects表中的70000條數據,創建步驟如下:
復制代碼 代碼如下:
-- 為了方便驗證結果集以及避免不必要的排序,這裡我直接使用了rownum來產生了有序的OBJECT_ID列
SQL> create table my_objects as
2 select rownum as OBJECT_ID,OBJECT_NAME,OBJECT_TYPE
3 from all_objects where rownum < 70001;
Table created.
-- 對OJBECT_ID列建立主鍵
SQL> alter table my_objects add primary key (object_id);
Table altered.
SQL> select count(*) from my_objects;
COUNT(*)
----------
70000
-- 分析該表
SQL> exec dbms_stats.gather_table_stats(user,'my_objects',cascade => TRUE);
PL/SQL procedure successfully completed.
二、分頁數據獲取
為了完成分頁,我們需要獲得該表中的第59991-60000條的10條記錄,這個工作我們分別使用rownum和rown_number()來實現
復制代碼 代碼如下:
-- 方法一,rownum偽列方式
SQL> select t.* from (select d.*,rownum num from my_objects d where rownum<=60000) t where t.num>=59991;
OBJECT_ID OBJECT_NAME OBJECT_TYPE NUM
---------- ------------------------------ ------------------- ----------
59991 /585bb929_DicomRepos24 JAVA CLASS 59991
59992 /13a1874f_DicomRepos25 JAVA CLASS 59992
59993 /2322ccf0_DicomRepos26 JAVA CLASS 59993
59994 /6c82abc6_DicomRepos27 JAVA CLASS 59994
59995 /34be1a57_DicomRepos28 JAVA CLASS 59995
59996 /b7ee0c7f_DicomRepos29 JAVA CLASS 59996
59997 /bb1d935c_DicomRepos30 JAVA CLASS 59997
59998 /deb95b4f_DicomRepos31 JAVA CLASS 59998
59999 /9b5f55c0_DicomRepos32 JAVA CLASS 59999
60000 /572f1657_DicomRepos33 JAVA CLASS 60000
10 rows selected.
-- 方法二,row_number分析函數方式
SQL> select * from
2 (select t.*,row_number() over (order by t.OBJECT_ID) as num
3 from my_objects t)
4 where num between 59991 and 60000;
OBJECT_ID OBJECT_NAME OBJECT_TYPE NUM
---------- ------------------------------ ------------------- ----------
59991 /585bb929_DicomRepos24 JAVA CLASS 59991
59992 /13a1874f_DicomRepos25 JAVA CLASS 59992
59993 /2322ccf0_DicomRepos26 JAVA CLASS 59993
59994 /6c82abc6_DicomRepos27 JAVA CLASS 59994
59995 /34be1a57_DicomRepos28 JAVA CLASS 59995
59996 /b7ee0c7f_DicomRepos29 JAVA CLASS 59996
59997 /bb1d935c_DicomRepos30 JAVA CLASS 59997
59998 /deb95b4f_DicomRepos31 JAVA CLASS 59998
59999 /9b5f55c0_DicomRepos32 JAVA CLASS 59999
60000 /572f1657_DicomRepos33 JAVA CLASS 60000
10 rows selected.
可以看到這兩種方式都返回了正確的結果集;在rownum方法中,由於不可以直接使用rownum偽列執行”大於“比較運算,所以這裡是先從子查詢中使用rownum來獲得前60000條數據,然後在外層查詢中使用大於運算去除不需要的行。而對於row_number()方法,row_number()分析函數以OBJECT_ID排序並為其生成了唯一的標識,然後通過between這種便於理解的方式來獲取區間數據,那麼實際的執行是不是這樣的呢?我們來簡單分析一下兩者的執行細節。
三、分頁性能分析
首先還是看一下他們的執行計劃:
復制代碼 代碼如下:
SQL> set autotrace traceonly
SQL> set linesize 200
-- rownum偽列分頁的執行計劃
SQL> select t.* from (select d.*,rownum num from my_objects d where rownum<=60000) t where t.num>=59991;
10 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 341064162
----------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 60000 | 3164K| 103 (0)| 00:00:02 |
|* 1 | VIEW | | 60000 | 3164K| 103 (0)| 00:00:02 |
|* 2 | COUNT STOPKEY | | | | | |
| 3 | TABLE ACCESS FULL| MY_OBJECTS | 60000 | 2226K| 103 (0)| 00:00:02 |
----------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("T"."NUM">=59991)
2 - filter(ROWNUM<=60000)
Statistics
----------------------------------------------------------
163 recursive calls
0 db block gets
399 consistent gets
0 physical reads
0 redo size
1030 bytes sent via SQL*Net to client
419 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
5 sorts (memory)
0 sorts (disk)
10 rows processed
-- row_number()分頁的執行計劃
SQL> select * from
2 (select t.*,row_number() over (order by t.OBJECT_ID) as num
3 from my_objects t)
4 where num between 59991 and 60000;
10 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 2942654422
----------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 70000 | 3691K| 565 (1)| 00:00:07 |
|* 1 | VIEW | | 70000 | 3691K| 565 (1)| 00:00:07 |
|* 2 | WINDOW NOSORT STOPKEY | | 70000 | 2597K| 565 (1)| 00:00:07 |
| 3 | TABLE ACCESS BY INDEX ROWID| MY_OBJECTS | 70000 | 2597K| 565 (1)| 00:00:07 |
| 4 | INDEX FULL SCAN | SYS_C0011057 | 70000 | | 146 (0)| 00:00:02 |
----------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("NUM">=59991 AND "NUM"<=60000)
2 - filter(ROW_NUMBER() OVER ( ORDER BY "T"."OBJECT_ID")<=60000)
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
490 consistent gets
0 physical reads
0 redo size
1030 bytes sent via SQL*Net to client
419 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
10 rows processed
從上面的執行計劃中我們可以看出,rownum方法使用了全表掃描來獲得表中的前60000行,然後使用謂詞條件”T”.”NUM”>=59991來過濾掉了不需要的行;而row_number()方法雖然利用到了主鍵索引來省去了分析函數本身產生的window的排序操作,但它還是先獲取了表中的所有70000行數據,然後再使用between關鍵字來過濾數據行,這個操作的很多資源都消耗在了數據讀取上了,所以上面的例子中,rownum偽列方法獲得了較好的性能,而實際上,在大多數情況下,第一種rownum方法都會獲得較好的性能。
可能有人會疑問,既然row_number()方法在數據讀取上面花費了這麼多的資源,為什麼不直接讓它全表掃描呢,那麼我們來看看使用全表掃描的情形:
復制代碼 代碼如下:
-- 直接禁用主鍵
SQL> alter table my_objects disable primary key;
Table altered.
SQL> select * from
2 (select t.*,row_number() over (order by t.OBJECT_ID) as num
3 from my_objects t)
4 where num between 59991 and 60000;
10 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 2855691782
-----------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 70000 | 3691K| | 812 (1)| 00:00:10 |
|* 1 | VIEW | | 70000 | 3691K| | 812 (1)| 00:00:10 |
|* 2 | WINDOW SORT PUSHED RANK| | 70000 | 2597K| 3304K| 812 (1)| 00:00:10 |
| 3 | TABLE ACCESS FULL | MY_OBJECTS | 70000 | 2597K| | 120 (1)| 00:00:02 |
-----------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("NUM">=59991 AND "NUM"<=60000)
2 - filter(ROW_NUMBER() OVER ( ORDER BY "T"."OBJECT_ID")<=60000)
Statistics
----------------------------------------------------------
190 recursive calls
0 db block gets
450 consistent gets
0 physical reads
0 redo size
1030 bytes sent via SQL*Net to client
419 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
6 sorts (memory)
0 sorts (disk)
10 rows processed
可以看到這種全表掃描的情形發生WINDOW SORT PUSHED RANK方法,也就是說這會cpu資源又花在了對object_id的排序上了,盡管在本例中object_id已經有序了,性能上同樣不及rownum方式。
所以在寫程序的過程中,對於Oracle的分頁操作我還是傾向於使用如下的rownum的方式來完成,通常的寫法如下:
復制代碼 代碼如下:
-- 返回第20頁數據,每頁10行
SQL> define pagenum=20
SQL> define pagerecord=10
SQL> select t.* from (select d.*,rownum num from my_objects d
2 where rownum<=&pagerecord*&pagenum) t
3 where t.num>=(&pagenum-1)*&pagerecord +1;
old 2: where rownum<=&pagerecord*&pagenum) t
new 2: where rownum<=10*20) t
old 3: where t.num>=(&pagenum-1)*&pagerecord +1
new 3: where t.num>=(20-1)*10 +1
OBJECT_ID OBJECT_NAME OBJECT_TYPE NUM
---------- ------------------------------ ------------------- ----------
191 SQLOBJ$DATA_PKEY INDEX 191
192 SQLOBJ$AUXDATA TABLE 192
193 I_SQLOBJ$AUXDATA_PKEY INDEX 193
194 I_SQLOBJ$AUXDATA_TASK INDEX 194
195 OBJECT_USAGE TABLE 195
196 I_STATS_OBJ# INDEX 196
197 PROCEDURE$ TABLE 197
198 PROCEDUREINFO$ TABLE 198
199 ARGUMENT$ TABLE 199
200 SOURCE$ TABLE 200
10 rows selected.
備注:
在寫程序的時候為了便於理解,也會有人在rownum方法中使用between來限定數據行,寫法如下:
復制代碼 代碼如下:
select t.* from (select rownum num, d.* from my_objects d) t where t.num between 59991 and 60000;
在他們看來,這樣寫返回的數據行和第一種rownum方法是一致的,Oracle會推進謂詞between部分到子查詢內部,同樣也不影響性能,而這種想法是完全錯誤的,我們來看一下它的具體執行計劃:
復制代碼 代碼如下:
SQL> select t.* from (select rownum num, d.* from my_objects d) t where t.num between 59991 and 60000;
10 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 1665864874
----------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 70000 | 3691K| 120 (1)| 00:00:02 |
|* 1 | VIEW | | 70000 | 3691K| 120 (1)| 00:00:02 |
| 2 | COUNT | | | | | |
| 3 | TABLE ACCESS FULL| MY_OBJECTS | 70000 | 2597K| 120 (1)| 00:00:02 |
----------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("T"."NUM"<=60000 AND "T"."NUM">=59991)
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
423 consistent gets
0 physical reads
0 redo size
1030 bytes sent via SQL*Net to client
419 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
10 rows processed
可以非常醒目的看到這個查詢先發生了70000行的全表掃描,並非預想的60000行,原因還是rownum,在子查詢中使用rownum直接禁用了查詢轉換階段的謂語前推功能,所以上面的查詢只能先獲得所有的數據再應用between來過濾了。可以參考我的這篇【CBO-查詢轉換探究】。
說了這麼多,其實也就是Oracle的分頁的三條SQL語句,對於數據量非常大的分頁問題,單純這樣做是不會獲得高效的,因此還需要借助於一些其他技術,比如反范式化設計,預先計算或者在應用層建立適當的緩存機制。