0. Oracle測試數據准備:
[oracle@db1 ~]$ sqlplus system/000000
SQL*Plus: Release 11.2.0.3.0 Production on Tue Mar 25 10:26:06 2014
Copyright (c) 1982, 2011, Oracle. All rights reserved.
Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
SQL> drop table test ;
Table dropped.
SQL> create table test(id int,name varchar2(20),age int,msg varchar2(20));
Table created.
SQL> insert into test values(1,'aaaaa',1,'aaaaa');
1 row created.
SQL> insert into test values(2,'bbbbb',2,'bbbbb');
1 row created.
SQL> insert into test values(3,'ccccc',3,'ccccc');
1 row created.
SQL> insert into test values(4,'ddddd',4,'ddddd');
1 row created.
SQL> insert into test values(5,'eeeee',5,'eeeee');
1 row created.
SQL>
SQL> commit;
Commit complete.
SQL> select count(*) from test;
COUNT(*)
----------
5
SQL> INSERT INTO TEST SELECT * FROM TEST;
5 rows created.
SQL> commit;
Commit complete.
SQL> select count(*) from test;
COUNT(*)
----------
100000
SQL> select segment_name,bytes/1024/1024 from dba_segments where segment_name='TEST';
SEGMENT_NAME
--------------------------------------------------------------------------------
BYTES/1024/1024
---------------
TEST
3
模擬插入10W的數據。
1. 使用落地文件外部表加載測試
首先在oracle服務器端建數據導出,並開啟gpfdist服務進程:
[oracle@db1 ~]$ gtlions.ora2text.bin user=system/000000 query='select * from test' text=csv file=test.sql fast=true
0 rows exported at 2014-03-25 10:30:19, size 0 MB.
100000 rows exported at 2014-03-25 10:30:20, size 2 MB.
output file test.sql closed at 100000 rows, size 2 MB.
[oracle@db1 ~]$ nohup gpfdist -d . -p 9999 &
[1] 15147
[oracle@db1 ~]$ nohup: 忽略輸入並把輸出追加到"nohup.out"
[1]+ Exit 1 nohup gpfdist -d . -p 9999
[oracle@db1 ~]$ ps -ef | grep gpfdist
oracle 15149 15068 0 10:30 pts/8 00:00:00 grep gpfdist
oracle 62994 62778 0 Mar24 ? 00:00:04 gpfdist -d . -p 9999
導出階段耗時1S;
接下來在gp創建相關外部表並加載數據入庫:
[gpadmin@bdb ~]$ psql postgres
Timing is on.
psql (8.2.15)
Type "help" for help.
postgres=# \timing on
Timing is on.
postgres=# drop table if exists gt_test;
create table gt_test(id int,name character varying(20),age int,msg character varying(20)) distributed randomly;
DROP TABLE
Time: 17.802 ms
postgres=# create table gt_test(id int,name character varying(20),age int,msg character varying(20)) distributed randomly;
drop external table if exists gt_test_ext;
CREATE TABLE
Time: 17.386 ms
postgres=# drop external table if exists gt_test_ext;
create external table gt_test_ext(like gt_test) location ('gpfdist://192.168.1.2:9999/test.sql') format 'csv' (header);
DROP EXTERNAL TABLE
Time: 6.734 ms
postgres=# create external table gt_test_ext(like gt_test) location ('gpfdist://192.168.1.2:9999/test.sql') format 'csv' (header);
NOTICE: HEADER means that each one of the data files has a header row.
CREATE EXTERNAL TABLE
Time: 13.562 ms
postgres=# insert into gt_test select * from gt_test_ext;
select count(*) from gt_test;
INSERT 0 100000
Time: 469.955 ms
postgres=# select count(*) from gt_test;
count
--------
100000
(1 row)
Time: 9.455 ms
insert階段耗時0.174S;
一共耗時1S+0.469S=1.469S;
2. 不落地的外部表加載
在gp創建相關外部表並加載數據入庫:
postgres=# \timing on
Timing is on.
postgres=# drop table if exists gt_test;
create table gt_test(id int,name character varying(20),age int,msg character varying(20)) distributed randomly;
DROP TABLE
Time: 17.969 ms
postgres=# create table gt_test(id int,name character varying(20),age int,msg character varying(20)) distributed randomly;
drop external table if exists gt_test_webext;
CREATE TABLE
Time: 16.960 ms
postgres=# drop external table if exists gt_test_webext;
DROP EXTERNAL TABLE
Time: 6.944 ms
postgres=# create external web table gt_test_webext(like gt_test) execute 'sh /home/gtlions/oracle.sh' on master format 'text' (delimiter ',');
CREATE EXTERNAL TABLE
Time: 9.508 ms
postgres=# insert into gt_test select * from gt_test_webext;
select count(*) from gt_test;
INSERT 0 100000
Time: 4253.003 ms
postgres=# select count(*) from gt_test;
count
--------
100000
(1 row)
Time: 9.867 ms
insert階段耗時4.523S;
3. 加大數據量進行測試
50W 100W 200W 400W 800W 1600W 2000W
經過逐步測試,在50W記錄以內,兩者的差別不大,在2~3倍左右;
但是一旦記錄增多,兩者的差距逐步放大:
50W記錄差距7倍;
100W記錄差距12.9倍;
200W記錄差距12.9倍;
500W記錄差距13.5倍;
1000W記錄差距15.2倍;
。。。。。。
不過在從100W到1億過程中,時間差距並不是線性增長的。
3. 總結
對於小表來說,使用不落地的方式尚可接受;而對於超過25MB的大小的表速度上差距太大了。
-EOF-