在RAC環境下配置OGG,要想實現RAC節點故障時,OGG能自動的failover到正常節點,要保證兩點:
1. OGG的checkpoint,trail,BR文件放置在共享的集群文件系統上,RAC各節點都能訪問到
2. 需要有集群軟件的來監測OGG進程,以及發生故障時,自動在正常節點重啟OGG(failover)
Oracle Grid Infrastructure Standalone Agents (XAG)搭配Oracle支持的集群文件系統,可以實現OGG的自動failover,本文介紹相關的配置步驟。
要想使用XAG實現自動failover,相關軟件的版本必須滿足要求:
至於集群文件系統,Oracle官方文檔給出的建議是ACFS,DBFS和OCFS,我覺得其他集群文件系統,比如Veritas 的集群文件系統應該也可以。
本文示例使用的是ACFS。
源端數據庫:11.2.0.4 RAC (ASM)
目標端數據庫:12.1.0.2 RAC(ASM)
GoldenGate : 12.2.0.1.1
操作系統:源端和目標端都是Oracle Enterprise Linux 6.5 (64bit)
XAG需要單獨去Oracle官網下載安裝 ,下載位置是:http://www.oracle.com/technetwork/database/database-technologies/clusterware/downloads/index.html
目前的版本是7,文件是xagpack_7b.zip
解壓縮文件,然後用GI的安裝用戶(一般是“grid”),執行xagsetup.sh進行安裝:
[grid@rac1 xag]$ ./xagsetup.sh --install --directory /u01/app/grid/xaghome --all_nodes
Installing Oracle Grid Infrastructure Agents on: rac1
Installing Oracle Grid Infrastructure Agents on: rac2
Done.
在目標端也安裝XAG,方法和源端相同。
11.2.0.4 在OEL上如果想用ACFS,必須安裝PSU補丁到11.2.0.4.4以上。補丁過程略過。
使用ACFS的磁盤組的屬性值COMPATIBLE.ASM和COMPATIBLE.ADVM必須設置為11.2 :
使用ASMCMD或ASMCA創建ACFS卷:
創建通用ACFS
此時ACFS還不是CRS管理的,可以使用ASMCMD的volinfo命令或/sbin/acfsutil registry查看ACFS信息
ASMCMD> volinfo -a
Diskgroup Name: DATA
Volume Name: VOLOGG1
Volume Device: /dev/asm/vologg1-426
State: ENABLED
Size (MB): 3072
Resize Unit (MB): 32
Redundancy: UNPROT
Stripe Columns: 4
Stripe Width (K): 128
Usage: ACFS
Mountpath: /u01/app/grid/acfsmounts/data_vol1
[root@rac1 ~]# /sbin/acfsutil registry
Mount Object:
Device: /dev/asm/vologg1-426
Mount Point: /u01/app/grid/acfsmounts/data_vol1
Disk Group: DATA
Volume: VOLOGG1
Options: none
Nodes: all
首先從通用ACFS的注冊信息中刪除我們剛才創建的ACFS的條目
[root@rac1 ~]# /sbin/acfsutil registry -d /u01/app/grid/acfsmounts/data_vol1
acfsutil registry: successfully removed ACFS mount point /u01/app/grid/acfsmounts/data_vol1 from Oracle Registry
然後,用SRVCTL工具進行CRS資源注冊:
[root@rac1 ~]# /u01/app/11.2.0/grid/bin/srvctl add filesystem -d /dev/asm/vologg1-426 -v VOLOGG1 -g DATA -m /u01/app/grid/acfsmounts/data_vol1 -u grid
[root@rac1 ~]# /u01/app/11.2.0/grid/bin/crsctl status resource -t
--------------------------------------------------------------------------------
NAME TARGET STATE SERVER STATE_DETAILS
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.DATA.dg
ONLINE ONLINE rac1
ONLINE ONLINE rac2
ora.LISTENER.lsnr
ONLINE ONLINE rac1
ONLINE ONLINE rac2
ora.asm
ONLINE ONLINE rac1 Started
ONLINE ONLINE rac2 Started
ora.data.vologg1.acfs
OFFLINE OFFLINE rac1
OFFLINE OFFLINE rac2
ora.gsd
OFFLINE OFFLINE rac1
OFFLINE OFFLINE rac2
ora.net1.network
ONLINE ONLINE rac1
ONLINE ONLINE rac2
ora.ons
ONLINE ONLINE rac1
ONLINE ONLINE rac2
--------------------------------------------------------------------------------
手工啟動資源,(mount ACFS)
[root@rac1 ~]# /u01/app/11.2.0/grid/bin/srvctl start filesystem -d /dev/asm/vologg1-426
[root@rac1 ~]#
[root@rac1 ~]# /u01/app/11.2.0/grid/bin/crsctl status resource -t
--------------------------------------------------------------------------------
NAME TARGET STATE SERVER STATE_DETAILS
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.DATA.dg
ONLINE ONLINE rac1
ONLINE ONLINE rac2
ora.LISTENER.lsnr
ONLINE ONLINE rac1
ONLINE ONLINE rac2
ora.asm
ONLINE ONLINE rac1 Started
ONLINE ONLINE rac2 Started
ora.data.vologg1.acfs
ONLINE ONLINE rac1 mounted on /u01/app /grid/acfsmounts/dat a_vol1
ONLINE ONLINE rac2 mounted on /u01/app/grid/acfsmounts/dat a_vol1
[root@rac1 ~]# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/vg_rac1-lv_root 45G 32G 12G 74% /
tmpfs 2.0G 437M 1.6G 23% /dev/shm
/dev/sda1 477M 55M 397M 13% /boot
/dev/asm/vologg1-426 3.0G 83M 3.0G 3% /u01/app/grid/acfsmounts/data_vol1
[root@rac2 ~]# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/vg_rac1-lv_root 45G 25G 19G 58% /
tmpfs 2.0G 440M 1.6G 23% /dev/shm
/dev/sda1 477M 55M 397M 13% /boot
/dev/asm/vologg1-426 3.0G 83M 3.0G 3% /u01/app/grid/acfsmounts/data_vol1
12c創建ACFS和11g的主要區別是,沒有了通用和數據庫home用文件系統的選項,創建後會生成注冊文件系統到CRS的腳本。
運行系統生成的腳本,完成注冊及掛載:
[root@oel65vm11 scripts]# ./acfs_script.sh
ACFS file system /u01/app/grid/acfsmounts/ogg_vol1 is mounted on nodes oel65vm11,oel65vm12
查看資源信息:
[root@oel65vm11 bin]# ./crsctl status resource -t
--------------------------------------------------------------------------------
Name Target State Server State details
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.DATA.VOLOGG2.advm
ONLINE ONLINE oel65vm11 STABLE
ONLINE ONLINE oel65vm12 STABLE
ora.DATA.dg
ONLINE ONLINE oel65vm11 STABLE
ONLINE ONLINE oel65vm12 STABLE
ora.LISTENER.lsnr
ONLINE ONLINE oel65vm11 STABLE
ONLINE ONLINE oel65vm12 STABLE
ora.asm
ONLINE ONLINE oel65vm11 Started,STABLE
ONLINE ONLINE oel65vm12 Started,STABLE
ora.data.vologg2.acfs
ONLINE ONLINE oel65vm11 mounted on /u01/app/grid/acfsmounts/ogg_vol1,STABLE
ONLINE ONLINE oel65vm12 mounted on /u01/app/grid/acfsmounts/ogg_vol1,STABLE
ora.net1.network
ONLINE ONLINE oel65vm11 STABLE
ONLINE ONLINE oel65vm12 STABLE
ora.ons
ONLINE ONLINE oel65vm11 STABLE
ONLINE ONLINE oel65vm12 STABLE
注意,所有節點必須關掉SELINUX,否則會出現ACFS無權寫入的錯誤。
這個版本的ogg同時支持11g和12c的數據庫,在圖形界面安裝時,用戶可以選擇對應不同數據庫版本的ogg
將OGG安裝在前面創建的ACFS上:
源端的安裝位置:/u01/app/grid/acfsmounts/data_vol1/ogg12
目標端的安裝位置:/u01/app/grid/acfsmounts/ogg_vol1/ogg12
選擇自動啟動Manager進程。
l 變更源端數據庫為歸檔模式,過程省略。
l 源端數據庫增加相關日志及修改參數:
SQL> ALTER DATABASE ADD SUPPLEMENTAL LOG DATA;
Database altered.
SQL> ALTER DATABASE FORCE LOGGING;
Database altered.
SQL> SELECT supplemental_log_data_min, force_logging FROM v$database;
SUPPLEME FORCE_LOGGING
-------- ---------------------------------------
YES YES
SQL> ALTER SYSTEM SWITCH LOGFILE;
System altered.
SQL> alter system set ENABLE_GOLDENGATE_REPLICATION=true;
System altered.
l 在源端和目標端創建OGG數據庫用戶及授權,我的例子裡創建的用戶是GGADM。
OGG用戶需要的權限可以參閱聯機文檔《Installing and Configuring Oracle GoldenGate for Oracle Database 12c (12.2.0.1)》中的章節 4.1.4.1 Oracle 11.2.0.4 or Later Database Privileges,我們這個測試為了方便,授予用戶DBA角色,以及使用特定系統包的授權:
SQL> BEGIN
dbms_goldengate_auth.grant_admin_privilege
2 3 (
grantee => 'GGADM',
privilege_type => 'CAPTURE',
grant_select_privileges => TRUE
);
END;
/ 4 5 6 7 8 9
PL/SQL procedure successfully completed.
l 登錄數據庫:
GGSCI (rac1.hthorizontest.com) 1> dblogin userid ggadm password ggadm
Successfully logged into database.
l 注冊集成式抽取
GGSCI (rac1.hthorizontest.com as ggadm@tdb1) 2> register extract ext1 database;
2016-04-07 23:44:38 INFO OGG-02003 Extract EXT1 successfully registered with database at SCN 1291634.
l 增加抽取進程
GGSCI (rac1.hthorizontest.com as ggadm@tdb1) 3> ADD EXTRACT ext1 INTEGRATED TRANLOG, BEGIN NOW
EXTRACT (Integrated) added.
GGSCI (rac1.hthorizontest.com as ggadm@tdb1) 4> ADD EXTTRAIL /u01/app/grid/acfsmounts/data_vol1/ogg12/dirdat/et, EXTRACT ext1
EXTTRAIL added.
l 增加傳送進程
GGSCI (rac1.hthorizontest.com as ggadm@tdb1) 5> ADD EXTRACT pump1 EXTTRAILSOURCE /u01/app/grid/acfsmounts/data_vol1/ogg12/dirdat/et
EXTRACT added.
GGSCI (rac1.hthorizontest.com as ggadm@tdb1) 6>EDIT PARAMS EXT1
加入下面內容:
EXTRACT ext1
USERID ggadm, PASSWORD ggadm
TRANLOGOPTIONS INTEGRATED PARAMS (MAX_SGA_SIZE 100)
EXTTRAIL /u01/app/grid/acfsmounts/data_vol1/ogg12/dirdat/et
TABLE test.*;
GGSCI (rac1.hthorizontest.com as ggadm@tdb1) 7>EDIT PARAMS PUMP1
加入下面內容:
EXTRACT pump1
USERID ggadm, PASSWORD ggadm
RMTHOST 192.168.0.11, MGRPORT 7809
RMTTRAIL /u01/app/grid/acfsmounts/ogg_vol1/ogg12/dirdat/rt
TABLE TEST.*;
然後啟動所有進程。
在11.2.0.4版本,如果實現集成的capture模式,在啟動抽取進程時,會提示需要安裝補丁17030189,主要是因為使用集成的capture,需要修改數據字典表。
但是在安裝了PSU後,有時會導致這個補丁和其他補丁沖突,也可以手工執行prvtlmpg.plb來解決問題。
(EXTRACT Abending With OGG-02912 (Doc ID 2091679.1))
GGSCI (oel65vm11.hthorizon.com) 8> dblogin userid ggadm password ggadm
Successfully logged into database.
GGSCI (oel65vm11.hthorizon.com as ggadm@racdb1) 9>ADD CHECKPOINTTABLE ggadm.checkpointtab
Successfully created checkpoint table ggadm.checkpointtab
GGSCI (oel65vm11.hthorizon.com as ggadm@racdb1) 10> ADD REPLICAT rep1, EXTTRAIL /u01/app/grid/acfsmounts/ogg_vol1/ogg12/dirdat/rt checkpointtable ggadm.checkpointtab
REPLICAT added.
GGSCI (oel65vm11.hthorizon.com as ggadm@racdb1) 11>EDIT PARAMS REP1
加入下面內容:
REPLICAT rep1
USERID ggadm, PASSWORD ggadm
ASSUMETARGETDEFS
DISCARDFILE /u01/app/grid/acfsmounts/ogg_vol1/ogg12/dirdat/rt, PURGE
MAP TEST.* TARGET TEST.*;
然後啟動進程,測試OGG數據復制是否正常
為了讓OGG的Manager進程能夠自動啟動復制進程,需要將下列配置加進Manager的配置文件:
AUTORESTART ER *, RETRIES 5, WAITMINUTES 1, RESETMINUTES 60
AUTOSTART ER *
重啟Manager進程使之生效。
源端和目標端都要修改。
l 添加APP VIP(以root身份)
[root@rac1 ~]# /u01/app/11.2.0/grid/bin/appvipcfg create -network=1 -ip=192.168.0.36 -vipname=xag.gg_1-vip.vip -user=oracle
l 允許grid用戶啟動資源(以root身份)
[root@rac1 ~]# /u01/app/11.2.0/grid/bin/crsctl setperm resource xag.gg_1-vip.vip -u user:grid:r-x
l 啟動VIP(以grid身份)
[root@rac1 ~]# su - grid
[grid@rac1 ~]$ /u01/app/11.2.0/grid/bin/crsctl start resource xag.gg_1-vip.vip
CRS-2672: Attempting to start 'xag.gg_1-vip.vip' on 'rac1'
CRS-2676: Start of 'xag.gg_1-vip.vip' on 'rac1' succeeded
l 查看狀態
[grid@rac1 ~]$ crsctl status resource xag.gg_1-vip.vip
NAME=xag.gg_1-vip.vip
TYPE=app.appvip_net1.type
TARGET=ONLINE
STATE=ONLINE on rac1
l 創建OGG對應的CRS資源(以root身份)
[root@rac1 bin]# /u01/app/grid/xaghome/bin/agctl add goldengate gg_1 --gg_home /u01/app/grid/acfsmounts/data_vol1/ogg12 --instance_type source --nodes rac1,rac2 --vip_name xag.gg_1-vip.vip --filesystems ora.data.vologg1.acfs --databases ora.tdb.db --oracle_home /u01/app/oracle/product/11.2.0/dbhome_1 --monitor_extracts ext1,pump1
[root@rac1 ~]# cd /u01/app/grid/xaghome/bin
[root@rac1 bin]# ./agctl status goldengate gg_1
Goldengate instance 'gg_1' is not running
l 授權grid啟動資源
上面的命令執行完畢,會自動創建一個對應ogg的CRS資源,需要授權grid有權管理它:
[root@oel65vm11 bin]# /u01/app/11.2.0/grid/bin/crsctl setperm resource xag.gg_1.goldengate -u user:grid:r-x
過程和源端類似,
l 創建VIP資源:
[root@oel65vm11 ~]# /u01/app/12.1.0/grid/bin/appvipcfg create -network=1 -ip=192.168.0.26 -vipname=xag.gg_1-vip.vip -user=oracle
[root@oel65vm11 ~]# /u01/app/12.1.0/grid/bin/crsctl setperm resource xag.gg_1-vip.vip -u user:grid:r-x
[root@oel65vm11 ~]# /u01/app/12.1.0/grid/bin/crsctl start resource xag.gg_1-vip.vip
CRS-2672: Attempting to start 'xag.gg_1-vip.vip' on 'oel65vm12'
CRS-2676: Start of 'xag.gg_1-vip.vip' on 'oel65vm12' succeeded
[root@oel65vm11 ~]# /u01/app/12.1.0/grid/bin/crsctl relocate resource xag.gg_1-vip.vip -n oel65vm11
CRS-2673: Attempting to stop 'xag.gg_1-vip.vip' on 'oel65vm12'
CRS-2677: Stop of 'xag.gg_1-vip.vip' on 'oel65vm12' succeeded
CRS-2672: Attempting to start 'xag.gg_1-vip.vip' on 'oel65vm11'
CRS-2676: Start of 'xag.gg_1-vip.vip' on 'oel65vm11' succeeded
l 創建ogg 對應的CRS資源
[root@oel65vm11 bin]# /u01/app/grid/xaghome/bin/agctl add goldengate gg_2 --gg_home /u01/app/grid/acfsmounts/ogg_vol1/ogg12 --instance_type target --nodes oel65vm11,oel65vm12 --vip_name xag.gg_1-vip.vip --filesystems ora.data.vologg2.acfs --databases ora.racdb.db --oracle_home /u01/app/oracle/product/12.1.0/dbhome_1 --monitor_replicats rep1
l 授權
[root@oel65vm11 bin]# /u01/app/12.1.0/grid/bin/crsctl setperm resource xag.gg_2.goldengate -u user:grid:r-x
將PUMP進程對應的源端地址修改為我們剛才創建的VIP
RMTHOST 192.168.0.26, MGRPORT 7809
重啟PUMP進程
進入ggsci命令行,將源端和目標段進程都停掉
l 啟動目標端資源
[grid@oel65vm11 ~]$ cd $ORACLE_BASE
[grid@oel65vm11 grid]$ cd xaghome/bin
[grid@oel65vm11 bin]$ ./agctl start goldengate gg_2 --node oel65vm11
[grid@oel65vm11 bin]$ crsctl status resource xag.gg_2.goldengate
NAME=xag.gg_2.goldengate
TYPE=xag.goldengate.type
TARGET=ONLINE
STATE=ONLINE on oel65vm11
l 啟動源端資源
[grid@rac1 bin]$ cd $ORACLE_BASE
[grid@rac1 grid]$ cd xaghome/bin
[grid@rac1 bin]$ ./agctl start goldengate gg_1 --node rac1
[grid@rac1 bin]$ crsctl status resource xag.gg_1.goldengate
NAME=xag.gg_1.goldengate
TYPE=xag.goldengate.type
TARGET=ONLINE
STATE=ONLINE on rac1
啟動後,進入GGSCI命令行,查看進程狀態,如果進程都自動啟動了,說明配置沒有問題。
使用命令測試源端切換:
[grid@rac1 bin]$ ./agctl relocate goldengate gg_1 --node rac2
[grid@rac1 bin]$ crsctl status resource –t
。。。。。。
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
。。。。。。
xag.gg_1-vip.vip
1 ONLINE ONLINE rac2
xag.gg_1.goldengate
1 ONLINE ONLINE rac2
再做一個切斷電源的測試,我們以“關掉電源”的方式關閉目標端的主機oel65vm11
在主機oel65vm12上,可以看到RAC的vip failover到了本節點,ogg的vip和gg_2對應的資源也自動failover到了本節點:
[grid@oel65vm12 ~]$ crsctl status resource -t
。。。。。。
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
。。。。。。
ora.oel65vm11.vip
1 ONLINE INTERMEDIATE oel65vm12 FAILED OVER,STABLE
ora.oel65vm12.vip
1 ONLINE ONLINE oel65vm12 STABLE
ora.racdb.db
1 ONLINE OFFLINE STABLE
2 ONLINE ONLINE oel65vm12 Open,STABLE
ora.scan1.vip
1 ONLINE ONLINE oel65vm12 STABLE
xag.gg_1-vip.vip
1 ONLINE ONLINE oel65vm12 STABLE
xag.gg_2.goldengate
1 ONLINE ONLINE oel65vm12 STABLE
上面只是一個最簡單的例子,沒有考慮各種復雜的情況,例如,同時部署有監控jagent,或者downstream復制等等,所以現實的生產環境往往比這個例子復雜得多。