使用外部表可以很容易的實現網站的訪問日志分析。
雖然使用Awstats等工具也可以實現,可是使用Oracle來分析我們更應該得心應手。
而且這一切還是有那麼一點點Cool的。
好了,閒言少敘,讓我們來看一下我分析的過程。
首先創建路徑指向日志存放目錄:
[Oracle@jumper elog]$ pwd
/opt/Oracle/elog
[Oracle@jumper elog]$ ls
eygle_Access_log.20061016
[Oracle@jumper elog]$ sqlplus "/ as sysdba"
SQL*Plus: Release 9.2.0.4.0 - Production on Wed Oct 18 08:59:35 2006
Copyright (c) 1982, 2002, Oracle Corporation. All rights reserved.
Connected to:
Oracle9i Enterprise Edition Release 9.2.0.4.0 - Production
With the Partitioning option
JServer Release 9.2.0.4.0 - Production
SQL> create or replace directory elog
2 as '/opt/Oracle/elog';
Directory created.
然後我將這個路徑的訪問權限授予eygle用戶來進行具體操作:
SQL> grant read,write on directory elog to eygle;
Grant succeeded.
選擇合適的分隔符創建外部表:
SQL> connect eygle/eygle
Connected.
SQL> create table eygle_Access_log_20061016
2 ( ip_address_date varchar2(100),
3 acc_file varchar2(400),
4 acc_cdsz varchar2(20),
5 acc_url varchar2(400),
6 left_blank varchar2(10),
7 acc_agent varchar2(400))
8 organization external (
9 type Oracle_loader
10 default directory ELOG
11 Access parameters (
12 records delimited by newline
13 nobadfile
14 nodiscardfile
15 nologfile
16 fIElds terminated by '"'
17 missing fIEld values are null
18 )
19 location('eygle_Access_log.20061016')
20 ) reject limit unlimited
21 /
Table created.
此時我們就可以對eygle.com的2006年10月16日的訪問日志進行分析了。
我們可以先看一下各個字段的分界結果,示例如下:
SQL> select ip_address_date from eygle_Access_log_20061016
2 where rownum <11;
IP_ADDRESS_DATE
-------------------------------------------------------------
38.102.128.140 - - [16/Oct/2006:00:00:17 +0800]
66.249.65.113 - - [16/Oct/2006:00:00:19 +0800]
202.160.178.221 - - [16/Oct/2006:00:00:35 +0800]
59.36.78.100 - - [16/Oct/2006:00:00:37 +0800]
59.36.78.100 - - [16/Oct/2006:00:00:38 +0800]
72.30.61.8 - - [16/Oct/2006:00:00:38 +0800]
221.217.84.230 - - [16/Oct/2006:00:00:42 +0800]
221.217.84.230 - - [16/Oct/2006:00:00:42 +0800]
74.6.65.236 - - [16/Oct/2006:00:01:07 +0800]
74.6.73.36 - - [16/Oct/2006:00:01:09 +0800]
10 rows selected.
通過SQL析取出訪問的ip地址:
SQL> select substr(ip_address_date,1,instr(ip_address_date,' ')) ip_address
2 from eygle_Access_log_20061016 where rownum <11;
IP_ADDRESS
---------------------------------------------------------------------------
38.102.128.140
66.249.65.113
202.160.178.221
59.36.78.100
59.36.78.100
72.30.61.8
221.217.84.230
221.217.84.230
74.6.65.236
74.6.73.36
10 rows selected.
接下來我們就可以很容易的獲得當日訪問我站點的獨立IP數量了:
SQL> set timing on
SQL> select count(distinct(substr(ip_address_date,1,instr(ip_address_date,' ')))) uip
2 from eygle_Access_log_20061016;
UIP
----------
7534
Elapsed: 00:00:06.86
因為外部表的處理性能上要差一些,我們記錄了一下時間,以上查詢大約用了7秒的時間。
我們可以對比一下數據庫表的性能。
首先將日志加載到數據庫表中:
SQL> create table ealog as
2 select * from eygle_Access_log_20061016;
Table created.
SQL> desc ealog;
Name Null? Type
----------------------------------------------------------------- -------- --------------------------------------------
IP_ADDRESS_DATE VARCHAR2(100)
ACC_FILE VARCHAR2(400)
ACC_CDSZ VARCHAR2(20)
ACC_URL VARCHAR2(400)
LEFT_BLANK VARCHAR2(10)
ACC_AGENT VARCHAR2(400)
SQL> select count(*) from ealog;
COUNT(*)
----------
165443
然後我們強制刷新Buffer Cache,消除Cache的影響,再次執行查詢:
SQL>alter session set events = 'immediate trace name flush_cache';
Session altered.
Elapsed: 00:00:00.03
SQL> select count(distinct(substr(ip_address_date,1,instr(ip_address_date,' ')))) uip
2 from ealog;
UIP
----------
7528
Elapsed: 00:00:02.15
此時用了大約2秒的時間,也就是說,外部表的性能較數據庫表大約慢了3倍左右。
繼續,我們可以查詢當日網站中,哪些網頁是被最頻繁訪問的:
SQL> select replace((replace(acc_file,'GET ','www.eygle.com')),'HTTP/1.1') accfile,ct from (
2 select ACC_FILE,count(*) ct from eygle_Access_log_20061016
3 where acc_file like '%htm%'
4 group by acc_file order by ct desc)
5 where rownum <21;
ACCFILE CT
-------------------------------------------------------------------------------- ----------
www.eygle.com/index-tech.htm 110
www.eygle.com/archives/2006/10/wish_home.Html 103
www.eygle.com/index-ha.htm 79
www.eygle.com/me/fairy_tale_leaf.htm 77
www.eygle.com/archives/2006/11/use_Oracle_external_table.Html 73
www.eygle.com/index-sql.htm 69
www.eygle.com/archives/2006/10/tom_Oracle_9i10g.Html 68
www.eygle.com/archives/2008/08/my_book_services.Html 63
www.eygle.com/archives/2006/11/welcome_frIEnd.Html 62
www.eygle.com/archives/2006/10/veritas_vcs_simulator.Html 61
www.eygle.com/index-case.htm 60
www.eygle.com/archives/2004/08/aoaouiiciona.Html 59
www.eygle.com/archives/2006/08/Oracle_fundbook_recommand.Html 52
www.eygle.com/archives/2006/08/5460_8174.Html 49
www.eygle.com/archives/2004/12/gmailaeaeoa.Html 48
www.eygle.com/archives/2005/06/howlsmovingcast.Html 48
www.eygle.com/gbook/index.Html 48
www.eygle.com/index-hist.htm 44
www.eygle.com/index-special.htm 41
www.eygle.com/index-f&l.htm 37
20 rows selected.
Elapsed: 00:00:06.31
SQL>
通過外部表及SQL查詢,只要日志文件中存在的信息,都可以很容易的被獲取和分析.
-The End-