程式師世界 >> 數據庫知識 >> SyBase數據庫 >> SyBase教程 >> Hive索引

Hive索引

編輯：SyBase教程

Hive索引

1、 Hive索引概述

Hive的索引目的是提高Hive表指定列的查詢速度。

沒有索引時，類似'WHERE tab1.col1 = 10' 的查詢，Hive會加載整張表或分區，然後處理所有的rows，但是如果在字段col1上面存在索引時，那麼只會加載和處理文件的一部分。

與其他傳統數據庫一樣，增加索引在提升查詢速度時，會消耗額外資源去創建索引和需要更多的磁盤空間存儲索引。

Hive 0.7.0版本中，加入了索引。Hive 0.8.0版本中增加了bitmap索引。

2、索引相關的配置參數

hive.index.compact.file.ignore.hdfs

Default Value: false

Added In: Hive 0.7.0 withHIVE-1889

在索引文件中存儲的hdfs地址將在運行時被忽略，如果開啟的話；如果數據被遷移，那麼索引文件依然可用，默認是false

hive.optimize.index.filter

Default Value: false

Added In: Hive 0.8.0 withHIVE-1644

是否自動使用索引, 默認是false

hive.optimize.index.filter.compact.minsize

Default Value: 5368709120

Added In: Hive 0.8.0 withHIVE-1644

壓縮索引自動應用的最小輸入大小

hive.optimize.index.filter.compact.maxsize

Default Value: -1

Added In: Hive 0.8.0 withHIVE-1644

壓縮索引自動應用的最大輸入大小,負值代表正無窮

hive.index.compact.query.max.size

Default Value: 10737418240

Added In: Hive 0.8.0 withHIVE-2096

一個使用壓縮索引做的查詢能取到的最大數據量，默認是10737418240 個byte；負值代表無窮大；

hive.index.compact.query.max.entries

Default Value: 10000000

Added In: Hive 0.8.0 withHIVE-2096

使用壓縮索引查詢時能讀到的最大索引項數，默認是10000000；負值代表無窮大；

hive.exec.concatenate.check.index

Default Value: true

Added In: Hive 0.8.0 withHIVE-2125

如果設置為true，那麼在做ALTER TABLE tbl_name CONCATENATE on a table/partition（有索引）操作時，拋出錯誤；可以幫助用戶避免index的刪除和重建；

hive.optimize.index.groupby

Default Value: false

Added In: Hive 0.8.1 withHIVE-1694

hive.index.compact.binary.search

Default Value: true

Added In: Hive 0.8.1with HIVE-2535

在索引表中是否開啟二分搜索進行索引項查詢，默認是true；

3、索引示例

注意：在Hive 0.12.0以及之前版本中，索引名稱在create index和drop index語句中是大小寫敏感的。然而，alter index 需要一個小寫的索引名字。

此bug在Hive 0.13.0解決，此版本開始使索引名字大小寫不敏感。

對於Hive 0.13.0之前的版本，最好使用小寫的索引名字。

下面介紹索引的常見用法：

A、 Create/build，show和drop index

create index table01_index ontable table01(column2) as 'COMPACT' with deferred rebuild;

show index on table01;

drop index table01_index ontable01;

B、 Create then build，show formatted和drop index

create index table02_index ontable table02(column3) as 'compact' with deferred rebuild;

alter index table02_index ontable02 rebuild;

show formatted index ontable02;

drop index table02_index ontable02;

C、創建bitmap索引，build,show 和drop

createindex table03_index on table table03 (column4) as 'bitmap' with deferred rebuild;

alter index table03_index ontable03 rebuild;

show formatted index ontable03;

drop index table03_index on table03;

D、在一張新表上創建索引

createindex table04_index on table table04 (column5) as 'compact'with deferred rebuild in tabletable04_index_table;

E、創建索引，存儲格式為RCFile

create index table05_index ontable table05 (column6) as 'compact' with deferred rebuildstored as rcfile;

F、創建索引，存儲格式為TextFile

create index table06_index ontable table06 (column7) as 'compact' with deferredrebuild row format delimited fields terminated by '\t' stored as textfile;

G、創建帶有索引屬性的索引

create index table07_index ontable table07 (column8) as 'compact' with deferred rebuild idxproperties("prop1"="value1", "prop2"="value2");

H、創建帶有表屬性的索引

create index table08_index ontable table08 (column9) as 'compact' withdeferred rebuild tblproperties("prop3"="value3", "prop4"="value4");

I、如果索引存在，則刪除

drop index if exists table09_indexon table09;

J、在分區上重建索引

alter index table10_index on table10partition (columnx='valueq', columny='valuer') rebuild;

4、索引測試

(1) 查詢表中行數

hive (hive)> select count(1)from userbook;

4409365

(2) 表中未創建索引前查詢

hive (hive)> select * fromuserbook where book_id = '15999998838';

Query ID =hadoop_20150627165551_595da79a-0e27-453b-9142-7734912934c4

Total jobs = 1

Launching Job 1 out of 1

Number of reduce tasks is setto 0 since there's no reduce operator

Starting Job =job_1435392961740_0012, Tracking URL =http://gpmaster:8088/proxy/application_1435392961740_0012/

Kill Command =/home/hadoop/hadoop-2.6.0/bin/hadoop job -kill job_1435392961740_0012

Hadoop job information forStage-1: number of mappers: 2; number of reducers: 0

2015-06-27 16:56:04,666 Stage-1map = 0%, reduce = 0%

2015-06-27 16:56:28,974 Stage-1map = 50%, reduce = 0%, Cumulative CPU4.36 sec

2015-06-27 16:56:31,123 Stage-1map = 78%, reduce = 0%, Cumulative CPU6.21 sec

2015-06-27 16:56:34,698 Stage-1map = 100%, reduce = 0%, Cumulative CPU7.37 sec

MapReduce Total cumulative CPUtime: 7 seconds 370 msec

Ended Job =job_1435392961740_0012

MapReduce Jobs Launched:

Stage-Stage-1: Map: 2 Cumulative CPU: 7.37 sec HDFS Read: 348355875 HDFS Write: 76 SUCCESS

Total MapReduce CPU Time Spent:7 seconds 370 msec

userbook.book_id userbook.book_name userbook.author userbook.public_date userbook.address

15999998838 uviWfFJ KwCrDOA 2009-12-27 3b74416d-eb69-48e2-9d0d-09275064691b

Time taken: 45.678 seconds, Fetched: 1 row(s)

(3) 創建索引

hive (hive)> create indexuserbook_bookid_idx on table userbook(book_id) as 'COMPACT' WITH DEFERREDREBUILD;

(4) 創建索引後再執行查詢

hive (hive)> select * fromuserbook where book_id = '15999998838';

Query ID =hadoop_20150627170019_5bb5514a-4c8e-4c47-9347-ed0657e1f2ff

Total jobs = 1

Launching Job 1 out of 1

Number of reduce tasks is setto 0 since there's no reduce operator

Starting Job =job_1435392961740_0013, Tracking URL = http://gpmaster:8088/proxy/application_1435392961740_0013/

Kill Command =/home/hadoop/hadoop-2.6.0/bin/hadoop job -kill job_1435392961740_0013

Hadoop job information forStage-1: number of mappers: 2; number of reducers: 0

2015-06-27 17:00:30,429 Stage-1map = 0%, reduce = 0%

2015-06-27 17:00:54,003 Stage-1map = 50%, reduce = 0%, Cumulative CPU7.43 sec

2015-06-27 17:00:56,181 Stage-1map = 78%, reduce = 0%, Cumulative CPU9.66 sec

2015-06-27 17:00:58,417 Stage-1map = 100%, reduce = 0%, Cumulative CPU10.83 sec

MapReduce Total cumulative CPUtime: 10 seconds 830 msec

Ended Job =job_1435392961740_0013

MapReduce Jobs Launched:

Stage-Stage-1: Map: 2 Cumulative CPU: 10.83 sec HDFS Read: 348356271 HDFS Write: 76 SUCCESS

Total MapReduce CPU Time Spent:10 seconds 830 msec

userbook.book_id userbook.book_name userbook.author userbook.public_date userbook.address

15999998838 uviWfFJ KwCrDOA 2009-12-27 3b74416d-eb69-48e2-9d0d-09275064691b

Time taken: 40.549 seconds, Fetched: 1 row(s)

可以看到創建索引後，速度還是稍快一點的。

其實對於這種簡單的查詢，通過我們的設置，可以不用啟動Map/Reduce的，而是啟動Fetch task，直接從HDFS文件中filter過濾出需要的數據，需要設置如下參數：

set hive.fetch.task.conversion=more;

hive (hive)> select * fromuserbook where book_id = '15999998838';

userbook.book_id userbook.book_name userbook.author userbook.public_date userbook.address

15999998838 uviWfFJ KwCrDOA 2009-12-27 3b74416d-eb69-48e2-9d0d-09275064691b

Time taken: 0.093 seconds,Fetched: 1 row(s)

可以看到速度更快了，畢竟省略掉了開啟MR任務，執行效率提高不少。

上一頁:Hive中將查詢結果導出到指定分隔符的文件中
下一頁:Couch的MapReduce查詢

SyBase教程

Sybase數據庫因SAP收購而重獲新生？

BKJIA外電頭條】在SAP完成對Sybase收購前，Sy

Sybase的安裝步驟

前兩天由於要給銀行做項目,銀行的數據庫是Sybase.

Java從入門到精通——數據庫篇MongoDB導出,導入,備份

Java從入門到精通——數據庫篇MongoDB導出,導入,備

Datazen配置

Datazen配置 D

數據清洗小記：“時間段”數據獲取小心錯誤的習慣

數據清洗小記：“時間段”數據獲取小心錯誤的習慣【背景】

Sybase為德國戰車SAP裝上新引擎？

BKJIA數據庫頻道近日報道《Sybase數據庫因SAP收

熱門圖文

精細講解最簡便的備份MySQL數據庫的方法 MySQL 處理重復數據 SQLSREVER如何創建和使用動態游標數據庫---實驗三嵌套查詢和視圖操作，---嵌套 Oracle數據庫及應用程序優化 MySQL緩存的查詢和清除命令使用詳解 oracle替代變量 mysql 表索引的一些要點

欄目導航

SyBase綜合文章 SyBase教程