最近項目中需要使用thrift和php來讀寫HBase中的相關數據,所以就整理了下相關的類,做了下測試.
現在自己用到的操作HBase的方式主要有以下幾種:
1.HBase Shell, 主要是配置後執行 shell 通過命令查看 HBase 中的數據,比如 count 'xxxx', scan 'xxxx' 等.
2.通過Native Java Api , 自己封裝了一個 RESTfull的Api , 通過提供的Api(http)方式來操作HBase
3.使用Thrift 的序列化技術,Thrift支持C++,PHP,Python等語言,適合其他的異構系統操作HBase,這塊剛剛嘗試
4.使用HBasExplorer,之前寫的一個圖形化的客戶端來操作HBase, http://www.cnblogs.com/scotoma/archive/2012/12/18/2824311.html
5. Hive/Pig , 這個現在還沒真正的用過.
當前主要講第三種方式 Thrift, 這個是Facebook開源出來的, 官方網站是 http://thrift.apache.org/ .
下載安裝和啟動,請看參考文章中的內容
查看是否跑成功...
使用php 類文件操作Hbase, 生成類文件的方式,請看參考文章中的生產的方法,不過我自己測試的生成方法有Bug,生成的 類文件中 namespace 是空的, 但是從官方源碼庫中生成的是 namespace Hbase, 所以這裡需要注意一下.
我調試了一個驅動類文件,放到了github上了,大家需要的可以下載使用.
https://github.com/xinqiyang/buddy/tree/master/Vender/thrift
接下來進行測試操作,參考http://blog.csdn.net/hguisu/article/details/7298456 這裡的測試類,寫了個測試,並調試了下
<?php /*** Thrift Test Class by xinqiyang */ ini_set('display_error', E_ALL); $GLOBALS['THRIFT_ROOT'] = './lib'; /* Dependencies. In the proper order. */ require_once $GLOBALS['THRIFT_ROOT'].'/Thrift/Transport/TTransport.php'; require_once $GLOBALS['THRIFT_ROOT'].'/Thrift/Transport/TSocket.php'; require_once $GLOBALS['THRIFT_ROOT'].'/Thrift/Protocol/TProtocol.php'; require_once $GLOBALS['THRIFT_ROOT'].'/Thrift/Protocol/TBinaryProtocol.php'; require_once $GLOBALS['THRIFT_ROOT'].'/Thrift/Transport/TBufferedTransport.php'; require_once $GLOBALS['THRIFT_ROOT'].'/Thrift/Type/TMessageType.php'; require_once $GLOBALS['THRIFT_ROOT'].'/Thrift/Factory/TStringFuncFactory.php'; require_once $GLOBALS['THRIFT_ROOT'].'/Thrift/StringFunc/TStringFunc.php'; require_once $GLOBALS['THRIFT_ROOT'].'/Thrift/StringFunc/Core.php'; require_once $GLOBALS['THRIFT_ROOT'].'/Thrift/Type/TType.php'; require_once $GLOBALS['THRIFT_ROOT'].'/Thrift/Exception/TException.php'; require_once $GLOBALS['THRIFT_ROOT'].'/Thrift/Exception/TTransportException.php'; require_once $GLOBALS['THRIFT_ROOT'].'/Thrift/Exception/TProtocolException.php'; /* Remember these two files? */ require_once $GLOBALS['THRIFT_ROOT'].'/Types.php'; require_once $GLOBALS['THRIFT_ROOT'].'/Hbase.php'; use Thrift\Protocol\TBinaryProtocol; use Thrift\Transport\TSocket; use Thrift\Transport\TSocketPool; use Thrift\Transport\TFramedTransport; use Thrift\Transport\TBufferedTransport; use Hbase\HbaseClient; //define host and port $host = '192.168.56.56'; $port = 9090; $socket = new Thrift\Transport\TSocket($host, $port); $transport = new TBufferedTransport($socket); $protocol = new TBinaryProtocol($transport); // Create a calculator client $client = new HbaseClient($protocol); $transport->open(); //echo "Time: " . $client -> time(); $tables = $client->getTableNames(); sort($tables); foreach ($tables as $name) { echo $name."\r\n"; } //create a fc and then create a table $columns = array( new \Hbase\ColumnDescriptor(array( 'name' => 'id:', 'maxVersions' => 10 )), new \Hbase\ColumnDescriptor(array( 'name' => 'name:' )), new \Hbase\ColumnDescriptor(array( 'name' => 'score:' )), ); $tableName = "student"; /* try { $client->createTable($tableName, $columns); } catch (AlreadyExists $ae) { var_dump( "WARN: {$ae->message}\n" ); } */ // get table descriptors $descriptors = $client->getColumnDescriptors($tableName); asort($descriptors); foreach ($descriptors as $col) { var_dump( " column: {$col->name}, maxVer: {$col->maxVersions}\n" ); } //set clomn //add update column data $time = time(); var_dump($time); $row = '2'; $valid = "foobar-".$time; $mutations = array( new \Hbase\Mutation(array( 'column' => 'score', 'value' => $valid )), ); $mutations1 = array( new \Hbase\Mutation(array( 'column' => 'score:a', 'value' => $time, )), ); $attributes = array ( ); //add row, write a row $row1 = $time; $client->mutateRow($tableName, $row1, $mutations1, $attributes); echo "-------write row $row1 ---\r\n"; //update row $client->mutateRow($tableName, $row, $mutations, $attributes); //get column data $row_name = $time; $fam_col_name = 'score:a'; $arr = $client->get($tableName, $row_name, $fam_col_name, $attributes); // $arr = array foreach ($arr as $k => $v) { // $k = TCell echo " ------ get one : value = {$v->value} , <br> "; echo " ------ get one : timestamp = {$v->timestamp} <br>"; } echo "----------\r\n"; $arr = $client->getRow($tableName, $row_name, $attributes); // $client->getRow return a array foreach ($arr as $k => $TRowResult) { // $k = 0 ; non-use // $TRowResult = TRowResult var_dump($TRowResult); } echo "----------\r\n"; /****** //no test public function scannerOpenWithScan($tableName, \Hbase\TScan $scan, $attributes); public function scannerOpen($tableName, $startRow, $columns, $attributes); public function scannerOpenWithStop($tableName, $startRow, $stopRow, $columns, $attributes); public function scannerOpenWithPrefix($tableName, $startAndPrefix, $columns, $attributes); public function scannerOpenTs($tableName, $startRow, $columns, $timestamp, $attributes); public function scannerOpenWithStopTs($tableName, $startRow, $stopRow, $columns, $timestamp, $attributes); public function scannerGet($id); public function scannerGetList($id, $nbRows); public function scannerClose($id); */ echo "----scanner get ------\r\n"; $startRow = '1'; $columns = array ('column' => 'score', ); // $scan = $client->scannerOpen($tableName, $startRow, $columns, $attributes); //$startAndPrefix = '13686667'; //$scan = $client->scannerOpenWithPrefix($tableName,$startAndPrefix,$columns,$attributes); //$startRow = '1'; //$stopRow = '2'; //$scan = $client->scannerOpenWithStop($tableName, $startRow, $stopRow, $columns, $attributes); //$arr = $client->scannerGet($scan); $nbRows = 1000; $arr = $client->scannerGetList($scan, $nbRows); var_dump('count of result :'.count($arr)); foreach ($arr as $k => $TRowResult) { // code... //var_dump($TRowResult); } $client->scannerClose($scan); //close transport $transport->close();
這裡操作了 createTable , Insert Row , Get Table , Update Row,Scan Table 這些常用的,先熟悉下.
實際操作的時候,需要注意:
1.php的版本,需要支持命名空間,所以需要5.3以上的支持
2.安裝thrift的php擴展,貌似這個沒有實際用到,還是得使用相關的php文件,誰能寫個擴展就好了.不知道性能是否能夠提升.
3.對於scan的相關操作,測試了 start/stop, prefix的Scan,感覺還是可以的.
4.感覺php的命名空間很挫,怎麼辦..\分割感覺就是那麼的不地道......
接下來,有時間的話,會做下其他的幾個操作,並進行壓力測試,並將這個部署到集群中去.
大家有用Thrift的歡迎交流,感謝hguisu寫的這個文章(參考文章),讓大家能夠盡快的入門.
更新內容:
20130517 在集群上啟動了Thrift發現寫入操作的時候,還是不穩定,有比較嚴重的超時現象,對於這塊的操作,需要進行 php 操作類的優化. 其實感覺操作類還是寫的太復雜的了.
參考文章:
http://blog.csdn.net/hguisu/article/details/7298456