MySQL 啟動報錯:File ./mysql-bin.index not found (Errcode: 13)。本站提示廣大學習愛好者:(MySQL 啟動報錯:File ./mysql-bin.index not found (Errcode: 13))文章只能為提供參考,不一定能成為您想要的結果。以下是MySQL 啟動報錯:File ./mysql-bin.index not found (Errcode: 13)正文
Elasticsearch是一個散布式、Restful的搜刮及剖析辦事器,Apache Solr一樣,它也是基於Lucence的索引辦事器,但我以為Elasticsearch比較Solr的長處在於:
情況搭建
啟動Elasticsearch,拜訪端口在9200,經由過程閱讀器可以檢查到前往的JSON數據,Elasticsearch提交和前往的數據格局都是JSON.
>> bin/elasticsearch -f
裝置官方供給的Python API,在OS X上裝置後湧現一些Python運轉毛病,是由於setuptools版本太舊惹起的,刪除重裝後恢復正常。
>> pip install elasticsearch
索引操作
關於單條索引,可以挪用create或index辦法。
from datetime import datetime from elasticsearch import Elasticsearch es = Elasticsearch() #create a localhost server connection, or Elasticsearch("ip") es.create(index="test-index", doc_type="test-type", id=1, body={"any":"data", "timestamp": datetime.now()})
Elasticsearch批量索引的敕令是bulk,今朝Python API的文檔示例較少,花了很多時光浏覽源代碼才弄清晰批量索引的提交格局。
from datetime import datetime from elasticsearch import Elasticsearch from elasticsearch import helpers es = Elasticsearch("10.18.13.3") j = 0 count = int(df[0].count()) actions = [] while (j < count): action = { "_index": "tickets-index", "_type": "tickets", "_id": j + 1, "_source": { "crawaldate":df[0][j], "flight":df[1][j], "price":float(df[2][j]), "discount":float(df[3][j]), "date":df[4][j], "takeoff":df[5][j], "land":df[6][j], "source":df[7][j], "timestamp": datetime.now()} } actions.append(action) j += 1 if (len(actions) == 500000): helpers.bulk(es, actions) del actions[0:len(actions)] if (len(actions) > 0): helpers.bulk(es, actions) del actions[0:len(actions)]
在這裡發明Python API序列化JSON時對數據類型支持比擬無限,原始數據應用的NumPy.Int32必需轉換為int能力索引。另外,如今的bulk操作默許是每次提交500條數據,我修正為5000乃至50000停止測試,會有索引不勝利的情形。
#helpers.py source code def streaming_bulk(client, actions, chunk_size=500, raise_on_error=False, expand_action_callback=expand_action, **kwargs): actions = map(expand_action_callback, actions) # if raise on error is set, we need to collect errors per chunk before raising them errors = [] while True: chunk = islice(actions, chunk_size) bulk_actions = [] for action, data in chunk: bulk_actions.append(action) if data is not None: bulk_actions.append(data) if not bulk_actions: return def bulk(client, actions, stats_only=False, **kwargs): success, failed = 0, 0 # list of errors to be collected is not stats_only errors = [] for ok, item in streaming_bulk(client, actions, **kwargs): # go through request-reponse pairs and detect failures if not ok: if not stats_only: errors.append(item) failed += 1 else: success += 1 return success, failed if stats_only else errors
關於索引的批量刪除和更新操作,對應的文檔格局以下,更新文檔中的doc節點是必需的。
{ '_op_type': 'delete', '_index': 'index-name', '_type': 'document', '_id': 42, } { '_op_type': 'update', '_index': 'index-name', '_type': 'document', '_id': 42, 'doc': {'question': 'The life, universe and everything.'} }
罕見毛病
機能
下面是應用MongoDB和Elasticsearch存儲雷同數據的比較,固然辦事器和操作方法都不完整雷同,但可以看出數據庫對批量寫入照樣比索引辦事器更具有優勢。
Elasticsearch的索引文件是主動分塊,到達萬萬級數據對寫入速度也沒有影響。但在到達磁盤空間下限時,Elasticsearch湧現了文件歸並毛病,而且年夜量喪失數據(共丟了100多萬條),停滯客戶端寫入後,辦事器也沒法主動恢復,必需手動停滯。在臨盆情況中這點比擬致命,特別是應用非Java客戶端,仿佛沒法在客戶端獲得到辦事真個Java異常,這使得法式員必需很當心地處置辦事真個前往信息。