應用Java的Lucene搜刮對象對檢索成果停止分組和分頁。本站提示廣大學習愛好者:(應用Java的Lucene搜刮對象對檢索成果停止分組和分頁)文章只能為提供參考,不一定能成為您想要的結果。以下是應用Java的Lucene搜刮對象對檢索成果停止分組和分頁正文
應用GroupingSearch對搜刮成果停止分組
Package org.apache.lucene.search.grouping Description
這個模塊可以對Lucene的搜刮成果停止分組,指定的單值域被集合到一路。好比,依據”author“域停止分組,“author”域值雷同的的文檔分紅一個組。
停止分組的時刻須要輸出一些需要的信息:
1、groupField:依據這個域停止分組。好比,假如你應用“author”域停止分組,那末每個組外面的書本都是統一個作者。沒有這個域的文檔將被分到一個零丁的組外面。
2、groupSort:組排序。
3、topNGroups:保存若干組。好比,10表現只保存前10組。
4、groupOffset:對排在後面的哪些分組組停止檢索。好比,3表現前往7個組(假定opNGroups等於10)。在分頁外面很有效,好比每頁只顯示5個組。
5、withinGroupSort:組內文檔排序。留意:這裡和groupSort的差別
6、withingroupOffset:對每個分組外面的哪些排在後面的文檔停止檢索。
應用GroupingSearch 對搜刮成果分組比擬簡略
GroupingSearch API文檔引見:
Convenience class to perform grouping in a non distributed environment.
非散布式情況下分組
WARNING: This API is experimental and might change in incompatible ways in the next release.
這裡應用的是4.3.1版本
一些主要的辦法:
示例代碼:
1.先看建索引的代碼
public class IndexHelper { private Document document; private Directory directory; private IndexWriter indexWriter; public Directory getDirectory(){ directory=(directory==null)? new RAMDirectory():directory; return directory; } private IndexWriterConfig getConfig() { return new IndexWriterConfig(Version.LUCENE_43, new IKAnalyzer(true)); } private IndexWriter getIndexWriter() { try { return new IndexWriter(getDirectory(), getConfig()); } catch (IOException e) { e.printStackTrace(); return null; } } public IndexSearcher getIndexSearcher() throws IOException { return new IndexSearcher(DirectoryReader.open(getDirectory())); } /** * Create index for group test * @param author * @param content */ public void createIndexForGroup(int id,String author,String content) { indexWriter = getIndexWriter(); document = new Document(); document.add(new IntField("id",id, Field.Store.YES)); document.add(new StringField("author", author, Field.Store.YES)); document.add(new TextField("content", content, Field.Store.YES)); try { indexWriter.addDocument(document); indexWriter.commit(); indexWriter.close(); } catch (IOException e) { e.printStackTrace(); } } }
2.分組:
public class GroupTest public void group(IndexSearcher indexSearcher,String groupField,String content) throws IOException, ParseException { GroupingSearch groupingSearch = new GroupingSearch(groupField); groupingSearch.setGroupSort(new Sort(SortField.FIELD_SCORE)); groupingSearch.setFillSortFields(true); groupingSearch.setCachingInMB(4.0, true); groupingSearch.setAllGroups(true); //groupingSearch.setAllGroupHeads(true); groupingSearch.setGroupDocsLimit(10); QueryParser parser = new QueryParser(Version.LUCENE_43, "content", new IKAnalyzer(true)); Query query = parser.parse(content); TopGroups<BytesRef> result = groupingSearch.search(indexSearcher, query, 0, 1000); System.out.println("搜刮射中數:" + result.totalHitCount); System.out.println("搜刮成果分組數:" + result.groups.length); Document document; for (GroupDocs<BytesRef> groupDocs : result.groups) { System.out.println("分組:" + groupDocs.groupValue.utf8ToString()); System.out.println("組內記載:" + groupDocs.totalHits); //System.out.println("groupDocs.scoreDocs.length:" + groupDocs.scoreDocs.length); for (ScoreDoc scoreDoc : groupDocs.scoreDocs) { System.out.println(indexSearcher.doc(scoreDoc.doc)); } } }
3.簡略的測試:
public static void main(String[] args) throws IOException, ParseException { IndexHelper indexHelper = new IndexHelper(); indexHelper.createIndexForGroup(1,"紅薯", "開源中國"); indexHelper.createIndexForGroup(2,"紅薯", "開源社區"); indexHelper.createIndexForGroup(3,"紅薯", "代碼設計"); indexHelper.createIndexForGroup(4,"紅薯", "設計"); indexHelper.createIndexForGroup(5,"覺先", "Lucene開辟"); indexHelper.createIndexForGroup(6,"覺先", "Lucene實戰"); indexHelper.createIndexForGroup(7,"覺先", "開源Lucene"); indexHelper.createIndexForGroup(8,"覺先", "開源solr"); indexHelper.createIndexForGroup(9,"散仙", "散仙開源Lucene"); indexHelper.createIndexForGroup(10,"散仙", "散仙開源solr"); indexHelper.createIndexForGroup(11,"散仙", "開源"); GroupTest groupTest = new GroupTest(); groupTest.group(indexHelper.getIndexSearcher(),"author", "開源"); } }
4.測試成果:
兩種分頁方法
Lucene有兩種分頁方法:
1、直接對搜刮成果停止分頁,數據量比擬少的時刻可以用這類方法,分頁代碼焦點參照:
ScoreDoc[] sd = XXX; // 查詢肇端記載地位 int begin = pageSize * (currentPage - 1); // 查詢終止記載地位 int end = Math.min(begin + pageSize, sd.length); for (int i = begin; i < end && i <totalHits; i++) { //對搜刮成果數據停止處置的代碼 }
2、應用searchAfter(...)
Lucene供給了五個重載辦法,可以依據須要應用
ScoreDoc after:為前次搜刮成果ScoreDoc總量減1;
Query query:查詢方法
int n:為每次查詢前往的成果數,即每頁的成果總量
一個簡略的應用示例:
//可使用Map保留需要的搜刮成果 Map<String, Object> resultMap = new HashMap<String, Object>(); ScoreDoc after = null; Query query = XX TopDocs td = search.searchAfter(after, query, size); //獲得射中數 resultMap.put("num", td.totalHits); ScoreDoc[] sd = td.scoreDocs; for (ScoreDoc scoreDoc : sd) { //經典的搜刮成果處置 } //搜刮成果ScoreDoc總量減1 after = sd[td.scoreDocs.length - 1]; //保留after用於下次搜刮,即下一頁開端 resultMap.put("after", after); return resultMap;