Académique Documents
Professionnel Documents
Culture Documents
! Joining
! Result grouping
! Conclusion
Background Example
! Product ! Name ! Description ! Product-item ! Color ! Size ! Price
! Non Lucene based approach: ! If free text search isnt very important use a relational database.
! Grouping & joining aren't naturally supported. ! All the solutions do increase the search time.
Joining
Modelling relations
Joining Introduction
! Support for parent child like search from Lucene 3.4 ! Not a SQL join.
10
! IndexWriter#addDocuments(docs);
11
! App is responsible for identifying block documents. ! Marking the last document in a block.
! Adding a document to a block requires you to reindex the whole block. ! Removing a document from a block doesnt requires reindexing a block.
12
13
14
Add block
Add block
15
Joining ToParentBlockJoinQuery
! Parent filter marks the parent documents.
16
17
! The second phase returns the documents that match with the collected terms from the previous phase in the toField.
! Two different implementations: ! JoinUtil - Lucene (! 3.6) ! Join query parser - Solr (trunk)
Searchworkings.org - The online search community
Thursday, May 17, 2012
18
19
20
! Result will contain one product. ! Possible to join over two indices.
21
! Use block join if you care about scoring. ! Frequent updates can be problematic. ! Use query time join for parent child filtering. ! Query time join is slower than index time join.
22
Result grouping
Previously known as Field Collapsing.
! Search hit represents a group. ! Facet counts & total hit count represent groups.
! Per group collect information ! Most relevant document. ! Top three documents. ! Aggregated counts
Searchworkings.org - The online search community
Thursday, May 17, 2012
24
! Collapse similar looking documents ! E.g. all results from the Wikipedia domains.
! Remove duplicates from the search result. ! Based on a field that contains a hash
25
26
! Two pass result grouping. ! Grouping by indexed field, function or doc values.
27
! Second pass collects data for each top group. ! The top N documents per group. ! Possible other aggregated information.
28
29
30
31
! Facet and total count can represent groups instead of documents. ! But requires more query time.
32
Conclusion
Compare...
! Joining ! + Fast & no data duplication ! - Index time join not optimal for updates ! - Query time join is limited.
Searchworkings.org - The online search community
Thursday, May 17, 2012
34
35
Any questions?
36
Thursday, May 17, 2012
Extra slides
We have time left!
! Joining ! Distributed support. ! Represent a hit as a parent child relation in the search result.
! Result grouping ! Aggregated grouped information like: sum, avg, min, max etc.
Searchworkings.org - The online search community
Thursday, May 17, 2012
38
Joining ToParentBlockJoinCollector
! TopGroups contains a group per top N parent document. ! Each group contains a parent and child documents.
Searchworkings.org - The online search community
Thursday, May 17, 2012
39
! Facet counts can be based on: ! Found documents. ! Found groups. ! Combination of facet value and group.
40
41