Vous êtes sur la page 1sur 24

Using Solr in Online Travel to Improve User Experience

Sudhakar Karegowdra, Esteban Donato Travelocity, May 25TH 2011 { sudhakar.karegowdra, esteban.donato}@travelocity.com

What We Will Cover


Travelocity Speakers Background Merchandising & Solr
Challenges Solution Sizing and performance data Take Away Challenges Solution Sizing and performance data Take Away

Location Resolution & Solr

Q&A
3

First Online Travel Agency(OTA) Launched in 1996 Grown to 3,000 employees and is one of the largest travel agencies worldwide Headquartered in Dallas/Fort Worth with satellite offices in San Francisco, New York, London, Singapore, Bangalore, Buenos Aires to name a few In 2004, the Roaming Gnome became the centerpiece of marketing efforts and has become an international pop icon Owned by Sabre Holdings - sister companies include Travelocity Business, IgoUgo.com, lastminute.com, Zuji among others

Speakers Background
Sudhakar Karegowdra
Principal Architect Travelocity.com
My experience
13 + years Solr/ Lucene 3 years Implementing Hadoop, Pig and Hive for Data warehouse.

Esteban Donato
Lead Architect Travelocity.com
My experience
10 + years Solr 2 years Analyzing Mahout and Carrot2 for document clustering engine.

Topic : Merchandising

Topic : Location Resolution

Merchandising
By Sudhakar Karegowdra

The Challenge
Market Drivers
Build Landing Pages with Faceted Navigation Enable Content Segmentation and delivery Support Roll out of Promotions Roll up Data to a higher level
E.g., All 5 star hotels in California to bring all the 5 Star hotels from SFO,LAX, SAN etc.,

Faster time to market new Ideas Rapidly scale to accommodate global brands with disparate data sources

The Challenge
Traditional Database approach
Higher time to market Specialized skill set to design and optimize database structures and queries Aggregation of data and changing of structures quite complex Building Faceted navigation capabilities needs complex logic leading to high maintenance cost

Solution - Overview
Data from various sources aggregated and ingested into Solr
Core per Locale and Product Type

Wrapper service to combine some data across product cores and manage configuration rules Solrs built in Search and Faceting to power the navigation

Solution Architecture View


UI Widgets Mobile

Services/Business Logic

Solr Slaves (Multi Core) Solr Master (Multi Core) Offer Management Tool

Oracle

ETL

Deals

Products

10

Solution - Achievements
Millions of unique Long Tail Landing Pages
E.g., http://www.travelocity.com/hotel-d4980-nevada-las-vegashotels_5-star_business-center_green

Faster search across products


E.g., Beach Deals under $500

Segmented Content delivery through tagging Scaled well to distribute the content to different brands, partners and advertisers Opened up for other innovative applications
Deals on Map, Deals on Mobile, Wizards etc.,

11

Solution Road Ahead


Migration to Solr 3.1
Geo spatial search CSV out put format

Query boosting by Search pattern Near Real time Updates Deal and user behavior mining in Hadoop MapReduce and Solr to Serve the Content Move Slaves to Cloud

12

Sizing & Performance


Index Stats
Number of Cores : 25 Number of Documents : ~ 1 Million Records

Response
Requests : 70 tps Average response time : 0.005 seconds (5 ms)

Software Versions
Solr Version 1.4.0
filterCache size : 30000

Tomcat 5.5.9 JDK1.6

13

Take Away
Semi Structured Storage in Solr helps aggregate disparate sources easily
Remember Dynamic fields

Multiple Cores to manage multiple locale data Solr is a great enabler of Innovations

14

Location Resolution
By Esteban Donato

15

The Challenge
How to develop a global location resolution service? Flexibility to changes General enough to cover everyone needs Multi language Performance and scalability Configurable by site

16

Architecture of the solution


Auto-complete Resolution

Solr Slave

Master/Slave architecture SolrJ client each core Multi-core: binary format Solr response cache represents a language Remote Streaming indexing CSV format

Solr Master

Management Tool

Location DB

Batch Job

17

Auto-complete
System has to suggest options as the users type their desired location Examples san => San Francisco, veg => Las Vegas Relevancy: not all the locations are equally important. par => Paris, France; Parana, Argentina Users can search by various fields: location code, location name, city code, city name, state/province code, state province name, country code, country name.
18

Solr schema
<dynamicField name="RANK*" type="int" required="false" indexed="true" stored="true" /> <field name="GLS_FULL_SEARCH" type="glsSearchField" required="false" indexed="true" stored="false" multiValued="true" /> <fieldType name="glsSearchField" class="solr.TextField" positionIncrementGap="100> <analyzer> <tokenizer class="solr.PatternTokenizerFactory" pattern="[/\-\t ]+" /> <filter class="solr.LowerCaseFilterFactory" /> <filter class="solr.TrimFilterFactory" /> <filter class="solr.ISOLatin1AccentFilterFactory" /> <filter class="solr.RemoveDuplicatesTokenFilterFactory" /> <filter class="solr.PatternReplaceFilterFactory" pattern="[,.]" replacement="" replace="all"/> </analyzer> </fieldType>

19

Resolution
System has to resolve the location requested by the users. Contemplates aliases. Big Apple => New York Contemplates ambiguities. Contemplates misspellings. Lomdon => London
NGramDistance algorithm. How to combine distance with relevancy Error suggesting the correct location when it is a prefix. Lond => London

20

Spellchecker configuration
<fieldType name=" spellcheckType " class="solr.TextField" positionIncrementGap="100> <analyzer> <tokenizer class="solr.KeywordTokenizerFactory /> <filter class="solr.LowerCaseFilterFactory" /> <filter class="solr.TrimFilterFactory" /> <filter class="solr.ISOLatin1AccentFilterFactory" /> <filter class="solr.RemoveDuplicatesTokenFilterFactory" /> <filter class="solr.PatternReplaceFilterFactory" pattern="[,.]" replacement="" replace="all"/> </analyzer> </fieldType>

21

Sizing & Performance


4 cores with ~ 500,000 documents indexed each Response times
Auto-complete: 15ms, 20 TPS Resolution: 10ms, 2 TPS

Cache configuration
queryResultCache: maxSize=1024 documentCache, maxSize=1024 fieldValueCache & filterCache disabled

22

Wrap Up
Performance always as top priority Develop simple but robust services Provide a simple API

23

Q&A

24

Contact
Esteban Donato
Esteban.donato@travelocity.com Twitter: @eddonato

Sudhakar Karegowdra
Sudhakar.karegowdra@travelocity.com Twitter: @skaregowdra

https://www.facebook.com/travelocity Twitter: @travelocity and @RoamingGnome


25

Vous aimerez peut-être aussi