Académique Documents
Professionnel Documents
Culture Documents
Sudhakar Karegowdra, Esteban Donato Travelocity, May 25TH 2011 { sudhakar.karegowdra, esteban.donato}@travelocity.com
Q&A
3
First Online Travel Agency(OTA) Launched in 1996 Grown to 3,000 employees and is one of the largest travel agencies worldwide Headquartered in Dallas/Fort Worth with satellite offices in San Francisco, New York, London, Singapore, Bangalore, Buenos Aires to name a few In 2004, the Roaming Gnome became the centerpiece of marketing efforts and has become an international pop icon Owned by Sabre Holdings - sister companies include Travelocity Business, IgoUgo.com, lastminute.com, Zuji among others
Speakers Background
Sudhakar Karegowdra
Principal Architect Travelocity.com
My experience
13 + years Solr/ Lucene 3 years Implementing Hadoop, Pig and Hive for Data warehouse.
Esteban Donato
Lead Architect Travelocity.com
My experience
10 + years Solr 2 years Analyzing Mahout and Carrot2 for document clustering engine.
Topic : Merchandising
Merchandising
By Sudhakar Karegowdra
The Challenge
Market Drivers
Build Landing Pages with Faceted Navigation Enable Content Segmentation and delivery Support Roll out of Promotions Roll up Data to a higher level
E.g., All 5 star hotels in California to bring all the 5 Star hotels from SFO,LAX, SAN etc.,
Faster time to market new Ideas Rapidly scale to accommodate global brands with disparate data sources
The Challenge
Traditional Database approach
Higher time to market Specialized skill set to design and optimize database structures and queries Aggregation of data and changing of structures quite complex Building Faceted navigation capabilities needs complex logic leading to high maintenance cost
Solution - Overview
Data from various sources aggregated and ingested into Solr
Core per Locale and Product Type
Wrapper service to combine some data across product cores and manage configuration rules Solrs built in Search and Faceting to power the navigation
Services/Business Logic
Solr Slaves (Multi Core) Solr Master (Multi Core) Offer Management Tool
Oracle
ETL
Deals
Products
10
Solution - Achievements
Millions of unique Long Tail Landing Pages
E.g., http://www.travelocity.com/hotel-d4980-nevada-las-vegashotels_5-star_business-center_green
Segmented Content delivery through tagging Scaled well to distribute the content to different brands, partners and advertisers Opened up for other innovative applications
Deals on Map, Deals on Mobile, Wizards etc.,
11
Query boosting by Search pattern Near Real time Updates Deal and user behavior mining in Hadoop MapReduce and Solr to Serve the Content Move Slaves to Cloud
12
Response
Requests : 70 tps Average response time : 0.005 seconds (5 ms)
Software Versions
Solr Version 1.4.0
filterCache size : 30000
13
Take Away
Semi Structured Storage in Solr helps aggregate disparate sources easily
Remember Dynamic fields
Multiple Cores to manage multiple locale data Solr is a great enabler of Innovations
14
Location Resolution
By Esteban Donato
15
The Challenge
How to develop a global location resolution service? Flexibility to changes General enough to cover everyone needs Multi language Performance and scalability Configurable by site
16
Solr Slave
Master/Slave architecture SolrJ client each core Multi-core: binary format Solr response cache represents a language Remote Streaming indexing CSV format
Solr Master
Management Tool
Location DB
Batch Job
17
Auto-complete
System has to suggest options as the users type their desired location Examples san => San Francisco, veg => Las Vegas Relevancy: not all the locations are equally important. par => Paris, France; Parana, Argentina Users can search by various fields: location code, location name, city code, city name, state/province code, state province name, country code, country name.
18
Solr schema
<dynamicField name="RANK*" type="int" required="false" indexed="true" stored="true" /> <field name="GLS_FULL_SEARCH" type="glsSearchField" required="false" indexed="true" stored="false" multiValued="true" /> <fieldType name="glsSearchField" class="solr.TextField" positionIncrementGap="100> <analyzer> <tokenizer class="solr.PatternTokenizerFactory" pattern="[/\-\t ]+" /> <filter class="solr.LowerCaseFilterFactory" /> <filter class="solr.TrimFilterFactory" /> <filter class="solr.ISOLatin1AccentFilterFactory" /> <filter class="solr.RemoveDuplicatesTokenFilterFactory" /> <filter class="solr.PatternReplaceFilterFactory" pattern="[,.]" replacement="" replace="all"/> </analyzer> </fieldType>
19
Resolution
System has to resolve the location requested by the users. Contemplates aliases. Big Apple => New York Contemplates ambiguities. Contemplates misspellings. Lomdon => London
NGramDistance algorithm. How to combine distance with relevancy Error suggesting the correct location when it is a prefix. Lond => London
20
Spellchecker configuration
<fieldType name=" spellcheckType " class="solr.TextField" positionIncrementGap="100> <analyzer> <tokenizer class="solr.KeywordTokenizerFactory /> <filter class="solr.LowerCaseFilterFactory" /> <filter class="solr.TrimFilterFactory" /> <filter class="solr.ISOLatin1AccentFilterFactory" /> <filter class="solr.RemoveDuplicatesTokenFilterFactory" /> <filter class="solr.PatternReplaceFilterFactory" pattern="[,.]" replacement="" replace="all"/> </analyzer> </fieldType>
21
Cache configuration
queryResultCache: maxSize=1024 documentCache, maxSize=1024 fieldValueCache & filterCache disabled
22
Wrap Up
Performance always as top priority Develop simple but robust services Provide a simple API
23
Q&A
24
Contact
Esteban Donato
Esteban.donato@travelocity.com Twitter: @eddonato
Sudhakar Karegowdra
Sudhakar.karegowdra@travelocity.com Twitter: @skaregowdra