Académique Documents
Professionnel Documents
Culture Documents
Sourcing Success
Beyond Boolean Search
Q3, 2009
Authors:
Shally Steckerl Bryan Starbuck
EVP, Arbita Founder & CEO, TalentSpring, Inc.
Semantic Search – Sourcing Success Beyond Boolean Search ARBITA
Table of Contents
The Future of Candidate Sourcing .................................................................................. 3
Ankle Deep in the Deep Web, but Inching Closer to Semantic Search ........................... 3
Semantic Search: Why Recruiters Should Care .............................................................. 3
Semantic Search for Recruiting ....................................................................................... 4
Understanding Semantic Search Fundamentals ............................................................. 6
Literal versus Equivalent Match Searches ................................................................... 6
Soft Keywords: The Hidden Power of Semantic Search .............................................. 7
Search Term Expansion Sets ...................................................................................... 8
Three Different Semantic Approaches .......................................................................... 10
Lexicon- and Ontological- Based Search................................................................... 10
Statistical Analysis and Pattern Matching .................................................................. 10
Contextual Search ..................................................................................................... 10
Broad vs. Narrow Match Semantic Search.................................................................... 10
Targeting Semantic Search ........................................................................................... 11
Example Semantic Search Technologies ...................................................................... 11
Full High End Semantic Search Solutions for Recruiters ........................................... 11
Free Semantic Search Tools for Recruiters ............................................................... 12
Semantic Search RFP Check List ................................................................................. 14
About the Authors: ........................................................................................................ 15
APPENDIX I – Alternate Search Engines...................................................................... 16
TABLE I – Semantic Search Engine Types ................................................................... 17
TABLE II – Semantic Search Engine Types .................................................................. 17
2
Copyright 2009, TalentSpring, Inc. (www.talentspring.com) and Arbita, Inc. (www.arbita.net).
Semantic Search – Sourcing Success Beyond Boolean Search ARBITA
Today, most people think of online search in terms of the capabilities of major search engines like
GoogleTM, Yahoo!TM, or Microsoft’s BingTM. These big search engines utilize Boolean-based keyword
search technology and often require the use of complex syntax and field search commands to find
specific occurrences of information (keywords) within documents. Results are based solely on whether
3
Copyright 2009, TalentSpring, Inc. (www.talentspring.com) and Arbita, Inc. (www.arbita.net).
Semantic Search – Sourcing Success Beyond Boolean Search ARBITA
those keywords are present. The major search engines are constantly experimenting with new ways to
simplify their search queries for users. However, these simplifying efforts don’t really work to understand
the true ‘meaning’ of what is being searched for.
In contrast, Semantic Search technologies seek to simplify search by understanding the actual concept
being sought. Semantic Search engines discover the true relationship between the question being asked
and the content being delivered. Consequently, the user’s ‘experience’ of the search is shifted from
sifting through documents that contain a specific keyword to reading documents that express the concept
originally being sought.
One of the biggest challenges of search engines is their difficulty to understand the ‘context’ of the
search. It is context that determines if the word ‘well’ refers to a ‘bucket’ as in, “Draw water from the well”
or a ‘person’, as in, “Is she not feeling well?” As a human, if you read “stair well” you automatically know
what it means. Computers, on the other hand, have to calculate hundreds of variations and probabilities
to arrive at a best guess.
Semantic Search engines make sense of sentence context by being pre-configured (trained) to
understand who the user is and what the likely context of the search term is. To illustrate, imagine two
people searching for a Marketing Manager position on the Web. One person is a recruiter, the other is a
job candidate. With a regular search engine both people would get the same results. However, with a
Semantic Search engine, that knew the user was a recruiter, only candidate resumes would be received,
while job listings would be ignored. Likewise, the job candidate would only see job listings.
“Semantic Search technology has not yet reached the level of fully comprehension.
However, a number of technology vendors have taken Semantic Search far beyond
the capabilities of Boolean search to make online recruiting simpler and faster.”
Shally Steckerl – Arbita
4
Copyright 2009, TalentSpring, Inc. (www.talentspring.com) and Arbita, Inc. (www.arbita.net).
Semantic Search – Sourcing Success Beyond Boolean Search ARBITA
Today, intelligent Semantic Search tools are becoming available to recruiters. The best of these search
tools outpace the ability of complex syntax and Boolean search to automatically match the content of an
individual’s resumes to a job description, and rank the results according to what is most important to the
recruiter.
• Semantic Search is far easier to learn than complex syntax and field search commands because
it doesn’t require significant technical skills to get good results (i.e. there’s no need to use
commands like intitle, inurl, site, and filetype).
• Semantic Search can save recruiters significant time by automatically identifying which terms to
search on in the job description.
• Semantic Search provides recruiters with more accurate resume matches by pre-filtering results
for such things as candidate qualifications (skills, experience, education, etc.) and work history
characteristics (job hopping, job similarity, etc.)
• Semantic Search increases search match quality by taking into account all needs of the job
requisition and candidate resume (e.g. detects job seekers who no longer work in the job title
matching the requisition).
• Semantic Search can identify high-quality candidates whose resumes don’t conform (are hidden)
to the rigid terms used in complex search strings.
From a practical perspective, Semantic Search for recruiters means that they don’t have to acquire
special skills building search strings to find candidates. In fact, applications that do Semantic Search well
don’t even require the recruiter to interact with keywords at all. The Semantic Search engines ‘read’ the
job description, understand the key attributes being sought, and then automatically builds an expanded
content list to search for resumes. The result is that match quality is much better than regular search
methods.
Semantic Search for recruiting refers to finding the best resumes or profiles that match
the needs of a job description. It requires going beyond search that simply understands
sentence structure to factoring in the needs of employer to find resumes
5
Copyright 2009, TalentSpring, Inc. (www.talentspring.com) and Arbita, Inc. (www.arbita.net).
Semantic Search – Sourcing Success Beyond Boolean Search ARBITA
For the layperson, the most noticeable difference between Boolean and Semantic Search engines is the
flexibility around matching the search keywords used. With Boolean search, an exact match to the
search terms (keywords(s)) is required. With Semantic Search, matches can include equivalent words as
well. While Literal versus Equivalent might sound simple, it is worth looking at an example: Imagine a
recruiter has the option of using a Boolean Search engine or a Semantic Search engine to fill the
following position:
Job Title: Software Engineer, Level: Team Supervisor, Company: Hewlett-Packard, Product
Line: Scanners, Requirements: 1. background in image processing algorithms, 2. experience
writing hardware device drivers, 3. 5 years experience, 4. Masters in Computer Science
With the Boolean search engine s/he might search on: “Software Engineer” AND “image processing” and
“device drivers”. Only resumes that literally conformed to the search term would be included in the
results.
With a Semantic Search engine, trained for recruiting, the user would see all candidates that had
equivalent term matches (i.e. computer programmer, image algorithms, image biubic, device DDK, etc.).
6
Copyright 2009, TalentSpring, Inc. (www.talentspring.com) and Arbita, Inc. (www.arbita.net).
Semantic Search – Sourcing Success Beyond Boolean Search ARBITA
Another big advantage of Semantic Search engines is that they provide greater linguistic variation
between job descriptions and job resumes by ‘weighting’ the value of individual keywords. These
weighted or ‘soft’ search terms enable the search engine to find the most relevant content.
An example of how ‘soft’ keywords can increase the flexibility of a job search can be seen in the following
example where the job description is for a Systems/Mechanical Engineer responsible for the design,
analysis and development of optimal surfaces specifically used in gears, joints, and actuators.
Semantic Search finds unexpected matches between the job description and candidate
resumes by searching for ‘soft’ keywords in the job description and candidate resumes.
Boolean Search's exact match logic doesn’t allow for this kind of matching flexibility.
7
Copyright 2009, TalentSpring, Inc. (www.talentspring.com) and Arbita, Inc. (www.arbita.net).
Semantic Search – Sourcing Success Beyond Boolean Search ARBITA
The main advantage of Semantic Search engines is the ability to find keywords and phrases that expand
from the original keyword(s) being searched for. Semantic Search engines do this by building expansion
sets, or lists of linguistically-equivalent meanings. This capability enables the Semantic Search engine to
find ‘hidden’ matches to the user’s intended search, which regular search engines would normally filter
out.
The advantage of using expansion sets can be seen in the following illustration. With Boolean Search
Engines, the number of potential matches is limited to only those resumes that match the specific
keywords being searched on.
With Semantic Search engines, each original keyword is expanded to include many semantically identical
keywords that increase the match opportunity significantly. The result is that far more matches can be
found with Semantic Search than regular Boolean Search.
8
Copyright 2009, TalentSpring, Inc. (www.talentspring.com) and Arbita, Inc. (www.arbita.net).
Semantic Search – Sourcing Success Beyond Boolean Search ARBITA
The following table shows an expansion set for a Nokia Programmer, programming for games using
OpenGL language. You can easily see how the Semantic Search Engine’s expansion set offers far more
match possibilities than a regular search engine would.
Semantic Search engines use Term Expansion to find larger keyword match sets, and
deliver more accurate results, by including linguistically-equivalent search terms in their
search sets and utilizing advanced filters and ranking algorithms to calibrate the results.
For recruiters, Semantic Search term expansion enables them to find excellent
candidates without requiring them to be subject matter or Boolean Search experts.
9
Copyright 2009, TalentSpring, Inc. (www.talentspring.com) and Arbita, Inc. (www.arbita.net).
Semantic Search – Sourcing Success Beyond Boolean Search ARBITA
In the field of information technology, Lexicon refers to a specific vocabulary or list of words related to a
particular domain, discipline or topic and Ontology refers to the description of concepts and relationships
that can exist within a data structure. Search engines that use this kind of approach attempt to map the
specific search lexicon of the search to the ontology domain.
A true semantic search system must encapsulate the knowledge of languages to emulate understanding
of meaning. Because of this requirement, search engines that use statistical analysis of ranking of links,
symbols, words, and clicking behaviors are not considered to be truly Semantic Search engines.
However, these engines can approximate the understanding of meaning by providing close matches,
particularly when the data is fairly homogenous.
Contextual Search
Contextual Search tries to understand meaning of a search by inferring it from the context around the
location of the data. This is usually done by: analyzing and ranking links pointing to a particular document;
specializing in only one category of information (Vertical Search); extracting summaries from the results;
and/or allowing the user an interface with which they can filter or disambiguate the search results
(Faceted Search).
10
Copyright 2009, TalentSpring, Inc. (www.talentspring.com) and Arbita, Inc. (www.arbita.net).
Semantic Search – Sourcing Success Beyond Boolean Search ARBITA
There are really two kinds of Semantic Search engines: broad search and narrow search. Narrow search
engines from companies like TalentSpring are designed for specific search problems like candidate
sourcing. Broad search engines from companies like Power Set or Autonomy are designed to find any
kind of documentation across an organization’s electronic documentation platform. A Semantic Search
engine that has been specifically designed to focus on recruiting is going to give you the most precise
candidate search results. A Semantic Search engine that has been designed for broad matching will
have the largest volume of results.
11
Copyright 2009, TalentSpring, Inc. (www.talentspring.com) and Arbita, Inc. (www.arbita.net).
Semantic Search – Sourcing Success Beyond Boolean Search ARBITA
• Trovix.com (now owned by Monster) primarily uses a sophisticated lexicon to match skills in
resumes to requirements in job descriptions. In also learns from users’ behavior to extract and
rank search criteria not included in the original search parameters.
• Deepdyve.com applies pattern matching to identify complex data found in the Deep Web (i.e.,
Web content within databases and other dynamic data sources not typically indexed by search
engines). The result is highly relevant results displayed in ways that can be easily organized and
visualized.
• Factbites.com lies somewhere between link analysis and document summarization, taking
excerpts of results and making them into meaningful sentences.
12
Copyright 2009, TalentSpring, Inc. (www.talentspring.com) and Arbita, Inc. (www.arbita.net).
Semantic Search – Sourcing Success Beyond Boolean Search ARBITA
• Hakia.com attempts to anticipate questions that “could be” asked about a document found in its
database, then ranks search results along an index that measures sentences depending on how
closely they match the concept related to the search query. Hakia employs a lexicon of
relationships between concepts and measurements of relevancy based on credibility and age of
content.
• Lexxe.com utilizes linguistics (natural language processing) and categorization, and works by
eliminating irrelevant content, then providing visual keyword drilldowns to help derive meaning
from a query.
• Sensebot.net is a summarization engine that extracts key phrases and sentences from top
results, making it less necessary for a user to drill down and click on individual links.
• Twingly.com is faceted social search focusing only on blog and micro blog content.
• Vertical People Search like wink.com, spock.com, zoominfo.com are engines index a multitude
of websites and deep web content focus only on one domain or topic. For example, Wink.com
focuses on people from social networks while Spock.com and Zoominfo.com collate biographical
information about people.
• Yedda.com answers questions by combining combines natural language processing with user
behavior learning.
13
Copyright 2009, TalentSpring, Inc. (www.talentspring.com) and Arbita, Inc. (www.arbita.net).
Semantic Search – Sourcing Success Beyond Boolean Search ARBITA
Recruiter-Specific Semantic Search engine. Focused-match search engines will deliver far better
results than general document search engines. You will not have to “train” the engine on how to
identify good candidates.
Resume List Depth: search engines that come pre-populated with job titles, skills, certifications,
education and experience levels will perform far better than engines that require will first have to
be trained by your company or the vendor. Look for training sets (the number of job profiles used
to train the system) being greater than 10 million profiles.
User Selectable Source: This is the ability for the user to define which resume sources they want
to pull from (i.e. specific job boards, social network, or the organization’s ATS system). This is an
important feature with regard to controlling where your candidates come from.
ATS Interoperability: The ability for the search engine to search your existing ATS database in
addition to external resume sources.
OFCCP Compliance: The ability for the semantic search engine to support your existing OFCCP
process (if used by your organization).
Geographic Sourcing: the ability to specify recruiting geography (local, regional, national, etc.)
Industry Sourcing: the ability to specify which industry you are recruiting from
Marketing Module: does the vendor provide tools that enable you to either selectively or mass-
send recruiting ads/emails to potential candidates?
14
Copyright 2009, TalentSpring, Inc. (www.talentspring.com) and Arbita, Inc. (www.arbita.net).
Semantic Search – Sourcing Success Beyond Boolean Search ARBITA
Because of his passion for the Internet as a recruitment tool and his continually innovative
methods, Shally Steckerl has developed a reputation as an authority in Internet search and a
pioneer in recruitment research. Shally is also an author, internationally-requested speaker,
founder of JobMachine.net, and EVP of Arbita, frequent contributor to industry forums, and global
recruiting consultant for companies like Microsoft Corporation, Google, Coca-Cola Enterprises,
Cisco Systems and Motorola. Since 1996, Shally has developed techniques that dramatically
increase recruitment productivity and allow companies to exploit the Internet. At Microsoft, he
managed the research arm of their global centralized sourcing and research team. At Google,
Shally built a central sourcing organization. At Coca-Cola, he was responsible for supporting all
corporate hiring managers and functional channels throughout North America, while at Cisco
Systems, he was a senior member defining Cisco’s online Recruiting Strategy. Shally provides
priceless insights into how forward-thinking companies are using innovative Internet recruiting
techniques and intelligent technologies to gain competitive recruiting advantages.
Bryan Starbuck
Bryan Starbuck is the CEO of TalentSpring, Inc. a provider of Semantic Search technology
products for the recruiting industry. Mr. Starbuck as a track record as an engineering manager of
working closely with Microsoft’s Recruiting department on talent acquisition focused on
exceptional talent. Mr. Starbuck created TalentSpring after seeing the potential of using
semantic matching algorithms on finding comprehensively matched candidates to the needs in a
job descriptions. Prior to starting TalentSpring, Bryan was an engineering manager at Microsoft
Corp and has a track record of shipping semantic matching related products, including working
with Microsoft Research. Mr. Starbuck has over 38 patents and a computer science degree
from UCSD.
15
Copyright 2009, TalentSpring, Inc. (www.talentspring.com) and Arbita, Inc. (www.arbita.net).
Semantic Search – Sourcing Success Beyond Boolean Search ARBITA
For reference, the following are examples of additional semantic search engines that are not directly
applicable to recruiting:
• eBay.com’s search engine utilizes categorization, keyword search, and user behavior to catalog
a vast amount of goods sold on their website.
• ExpertSystem.net gets the closest to really understanding meaning and sentiment from both
structured and unstructured data, but is available only as an enterprise search application.
• Kosmix.com employs categorization and content aggregation to create a directory. Kosmix tries
to derive meaning by looking at the extent to which the contents of a link point to similar content.
• MyRoar.com uses natural language processing to answer questions with a focus on financial
information.
• Swoogle (swoogle.umbc.edu) searches only the semantic web which contains highly structured
data, and focuses on documents with purposely written semantic content.
16
Copyright 2009, TalentSpring, Inc. (www.talentspring.com) and Arbita, Inc. (www.arbita.net).
Semantic Search – Sourcing Success Beyond Boolean Search ARBITA
17
Copyright 2009, TalentSpring, Inc. (www.talentspring.com) and Arbita, Inc. (www.arbita.net).