Vous êtes sur la page 1sur 17

Semantic Search

Sourcing Success
Beyond Boolean Search

Q3, 2009

Authors:
Shally Steckerl Bryan Starbuck
EVP, Arbita Founder & CEO, TalentSpring, Inc.
Semantic Search – Sourcing Success Beyond Boolean Search ARBITA

Table of Contents
The Future of Candidate Sourcing .................................................................................. 3 
Ankle Deep in the Deep Web, but Inching Closer to Semantic Search ........................... 3 
Semantic Search: Why Recruiters Should Care .............................................................. 3 
Semantic Search for Recruiting ....................................................................................... 4 
Understanding Semantic Search Fundamentals ............................................................. 6 
Literal versus Equivalent Match Searches ................................................................... 6 
Soft Keywords: The Hidden Power of Semantic Search .............................................. 7 
Search Term Expansion Sets ...................................................................................... 8 
Three Different Semantic Approaches .......................................................................... 10 
Lexicon- and Ontological- Based Search................................................................... 10 
Statistical Analysis and Pattern Matching .................................................................. 10 
Contextual Search ..................................................................................................... 10 
Broad vs. Narrow Match Semantic Search.................................................................... 10 
Targeting Semantic Search ........................................................................................... 11 
Example Semantic Search Technologies ...................................................................... 11 
Full High End Semantic Search Solutions for Recruiters ........................................... 11 
Free Semantic Search Tools for Recruiters ............................................................... 12 
Semantic Search RFP Check List ................................................................................. 14 
About the Authors: ........................................................................................................ 15 
APPENDIX I – Alternate Search Engines...................................................................... 16 
TABLE I – Semantic Search Engine Types ................................................................... 17 
TABLE II – Semantic Search Engine Types .................................................................. 17 

2
Copyright 2009, TalentSpring, Inc. (www.talentspring.com) and Arbita, Inc. (www.arbita.net).
Semantic Search – Sourcing Success Beyond Boolean Search ARBITA

The Future of Candidate Sourcing


Today, the recruiting industry stands on the forefront of a technological revolution as profound as the
initial adoption of candidate sourcing via the Internet. It is a revolution that will change the way recruiters
spend their days, the way organizations allocate their resources, and the way candidates find jobs. This
new revolution comes in the form of newly advanced tools that leverage Semantic Search technology.
Semantic Search will change the ‘keyword’ focus of electronic sourcing to the ‘actual’ meaning found
within resumes and job descriptions. These new tools will cut in half the time recruiters spend on
sourcing, vastly improve candidate match quality, and simplify candidate information found on social
networks, job boards and corporate Application Tracking Systems (ATS).

Ankle Deep in the Deep Web, but Inching


Closer to Semantic Search
Most people agree that the biggest problem with online recruiting today is too much available information.
Without a good search engine, you simply get lost in all the information. Unfortunately, today’s search
engines are still inefficient, delivering mismatched information and requiring complex search string
knowledge to use effectively. In an ideal world a search engine would function like a human,
understanding the underlying meaning of the user’s search and then matching the search results
accordingly. Many expert communities talk about Semantic Search applications being the most likely
technology to deliver this kind of result, but few take the time to explain it in plain language. This
whitepaper explains Semantic Search for candidate sourcing.

Semantic Search: Why Recruiters Should Care


Semantics is the field of study that focuses on ‘meaning’. Linguistic Semantics seeks to understand the
meaning behind language, symbols, words, phrases, sentences, and larger blocks of text. Semantic
Search engines apply grammatical analysis, logical interpretation, and linguistic morphology to identify
the unstructured meaning of the user’s search and find relevant results. In simple terms, a perfect
Semantic Search engine would instantly take into consideration the meaning behind your question and
deliver the result you were actually looking for.

Today, most people think of online search in terms of the capabilities of major search engines like
GoogleTM, Yahoo!TM, or Microsoft’s BingTM. These big search engines utilize Boolean-based keyword
search technology and often require the use of complex syntax and field search commands to find
specific occurrences of information (keywords) within documents. Results are based solely on whether

3
Copyright 2009, TalentSpring, Inc. (www.talentspring.com) and Arbita, Inc. (www.arbita.net).
Semantic Search – Sourcing Success Beyond Boolean Search ARBITA

those keywords are present. The major search engines are constantly experimenting with new ways to
simplify their search queries for users. However, these simplifying efforts don’t really work to understand
the true ‘meaning’ of what is being searched for.

In contrast, Semantic Search technologies seek to simplify search by understanding the actual concept
being sought. Semantic Search engines discover the true relationship between the question being asked
and the content being delivered. Consequently, the user’s ‘experience’ of the search is shifted from
sifting through documents that contain a specific keyword to reading documents that express the concept
originally being sought.

One of the biggest challenges of search engines is their difficulty to understand the ‘context’ of the
search. It is context that determines if the word ‘well’ refers to a ‘bucket’ as in, “Draw water from the well”
or a ‘person’, as in, “Is she not feeling well?” As a human, if you read “stair well” you automatically know
what it means. Computers, on the other hand, have to calculate hundreds of variations and probabilities
to arrive at a best guess.

Semantic Search engines make sense of sentence context by being pre-configured (trained) to
understand who the user is and what the likely context of the search term is. To illustrate, imagine two
people searching for a Marketing Manager position on the Web. One person is a recruiter, the other is a
job candidate. With a regular search engine both people would get the same results. However, with a
Semantic Search engine, that knew the user was a recruiter, only candidate resumes would be received,
while job listings would be ignored. Likewise, the job candidate would only see job listings.

“Semantic Search technology has not yet reached the level of fully comprehension.
However, a number of technology vendors have taken Semantic Search far beyond
the capabilities of Boolean search to make online recruiting simpler and faster.”
Shally Steckerl – Arbita

Semantic Search for Recruiting


When we talk about effective Semantic Search for recruiting today, we are really talking about two
capabilities. First, we are talking about the unique capabilities of Semantic Search engines to understand
the concept being searched for. Second, we are talking about the ability of Semantic Search engines to
rank and filter search results using ‘smart’ ranking systems. Combined, these two capabilities create
intelligent search tools that are capable of duplicating the most tedious and time-consuming aspects of
candidate sourcing. Recruiters have no time to waste when a computer should be smart enough to
derive context, subtext, and meaning for us.

4
Copyright 2009, TalentSpring, Inc. (www.talentspring.com) and Arbita, Inc. (www.arbita.net).
Semantic Search – Sourcing Success Beyond Boolean Search ARBITA

Today, intelligent Semantic Search tools are becoming available to recruiters. The best of these search
tools outpace the ability of complex syntax and Boolean search to automatically match the content of an
individual’s resumes to a job description, and rank the results according to what is most important to the
recruiter.
 

Benefits of Semantic Search over Boolean Search for Recruiters:

• Semantic Search is far easier to learn than complex syntax and field search commands because
it doesn’t require significant technical skills to get good results (i.e. there’s no need to use
commands like intitle, inurl, site, and filetype).
• Semantic Search can save recruiters significant time by automatically identifying which terms to
search on in the job description.
• Semantic Search provides recruiters with more accurate resume matches by pre-filtering results
for such things as candidate qualifications (skills, experience, education, etc.) and work history
characteristics (job hopping, job similarity, etc.)
• Semantic Search increases search match quality by taking into account all needs of the job
requisition and candidate resume (e.g. detects job seekers who no longer work in the job title
matching the requisition).
• Semantic Search can identify high-quality candidates whose resumes don’t conform (are hidden)
to the rigid terms used in complex search strings.

From a practical perspective, Semantic Search for recruiters means that they don’t have to acquire
special skills building search strings to find candidates. In fact, applications that do Semantic Search well
don’t even require the recruiter to interact with keywords at all. The Semantic Search engines ‘read’ the
job description, understand the key attributes being sought, and then automatically builds an expanded
content list to search for resumes. The result is that match quality is much better than regular search
methods.

Definition of Semantic Search for Recruiting:

Semantic Search for recruiting refers to finding the best resumes or profiles that match
the needs of a job description. It requires going beyond search that simply understands
sentence structure to factoring in the needs of employer to find resumes

5
Copyright 2009, TalentSpring, Inc. (www.talentspring.com) and Arbita, Inc. (www.arbita.net).
Semantic Search – Sourcing Success Beyond Boolean Search ARBITA

Understanding Semantic Search Fundamentals


There are three interrelated aspects of Semantic Search Engines that warrant additional discussion:

1. Literal vs. Equivalent Match Searches


2. Search Term Expansion Sets
3. Sentence Structure Understanding

Literal versus Equivalent Match Searches

For the layperson, the most noticeable difference between Boolean and Semantic Search engines is the
flexibility around matching the search keywords used. With Boolean search, an exact match to the
search terms (keywords(s)) is required. With Semantic Search, matches can include equivalent words as
well. While Literal versus Equivalent might sound simple, it is worth looking at an example: Imagine a
recruiter has the option of using a Boolean Search engine or a Semantic Search engine to fill the
following position:

Job Title: Software Engineer, Level: Team Supervisor, Company: Hewlett-Packard, Product
Line: Scanners, Requirements: 1. background in image processing algorithms, 2. experience
writing hardware device drivers, 3. 5 years experience, 4. Masters in Computer Science

With the Boolean search engine s/he might search on: “Software Engineer” AND “image processing” and
“device drivers”. Only resumes that literally conformed to the search term would be included in the
results.

With a Semantic Search engine, trained for recruiting, the user would see all candidates that had
equivalent term matches (i.e. computer programmer, image algorithms, image biubic, device DDK, etc.).

6
Copyright 2009, TalentSpring, Inc. (www.talentspring.com) and Arbita, Inc. (www.arbita.net).
Semantic Search – Sourcing Success Beyond Boolean Search ARBITA

Soft Keywords: The Hidden Power of Semantic Search

Another big advantage of Semantic Search engines is that they provide greater linguistic variation
between job descriptions and job resumes by ‘weighting’ the value of individual keywords. These
weighted or ‘soft’ search terms enable the search engine to find the most relevant content.

An example of how ‘soft’ keywords can increase the flexibility of a job search can be seen in the following
example where the job description is for a Systems/Mechanical Engineer responsible for the design,
analysis and development of optimal surfaces specifically used in gears, joints, and actuators.

Semantic Search finds unexpected matches between the job description and candidate
resumes by searching for ‘soft’ keywords in the job description and candidate resumes.
Boolean Search's exact match logic doesn’t allow for this kind of matching flexibility.

7
Copyright 2009, TalentSpring, Inc. (www.talentspring.com) and Arbita, Inc. (www.arbita.net).
Semantic Search – Sourcing Success Beyond Boolean Search ARBITA

Search Term Expansion Sets

The main advantage of Semantic Search engines is the ability to find keywords and phrases that expand
from the original keyword(s) being searched for. Semantic Search engines do this by building expansion
sets, or lists of linguistically-equivalent meanings. This capability enables the Semantic Search engine to
find ‘hidden’ matches to the user’s intended search, which regular search engines would normally filter
out.

The advantage of using expansion sets can be seen in the following illustration. With Boolean Search
Engines, the number of potential matches is limited to only those resumes that match the specific
keywords being searched on.

With Semantic Search engines, each original keyword is expanded to include many semantically identical
keywords that increase the match opportunity significantly. The result is that far more matches can be
found with Semantic Search than regular Boolean Search.

8
Copyright 2009, TalentSpring, Inc. (www.talentspring.com) and Arbita, Inc. (www.arbita.net).
Semantic Search – Sourcing Success Beyond Boolean Search ARBITA

The following table shows an expansion set for a Nokia Programmer, programming for games using
OpenGL language. You can easily see how the Semantic Search Engine’s expansion set offers far more
match possibilities than a regular search engine would.

Semantic Search engines use Term Expansion to find larger keyword match sets, and
deliver more accurate results, by including linguistically-equivalent search terms in their
search sets and utilizing advanced filters and ranking algorithms to calibrate the results.

For recruiters, Semantic Search term expansion enables them to find excellent
candidates without requiring them to be subject matter or Boolean Search experts.

9
Copyright 2009, TalentSpring, Inc. (www.talentspring.com) and Arbita, Inc. (www.arbita.net).
Semantic Search – Sourcing Success Beyond Boolean Search ARBITA

Three Different Semantic Approaches


There are a number of approaches to Semantic Search technology worth highlighting, each with its own
particular flavor of how to answer the question around extracting meaning and subtext. While not perfect,
these tools are certainly good additions to your research toolkit. The three primary approaches are
Lexicon, Statistical Analysis and Conceptual Search.

Lexicon- and Ontological- Based Search

In the field of information technology, Lexicon refers to a specific vocabulary or list of words related to a
particular domain, discipline or topic and Ontology refers to the description of concepts and relationships
that can exist within a data structure. Search engines that use this kind of approach attempt to map the
specific search lexicon of the search to the ontology domain.

Statistical Analysis and Pattern Matching

A true semantic search system must encapsulate the knowledge of languages to emulate understanding
of meaning. Because of this requirement, search engines that use statistical analysis of ranking of links,
symbols, words, and clicking behaviors are not considered to be truly Semantic Search engines.
However, these engines can approximate the understanding of meaning by providing close matches,
particularly when the data is fairly homogenous.

Contextual Search

Contextual Search tries to understand meaning of a search by inferring it from the context around the
location of the data. This is usually done by: analyzing and ranking links pointing to a particular document;
specializing in only one category of information (Vertical Search); extracting summaries from the results;
and/or allowing the user an interface with which they can filter or disambiguate the search results
(Faceted Search).

Broad vs. Narrow Match Semantic Search


It is easy to get confused about which search applications use Semantic Search. Hype around Semantic
Search engines powering everything from smart applications to broad-match search engines like Google
and Yahoo! abound. The reality, however, is that Semantic Search technology is best applied to specific
search applications that deal with volumes of contextual data. A Semantic Search engine needs to be
‘trained’ to recognize expansion lists, fuzzy match rules, and soft vs. hard keyword sets.

10
Copyright 2009, TalentSpring, Inc. (www.talentspring.com) and Arbita, Inc. (www.arbita.net).
Semantic Search – Sourcing Success Beyond Boolean Search ARBITA

There are really two kinds of Semantic Search engines: broad search and narrow search. Narrow search
engines from companies like TalentSpring are designed for specific search problems like candidate
sourcing. Broad search engines from companies like Power Set or Autonomy are designed to find any
kind of documentation across an organization’s electronic documentation platform. A Semantic Search
engine that has been specifically designed to focus on recruiting is going to give you the most precise
candidate search results. A Semantic Search engine that has been designed for broad matching will
have the largest volume of results.

The effectiveness of a Semantic Search engine for candidate sourcing depends on


the depth of its linguistic expansion set for candidate sourcing keywords: job titles,
skill-sets, experience types, and education levels.

Targeting Semantic Search


While semantic search engines are highly effective at finding deep relational matches between job
descriptions and resumes, they still need to be focused, or targeted, on the intended subject. For
example, there is a big difference between the jobs, pay, and responsibilities of a Construction Project
Manager and a Project Manager that works at Microsoft. However, to a search engine, the two job
descriptions look very similar, and if the recruiter didn’t highlight that one was for the Construction
industry and one was for the IT industry, the Semantic Search engine would likely find matches for both
jobs. The result is that the more targeting the recruiter does before the search, the more accurate the
results are going to be. Good Semantic Search engines will provide a simple way to calibrate or tune
search parameters.

Example Semantic Search Technologies


While there are a dizzying array of semantic search technology providers and approaches, if you wish
implement a semantic search solution for your recruiting effort, there are only a few options available at
this time. The following sections outline both high-end semantic search solutions and free tools for
recruiters:

Full High End Semantic Search Solutions for Recruiters


• TalentSpring.com takes structured information like resumes and people profiles, and matches
them to employment requirements using both ontological categorization and semantic analysis.

11
Copyright 2009, TalentSpring, Inc. (www.talentspring.com) and Arbita, Inc. (www.arbita.net).
Semantic Search – Sourcing Success Beyond Boolean Search ARBITA

• Trovix.com (now owned by Monster) primarily uses a sophisticated lexicon to match skills in
resumes to requirements in job descriptions. In also learns from users’ behavior to extract and
rank search criteria not included in the original search parameters.

Free Semantic Search Tools for Recruiters


• Semantic Technology Adopted by Major Search Engines:
o Ask.com applies linguistic procession to answering questions posed in natural language
such as “What is the capital of Russia?” It uses link popularity to measure relevance and
awards higher ranks to results from pages considered to be from experts or authoritative
sources on the topic of a search.
o Bing.com uses a little bit of everything from a lexicon for automatically suggested
keywords, statistical analysis of links and words from authoritative sources, page ranking
methods and categorization. Bing includes Zoomix.com, a self-learning matching
technology based on learning user behavior, and aspects of Powerset.com who’s
primary discovery engine asks the user to disambiguate results by clicking on results of
relevant articles from Wikipedia.
o Exalead.com has a traditional keyword search engine but also a best-in-class image
search created by the categorization of image size, color and content, focusing on
defining content where link analysis won’t work.
o Google.com applies a number of techniques like page rank and link analysis, statistical
analysis of relationships between keywords, and now with Google Squared it even
suggests other topics using a lexicon. The related: command identifies other websites
that have statistically similar content. Google also learns from user behavior and ranks
results based on a user's previous click-through history.
o Yahoo.com infers meaning from tags in the HTML and XML code. Together with
Zemanta, AdaptiveBlue, DERI (NUI Galway), Faviki, Freebase and Zigtag, Yahoo
created a semantic tagging format called “Common Tag” to assist users in having a
common ontology for adding meaning to content via HTML tags. Tagged content
becomes easier for machines to understand.

• Deepdyve.com applies pattern matching to identify complex data found in the Deep Web (i.e.,
Web content within databases and other dynamic data sources not typically indexed by search
engines). The result is highly relevant results displayed in ways that can be easily organized and
visualized.

• Factbites.com lies somewhere between link analysis and document summarization, taking
excerpts of results and making them into meaningful sentences.

12
Copyright 2009, TalentSpring, Inc. (www.talentspring.com) and Arbita, Inc. (www.arbita.net).
Semantic Search – Sourcing Success Beyond Boolean Search ARBITA

• Hakia.com attempts to anticipate questions that “could be” asked about a document found in its
database, then ranks search results along an index that measures sentences depending on how
closely they match the concept related to the search query. Hakia employs a lexicon of
relationships between concepts and measurements of relevancy based on credibility and age of
content.

• Lexxe.com utilizes linguistics (natural language processing) and categorization, and works by
eliminating irrelevant content, then providing visual keyword drilldowns to help derive meaning
from a query.

• Sensebot.net is a summarization engine that extracts key phrases and sentences from top
results, making it less necessary for a user to drill down and click on individual links.

• Twitter.com is a real-time search focusing on shallow but very recent content

• Twingly.com is faceted social search focusing only on blog and micro blog content.

• Vertical People Search like wink.com, spock.com, zoominfo.com are engines index a multitude
of websites and deep web content focus only on one domain or topic. For example, Wink.com
focuses on people from social networks while Spock.com and Zoominfo.com collate biographical
information about people.

• Yedda.com answers questions by combining combines natural language processing with user
behavior learning.

13
Copyright 2009, TalentSpring, Inc. (www.talentspring.com) and Arbita, Inc. (www.arbita.net).
Semantic Search – Sourcing Success Beyond Boolean Search ARBITA

Semantic Search RFP Check List


With a number of technologies becoming available, here is a list of features you should be looking for
when selecting a Semantic Search technology:

Recruiter-Specific Semantic Search engine. Focused-match search engines will deliver far better
results than general document search engines. You will not have to “train” the engine on how to
identify good candidates.

Resume List Depth: search engines that come pre-populated with job titles, skills, certifications,
education and experience levels will perform far better than engines that require will first have to
be trained by your company or the vendor. Look for training sets (the number of job profiles used
to train the system) being greater than 10 million profiles.

User Selectable Source: This is the ability for the user to define which resume sources they want
to pull from (i.e. specific job boards, social network, or the organization’s ATS system). This is an
important feature with regard to controlling where your candidates come from.

ATS Interoperability: The ability for the search engine to search your existing ATS database in
addition to external resume sources.

OFCCP Compliance: The ability for the semantic search engine to support your existing OFCCP
process (if used by your organization).

Geographic Sourcing: the ability to specify recruiting geography (local, regional, national, etc.)

Industry Sourcing: the ability to specify which industry you are recruiting from

Marketing Module: does the vendor provide tools that enable you to either selectively or mass-
send recruiting ads/emails to potential candidates?

14
Copyright 2009, TalentSpring, Inc. (www.talentspring.com) and Arbita, Inc. (www.arbita.net).
Semantic Search – Sourcing Success Beyond Boolean Search ARBITA

About the Authors:


Shally Steckerl

Because of his passion for the Internet as a recruitment tool and his continually innovative
methods, Shally Steckerl has developed a reputation as an authority in Internet search and a
pioneer in recruitment research. Shally is also an author, internationally-requested speaker,
founder of JobMachine.net, and EVP of Arbita, frequent contributor to industry forums, and global
recruiting consultant for companies like Microsoft Corporation, Google, Coca-Cola Enterprises,
Cisco Systems and Motorola. Since 1996, Shally has developed techniques that dramatically
increase recruitment productivity and allow companies to exploit the Internet. At Microsoft, he
managed the research arm of their global centralized sourcing and research team. At Google,
Shally built a central sourcing organization. At Coca-Cola, he was responsible for supporting all
corporate hiring managers and functional channels throughout North America, while at Cisco
Systems, he was a senior member defining Cisco’s online Recruiting Strategy. Shally provides
priceless insights into how forward-thinking companies are using innovative Internet recruiting
techniques and intelligent technologies to gain competitive recruiting advantages.

Bryan Starbuck
 
 
Bryan Starbuck is the CEO of TalentSpring, Inc. a provider of Semantic Search technology
products for the recruiting industry. Mr. Starbuck as a track record as an engineering manager of
working closely with Microsoft’s Recruiting department on talent acquisition focused on
exceptional talent. Mr. Starbuck created TalentSpring after seeing the potential of using
semantic matching algorithms on finding comprehensively matched candidates to the needs in a
job descriptions. Prior to starting TalentSpring, Bryan was an engineering manager at Microsoft
Corp and has a track record of shipping semantic matching related products, including working
with Microsoft Research. Mr. Starbuck has over 38 patents and a computer science degree
from UCSD.

15
Copyright 2009, TalentSpring, Inc. (www.talentspring.com) and Arbita, Inc. (www.arbita.net).
Semantic Search – Sourcing Success Beyond Boolean Search ARBITA

APPENDIX I – Alternate Search Engines


The application of Semantic Search technology is fare reaching and will be increasingly common in the
years to come. Already, specialized semantic search engines for specific applications are employed on
major websites such as Amazon.com and eBay to find, organize and deliver fantastic user results.

For reference, the following are examples of additional semantic search engines that are not directly
applicable to recruiting:

• Amazon.com compares user behavior to provide “similar items”

• eBay.com’s search engine utilizes categorization, keyword search, and user behavior to catalog
a vast amount of goods sold on their website.

• ExpertSystem.net gets the closest to really understanding meaning and sentiment from both
structured and unstructured data, but is available only as an enterprise search application.

• Evri.com connects contextually relevant documents to each other

• Freebase.com a user-generated information database collected by and for the community.

• Kosmix.com employs categorization and content aggregation to create a directory. Kosmix tries
to derive meaning by looking at the extent to which the contents of a link point to similar content.

• MyRoar.com uses natural language processing to answer questions with a focus on financial
information.

• Swoogle (swoogle.umbc.edu) searches only the semantic web which contains highly structured
data, and focuses on documents with purposely written semantic content.

16
Copyright 2009, TalentSpring, Inc. (www.talentspring.com) and Arbita, Inc. (www.arbita.net).
Semantic Search – Sourcing Success Beyond Boolean Search ARBITA

TABLE I – Semantic Search Engine Types


The following table summarizes the types of Semantic Search common search engines fall under:

TABLE II – Semantic Search Engine Types


Search engines make a tradeoff between the user effort required to operate them and how structured the
data being search. This table illustrates positioning of different Semantic Search engines.

17
Copyright 2009, TalentSpring, Inc. (www.talentspring.com) and Arbita, Inc. (www.arbita.net).

Vous aimerez peut-être aussi