Académique Documents
Professionnel Documents
Culture Documents
Extending the Google Search Appliance to Crawl Valuable Data Behind the Firewall
Nitin Mangtani
May 27, 2009
Search is the starting point to the world’s information
Our Search Products
Google Enterprise Search
Employee
Directory
Content
Management
File share
Wikis
Intranet
SharePoint
Google’s Search Philosophy
All information
Reach ‘Real-time’ data
Customizable and extendable
Marketin Engineerin
g g
Advanced Biasing Controls
Biasing based on
metadata attribute and
value
“Boost all documents
that have author as
Larry Page”
Administrators control
influence (positive or
negative) on metadata
attribute/value pairs
Determine On Specific
influence of metadata name,
metadata content
parameter
Embedding Search Box in your application
<form method="GET" action="http://search.mycompany.com/search">
<input type="text" name="q" size="32" maxlength="256" value="query string">
<input type="submit" name="btnG" value="Google Search">
<input type="hidden" name="site" value="default_collection">
<input type="hidden" name="client" value="default_frontend">
<input type="hidden" name="output" value="xml_no_dtd">
<input type="hidden" name="proxystylesheet" value="default_frontend">
</form>
Such forms are the most recognizable methods for generating GET requests, but there are numerous other ways.
2008 Q4 Secure
Real-time access to
Q1 2007 Q3 2007 Q1 2008 Q3 2008
business information
Q1 2007 – Q4 2008
Providers
Internal: Specialized search content in a separate appliance collection
External: Modules from OneBox module gallery
External: API enables you to create your own modules
OneBox Results Schema
<OneBoxResults>
<resultCode>result_code </resultCode>
<Diagnostics>failure_reason </Diagnostics>
<provider>provider_name </provider>
<searchTerm>query_escape </searchTerm>
<totalResults>total_results_escape </totalResults>
<title>
<urlText>results_title </urlText>
<urlLink>results_uri </urlLink>
</title>
<IMAGE_SOURCE>image_uri </IMAGE_SOURCE>
<MODULE_RESULT>
<U>uri </U>
<Title>title </Title>
<Field name="name1 ">value1 </Field>
<Field name="name2 ">value2 </Field>
<Field name="nameN ">valueN </Field>
</MODULE_RESULT>
</OneBoxResults>
Security
Document Level Security Provide the right users with access to the right
documents
Common HTTP-Basic
Security “Zero” Sign-on
NTLM (v1, v2)
Protocols
LDAP
Disaster
Recovery
Server
+
Patch
+ Deployme
nt
+
Managem
ent Server
Volume
License
Managem
ent Server
Google Architecture: 10M documents in a box
Health Vine Simplicity
Patients
Immediate Family
Community
Where’s your GSA??
16 Executive Agencies
No common web search
No unified way for citizen’s or businesses find information
about State Government.
Where is Missouri??
Centrally Managed
Google GSA
Front Ends and
Collections provided to
all State Government
entities
Common search across
all State Government
web content
Reliable information
now easily found by
citizens and businesses