Vous êtes sur la page 1sur 32

Google Search Appliance

Extending the Google Search Appliance to Crawl Valuable Data Behind the Firewall

Nitin Mangtani
May 27, 2009
Search is the starting point to the world’s information
Our Search Products
Google Enterprise Search

More than 20,000 enterprise search customers


Dedicated team of enterprise engineers focused on solving enterprise search problems.
Backed by Google’s core research and development

Bringing Google.com search experience to


businesses
Universal Search

Employee
Directory

Content
Management

File share

Wikis

Intranet

SharePoint
Google’s Search Philosophy

Intuitive, unified results


User Highly relevant
User-friendly innovation

All information
Reach ‘Real-time’ data
Customizable and extendable

Highly secure architecture


Security Standards-based
Leverage existing security

Large corpus search


Scale Cross-enterprise management
Flexible infrastructure
Personalized Search Experience

Marketin Engineerin
g g
Advanced Biasing Controls

Administrators can create


multiple biasing policies.
Source biasing
Date biasing
Metadata biasing New!
Front-end biasing New!
Simple setup - No complex
coding or scripts.
Metadata Biasing New!

Biasing based on
metadata attribute and
value
“Boost all documents
that have author as
Larry Page”
Administrators control
influence (positive or
negative) on metadata
attribute/value pairs

Determine On Specific
influence of metadata name,
metadata content
parameter
Embedding Search Box in your application
<form method="GET" action="http://search.mycompany.com/search">
<input type="text" name="q" size="32" maxlength="256" value="query string">
<input type="submit" name="btnG" value="Google Search">
<input type="hidden" name="site" value="default_collection">
<input type="hidden" name="client" value="default_frontend">
<input type="hidden" name="output" value="xml_no_dtd">
<input type="hidden" name="proxystylesheet" value="default_frontend">
</form>
Such forms are the most recognizable methods for generating GET requests, but there are numerous other ways.

A web application may make a HTTP GET request directly:


GET /search?q=query+string&site=default_collection
&client=default_frontend
&output=xml_no_dtd
&proxystylesheet=default_frontend HTTP/1.0
Leverage users’ input
Do-It-Yourself KeyMatch
Search-as-you-Type
Universal Search: Powered by Google Search Appliance

Google Search Appliance

File Enterprise Content


Intranets Databases Management
shares applications

Over Web Oracle ERP Documentu


200 file servers systems m
SQL
formats SharePoint
Portals Server Busines
MS s FileNet
MySQL
Office, intelligen
PDF, DB2 ce Livelink
HTML, systems Any other
Sybase
etc. system
Architecture
Real-Time Access to Business Applications
Access to real-time business data with OneBox

2008 Q4 Secure
Real-time access to
Q1 2007 Q3 2007 Q1 2008 Q3 2008
business information
Q1 2007 – Q4 2008

“The Google Search Appliance with OneBox is our command line


interface to our world …adding more content and additional OneBox
interfaces will only increase the value to our organization”
– Danny Perri, BOC Gases
Google OneBox for Enterprise

1. User enters a query


2. OneBox “trigger”
determines if the query is
relevant to a OneBox
module.
3. The appliance makes a
secure REST call (https
GET request) to the
predefined OneBox
provider, passing security
① credentials and other
parameters.
⑤ 4. The provider users the
② information to determine
③ appropriate, user-specific,
https://provider… secure results to the query,
and passes those results
XML back to the appliance in
XML.
④ Provider 5. The XML is transformed
Server
into HTML based on the
XSL template provided in
the OneBox module and
presented to the user inline
with their search results.
Google OneBox for Enterprise

Real-time, secure access to information from the search box


Triggers - Configurable to show OneBox results:
Always On: the module is invoked for every query
Keyword(s): the module is invoked in response to specific keywords
Regular Expression: invoked when query matches a regular
expression

Providers
Internal: Specialized search content in a separate appliance collection
External: Modules from OneBox module gallery
External: API enables you to create your own modules
OneBox Results Schema
<OneBoxResults>
<resultCode>result_code </resultCode>
<Diagnostics>failure_reason </Diagnostics>
<provider>provider_name </provider>
<searchTerm>query_escape </searchTerm>
<totalResults>total_results_escape </totalResults>
<title>
<urlText>results_title </urlText>
<urlLink>results_uri </urlLink>
</title>
<IMAGE_SOURCE>image_uri </IMAGE_SOURCE>
<MODULE_RESULT>
<U>uri </U>
<Title>title </Title>
<Field name="name1 ">value1 </Field>
<Field name="name2 ">value2 </Field>
<Field name="nameN ">valueN </Field>
</MODULE_RESULT>
</OneBoxResults>
Security

Document Level Security Provide the right users with access to the right
documents

Common HTTP-Basic
Security “Zero” Sign-on
NTLM (v1, v2)
Protocols
LDAP

Advanced Kerberos New!


Security SSO - Oracle (Oblix),
CA/SiteMinder
X509 Certificates
Custom
Authentication
Support for SAML SPI
& Authorization
Access Control (NTLM, HTTP Basic, SSO, etc.)

1. User executes search for public


and secure content (access=a)

2. User is prompted for credentials (if


NTLM/Basic Auth & SSO, user is
prompted for both sets of
credentials)

3. Users credentials are sent


securely to the search appliance
Results 4. Google Search Appliance queries
# URL Secure
Index 1 http://corp…/preso.ppt ntlm  index for all possible results
x
2 http://corp…/policyhtml

http basic
… http://corp…/welcome/ none
n http://int…/customer.jsp sso  5. Search appliance makes
‘authorization’ requests of the host
content servers with user’s
credential set
200 401 200
6. Host servers respond with success
or failure

7. Secure results restricted to user


are filtered from search results

8. Final search results (filtered) are


Content Mgmt. File shares Database
presented to the user
Traditional search technology for millions of docs

Disaster
Recovery
Server
+
Patch
+ Deployme
nt
+
Managem
ent Server
Volume
License
Managem
ent Server
Google Architecture: 10M documents in a box
Health Vine Simplicity
Patients

Immediate Family

Community
Where’s your GSA??

The State of Missouri’s use of Google GSA


Where was Missouri?

16 Executive Agencies
No common web search
No unified way for citizen’s or businesses find information
about State Government.
Where is Missouri??

Centrally Managed
Google GSA
Front Ends and
Collections provided to
all State Government
entities
Common search across
all State Government
web content
Reliable information
now easily found by
citizens and businesses

Vous aimerez peut-être aussi