Vous êtes sur la page 1sur 414

Endeca ® Navigation Platform

Advanced Features Guide

Endeca ® Navigation Platform Advanced Features Guide

Copyright and Disclaimer

Product specifications are subject to change without notice and do not represent a commitment on the part of Endeca Technologies, Inc. The software described in this document is furnished under a license agreement. The software may not be reverse assembled and may be used or copied only in accordance with the terms of the license agreement. It is against the law to copy the software on any medium except as specifically allowed in the license agreement.

No part of this document may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying and recording, for any purpose without the express written permission of Endeca Technologies, Inc.

Copyright © 2003-2005 Endeca Technologies, Inc. All rights reserved. Printed in USA.

Corda PopChart ® and Corda Builder™ Copyright 1996-2005 Corda Technologies, Inc.

Outside In ® SearchML © 1992-2005 Stellent Chicago, Inc. All rights reserved.

Rosette ® Globalization Platform Portions Copyright © Basis Technology Corp. 2003-2005. All rights reserved.

Teragram Language Identification Software Portions Copyright © 1997-2005 Teragram Corporation. All rights reserved.

Trademarks

Don't Stop At Search, Endeca, Endeca InFront, Endeca Navigation Engine, Guided Navigation, and ProFind are registered trademarks, and Endeca Data Foundry and Endeca Latitude are trademarks of Endeca Technologies, Inc.

Basis Technology and Rosette are trademarks of Basis Technology Corp.

All other trademarks or registered trademarks contained herein are the property of their respective owners.

Endeca Advanced Features Guide • August 2005

Contents

Preface

. Who Should Use This Guide Symbols and Conventions Endeca Documentation

About This Guide

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

xvi

xvi

xvi

xvii

Contacting Endeca Standard Customer Support

xx

SECTION IDATA IMPORT FEATURES

Chapter 1

Content Acquisition System

Sections of This Chapter

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

23

CAS and Security

24

Components that Support

25

CAS Reference Implementation

26

Full Crawls versus Differential Crawls

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

27

URL and Record Processing

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

28

Redundant URLs

. Source Documents and Endeca

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

30

31

Property Name

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

36

Viewing all Properties Generated by CAS

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

38

Creating a Full Crawling Pipeline

.

.

.

.

. Creating a Record Adapter to Read Documents

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

39

41

Creating a Record

 

43

Adding a RETRIEVE_URL

45

Converting Documents to Text

47

iv

Identifying the Language of the Documents

 

50

Removing Document Body Properties

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

52

Modifying Records with a Perl Manipulator

 

54

. Specifying Root URLS to Crawl

Creating a

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

55

59

Configuring URL Extraction

60

Example Syntax of URL Filters

Specifying a Record Source for the Spider

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

64

65

. Specifying Proxy Servers

Specifying Timeouts

.

. Removing any Unnecessary Records after a Crawl

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

65

66

68

Handling Crawling

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

72

Properties Generated by CAS

Formats Supported by ProFind

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

73

81

Chapter 2

Web Crawling with Authentication

Configuring Basic Authentication

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

89

KEY_RING

Element

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

91

SITE Element

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

91

HOST Attribute

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

92

PORT

Attribute .

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

92

HTTP Element

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

93

REALM Element

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

93

KEY Element

.

.

.

.

.

.

.

.

.

. Configuring HTTPS Authentication

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

94

94

Boot-Strapping Server Authentication

95

CA_DB

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

95

Disabling Server Authentication for a

96

HTTPS Element .

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

96

AUTHENTICATE_HOST Attribute

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

96

Configuring Client Authentication

97

CERT Element

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

97

PATH Attribute

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

98

 

v

PRIV_KEY_PATH Attribute

 

98

Authenticating with a Microsoft Exchange Server

 

.

.

.

.

.

.

.

.

.

.

.

.

98

EXCHANGE_SERVER Element

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

99

Authenticating with a Proxy Server

99

PROXY Element

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

100

Using Forge to Encrypt Keys and Pass Phrases

100

Encrypting a Username/Password Pair

101

Encrypting a Pass Phrase

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

101

SECTION II

RECORD FEATURES

Chapter 3

Creating Aggregated Records

Aggregated Record Behavior

106

Enabling Record

107

Generating and Displaying Aggregated Records

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

109

Determining the Available Rollup Keys

109

Creating Aggregated Record Navigation Queries

 

112

Specifying the Rollup Key for the Navigation Query

112

Setting the Maximum Number of Returned Records

113

Creating Aggregated Record

114

Displaying Aggregated Records Retrieving an Aggregated Record from

115

a

ENEQueryResults Object

115

Retrieving an Aggregated Record List from

 

a

Navigation Object

.

.

.

.

.

.

.

.

.

.

.

.

.

116

. Displaying the Records in the Aggregated Record

Displaying Aggregated Record Attributes

.

.

.

.

.

.

.

.

.

.

.

.

.

.

117

119

Chapter 4

Using Derived Properties

Specifying Derived Properties

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

124

Displaying Derived Properties

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

125

Troubleshooting Derived Properties

128

Derived Property Performance

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

129

vi

Chapter 5

Selecting a Record Set Based on a Key

 

About the Select Feature

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

131

Configuring the Select Feature

132

Using URL Query Parameters for Select

133

Selecting Keys in the Application

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

133

 

133

 

Using the Java Selection Using the .NET Selection Property

. Using the COM/Perl Selection Methods

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

135

135

Chapter 6

Bulk Export of Records

Configuring the Bulk Export Feature

 

137

Using URL Query Parameters for Bulk Export

 

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

138

Retrieving Bulk Records in the Application

 

138

Setting the Number of Bulk

138

Retrieving the Bulk-format Records

140

Using Java Bulk Export Methods

140

Using COM/Perl Bulk Export

142

Using .NET Bulk Export

143

Performance Impact for Bulk Export

144

Chapter 7

Record Filters

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

146

Record Filter Syntax ENE Query Syntax

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

. XML Syntax for File-based Record Filter Expressions

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

147

149

Enabling Properties for Use in Record

 

151

Data Configuration for File-based Filter

151

Record Filter Result ENE URL Query Parameters for Record Filters

.

.

.

.

.

.

.

.

.

152

153

Sample Queries .

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

154

Record Filter Performance Implications

.

.

Memory Cost

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

154

154

Expression Evaluation

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

154

Record Filters

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

155

vii

Interaction with Spelling Auto-correction and Spelling Did You

 

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

155

. Expression Evaluation

Memory Cost

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

155

155

SECTION III

DIMENSION FEATURES

Chapter 8

Using Inert Dimension Values

 

Configuring Inert Dimension Values

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

162

Using Inert Dimension Values in the Application

 

163

Sample Java Code for Inert Dimension Values

164

Sample .NET Code for Inert Dimension Values

165

Sample COM Code

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

166

Chapter 9

Working with Externally Created Dimensions

 

XML Requirements

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

170

XML Syntax to Specify Dimension Hierarchy

 

171

Example of Using Nested node Elements

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

172

Example of Using Parent Attributes

 

173

Example of Using Child Elements

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

173

. Importing an Externally Created Dimension

Node ID Requirements

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

173

174

Chapter 10

Working with an Externally Managed Taxonomy

 

XSLT and XML Requirements

 

180

XSLT Mapping .

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

181

XML Syntax to Specify Dimension Hierarchy

 

181

Example of Using Nested node Elements

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

183

Example of Using parent Attributes

 

183

Example of Using child Elements

183

Node ID Requirements and Identifier Management in Forge

Pipeline

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

184

185

. Integrating an Externally Managed Taxonomy

 

.

.

.

.

.

.

.

.

.

.

.

.

.

.

185

viii

 

Transforming an Externally Managed

 

187

Loading an Externally Managed Dimension

188

Running a Second Baseline

189

Updating an Externally Managed Taxonomy in Your

190

Chapter 11

Classifying Documents with Stratify

 

Sections of This

 

.

.

.

.

.

.

.

.

.