Vous êtes sur la page 1sur 390

Endeca Information Access Platform

Administrators Guide

Copyright and Disclaimer


Product specifications are subject to change without notice and do not represent a commitment on the part of Endeca Technologies, Inc. The software described in this document is furnished under a license agreement. The software may not be reverse engineered, decompiled, or otherwise manipulated for purposes of obtaining the source code. The software may be used or copied only in accordance with the terms of the license agreement. It is against the law to copy the software on any medium except as specifically allowed in the license agreement. No part of this document may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying and recording, for any purpose without the express written permission of Endeca Technologies, Inc. Copyright 2003-2007 Endeca Technologies, Inc. All rights reserved. Printed in USA. Portions of this document and the software are subject to third-party rights, including: Corda PopChart and Corda Builder Copyright 1996-2005 Corda Technologies, Inc. Intelliseek Spider Copyright 2002-2005 Intelliseek, Inc. All rights reserved. Outside In Search Export 1991, 2007 Oracle. All rights reserved. Rosette Globalization Platform Copyright 2003-2005 Basis Technology Corp. All rights reserved. Teragram Language Identification Software Copyright 1997-2005 Teragram Corporation. All rights reserved.

Trademarks
Endeca, the Endeca logo, Guided Navigation, Endeca The Next Generation of Search and Information Access, Find/Analyze/Understand, MDEX Engine, Endeca Latitude, Endeca Profind, Endeca Navigation Engine, and other Endeca product names referenced herein are registered trademarks or trademarks of Endeca Technologies, Inc. in the United States and other jurisdictions. All other product names, company names, marks, logos, and symbols are trademarks of their respective owners.

Endeca Administrators Guide November 2007

Contents
Preface
About this guide . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxi Who should use this guide . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxi Contacting Endeca Standard Customer Support . . . . . . . . . . . . . . . xxi

SECTION I
Chapter 1

ENDECA TOOLS
Endeca Tools Overview
Endeca tools and tool components . . . . . . . . . . . . . . . . . . . . . . . . . 26 Endeca Web Studio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 Endeca Developer Studio. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 Endeca Application Controller . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

Chapter 2

Working with Developer Studio


Changing the Developer Studio configuration . . . . . . . . . . . . . . . . . 30 Changing to another Web Studio . . . . . . . . . . . . . . . . . . . . . . . . 30 Specifying command options for Endeca components . . . . . . . . 31 Operational tasks in Developer Studio . . . . . . . . . . . . . . . . . . . . . . . 31 Web Studio toolbar in Developer Studio . . . . . . . . . . . . . . . . . . . 32 Sending a new instance configuration . . . . . . . . . . . . . . . . . . . . 32 Retrieving Web Studios project configuration . . . . . . . . . . . . . . 33 Other Developer Studio tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 Input paths and state in pipeline files . . . . . . . . . . . . . . . . . . . . . . . . 34 Pipeline paths in an Endeca Application Controller environment 34 Maintaining a user-defined state . . . . . . . . . . . . . . . . . . . . . . . . . 34

iv

Chapter 3

Managing System Operations with Web Studio


Accessing the EAC Administration Console of Web Studio . . . . . . . 36 Hiding the list of applications on the Web Studio login page . . . . . . 37 Application-specific login pages . . . . . . . . . . . . . . . . . . . . . . . . . 37 Provisioning an application using Web Studio . . . . . . . . . . . . . . . . . 38 Breaking resource locks in Web Studio . . . . . . . . . . . . . . . . . . . . . . 38 Performing system operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 Running a baseline update from Web Studio . . . . . . . . . . . . . . . 40 Starting and stopping the MDEX Engine . . . . . . . . . . . . . . . . . . . 40 Starting and stopping the Log Server . . . . . . . . . . . . . . . . . . . . . 40 Rolling Log Server logs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 Backing up and restoring an Endeca project . . . . . . . . . . . . . . . . 41 Downloading an instance configuration . . . . . . . . . . . . . . . . . . . . 42 Monitoring the system status . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 Viewing component logs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 Viewing system logs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 Controlling system logs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 Log file naming and rolling . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 Refreshing the status information . . . . . . . . . . . . . . . . . . . . . . . . 45 Changing Endeca HTTP service ports . . . . . . . . . . . . . . . . . . . . . . . 45 Encoding of workflow e-mails in Web Studio . . . . . . . . . . . . . . . . . . 46

Chapter 4

Managing Users in Web Studio


Users, roles, and permissions in Web Studio . . . . . . . . . . . . . . . . . . 50 Web Studio predefined admin user . . . . . . . . . . . . . . . . . . . . . . . 50 Web Studio user roles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 Custom user roles in Web Studio . . . . . . . . . . . . . . . . . . . . . . . . 52 Role names and descriptions for multiple locales . . . . . . . . . . . . 53 Enabling custom roles in Web Studio . . . . . . . . . . . . . . . . . . . . . 54 Disabling the admin role from modifying provisioning information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 Assigning rule group permissions to Web Studio users . . . . . . . . . . 56 LDAP integration with Web Studio . . . . . . . . . . . . . . . . . . . . . . . . . . 56

Authentication of users in Web Studio with LDAP enabled. . . . . 57 Troubleshooting user authentication in Web Studio with LDAP enabled . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 User profiles for LDAP users and groups . . . . . . . . . . . . . . . . . . 59 Roles and permissions for LDAP users and groups . . . . . . . . . . 59 Administrators in Web Studio with LDAP. . . . . . . . . . . . . . . . 60 Workflow notifications for LDAP users and groups . . . . . . . . 61 Enabling LDAP authentication in Web Studio . . . . . . . . . . . . . . . 61 Disabling LDAP authentication for Web Studio . . . . . . . . . . . 62 Configuration of the Webstudio login profile for LDAP . . . . . . . . 62 Specifying the location of the configuration file . . . . . . . . . . . 63 Templates used in the Webstudio profile. . . . . . . . . . . . . . . . 64 Configuration parameters for the Webstudio profile . . . . . . . 65 LDAP path parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 Specifying multiple values for parameters in the Webstudio profile . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

Chapter 5

Customizing Web Studio


The navigation menu and launch page . . . . . . . . . . . . . . . . . . . . . . 72 Navigation menu nodes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 Node titles for multiple locales . . . . . . . . . . . . . . . . . . . . . . . . 73 Predefined menu nodes in Web Studio . . . . . . . . . . . . . . . . . 74 Navigation menu leaf items. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 Updating the Web Studio menu and launch page. . . . . . . . . . . . 76 Web Studio extensions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 Configuration of extensions in Web Studio . . . . . . . . . . . . . . . . . 77 Extension names and descriptions for multiple locales . . . . . 79 Enabling extensions in Web Studio. . . . . . . . . . . . . . . . . . . . . . . 80 URL tokens and Web Studio extensions. . . . . . . . . . . . . . . . . . . 81 Token-based authentication for Web Studio extensions. . . . . . . 82 Theming extensions to match Web Studio . . . . . . . . . . . . . . . . . 85 Troubleshooting Web Studio extensions. . . . . . . . . . . . . . . . . . . 86

vi

Chapter 6

Setting Up the Preview Application for Web Studio


Preview application overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 Preview application requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 Instrumenting your application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 Instrumenting the navigation results page . . . . . . . . . . . . . . . . . . 93 Instrumenting the record page. . . . . . . . . . . . . . . . . . . . . . . . . . . 94 Configuring the preview application . . . . . . . . . . . . . . . . . . . . . . . . . 94 Using pre-existing applications . . . . . . . . . . . . . . . . . . . . . . . . . . 95 Enabling and disabling the display of the preview application. . . 95

Chapter 7

Configuring Logging and Reporting


About logging and reporting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 Before you begin . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 About the Log Server. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 About the Report Generator . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 Report details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 Implementing logging and reporting in Web Studio . . . . . . . . . . . . 100 Configuring and running each component individually . . . . . . . 101 Provisioning and starting the Log Server . . . . . . . . . . . . . . . 101 Provisioning and starting the Report Generator . . . . . . . . . . 103 Provisioning the Report Generator to run in French. . . . . . . 107 Specifying report frequency . . . . . . . . . . . . . . . . . . . . . . . . . 107 Automatically scheduling report generation . . . . . . . . . . . . . 108 After the Report Generator completes its run. . . . . . . . . . . . 109 Running the report generation script . . . . . . . . . . . . . . . . . . . . . 109 About the report generation script . . . . . . . . . . . . . . . . . . . . 109 High-level workflow for using the report generation script . . 111 Viewing reports in Web Studio . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 Additional report generation tasks . . . . . . . . . . . . . . . . . . . . . . . . . 114 Configuring report contents and format . . . . . . . . . . . . . . . . . . . 115 Customizing the report generation file . . . . . . . . . . . . . . . . . . . . 115 Generating HTML reports . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 Generating HTML reports. . . . . . . . . . . . . . . . . . . . . . . . . . . 115

vii

Viewing HTML reports . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 Viewing reports produced by other Report Generators. . . . . . . 116 Archiving and deleting log files and reports . . . . . . . . . . . . . . . 116

Chapter 8

Configuring the Endeca Standard Application


About the Endeca Standard Application. . . . . . . . . . . . . . . . . . . . . 120 Display features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 Standard Application installation scenarios. . . . . . . . . . . . . . . . 121 Accessing the Standard Application. . . . . . . . . . . . . . . . . . . . . . . . 122 Configuring the Standard Application. . . . . . . . . . . . . . . . . . . . . . . 123 Installing the Standard Application on Tomcat . . . . . . . . . . . . . . . . 126 Enabling SSL for the MDEX Engine . . . . . . . . . . . . . . . . . . . . . 127 Adding the SSL environment entry to the endeca_standard.xml file . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128 Creating a JKS-format certificate . . . . . . . . . . . . . . . . . . . . . 128 Starting Tomcat with the JKS certificate . . . . . . . . . . . . . . . 129 Enabling user authentication for the Standard Application . . . . 129 Configuring JAAS on the application server . . . . . . . . . . . . 130 Adding the user authentication entry to the server.xml file . 130 Setting up the login configuration file . . . . . . . . . . . . . . . . . . 131 Configuring record permissions . . . . . . . . . . . . . . . . . . . . . . 132 Logging in with the Standard Application. . . . . . . . . . . . . . . 132 Installing the Standard Application on WebLogic . . . . . . . . . . . . . . 133

SECTION II
Chapter 9

ADMINISTERING APPLICATION CONTROLLER ENVIRONMENTS


About the Endeca Application Controller
About the Endeca Application Controller . . . . . . . . . . . . . . . . . . . . 138 Architecture of the Application Controller . . . . . . . . . . . . . . . . . . . . 138

viii

Chapter 10

Using the Application Controller


Installing the Application Controller . . . . . . . . . . . . . . . . . . . . . . . . 144 Enabling SSL security on the Application Controller . . . . . . . . . 144 Specifying the EAC Central Server in Web Studio . . . . . . . . . . . . . 145 Starting and stopping the Application Controller directly . . . . . . . . 145 Starting and stopping the Application Controller on UNIX . . . . . 145 Starting the Application Controller from inittab . . . . . . . . . . . 145 Starting and stopping the Application Controller on Windows . . 146 Using the eac.properties file . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146 Setting the Copy utilitys temporary directory . . . . . . . . . . . . . . 146 Ensuring clean component shutdown . . . . . . . . . . . . . . . . . . . . 147 Managing server restarts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147 Modifying Application Controller logging levels . . . . . . . . . . . . . . . 147

Chapter 11

Provisioning an Implementation with the Endeca Application Controller


Provisioning overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150 About the provisioning file and schema . . . . . . . . . . . . . . . . . . . . . 150 Invalid characters in provisioning . . . . . . . . . . . . . . . . . . . . . . . 151 Defining the root Application element . . . . . . . . . . . . . . . . . . . . 151 Defining hosts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152 Aliasing hosts with host-id . . . . . . . . . . . . . . . . . . . . . . . . . . 152 Provisioning directories on a host . . . . . . . . . . . . . . . . . . . . 153 Defining components in your provisioning file . . . . . . . . . . . . . . 153 Using XML entities in your provisioning file . . . . . . . . . . . . . 153 Adding properties to hosts and components . . . . . . . . . . . . 154 Defining scripts in your provisioning file . . . . . . . . . . . . . . . . . . 154 Developing and maintaining scripts . . . . . . . . . . . . . . . . . . . 155 Script reference implementations . . . . . . . . . . . . . . . . . . . . . 155 Script environment variables . . . . . . . . . . . . . . . . . . . . . . . . 156 Provisioning scripts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156 Using canonical paths in an application . . . . . . . . . . . . . . . . 157

ix

Component reference. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158 Forge . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158 Dgidx . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161 Dgraph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165 Agidx . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168 Agraph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170 Crawler. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174 LogServer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178 ReportGenerator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179 Provisioning your implementation with eaccmd . . . . . . . . . . . . . . . 183 Provisioning the Application Controller to work on multiple machines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183 Multiple machine example . . . . . . . . . . . . . . . . . . . . . . . . . . 183 Forcing the removal of an application . . . . . . . . . . . . . . . . . . . . . . 185 Incremental provisioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185 Incremental provisioning guidelines . . . . . . . . . . . . . . . . . . . . . 186 About the def_file setting. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186 About the --force flag. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187 Adding, removing, or updating a component . . . . . . . . . . . . . . 187 Adding, removing, or updating a host . . . . . . . . . . . . . . . . . . . . 188 Adding, removing, or updating a script . . . . . . . . . . . . . . . . . . . 189 Provisioning your deployment with Endeca Deployment Template 189 Downloading the Endeca Deployment Template . . . . . . . . . . . 190 Using the Endeca Deployment Template . . . . . . . . . . . . . . . . . 190

Chapter 12

Using the Eaccmd Tool


About eaccmd . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192 Running eaccmd . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192 Eaccmd feedback . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192 Component and utility status verbosity . . . . . . . . . . . . . . . . . . . 193 Server component status verbosity . . . . . . . . . . . . . . . . . . . 193 Batch component status verbosity . . . . . . . . . . . . . . . . . . . . 193 Using a default host and port . . . . . . . . . . . . . . . . . . . . . . . . . . 193

Eaccmd usage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194 Eaccmd command reference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195 Provisioning commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195 Provisioning example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196 Incremental provisioning commands . . . . . . . . . . . . . . . . . . . . . 196 Incremental provisioning example . . . . . . . . . . . . . . . . . . . . 199 Synchronization commands . . . . . . . . . . . . . . . . . . . . . . . . . . . 200 About the Synchronization service . . . . . . . . . . . . . . . . . . . . 200 Synchronization examples . . . . . . . . . . . . . . . . . . . . . . . . . . 201 Component and script control commands . . . . . . . . . . . . . . . . . 201 Component control example . . . . . . . . . . . . . . . . . . . . . . . . 201 Utility commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202 General notes on Application Controller utilities. . . . . . . . . . 202 The List Directory Contents (ls) command . . . . . . . . . . . . . . 203 The Shell utility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204 The Copy utility. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205 The Archive utility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212

Chapter 11

Endeca Application Controller API Interface Reference


Using the Application Controller WSDL . . . . . . . . . . . . . . . . . . . . . 218 Simple types in the Application Controller WSDL . . . . . . . . . . . 218 ComponentControl interface. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219 ComponentControl methods . . . . . . . . . . . . . . . . . . . . . . . . . . . 219 startComponent(FullyQualifiedComponentIDType startComponentInput) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219 stopComponent(FullyQualifiedComponentIDType stopComponentInput) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219 getComponentStatus(FullyQualifiedComponentIDType getComponentStatusInput) . . . . . . . . . . . . . . . . . . . . . . . . 220 Synchronization interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220 Synchronization methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221 setFlag(FullyQualifiedFlagIDType setFlagInput) . . . . . . . . . 221 removeFlag(FullyQualifiedFlagIDType removeFlagInput) . . 221

xi

removeAllFlags(IDType removeAllFlagsInput) . . . . . . . . . . 222 listFlags(IDType listFlagsInput) . . . . . . . . . . . . . . . . . . . . . . 222 Utility interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223 Utility methods. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223 startBackup(RunBackupType startBackupInput) . . . . . . . . . 223 startFileCopy(RunFileCopyType startFileCopyInput) . . . . . 224 startRollback(RunRollbackType startRollbackInput) . . . . . . 225 startShell(RunShellType startShellInput) . . . . . . . . . . . . . . . 226 stop(FullyQualifiedUtilityTokenType) . . . . . . . . . . . . . . . . . . 227 getStatus(String applicationID, String token) . . . . . . . . . . . . 227 listDirectoryContents(ListDirectoryContentsInputType listDirectoryContentsInput) . . . . . . . . . . . . . . . . . . . . . . . . 228 Provisioning interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229 Provisioning methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229 defineApplication(ApplicationType application) . . . . . . . . . . 229 getApplication(IDType getApplicationInput). . . . . . . . . . . . . 230 getCanonicalApplication(IDType getCanonicalApplicationInput) . . . . . . . . . . . . . . . . . . . . . 230 listApplicationIDs(listApplicationIDsInput) . . . . . . . . . . . . . . 231 removeApplication(RemoveApplicationType removeApplicationInput) . . . . . . . . . . . . . . . . . . . . . . . . . . 231 addComponent(AddComponentType addComponentInput) 231 removeComponent(RemoveComponentType removeComponentInput) . . . . . . . . . . . . . . . . . . . . . . . . . 232 updateComponent(UpdateComponentType updateComponentInput) . . . . . . . . . . . . . . . . . . . . . . . . . . 233 addHost(AddHostType addHostInput) . . . . . . . . . . . . . . . . . 234 updateScript(UpdateScriptType updateScriptInput) . . . . . . 235 removeHost(RemoveHostType removeHostInput) . . . . . . . 235 updateHost(UpdateHostType updateHostInput) . . . . . . . . . 236 addScript(AddScriptType addScriptInput) . . . . . . . . . . . . . . 237 removeScript(RemoveScriptType removeScriptInput). . . . . 237 ScriptControl interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238

xii

ScriptControl methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238 startScript(FullyQualifiedScriptIDType startScriptInput) . . . . 238 stopScript(FullyQualifiedScriptIDType stopScriptInput) . . . . 239 getScriptStatus(FullyQualifiedScriptIDType getScriptStatusInput). . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239

Chapter 12

Endeca Application Controller API Class Reference


AddComponentType class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242 AddComponentType properties. . . . . . . . . . . . . . . . . . . . . . . . . 242 AddHostType class. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242 AddHostType properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242 AddScriptType class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243 AddScriptType properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243 AgidxComponentType class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243 AgdixComponentType properties . . . . . . . . . . . . . . . . . . . . . . . 243 AgraphChildListType class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244 AgraphChildListType properties . . . . . . . . . . . . . . . . . . . . . . . . 244 AgraphComponentType class . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244 AgraphComponentType properties . . . . . . . . . . . . . . . . . . . . . . 245 ApplicationIDListType class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245 ApplicationIDListType properties . . . . . . . . . . . . . . . . . . . . . . . . 245 ApplicationType class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 246 ApplicationType properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . 246 BackupMethodType class. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 246 BackupMethodType fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 246 BatchStatusType class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247 BatchStatusType properties . . . . . . . . . . . . . . . . . . . . . . . . . . . 247 ComponentListType class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247 ComponentListType properties . . . . . . . . . . . . . . . . . . . . . . . . . 248 ComponentType class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 248 ComponentType properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . 248 CrawlerComponentType class . . . . . . . . . . . . . . . . . . . . . . . . . . . . 248 CrawlerComponentType properties. . . . . . . . . . . . . . . . . . . . . . 249

xiii

DgidxComponentType class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 250 DgidxComponentType properties . . . . . . . . . . . . . . . . . . . . . . . 250 DgraphComponentType class . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251 DgraphComponentType properties. . . . . . . . . . . . . . . . . . . . . . 251 DgraphHostPortType class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252 DgraphHostPortType properties . . . . . . . . . . . . . . . . . . . . . . . . 252 DgraphReferenceType class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252 DgraphReferenceType properties. . . . . . . . . . . . . . . . . . . . . . . 253 DirectoryListType class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253 DirectoryListType property . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253 DirectoryType class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253 DirectoryType properties. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253 EACFault class. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253 EAC Fault property . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254 FilePathListType. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254 FilePathListType property . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254 FilePathType . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254 FilePathType properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254 FlagIDListType class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254 FlagIDListType property . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255 ForgeComponentType class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255 ForgeComponentType properties . . . . . . . . . . . . . . . . . . . . . . . 255 FullyQualifiedComponentIDType class . . . . . . . . . . . . . . . . . . . . . 256 FullyQualifiedComponentIDType properties . . . . . . . . . . . . . . . 256 FullyQualifiedFlagIDType class . . . . . . . . . . . . . . . . . . . . . . . . . . . 256 FullyQualifiedFlagIDType properties. . . . . . . . . . . . . . . . . . . . . 256 FullyQualifiedHostIDType class . . . . . . . . . . . . . . . . . . . . . . . . . . . 256 FullyQualifiedHostIDType properties . . . . . . . . . . . . . . . . . . . . 257 FullyQualifiedScriptIDType class . . . . . . . . . . . . . . . . . . . . . . . . . . 257 FullyQualifiedScriptIDType properties. . . . . . . . . . . . . . . . . . . . 257 FullyQualifiedUtilityTokenType class . . . . . . . . . . . . . . . . . . . . . . . 257 FullyQualifiedUtilityTokenType properties. . . . . . . . . . . . . . . . . 257 HostListType class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257

xiv

HostListType property . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 258 HostType class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 258 HostType properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 258 ListApplicationIDsInput class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 258 ListDirectoryContentsInputType class . . . . . . . . . . . . . . . . . . . . . . 258 ListDirectoryContentsInputType properties . . . . . . . . . . . . . . . . 259 LogServerComponentType class . . . . . . . . . . . . . . . . . . . . . . . . . . 259 LogServerComponentType properties . . . . . . . . . . . . . . . . . . . 259 PropertyListType class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 260 PropertyListType property . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 260 PropertyType class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 260 PropertyType properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . 260 ProvisioningFault class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 260 ProvisioningFault properties . . . . . . . . . . . . . . . . . . . . . . . . . . . 260 RemoveApplicationType class . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261 RemoveApplicationType properties. . . . . . . . . . . . . . . . . . . . . . 261 RemoveComponentType class. . . . . . . . . . . . . . . . . . . . . . . . . . . . 261 RemoveComponentType properties . . . . . . . . . . . . . . . . . . . . . 261 RemoveHostType class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261 RemoveHostType properties . . . . . . . . . . . . . . . . . . . . . . . . . . . 262 RemoveScriptType class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262 RemoveScriptType properties . . . . . . . . . . . . . . . . . . . . . . . . . . 262 ReportGeneratorComponentType class . . . . . . . . . . . . . . . . . . . . . 262 ReportGeneratorComponentType properties . . . . . . . . . . . . . . 262 RunBackupType class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264 RunBackupType properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264 RunFileCopyType class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265 RunFileCopyType properties . . . . . . . . . . . . . . . . . . . . . . . . . . . 265 RunRollbackType class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265 RunRollbackType properties . . . . . . . . . . . . . . . . . . . . . . . . . . . 265 RunShellType class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 266 RunShellType properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 266 RunUtilityType class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 266

xv

RunUtilityType properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267 ScriptListType class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267 ScriptListType properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267 ScriptType class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267 ScriptType properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267 SSLConfigurationType class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 268 SSLConfigurationType properties . . . . . . . . . . . . . . . . . . . . . . . 268 StateType class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 268 StateType fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 268 StatusType class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 269 StatusType properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 269 TimeRangeType class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 269 TimeRangeType fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 270 TimeSeriesType class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 270 TimeSeriesType fields. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 270 UpdateComponentType class . . . . . . . . . . . . . . . . . . . . . . . . . . . . 270 UpdateComponentType properties . . . . . . . . . . . . . . . . . . . . . . 270 UpdateHostType class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271 UpdateHostType properties . . . . . . . . . . . . . . . . . . . . . . . . . . . 271 UpdateScriptType class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271 UpdateScriptType properties . . . . . . . . . . . . . . . . . . . . . . . . . . 271

SECTION III
Chapter 13

TRANSFERRING IMPLEMENTATIONS BETWEEN ENVIRONMENTS


Transferring Endeca Implementations Between Environments
About transferring your front-end Web application . . . . . . . . . . . . . 276 Transferring implementations using the tools. . . . . . . . . . . . . . . . . 276 Retrieving the Web Studio instance configuration with Developer Studio. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 276 Transferring implementations using the emgr_update utility . . . . . 277 emgr_update syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277

xvi

Using emgr_update to transfer from a Web Studio staging environment to a Web Studio production environment . . . . . 282 Transferring all instance configuration files . . . . . . . . . . . . . 282 Transferring only instance configuration files modified by Web Studio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283 Using emgr_update to transfer from one Web Studio environment to another . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 284 Using emgr_update to remove instance configuration files from Web Studio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285 Using emgr_update to send the dimensions file produced by Forge to the Web Studio. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 286 Removing an application from Endeca IAP . . . . . . . . . . . . . . . . . . 287

SECTION IV
Chapter 14

TUNING ENDECA IMPLEMENTATIONS


The MDEX Engine Request Log
About the MDEX Engine request log . . . . . . . . . . . . . . . . . . . . . . . 292 Request log file format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 292 Extracting information from request logs . . . . . . . . . . . . . . . . . . . . 294 Request log rolling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 295 URL parameter mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 296

Chapter 15

The Eneperf Tool


About Eneperf. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 302 Using Eneperf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 302 Required settings. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303 Host and port settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303 Log file settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 304 Setting the number of connections and iterations . . . . . . . . 304 Optional settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305 Generating incremental statistics . . . . . . . . . . . . . . . . . . . . . 308 Setting the number of queries sent to the Dgraph . . . . . . . . 308 Obtaining logs for use with Eneperf . . . . . . . . . . . . . . . . . . . . . . . . 309

xvii

Converting a MDEX Engine request log file . . . . . . . . . . . . . . . 309 Creating a log file by hand using substitute search terms. . . . . 309 Debugging Eneperf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 310

Chapter 16

MDEX Engine Statistics and Auditing


About the MDEX Engine Statistics page . . . . . . . . . . . . . . . . . . . . 312 Viewing the MDEX Engine Statistics page. . . . . . . . . . . . . . . . . . . 312 Sections of the MDEX Engine Statistics page . . . . . . . . . . . . . . . . 313 The Performance Summary tab . . . . . . . . . . . . . . . . . . . . . . . . 313 The General Information tab . . . . . . . . . . . . . . . . . . . . . . . . . . . 314 The Index Preparation tab. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314 The Cache tab. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314 The Details tab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315 Checking the aliveness of a Dgraph or Agraph . . . . . . . . . . . . . . . 315 About the Endeca MDEX Engine Auditing page . . . . . . . . . . . . . . 316 Viewing the MDEX Engine Auditing page . . . . . . . . . . . . . . . . . 316 Audit persistence file details . . . . . . . . . . . . . . . . . . . . . . . . . . . 317 Sections of the MDEX Engine Auditing page . . . . . . . . . . . . . . 317 The Audit Stats tab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 318 The General Information tab . . . . . . . . . . . . . . . . . . . . . . . . 319

Chapter 17

The Forge Logging System


Overview of the Forge logging system . . . . . . . . . . . . . . . . . . . . . . 322 About log levels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 322 Logging topics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 322 Forge logging topics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323 The command line interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323 Aliasing existing -v levels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325 Logging output to a file . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 326 Changes to the EDF_LOG_LEVEL environment variable . . . . 326

xviii

Chapter 18

The Forge Metrics Web Service


About the Forge metrics Web service. . . . . . . . . . . . . . . . . . . . . . . 328 Enabling Forge metrics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 329 Enabling SSL security . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 330 Using Forge metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 330 The MetricsService interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 330 getMetric method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 331 getMetric(MetricInputType getMetricInput). . . . . . . . . . . . . . 331 MetricsService classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 331 MetricType . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 331 MetricListType . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 332 MetricInputType . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 332 MetricResultType . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 332 AttributeType . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 332 MeasurementType . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333

Chapter 19

Useful Third-Party Tools


Cross-platform tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 336 Solaris and Linux tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 337 Solaris-specific tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 338 Linux-specific tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 339 Windows tools. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 339

SECTION V
Appendix A

APPENDICES
Endeca Flag Reference
Agidx options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 344 Agraph options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 345 Dgidx options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 348 Dgraph options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 352 Forge options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 366

xix

Appendix B

Endeca IAP Ports


Default Ports. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 376

Appendix C

About the Baseline Update Script


Editing the baseline update script . . . . . . . . . . . . . . . . . . . . . . . . . 378 Prerequisites for using the baseline update script . . . . . . . . . . . . . 378 Running the baseline update script . . . . . . . . . . . . . . . . . . . . . . . . 379

Index

xx

Preface
The Endeca Information Access Platform is the foundation for building applications based on Endeca MDEX Engine technology. With the Endeca Information Access Platform, you can build Guided Navigation functionality into your Web applications. The Endeca Guided Navigation solution puts the results of all navigation, search, and analytic queries in an organized context that shows users how to refine and explore further. This helps solve the problems associated with information overload by guiding users as they quickly and precisely navigate through large data sets.

About this guide


This guide describes the tasks involved in the configuration and administration of an Endeca implementation running in an Endeca Control System environment.

Who should use this guide


This guide is intended for system administrators and others who are managing the day-to-day operations of the Endeca Information Access Platform. It may also be of interest to developers while they are deploying an Endeca implementation.

Contacting Endeca Standard Customer Support


You can contact Endeca Standard Customer Support through the online Endeca Support Center (https://support.endeca.com).

xxii

The Endeca Support Center provides registered users with important information regarding Endeca software, implementation questions, product and solution help, training and professional services consultation as well as overall news and updates from Endeca.

Administrators Guide Preface

Endeca Confidential

SECTION I
Endeca Tools

24

Administrators Guide

Endeca Confidential

Chapter 1

Endeca Tools Overview


This chapter provides an overview of the administrative aspects of the Endeca tools. For a general introduction to the broad capabilities, usage, and workflow of Endeca tools, see the Endeca Getting Started Guide. This chapter includes the following sections:

Endeca tools and tool components Endeca Web Studio Endeca Developer Studio Endeca Application Controller

26

Endeca tools and tool components


The Endeca software includes the following tools and tool components that help you configure and administer your Endeca implementation:

Endeca Web Studio contains configuration and administrative functionality for system administrators as well as business logic functionality for business users. Web Studio provides the primary means to administer your Endeca implementation in a Tools environment. Endeca Developer Studio facilitates the entire process of data pipeline development and some aspects of configuration of an Endeca application. Endeca Application Controller (EAC) controls access to and use of an Endeca implementation.

Endeca Web Studio


Endeca Web Studio is a Web-based tool intended for business users and system administrators. For business user information, see the Endeca Business Users Guide. With Web Studio, system administrators can perform any of the following tasks:

Provision the hosts available to an Endeca implementation. Provision the applications available to an Endeca implementation. Provision the scripts, such as the report generator script, or a baseline update script to an Endeca implementation. Configure SSL settings, report generation, and set up a preview application for dynamic business rule testing. Perform system operations such as running baseline updates or starting and stopping the MDEX Engine or Log Server. Monitor the status of system components such as Forge, Dgidx, MDEX Engine, Log Server, and Report Generator.

Administrators Guide Chapter 1: Endeca Tools Overview

Endeca Confidential

27

Web Studio and Developer Studio require the Endeca Application Controller (EAC) to control and communicate with other components and hosts in an Endeca implementation.

Endeca Developer Studio


Endeca Developer Studio is a Windows application that developers can use to create and edit an instance configuration. Logically, an instance configuration consists of the following:

A pipeline diagram, which serves as a visual script for the entire data transformation process. A dimension hierarchy, which provides the dimension names and IDs that are needed to map your source data properties to Endeca dimensions. An index configuration, which defines how your Endeca records, Endeca properties, dimensions, and dimension values are indexed by Dgidx.

From a file perspective, an instance configuration is represented by a number of XML files that Developer Studio generates. For more information about the instance configuration see the Endeca Getting Started Guide.

Endeca Application Controller


The Endeca Application Controller (EAC) is the interface you use to control, manage, and monitor your Endeca implementations. It provides the infrastructure to support Endeca projects from design through deployment and runtime. For a detailed overview, see the Architecture of the Application Controller on page 138.

Endeca Confidential

Endeca Developer Studio

28

Administrators Guide Chapter 1: Endeca Tools Overview

Endeca Confidential

Chapter 2

Working with Developer Studio


This chapter describes how to work with Developer Studio. Among the tasks you can perform in Developer Studio are starting and stopping an Endeca MDEX Engine and checking application status. This chapter includes the following sections:

Changing the Developer Studio configuration Operational tasks in Developer Studio Other Developer Studio tasks Input paths and state in pipeline files

30

Changing the Developer Studio configuration


The two reconfiguration tasks in Developer Studio are:

Changing the Developer Studio connection to use a different Endeca Web Studio. Specifying command options for Endeca components.

Both tasks are described in the next two sections.

Changing to another Web Studio


If you have more than one Web Studio running in your implementation, you can change the connection from one instance of Web Studio to another one. Note that both Web Studios must be the same software version.

To change the connection between Developer Studio and Web Studio:


1 2 3 Make sure that the target Web Studio is running and has been provisioned to use the components for this Endeca application. Start Developer Studio and open your project. From the Tools menu, choose Web Studio Settings. The Endeca Web Studio Settings dialog box appears, showing the current settings for the host and port of the machine that is running Web Studio. 4 5 6 7 8 In the Host field, enter the name or IP address of the host on which the target Web Studio is running. In the Port field, enter the port on which the target Web Studio is listening. In the Application field, select an application to associate the Developer Studio project with. Click OK. Save the project.

Administrators Guide Chapter 2: Working with Developer Studio

Endeca Confidential

31

Click Set Instance Configuration.

Developer Studio sends the instance configuration to the target Web Studio.

Specifying command options for Endeca components


You specify command line options to Endeca programs such as Forge, Dgidx, the Dgraph and so on, in any of the following ways:

On the EAC Administration Console page of Web Studio In the command line interface to EAC (with eaccmd) In the WSDL API to EAC.

The Forge, Dgidx, and Dgraph options that you can specify are listed in Endeca Flag Reference on page 343.

To specify command options on the EAC Administration Console page of Web Studio:
1 2 3 4 5 Start Web Studio and log in to your application. Select the EAC Administration Console page. For the component you want to modify, enter your options exactly as you would on the command line. Click Save Changes. If you are only sending command options to the Endeca MDEX Engine, stop and restart the MDEX Engine so that the new options take effect. Options for Forge and for Forge and Dgidx do not take effect until you run a baseline update.

Operational tasks in Developer Studio


In addition to developing data pipelines, you use Developer Studio to perform the following operational tasks:

Endeca Confidential

Operational tasks in Developer Studio

32

Sending a new instance configuration. Retrieving your Endeca applications most recent instance configuration.

Web Studio toolbar in Developer Studio


Developer Studio interacts with Web Studio primarily via the Web Studio toolbar. The following table describes the buttons in the Web Studio toolbar. Icon Button name
Open Web Studio

Description
Opens Web Studio, allowing access to the EAC Administration Console. Retrieves Web Studios currently loaded instance configuration. Sends the instance configuration for the current project to Web Studio.

Get Instance Configuration Set Instance Configuration

Sending a new instance configuration


Whenever you change any part of the instance configuration, you should send the new information to Web Studio.

To send your instance configuration to Web Studio


1 2 3 4 Start Developer Studio and open your project. Make any changes to the instance configuration using the facilities in the Developer Studio. From the File menu, choose Save to save any changes. On the Web Studio toolbar, click Set Instance Configuration.

Administrators Guide Chapter 2: Working with Developer Studio

Endeca Confidential

33

The new instance configuration is sent to Web Studio. The new configuration does not take effect until you run a baseline update.

Retrieving Web Studios project configuration


You can retrieve the current project instance configuration from Web Studio. Retrieving the current configuration is generally part of the workflow for migrating from a staging environment to a production environment that uses the Control System. This workflow is described in detail in Chapter 13, Transferring Endeca Implementations Between Environments.

To retrieve Web Studios instance configuration:


1 2 Start Developer Studio and open a new project. From the Tools menu, choose Web Studio. The Web Studio Settings dialog box appears. 3 4 Make sure the hostname and port that are specified correspond with the application whose information you want to retrieve and click OK. In the Web Studio toolbar, click Get Instance Configuration.

From the File menu, choose Save to save the new project with the instance configuration retrieved from Web Studio.

Other Developer Studio tasks


Developer Studio can do many more tasks than the operational tasks described in this chapter. See the Endeca Developer Studio Help for complete information on the pipeline development and instance configuration tasks that you can perform in Developer Studio.

Endeca Confidential

Other Developer Studio tasks

34

Input paths and state in pipeline files


The following sections provide brief notes on paths and user-defined state in the Pipeline.epx file.

Pipeline paths in an Endeca Application Controller environment


When using a Pipeline.epx file in an environment controlled by the Endeca Application Controller, pipeline paths for incoming data are treated as follows:

Relative paths in the pipeline are resolved in relation to the directory path you entered in Web Studio, in the Incoming Directory field for hosts, components and scripts on the EAC Administration page. Absolute paths are not changed and used exactly as specified.

Maintaining a user-defined state


You can use record adapter components in your pipeline to maintain user-defined state. In this scenario, your pipeline incorporates an output record adapter that writes out data that is read in by an input record adapter the next time you run the pipeline. If you use record adapters in this way, you must check Maintain state on the General tab of the Record Adapter editor for both the input and output record adapters.

Administrators Guide Chapter 2: Working with Developer Studio

Endeca Confidential

Chapter 3

Managing System Operations with Web Studio


This chapter describes the system administration and maintenance tasks available in the Administration section of Web Studio. The chapter includes the following sections:

Accessing the EAC Administration Console of Web Studio Provisioning an application using Web Studio Breaking resource locks in Web Studio Performing system operations Monitoring the system status Changing Endeca HTTP service ports Encoding of workflow e-mails in Web Studio

36

Accessing the EAC Administration Console of Web Studio


The EAC Administration Console page provides a way for administrators to establish and modify system provisioning, start and stop system components, and run EAC scripts. The Administration Console of Web Studio is divided into three sections:

Hosts shows a view of your application organized by the hosts you provision. This view indicates the host name, host alias, port and configuration options. You can modify the hosts configuration options, start or stop a component on a host, and see the status of a component on a host. Components shows a view of your application organized by the Endeca components provisioned for an application. You can create components on this tab but not hosts. Scripts show the EAC scripts available to an application and allows you to add, remove, run, and monitor EAC scripts. You can stop and start system operations run by EAC scripts, such as baseline updates.

To access the Administration Console of Web Studio:


1 In the Address box of your Web browser, enter the following URL:
http://WebStudioHost:8888

Note: If you used a different HTTP Connector port when you configured Web Studio, substitute that ports number for 8888. 2 At the Web Studio login page, do the following: a b Type the name and password for the user with the admin role. The default username and password are both admin. If you have an application provisioned, select the application to access. An admin user can also log to Web Studio without any applications provisioned in the system. Click log in.

c 3

In the navigation menu, click EAC Administration.

Administrators Guide Chapter 3: Managing System Operations with Web Studio

Endeca Confidential

37

Hiding the list of applications on the Web Studio login page


By default, Web Studio shows all available Endeca applications in a drop-down list on the login page. Business users can log in to any application that an administrator has added them to. In cases where you do not want Web Studio users to see all available Web Studio applications on the login page, you can hide the drop-down list of applications displayed in Web Studio.

To hide the drop-down list of applications on the login page of Web Studio:
1 2 3 Stop the Endeca HTTP service. Open the webstudio.properties file located in %ENDECA_CONF%\conf (on Windows) or $ENDECA_CONF\conf (on UNIX). Locate the com.endeca.webstudio.hide.login.application.dropdown property, for example:
# Hides the dropdown for selecting an application # on the login page. com.endeca.webstudio.hide.login.application.dropdown=false

Set the value of the property to true, for example:


com.endeca.webstudio.hide.login.application.dropdown=true

5 6

Save and close the webstudio.properties file. Start the Endeca HTTP service.

Application-specific login pages


The URL for an application-specific login page is http://WebStudioHost:8888/login/AppName. The value of AppName is the name you provided when creating the application using Web Studio (or eaccmd or the custom Web services interface). For example, if you created an application named wine on localhost, the URL is http://localhost:8888/login/wine.

Endeca Confidential

Hiding the list of applications on the Web Studio login page

38

Provisioning an application using Web Studio


Provisioning is the task of defining the location and configuration of the Endeca resources (such as Forge and the MDEX Engine) that control your Endeca application. You can use EAC Administration Console page to provision an Endeca application. Note: Endeca recommends using the Endeca Deployment Template to provision your application. For more information about the Endeca Deployment Template, see Using the Endeca Deployment Template on page 190.

To provision an application:
1 2 3 4 Using a Web browser, log in to Endeca Web Studio, as described in Accessing the EAC Administration Console of Web Studio on page 36. Add one or more applications. The procedure to add or remove an application is described in the Endeca Web Studio Help. Add one or more hosts. The procedure to add or remove a host is described in the Endeca Web Studio Help. Add Endeca components to the hosts. You should set one Forge, and at least one Indexer (Dgidx) and one MDEX Engine (Dgraph). The procedure to add components is described in the Endeca Web Studio Help. Add EAC scripts. The procedure to add EAC scripts is described in the Endeca Web Studio Help.

Breaking resource locks in Web Studio


On the Resource Locks page of Web Studio, an administrator can view or break resource locks that users have acquired during their Web Studio session. A resource corresponds to a page in Web Studio, such as the Thesaurus page, Rule Manager page, or a rule group on the Rule Manager page. A user acquires a resource lock by selecting a page, rule group, or redirect group that the user has permission to access. There is no limit to the number of resources a user may lock during a session.

Administrators Guide Chapter 3: Managing System Operations with Web Studio

Endeca Confidential

39

While one user has a resource locked, no other user can select the resource without getting an error such as This component is currently in use by another application or user. Resource locking protects a project from multiple users making conflicting changes at the same time. Not all pages (resources) in the navigation pane of Web Studio can be locked. Web Studio locks the following pages when a user selects them: Thesaurus page, Rule Manager page, Phrases page, Stop Words page, and Dimension Order page. In addition, if an application uses rule groups on the Rule Manager page or redirect groups on the Redirect List page, then Web Studio treats each group as a separate resource and locks the group when a user selects it. The Preview App Settings page and View Reports page are not locked if a user selects them. Web Studio releases a resource lock in the following ways:

When a user logs out by clicking the Logout link. When a user closes his or her Web browser. Web Studio logs the user out approximately one minute after the browser closes. Note: If multiple browser windows are open with the same user log in, the lock is released only after the last window is closed.

When Web Studio ends a users session by timing out. Web Studio ends a session after 20 minutes of inactivity. When an administrator breaks a resource lock on the Resource Locks page. When a user clicks a rule group on the Rule Manager page or clicks a keyword redirect group on the Redirect List page. Each rule group or redirect group is locked individually and the lock is broken individually when a user selects a different group.

To break a resource lock:


1 2 On the Resource Locks page, click the delete icon associated with the user whose resources locks you want to release. Click OK.

Note: Breaking the resource locks causes that user to lose any unsaved changes.

Endeca Confidential

Breaking resource locks in Web Studio

40

Performing system operations


System operations include running updates, starting and stopping Endeca components, backing up projects, and so on. The Scripts tab of the EAC Administration Console is where you run baseline updates that control Endeca components such as Forge, Dgidx, the MDEX Engine (both Dgraph and Agraph), the Report Generator, and the Log Server. The Hosts and Components tab is where you run individual Endeca components. For information on other system operations, such as transferring your instance configuration from staging to production environment, and using the emgr_update utility, see Transferring Endeca Implementations Between Environments on page 275.

Running a baseline update from Web Studio


A baseline update completely rebuilds the Endeca application, including running Forge on the source data, running Dgidx to produce the Endeca records and indices, and starting one or more MDEX Engines with the new indices. See the Endeca Web Studio Help for the process to run a baseline update using your EAC script.

Starting and stopping the MDEX Engine


If the status of the MDEX Engine is Running, the Start link (next to the MDEX Engine label) is disabled and only the Stop link is available. If the status is Stopped, only the Start link can be used. When you start an MDEX Engine, it starts with any options that you specified in Arguments field of component configuration. See the Endeca Web Studio Help for the process to start and stop the MDEX Engine.

Starting and stopping the Log Server


If the status of the Log Server is Running, the Start link (next to the Log Server label) is disabled and only the Stop link is available. If the status is Pending or Stopped, only the Start link can be used. See the Endeca Web Studio Help for the process to start and stop the Log Server.

Administrators Guide Chapter 3: Managing System Operations with Web Studio

Endeca Confidential

41

Rolling Log Server logs


From the Administration page, you cannot roll the logs created by the Log Server. However, you can roll the logs with this URL command:
http://logserverhost:logserverport/roll

For example, this command:


http://web002:8002/roll

rolls the Log Server that is running on port 8002 on the host named web002.

Backing up and restoring an Endeca project


The backup process allows you to take a snapshot of your projects including users, rule groups, and permissions data. This process does not include the provisioning information for an application. For backup purposes, an Endeca project is composed of three pieces:

Instance configuration - the Endeca project files for all the applications managed by the same instance of Web Studio. Web Studio store - a directory that contains a database of users, rule groups, and associated permission information.

Together, the instance configuration and the Web Studio store are the backup. The two are a snapshot of your projects and all their associated user and permission information.

To back up a project:
1 2 Stop the Endeca HTTP service. Copy the webstudiostore directory, including all its subdirectories, from the %ENDECA_CONF%\state\ directory (on Windows) or $ENDECA_CONF/state/ (on UNIX) to another location. Note: Recall that the default location of %ENDECA_CONF% on Windows is C:\Endeca\MDEXEngine\workspace. 3 Copy the emanager directory, including all its subdirectories, from the %ENDECA_CONF%\state\ directory (on Windows) or $ENDECA_CONF/state/ (on UNIX) to another location.

Endeca Confidential

Performing system operations

42

In 5.1.3 or later, back up your Web Studio customization files: a b Navigate to %ENDECA_CONF%\conf directory (on Windows) or $ENDECA_CONF/conf (on UNIX). Copy ws-extensions.xml, ws-mainMenu.xml, and ws-roles.xml to another location.

To restore a backup:
Note: You can only restore a back up to an Endeca installation that is exactly the same version as the one on which you made the backup (for example, from 5.1.3 to 5.1.3, but not from a different 5.1.x version to 5.1.3). For information about transferring project and user information when upgrading Endeca, see the Endeca Migration Guide. 1 2 Stop the Endeca HTTP service. Delete the webstudiostore and emanager directories from %ENDECA_CONF%\state\ (on Windows) or $ENDECA_CONF/state/ (on UNIX). Copy the webstudiostore directory that you backed up earlier to %ENDECA_CONF%\state\ (on Windows) or $ENDECA_CONF/state/ (on UNIX). Copy the emanager directory that you backed up earlier to %ENDECA_CONF%\state\ (on Windows) or $ENDECA_CONF/state/ (on UNIX). In 5.1.3 or later, copy the ws-extensions.xml, ws-mainMenu.xml, and ws-roles.xml files that you backed up earlier to %ENDECA_CONF%\conf (on Windows) or $ENDECA_CONF/conf (on UNIX). Start the Endeca HTTP service.

If you are restoring the backup to a new Endeca installation, users will be unable to log in to Web Studio until the corresponding applications have been provisioned in the EAC.

Downloading an instance configuration


The Instance Configuration page under Application Settings in Web Studio allows you to view and download the instance configuration that is currently

Administrators Guide Chapter 3: Managing System Operations with Web Studio

Endeca Confidential

43

being used by Web Studio. The project XML files that make up the instance configuration are zipped into one file. This feature is intended primarily for debugging and support purposes. See the Endeca Web Studio Help for how to download an instance configuration. For information on transferring your instance configuration from staging to production environment, and using the emgr_update utility, see Transferring Endeca Implementations Between Environments on page 275.

Monitoring the system status


Each host and component that you provision in an Endeca application displays its system status on the EAC Administration Console page of Web Studio. Web Studio displays a summary of the components status in the collapsed view of the Hosts tab and Components tab. You can access details about each component (except the Log Server, which does not log its own activities) via the status link next to each component. Clicking the status link displays start time, duration (how long the component has been running), and the last time Web Studio checked the components status. With Auto-Refresh selected, Web Studio automatically refreshes status at frequent intervals.

Viewing component logs


To view the latest log for an Endeca component (except for the Log Server which does not log its own actions), check the value of Log File for a component as indicated on the Components tab of the Administration Console page. Then browse to the Log File directory and open the component log.

Viewing system logs


In addition to viewing component logs, you can also check the Endeca Application Controller and Web Studio logs that are located in the workspace/logs directory.

Endeca Confidential

Monitoring the system status

44

The Web Studio log (webstudio.log) logs activity such as user logins, dynamic business rule changes, automatic phrase creation and modification, and so on. Business rule logging records when a rule was modified, who modified the rule (according to Web Studio user name), and the name of the rule. Business rule logging does not record specific changes to the rules configuration such as changes to its trigger values, target values, rule properties, and so on.

Controlling system logs


Web Studio and the Endeca Application Controller generate their own system logs. These files are configured by the logging.properties file, which is located in the workspace/conf directory.

Log file naming and rolling


Both the component and system logs have a default size limit of 1G. Each of these logs, named in the form <log name>.<rotation number>.log, is part of a two-log rotation that rolls automatically when the maximum size is reached. When the second log file reaches the maximum size, the first is overwritten. For example, the Endeca Application Controllers log file, main.log, appears as the file main.0.log. When main.0.log reaches the 1G size limit, the system starts to write to main.1.log. Once main.1.log reaches the 1G size limit, main.0.log is overwritten. This default configuration can be changed by modifying the appropriate lines in the logging.properties. For example, to change the logging properties of main.log, modify the following lines:
1endeca.java.util.logging.FileHandler.limit = 1000000000 1endeca.java.util.logging.FileHandler.count = 2 1endeca.java.util.logging.FileHandler.pattern = ${catalina.base}/logs/main%g.log 1endeca.java.util.logging.FileHandler.formatter = java.util.logging.SimpleFormatter com.endeca.eac.main.level = INFO com.endeca.eac.main.handlers = 1endeca.java.util.logging.FileHandler

Note: For backward compatibility, Web Studio log levels can still be configured using system properties.

Administrators Guide Chapter 3: Managing System Operations with Web Studio

Endeca Confidential

45

Refreshing the status information


You can manually refresh the status display by clicking the Refresh Status button. You can also set the page to be refreshed automatically (at a pre-set interval) by checking the Auto Refresh checkbox. This option is useful when a baseline update is in progress and the system state changes frequently. By default, this option is turned off because the overall system state changes infrequently.

Changing Endeca HTTP service ports


The port on which the Endeca HTTP service and also Web Studio listen are specified in the server.xml file, which is located in the %ENDECA_CONF%\conf directory ($ENDECA_CONF/conf for UNIX). The file also specifies the default server port. The default values are:

Port 8090 for the Endeca HTTP service shutdown port. Port 8888 for the Endeca HTTP service port.

You can change either or both of these ports, as long as you choose a new port that is not being used.

To change the Endeca HTTP service shutdown port:


1 2
<!--

Open the server.xml file in a text editor. Find the Server element in the file:

NOTE: ENDECA HAS MODIFIED TOMCAT'S DEFAULT SERVER PORT OF 8005. ENDECA'S USES A DEFAULT SERVER PORT OF 8090

--> <Server port="8090" shutdown="SHUTDOWN" debug="0">

3 4 5

Change the number in the port attribute to the new port you want to use. Save and close the server.xml file. Restart the Endeca HTTP service. On UNIX: a Stop the Endeca HTTP service using:

Endeca Confidential

Changing Endeca HTTP service ports

46

$ENDECA_ROOT/tools/server/bin/shutdown.sh

Restart the Endeca HTTP service using:


$ENDECA_ROOT/tools/server/bin/startup.sh

On Windows: a b c From the Windows Control Panel, select Administrative Tools, and then select Services. In the right pane of the Services window, right-click Endeca HTTP service and choose Restart. Close the Services window.

To change the Endeca HTTP service port:


1 2 Open the server.xml file in a text editor. Find the non-SSL HTTP/1.1 Connector element:

<!-- NOTE: ENDECA HAS MODIFIED THE DEFAULT TOMCAT NON-SSL HTTP PORT OF 8080. ENDECA' USES A DEFAULT NON-SSL HTTP PORT OF 8888 --> <!-- Define a non-SSL HTTP/1.1 Connector on port 8888 --> <Connector className="org.apache.catalina.connector.http.HttpConnector" port="8888" minProcessors="5" maxProcessors="75" enableLookups="true" redirectPort="8443" acceptCount="10" debug="0" connectionTimeout="60000"/>

3 4 5

Change the number in the port attribute to the new port you want the Endeca Application Controller and Web Studio to use. Save and close the server.xml file. Restart the Endeca HTTP service, as documented in step 5 of the previous procedure.

Encoding of workflow e-mails in Web Studio


Any time a user makes a change to the workflow state of a dynamic business rule and clicks Save Changes, the Add a note page displays. On the Add a note page, the user can choose to click Add to store a note or Add and Email to launch an e-mail client to send a change notification as

Administrators Guide Chapter 3: Managing System Operations with Web Studio

Endeca Confidential

47

well as the text of the note. For more information about workflow for business rules in Web Studio, see the Endeca Business Users Guide. To support non-ASCII characters in workflow e-mails, you can configure Web Studio to use UTF-8 encoding. Note that some e-mail clients, including Microsoft Outlook 2003, do not support UTF-8 encoding in mailto URLs, which causes extended characters not to display properly. You should only enable UTF-8 encoding if you are certain that it is supported on all e-mail clients in your organization. The default setting in Web Studio encodes workflow e-mail notifications using the escape function in JavaScript. On most systems this results in ISO-8859-1 encoding (which is supported by Outlook), but the actual encoding may depend on system settings on the client machine.

To enable UTF-8 URL encoding in workflow e-mails:


1 2 3 Stop the Endeca HTTP service. Navigate to %ENDECA_CONF%\conf (on Windows) or $ENDECA_CONF/conf (on UNIX). Open the webstudio.properties file, and locate the com.endeca.webstudio.useUTF8InMailToUrls property, for example:
# URL encoding for workflow emails com.endeca.webstudio.useUTF8InMailToUrls=false

Change the value of the property to true, for example:


com.endeca.webstudio.useUTF8InMailToUrls=true

5 6

Save and close the file. Start the Endeca HTTP service.

Note that although UTF-8 support varies depending on the default e-mail client on each users machine, this setting applies to all workflow e-mail messages created by Web Studio.

Endeca Confidential

Encoding of workflow e-mails in Web Studio

48

Administrators Guide Chapter 3: Managing System Operations with Web Studio

Endeca Confidential

Chapter 4

Managing Users in Web Studio


This chapter describes the Web Studio user and permissions model, and how to manage users within Web Studio. The chapter includes the following sections:

Users, roles, and permissions in Web Studio Web Studio user roles Assigning rule group permissions to Web Studio users LDAP integration with Web Studio

50

Users, roles, and permissions in Web Studio


Web Studio users log in to an application in Web Studio with basic user name and password authentication. Before a business user can log in to an application in Web Studio, a Web Studio administrator with admin or settings permissions must create a profile for the user that includes the following:

user name password roles and permissions user identity information such as first name, last name, and e-mail address

Roles dictate which Web Studio features are available to users. User identity information provides a way to associate name and contact information with user names in Web Studio. If you have Web Studio configured to use LDAP for user authentication, an administrator can create a user profile where the password and identity information is stored and managed in an LDAP directory. LDAP integration also allows you to assign roles and permissions across an entire LDAP group rather than configuring each user individually. For more information about configuring Web Studio with LDAP, see LDAP integration with Web Studio. Each business user profile is associated with a specific application and a business user profile cannot span applications. In cases where you might want the same user in multiple applications, an administrator can create a number of identical business user profiles for any number of applications. Administrators, on the other hand, span applications across Web Studio. For the process to add users and modify user names, passwords, and roles, see the Endeca Web Studio Help.

Web Studio predefined admin user


Web Studio has a predefined administrator with full administration privileges. An administrator is granted all roles in the system. The user name for the predefined Web Studio administrator is admin and the default

Administrators Guide Chapter 4: Managing Users in Web Studio

Endeca Confidential

51

password is admin. After signing in as the admin user, you can modify the password but not the user name. The admin user can create additional administrators in Web Studio. Only an administrator can create other administrators. An administrator can also delete other administrators, including the predefined admin user, as long as there is always at least one administrator in the system. If you have LDAP authentication enabled, see also Administrators in Web Studio with LDAP on page 60. An administrator is not associated with an application in the same way that business users are. Each business user is associated with a particular application. Administrators span applications, so an administrator can add or remove applications without being affected by that addition or removal.

Web Studio user roles


A Web Studio administrator can assign users any of the following roles in the table below. A user who does not have any roles assigned is unable to log in to Web Studio. For information about how to add and configure users, see the Endeca Web Studio Help. Role name
admin

Role description
This is a cumulative role that provides access to pages enabled by all the predefined user roles in Web Studio. This role cannot be assigned to users, but is automatically assigned to any administrators that you create. It is possible to disable admin users from modifying provisioning information. For more information, see Disabling the admin role from modifying provisioning information on page 55.

eacconsole

Provides access to the EAC Admin Console page. Users with this role cannot modify provisioning information on the EAC Admin Console. However, users can start and stop Endeca components and EAC scripts. Provides access to the Dimension Order page. Provides access to the Phrases page.

dimorder phrases

Endeca Confidential

Web Studio user roles

52

Role name
redirects reporting rules settings

Role description
Provides access to the Keyword Redirects page. Provides access to the Reporting page. Provides access to the Rule Manager page. Provides access to all pages under the Application Settings section. This includes the following pages: Instance Configuration, Resource Locks, User Management, Rule Group Permissions, Preview App Settings. Provides access to the Stop Words page. Provides access to the Thesaurus page.

stopwords thesaurus

Custom user roles in Web Studio


In addition to the predefined user roles in Web Studio, you can also define custom roles, for instance to control access to Web Studio extensions. For more information about extensions, see Web Studio extensions on page 77. Like the predefined user roles, custom roles span applications. Administrators are automatically granted all roles including custom roles. Custom roles are defined in the ws-roles.xml file in %ENDECA_CONF%\conf (on Windows) or $ENDECA_CONF/conf (on UNIX). The default ws-roles.xml file is as follows:
<?xml version="1.0" encoding="UTF-8"?> <roles xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="roles.xsd"> </roles>

Administrators Guide Chapter 4: Managing Users in Web Studio

Endeca Confidential

53

Each role is defined in a role element within roles. You can specify as many additional roles as you need by adding more role elements. The following attributes must be defined for each role: Attribute name
id

Attribute value
A unique string identifying this role. Do not define a custom role with the same id as one of the predefined user roles: admin, crawler, dimorder, eacconsole, phrases, redirects, reporting, rules, settings, stopwords, thesaurus. Roles are listed in alphabetical order by id in the User Management page in Web Studio. Note: Modifying this value after the rule is created deletes the original role and creates a new role.

defaultName

The display name for this role that appears on the User Management page in Web Studio. A brief description of this role that appears on the User Management page in Web Studio.

defaultDescription

Example
This example of a ws-roles.xml file defines two custom roles.
<?xml version="1.0" encoding="UTF-8"?> <roles xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="roles.xsd"> <role id="roleA" defaultName="roleA" defaultDescription="Provides access to an extension page" /> <role id="roleB" defaultName="roleB" defaultDescription="Provides access to another extension page" /> </roles>

Role names and descriptions for multiple locales


If you support multiple locales in Web Studio, you can optionally specify localized names and descriptions for custom roles.

Endeca Confidential

Web Studio user roles

54

Localized names are defined in a names element within role that contains one or more name elements. Localized descriptions are defined in a descriptions element within role that contains one or more description elements. The name and description elements require a locale attribute whose value is a valid ISO language code.

Example
This example of a ws-roles.xml file defines a custom role with separate names and descriptions for English and French.
<?xml version="1.0" encoding="UTF-8"?> <roles xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="roles.xsd"> <role id="localized" defaultName="localized" defaultDescription="A role with localized names" > <names> <name locale="en">localized</name> <name locale="fr">localis</name> </names> <descriptions> <description locale="en">A localized role</description> <description locale="fr">Un rle localis</description> </descriptions> </role> </roles>

Web Studio checks for a name and description that matches the locale defined in the current installation of Web Studio. If no matching localized name or description is found, the defaultName and defaultDescription values are used.

Enabling custom roles in Web Studio


To update Web Studio to use custom user roles:
1 Make a backup of your Endeca project, including the Web Studio customization files (especially ws-roles.xml). See Backing up and restoring an Endeca project on page 41. Stop the Endeca HTTP service.

Administrators Guide Chapter 4: Managing Users in Web Studio

Endeca Confidential

55

3 4 5 6

Navigate to %ENDECA_CONF%\conf (on Windows) or $ENDECA_CONF/conf (on UNIX). Open ws-roles.xml in a text editor and add or modify roles as necessary. See Custom user roles in Web Studio on page 52. Save and close the file. Start the Endeca HTTP service.

IMPORTANT: Deleting a role causes all the user assignments to that role to be deleted across all applications. Modifying the id attribute of a role deletes the original role (and its corresponding user assignments) and creates a new role with the new id. Modifications to any other attributes are saved when you update Web Studio and user assignments are preserved. To recover a deleted role along with its user assignments, restore the backups made in Step 1. See Backing up and restoring an Endeca project on page 41.

Disabling the admin role from modifying provisioning information


By default, the admin role allows an administrator to modify provisioning information on the EAC Admin Console page. If necessary, you can disable the admin role from modifying provisioning information.

To disable the admin role from modifying provisioning:


1 2 3 4 Stop the Endeca HTTP service. Navigate to %ENDECA_CONF%\conf (on Windows) or $ENDECA_CONF/conf (on UNIX). Open webstudio.properties in a text editor. Change the com.endeca.webstudio.allow.eac.provisioning property from true to false as shown:
com.endeca.webstudio.allow.eac.provisioning=false

5 6

Save and close the file. Start the Endeca HTTP service.

Endeca Confidential

Web Studio user roles

56

Assigning rule group permissions to Web Studio users


Rule group permissions control how Web Studio users access rule groups and the rules contained in the groups. An administrator uses Web Studio to assign rule group permissions in either of two ways:

Assign by group on the Rule Group Permissions page Assign by user name on the User Management page

There are four permission levels available for rule group access. A user may have one of the following permissions for each rule group:

ApproveThe user has permission to view, edit, and approve rules in the group. EditThe user has permission to view and edit rules in the group but no permission to approve rules. View OnlyThe user has permission to view rules in the group but no permission to edit or approve rules. NoneThe user has no permission to view, edit, or approve rules in the group. Users with this permission will not see the rule group displayed in Web Studio.

Administrators are automatically assigned Approve permissions in all rule groups. See the Endeca Web Studio Help for the procedures to assign rule group permissions to Web Studio users.

LDAP integration with Web Studio


If you have Web Studio configured to use LDAP for user authentication, an administrator can create a user profile in Web Studio that is associated with a user in an LDAP directory. LDAP integration also allows you to assign roles and permissions across an entire LDAP group rather than configuring each user individually.

Administrators Guide Chapter 4: Managing Users in Web Studio

Endeca Confidential

57

For users who are configured in Web Studio to authenticate via LDAP, the password and identity information such as name and e-mail address are maintained in the LDAP directory. Web Studio does not write any data to the LDAP directory. Any roles and permissions assigned to an LDAP user profile in Web Studio are stored in the Web Studio database. LDAP user and group profiles can be used in combination with the traditional Web Studio user profiles that an administrator configures manually. Users can authenticate via either method on the same instance of Web Studio and in the same application. Optionally, you can enable SSL for communication between Web Studio and your LDAP server. For more information on using LDAP with SSL, see the Endeca Security Guide. Web Studio supports integration with LDAP servers that comply with LDAP version 3.

Authentication of users in Web Studio with LDAP enabled


User authentication via LDAP can be used in combination with the traditional method of authentication for users that are configured manually in Web Studio. Web Studio follows this order of events when a user attempts to log in: 1 Web Studio checks whether the user name matches the name of any manually configured Web Studio user profile in the current application. If such a user exists, Web Studio attempts to authenticate the user against the password stored in the Web Studio user profile. If no manually configured user of that name exists, Web Studio attempts to authenticate the user against the LDAP directory. If the user also has a profile configured as an LDAP user in Web Studio, then any associated roles and permissions are applied. If the user is an administrator or if the user profile has the Override LDAP Group Permissions option selected, then the user enters Web Studio with the roles and permissions specified in the user profile. Otherwise, Web Studio checks the LDAP directory for any groups of which the user is a member. If any of these groups have a profile configured in Web Studio, then any roles and permissions associated with all the groups are applied to the user. For more details about

2 3

Endeca Confidential

LDAP integration with Web Studio

58

inheritance of LDAP group roles and permissions, see Roles and permissions for LDAP users and groups on page 59.

Troubleshooting user authentication in Web Studio with LDAP enabled


If a user cannot log in to Web Studio, one of the following may be the case:

A user is attempting to log in with a user name and password defined in the LDAP directory, but there is a manually configured Web Studio user in the same application with the same user name or a Web Studio administrator with the same user name. A user with a manually configured profile always takes precedence over a user authenticating via LDAP. For more details about the behavior of users with the same name, see User profiles for LDAP users and groups on page 59.

A manually configured profile exists for the user in Web Studio and the password provided does not match the password stored in the user profile. No manually configured profile exists for the user in Web Studio but the user exists in the LDAP directory, and the password provided does not match the password stored in the LDAP directory. No profile exists for the user in Web Studio either as a Web Studio user or an LDAP user, and none of the LDAP groups of which the user is a member have a user profile configured in Web Studio. One or more user profiles exist for the user or for groups of which the user is a member, but none of the profiles specify any roles. A user who does not have any associated roles cannot log in to Web Studio. No manually configured profile exists for the user in Web Studio, and no such user exists in the LDAP directory or the query to the LDAP server returns more than one result. Web Studio does not handle the case of more than one user object with the same user name specified in the LDAP directory. If no users are able to authenticate via LDAP, there may be a problem with the configuration in %ENDECA_CONF%\conf\Login.conf (on Windows) or $ENDECA_CONF/conf/Login.conf (on UNIX).

Check the error messages in the Web Studio log for more information about the causes of authentication failures.

Administrators Guide Chapter 4: Managing Users in Web Studio

Endeca Confidential

59

User profiles for LDAP users and groups


If LDAP authentication is enabled for Web Studio, you have the option of creating user profiles in Web Studio for individual users or groups managed in an LDAP directory. For more information on creating user profiles in Web Studio, see the Endeca Web Studio Help. A user profile is uniquely identified in Web Studio by the combination of the user name and user type (Web Studio user, LDAP user, or LDAP group) in each application. Administrators are uniquely identified by the combination of user name and user type across all of Web Studio. In the case that an LDAP directory defines a user and a group with the same name, this allows profiles to exist in Web Studio for both the user and the group. Once a user profile is created and saved in Web Studio, the user type cannot be changed. Note that because of the order in which Web Studio handles logins, a Web Studio user always takes precedence over an LDAP user. For example, if there is a manually configured Web Studio user named lsmith, a user with the name lsmith in the LDAP directory will not be able to log in with the credentials stored in LDAP, even if there is a user profile in Web Studio for lsmith as an LDAP user or as a member of an LDAP group. However, there is no conflict between manually configured users and LDAP groups. For example, if Web Studio has a manually configured user named Marketing, and also has a profile for an LDAP group named Marketing, members of the Marketing group in LDAP are able to log in to Web Studio as long as there are no conflicts between the LDAP user name and the name of a manually configured Web Studio user. (For example, if one of the users in the group has the name Marketing in the LDAP directory, the Web Studio user named Marketing will still take precedence.)

Roles and permissions for LDAP users and groups


The user profiles you create in Web Studio allow you to assign roles and rule group permissions to an LDAP user or group. Users that exist in the LDAP directory but do not have a profile and associated roles specified in Web Studio, either as an individual or as a member of an LDAP group, cannot log in to Web Studio. A user who authenticates via LDAP is assigned the union of all roles associated with all groups of which that user is a member. For each rule

Endeca Confidential

LDAP integration with Web Studio

60

group in the application, a user who is a member of multiple LDAP groups defined in Web Studio is assigned the broadest permission associated with any of the LDAP groups of which that user is a member. If you create an LDAP user profile in Web Studio for an individual who is also a member of one or more LDAP groups defined in Web Studio, that user is assigned any roles you specify on the User Management page in addition to any roles that the user inherits from membership in LDAP groups. If you specify rule group permissions for an LDAP user who is also a member of an LDAP group, then for each rule group, the user is assigned either the permission specified on the User Management page or the broadest permission associated with any of the users LDAP groups, whichever is broader. You can override this behavior by specifying Override LDAP Group Permissions when creating the profile in Web Studio. If you select this option, the user is assigned only the roles and permissions you specify in the user profile, and does not inherit any roles or permissions from LDAP groups.

Administrators in Web Studio with LDAP


If you have LDAP enabled, you can create profiles for both LDAP users and LDAP groups as administrators in Web Studio. Note that the same precedence rules apply when logging in to Web Studio as for non-administrators, so that if a manually configured user profile exists for either an administrator or non-administrator in Web Studio, a user will not be able to log in via LDAP with the same user name. Starting with 5.1.3, the predefined admin user can be renamed or deleted. If you want to create an LDAP user named admin, you must do one of the following to enable that user to log in to Web Studio:

Rename the predefined admin user. If you have created another administrator as a manually configured Web Studio user, you can delete the predefined admin user.

Note that administrators can still delete other administrators, but there must be at least one manually configured Web Studio administrator. This is to ensure that changes to the LDAP directory or disabling of LDAP authentication for Web Studio cannot disable all administrator logins.

Administrators Guide Chapter 4: Managing Users in Web Studio

Endeca Confidential

61

Workflow notifications for LDAP users and groups


For users who authenticate via LDAP, Web Studio uses the e-mail address that is stored in the LDAP directory for workflow notification messages. Whenever a rule is modified, Web Studio saves the user name of the editor who modified the rule for notification purposes, whether the user is defined as a Web Studio user, an LDAP user, or a member of an LDAP group. When workflow notifications are sent out, Web Studio looks up the users e-mail address in the user profile or in the LDAP directory as appropriate. If an approver for a rule group is an LDAP group, then Web Studio attempts to find an e-mail address associated with the group in LDAP. When a user changes a rules workflow state (by activating, deactivating, requesting activation, requesting deactivation, cancelling a request for a rule, or rejecting a request) and clicks Save Changes, Web Studio writes a message to the log similar to the following:
INFO: User mmartin made a workflow state change. INFO: Email addresses were retrieved for the following users or groups: Web Studio User batkins, LDAP User lsmith INFO: Email addresses could not be found for the following users or groups: LDAP Group rule_approvers, Web Studio User admin

This information is only captured in the log; the user in Web Studio will not see any message about whether e-mail addresses could be found. Because Web Studio launches another application to send the e-mail and the user can edit the list of recipients before sending the message, the Web Studio log cannot record whether an e-mail was sent, or the actual recipients of the message.

Enabling LDAP authentication in Web Studio


Because LDAP configuration is unique to each LDAP server and directory, enabling LDAP authentication for Web Studio is a manual process.

To enable LDAP authentication in Web Studio:


1 2 Stop the Endeca HTTP service. Navigate to %ENDECA_CONF%\conf (on Windows) or $ENDECA_CONF/conf (on UNIX).

Endeca Confidential

LDAP integration with Web Studio

62

Open the webstudio.properties file, and locate the com.endeca.webstudio.useLdap property, for example:
# LDAP Authentication com.endeca.webstudio.useLdap=false

Change the value of the property to true, for example:


com.endeca.webstudio.useLdap=true

5 6

Save and close the file. Open the Login.conf file. This file contains a sample configuration for LDAP authentication. Note: By default, Web Studio uses the authentication profile in this location. You can specify an alternate configuration file. For more information, see Specifying the location of the configuration file on page 63.

Uncomment and modify the Webstudio profile according to your LDAP configuration. For details about profile parameters, see Configuration of the Webstudio login profile for LDAP on page 62. Save and close the file. Start the Endeca HTTP service.

8 9

Disabling LDAP authentication for Web Studio


If you disable LDAP authentication for Web Studio by setting the property com.endeca.webstudio.useLdap=false in the webstudio.properties file, the options to create a user profile for an LDAP user or an LDAP group do not display in Web Studio. All new user profiles you create must be manually configured in Web Studio. Any users who were configured as LDAP users or as members of an LDAP group are no longer able to log in to Web Studio. Although they are inactive, any existing user profiles for LDAP users or LDAP groups remain in Web Studio and can be edited by an administrator.

Configuration of the Webstudio login profile for LDAP


Web Studio uses the Java Authentication and Authorization Service (JAAS) to authenticate users against an LDAP directory. The configuration information that Web Studio uses for LDAP authentication is stored in a

Administrators Guide Chapter 4: Managing Users in Web Studio

Endeca Confidential

63

profile named Webstudio in %ENDECA_CONF%\conf\Login.conf (on Windows) or $ENDECA_CONF/conf/Login.conf (on UNIX). A sample profile is included in this location by default, but you should modify its parameters as needed for your LDAP configuration. You can also specify an alternate location for the configuration file. If you want to configure JAAS authentication for other applications running in the Endeca HTTP service, for example, for the Standard Application or your own Endeca implementation, create additional profiles with unique names in this same Login.conf file. For more information on configuring JAAS authentication for your Endeca application using LDAP or a local password file, see the Endeca Security Guide. Note: A Login.conf file exists in %ENDECA_CONF%\etc (on Windows) and $ENDECA_CONF/etc (on UNIX). This file contains a sample profile for file-based authentication and is not used by Web Studio.

Specifying the location of the configuration file


By default, Web Studio uses %ENDECA_CONF%\conf\Login.conf (on Windows) or $ENDECA_CONF/conf/Login.conf (on UNIX) as its configuration file. You can substitute any configuration file that includes a JAAS profile named Webstudio. The file does not have to be named Login.conf, but it should be saved in UTF-8 format. If you want to store the configuration file in a different location, you can pass this location to the Java JVM. How you specify the location depends on how you run the Endeca HTTP service.

If you are running the Endeca HTTP service as a Windows service:


1 Open the Registry Editor and look for the following key:
HKEY_LOCAL_MACHINE\SOFTWARE\Apache Software Foundation\ Procrun 2.0\EndecaHTTPservice\Parameters\Java\Options

Note: In the Registry Editor Explorer pane, expand the folders until you reach Java. Then click on the Java folder and look for the Options setting in the right pane. 2 Edit the Options setting and look for the following parameter:
-Djava.security.auth.login.config=%ENDECA_CONF%/conf/Login.conf

Change the path to point to the location of your configuration file.

Endeca Confidential

LDAP integration with Web Studio

64

If you are running the Endeca HTTP service on Windows from the command line:
1 2 3 Navigate to the %ENDECA_ROOT%\tools\server\bin directory. Open the setenv.bat file. Locate the line that begins with set JAVA_OPTS, for example:

set JAVA_OPTS=-Xmx1024m -XX:MaxPermSize=128m -Djava.security.auth.login.config=%ENDECA_CONF%/conf/Login.conf

Change the path of the -Djava.security.auth.login.config parameter to point to the location of your configuration file.

If you are running the Endeca HTTP service on UNIX:


1 2 3 Navigate to the $ENDECA_ROOT/tools/server/bin directory. Open the setenv.sh file. Locate the line that begins with JAVA_OPTS=, for example:

JAVA_OPTS="-Xmx1024m -XX:MaxPermSize=128m -Djava.security.auth.login.config=$ENDECA_CONF/conf/Login.conf"

Change the path of the -Djava.security.auth.login.config parameter to point to the location of your configuration file.

Templates used in the Webstudio profile


Web Studio allows templates to be supplied for certain configuration parameters in the Webstudio JAAS profile. These templates, indicated by %{} escapes, allow values from the authentication operation, such as the user or group name, and the values from the user or group objects, to be substituted into the parameter value. The %{} escapes are expanded as follows: Escape
%{#username}

Description
The name of the LDAP user as defined in the user profile in Web Studio, or the user name entered by a user at the Web Studio login page. The name of the LDAP group as defined in the user profile in Web Studio.

%{#groupname}

Administrators Guide Chapter 4: Managing Users in Web Studio

Endeca Confidential

65

Escape
%{#dn}

Description
The distinguished name of the user or group object in the LDAP directory. The value of the path field at index n in the distinguished name of the user or group object in LDAP. For example, if the value in the %{#dn} field is cn=joe,ou=People,dc=foo,dc=com, then the value People will be substituted for %{#dn:1}, while joe will be substituted for %{#dn:0}. Note that unlike the value of %{#dn}, which is the raw value returned from the LDAP server, the values returned by this template are not LDAP escaped.

%{#dn:n}

%{#fieldname}

The value in the fieldname field of the user object (or group object when used in the groupTemplate parameter) under consideration.

Configuration parameters for the Webstudio profile


You specify the values of configuration parameters for LDAP authentication as quoted strings. If there are any quotation marks (") or backslashes (\) in the string, they must be escaped. For example, if you have the following string: "A string with an "embedded quote" and a \backslash" In the profile, it should be specified as follows:
"A string with an \"embedded quote\" and a \\backslash"

For most parameter values, single quotation marks (') do not need to be escaped and the values you specify for the parameters can include UTF-8 characters. For additional restrictions on the userPath, groupPath, and findGroupPath parameters, see LDAP path parameters on page 68. The following parameters can be specified in the profile: Parameter
serverInfo

Description
A URL specifying the name and port of the LDAP server to be used for authentication. You can specify multiple LDAP servers.

Endeca Confidential

LDAP integration with Web Studio

66

Parameter
userPath

Description
The query that is passed to the LDAP server to find an individual user. When appended to the URL in the serverInfo parameter, this should form a valid LDAP URL as described in RFC 2255. Optional. The name of the attribute on the user object that contains the users first name. Optional. The name of the attribute on the user object that contains the users last name. Note: Web Studio requires at least one of the name fields to be specified. If the LDAP directory does not have separate fields for first and last name, you can map the full name of the user to either the firstNameAttribute or the lastNameAttribute parameter.

firstNameAttribute

lastNameAttribute

emailAttribute

Optional. The name of the attribute on the user object that contains the users e-mail address. This information is used for workflow notifications. The query that is passed to the LDAP server to find all the groups of which a user is a member. The query uses the information about the user that is returned by the userPath query. When appended to the URL in the serverInfo parameter, this should form a valid LDAP URL as described in RFC 2255. You can specify multiple values for groupPath. A template that specifies how to produce individual group names from the set of groups returned by the groupPath query. The value of this template should match the name of the LDAP group as defined in the Web Studio user profile. You can specify multiple values for groupTemplate. The query that is passed to the LDAP server to find a specific group. When appended to the URL in the serverInfo parameter, this should form a valid LDAP URL as described in RFC 2255.

groupPath

groupTemplate

findGroupPath

Administrators Guide Chapter 4: Managing Users in Web Studio

Endeca Confidential

67

Parameter
groupEmailAttribute

Description
The name of the attribute on the group object that contains an e-mail address associated with the group in LDAP. This information is used for workflow notifications in the case where an LDAP group is specified as an approver for a rule group. The user name of an administrator login to the LDAP server specified in the serverInfo parameter. For example: "Manager@example.com" or "cn=Manager,dc=example,dc=com" If no value is specified for this option, Web Studio will attempt to authenticate anonymously.

serviceUsername

servicePassword

The password to use in conjunction with the serviceUsername value. Specifies the method of authentication that should be used in connecting to the LDAP server as the administrator account. The permitted values are none, simple, or EXTERNAL.

serviceAuthentication

authentication

Specifies the method of authentication that should be used in rebinding to the LDAP server as a user account. The permitted values are none, simple, or EXTERNAL.

ldapBindAuthentication

Optional. By default this is set to true, and Web Studio authenticates users by rebinding as the user to the LDAP system, thereby employing the LDAP systems own authentication mechanism. Optional. A template login name that will be used to rebind to the LDAP server if ldapBindAuthentication is true. Default value is %{dn}.

loginName

Endeca Confidential

LDAP integration with Web Studio

68

Parameter
passwordAttribute

Description
Optional. The name of the attribute on the user object that contains the users password. Used only if ldapBindAuthentication is set to false. The field specified must contain the users password in clear text. By default this is set to userPassword. Optional. Determines whether Web Studio checks passwords during logins. Default value is true. If set to false, Web Studio uses only the user name to authenticate from the LDAP directory. Optional. Default value is false. If set to true, Web Studio will make mutually authenticated SSL connections to the LDAP server. If you set the parameter, ensure that you have configured the LDAP server to use SSL and that the value of serverInfo has the protocol specified as ldaps:// with an SSL port.

checkPasswords

useSSL

keyStoreLocation

Used only if useSSL=true. The location of the Java keystore, which stores keys and certificates. The keystore is where Java gets the certificates to be presented for authentication. The location of the keystore is OS-dependant, but is often stored in a file named .keystore in the users home directory. Note: Even if this location is on a Windows system, the path uses forward slashes, (/) not backslashes (\).

keyStorePassphrase

Used only if useSSL=true. The passphrase used to open the keystore file.

LDAP path parameters


The userPath, groupPath, and findGroupPath parameters must conform to RFC 2255. This means that certain characters must be encoded in order for the path parameters to form a valid LDAP URL when appended to the value of the serverInfo parameter. Both LDAP and URL encoding may apply to these strings depending on your data. If possible, verify the URL by passing

Administrators Guide Chapter 4: Managing Users in Web Studio

Endeca Confidential

69

it to your LDAP server before specifying it in the configuration for Web Studio. LDAP encoding affects reserved characters such as the comma (,), equals sign (=), and question mark (?). These characters must be escaped by prepending a backslash (\) when they are not used for their reserved purpose, for example if they appear within a common name or organizational unit. URL encoding affects characters that are invalid for URLs, such as non-ASCII characters and any unsafe characters as defined in RFC 1738. This includes reserved LDAP characters when they are not used for their reserved purpose. These characters must be replaced with the % sign followed by the appropriate hex code. For example, if you have the following string as part of your userPath:
ou=Endeca Technologies, Inc.

Applying LDAP encoding produces the following result:


ou=Endeca Technologies\, Inc.

And applying URL encoding to the LDAP-encoded string produces:


ou=Endeca%20Technologies%5C%2C%20Inc.

Any non-ASCII characters or any other characters that are not valid in an LDAP URL must also be properly encoded in the string that you specify in the Webstudio profile.

Specifying multiple values for parameters in the Webstudio profile


You can specify multiple LDAP servers with multiple instances of the serverInfo parameter, by using the format:
serverInfo.n = ldap://server_url:port_number

For example:
serverInfo.0="ldap://web01.endeca.com:1234" serverInfo.1="ldap://web02.endeca.com:1230" serverInfo.2="ldap://web03.endeca.com:1334"

If you specify multiple LDAP servers, the servers are assumed to be equivalent. The choice of which LDAP server to contact is made randomly. If an LDAP server cannot be reached, the LoginModule plug-in proceeds through the

Endeca Confidential

LDAP integration with Web Studio

70

remaining servers in order of configuration, wrapping if necessary. For example, if five servers are configured and Server 3 is the first to be contacted, the remaining order of contact is Server 4, Server 5, Server 1, and finally Server 2. You can also specify multiple values for the groupPath attribute by using the same format, for example:
groupPath.0="/ou=groups,dc=endeca,dc=com??sub?(member=%{#dn})" groupPath.1="/dc=endeca,dc=com?memberOf?sub?(AccountName=%{#use rname})"

If you specify more than one groupPath, Web Studio sends all the queries to the LDAP server to discover the groups of which a user is a member. You can specify corresponding values for groupTemplate for each groupPath. In this case, the value for groupTemplate.0 is applied to the results of the groupPath.0 query, groupTemplate.1 is applied to the results of groupPath.1, and so on.

Administrators Guide Chapter 4: Managing Users in Web Studio

Endeca Confidential

Chapter 5

Customizing Web Studio


This chapter describes how to customize the Web Studio interface and how to add extensions to Web Studio. The chapter includes the following sections:

The navigation menu and launch page Web Studio extensions

72

The navigation menu and launch page


You can configure the items in the navigation menu on the left and on the launch page of Web Studio by modifying the ws-mainMenu.xml file in %ENDECA_CONF%\conf (on Windows) or $ENDECA_CONF/conf (on UNIX). By editing ws-mainMenu.xml, you can do any of the following:

Add a new menu item. Remove an item from the menu. Specify the order in which the menu items display. Specify whether an item is in the top-level menu or in a submenu. Specify whether a menu item displays on the launch page.

Navigation menu nodes


A menu item is either a leaf or a node. A node is a top-level menu item that does not link directly to any pages. Instead it has children that are leaf items and are displayed in a submenu. Nodes cannot be displayed on the launch page. Each node is defined in a menunode element in ws-mainMenu.xml that takes the following attributes: Attribute name
id

Attribute value
The id of a predefined node in Web Studio or a unique string identifying a custom node. For more information on predefined nodes, see Predefined menu nodes in Web Studio on page 74. The display name for this node that appears in the navigation menu. This attribute is required for all custom nodes.

defaultTitle

A menunode element requires one or more child menuitem elements (see Navigation menu leaf items on page 74).

Administrators Guide Chapter 5: Customizing Web Studio

Endeca Confidential

73

Example
This example of a ws-mainMenu.xml file defines a custom menu node with extensions as its child items.
<?xml version="1.0" encoding="UTF-8"?> <mainmenu xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="mainMenu.xsd"> <menunode id="myextensions" defaultTitle="My Extensions"> <menuitem id="extensionA"/> <menuitem id="extensionB"/> </menunode> </mainmenu>

Node titles for multiple locales


If you customize a menu for multiple locales in Web Studio, you can optionally specify localized titles for custom menu nodes in a titles element within menunode that contains one or more title elements. The title element requires a locale attribute whose value is a valid ISO language code.

Example
This example of a ws-mainMenu.xml file defines a custom menu node with titles in both English and French.
<?xml version="1.0" encoding="UTF-8"?> <mainmenu xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="mainMenu.xsd"> <menunode id="myextensions" defaultTitle="My Extensions"> <titles> <title locale="en">Access Extensions</title> <title locale="fr">Accder aux extensions</title> </titles> <menuitem id="extensionA"/> <menuitem id="extensionB"/> </menunode> </mainmenu>

Web Studio checks for a title that matches the locale defined in the current installation of Web Studio. If no matching localized title is found, the defaultTitle value is used.

Endeca Confidential

The navigation menu and launch page

74

Predefined menu nodes in Web Studio


There are several predefined menu nodes in Web Studio. You can specify the placement of the predefined nodes in the menu and what items display under them, but you cannot modify the titles or specify localized titles. The predefined nodes in Web Studio are as follows: Node id
searchConfig reporting settings eacconsole

Node description
Search Configuration View Reports Application Settings EAC Administration

Navigation menu leaf items


A leaf is a menu item that links to a page, and that can also have an entry on the launch page. A leaf can be either in the top-level menu or in a submenu as the child of a node. Leaf items cannot have child items. Menu items display in the order in which they are listed in ws-mainMenu.xml. Each leaf in the menu is defined in a menuitem element in ws-mainMenu.xml that takes the following attributes: Attribute name
id

Attribute value
The id of a predefined page in Web Studio or the id of an extension as defined in ws-extensions.xml. For more information about extensions, see Web Studio extensions on page 77.

Required?
yes

onLaunchPage

If set to true, the menu item displays on the launch page in the order in which it is listed in ws-mainMenu.xml. Default value is false.

no

Administrators Guide Chapter 5: Customizing Web Studio

Endeca Confidential

75

The predefined pages and their corresponding ids are as follows: Web Studio page
Rule Manager Keyword Redirects Thesaurus Phrases Stop Words Dimension Order Current Report (Daily) Current Report (Weekly) Daily Reports Weekly Reports EAC Monitor User Management Rule Group Permissions Resource Locks Report Generation Preview App Settings Instance Configuration User Settings (for non-admin users) EAC Admin Console EAC Settings

Menu item id
rules redirects thesaurus phrases stopwords dimorder reporting.currentDaily reporting.currentWeekly reporting.daily reporting.weekly eacMonitor settings.users settings.permissions settings.locks settings.reporting settings.previewApp settings.instanceConfig userSettings eacconsole.console eacconsole.settings

Endeca Confidential

The navigation menu and launch page

76

Example
This example of a ws-mainMenu.xml file defines a menu that shows top-level leaf items, items nested within a predefined node, and items nested within a custom node. Items that have onLaunchPage="true" display in the launch page regardless of whether they are in the top-level menu or in a submenu.
<?xml version="1.0" encoding="UTF-8"?> <mainmenu xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="mainMenu.xsd"> <menuitem id="rules" onLaunchPage="true"/> <menuitem id="redirects" onLaunchPage="true"/> <menunode id="searchConfig"> <menuitem id="thesaurus" onLaunchPage="true"/> <menuitem id="phrases"/> <menuitem id="stopwords"/> </menunode> <menunode id="myextensions" defaultTitle="My Extensions"> <menuitem id="extensionA" onLaunchPage="true"/> <menuitem id="extensionB"/> </menunode> </mainmenu>

Updating the Web Studio menu and launch page


To update the navigation menu and launch page:
1 2 3 4 5 Stop the Endeca HTTP service. Navigate to %ENDECA_CONF%\conf (on Windows) or $ENDECA_CONF/conf (on UNIX). Open ws-mainMenu.xml in a text editor and add or modify menu items as necessary. See The navigation menu and launch page on page 72. Save and close the file. Start the Endeca HTTP service.

Administrators Guide Chapter 5: Customizing Web Studio

Endeca Confidential

77

Web Studio extensions


Extensions enable you to incorporate web applications related to your Endeca implementation as plug-ins to Web Studio. An extension can be as simple as a static web page or it can provide sophisticated functionality to control, monitor, and configure your Endeca applications. Extensions can be hosted on the same server as Web Studio or on another server. Web Studio provides the ability to customize the navigation menu and launch page to include links to extensions. The extension itself is presented within an iFrame in Web Studio and can be themed to inherit the look and feel of the Web Studio interface. The extensibility framework allows extensions to leverage Web Studio user authentication and role-based permissions. Web Studio can also pass information to extensions, such as the EAC Host and applications that it is connected to.

Configuration of extensions in Web Studio


Extensions are defined in the ws-extensions.xml file in %ENDECA_CONF%\conf (on Windows) or $ENDECA_CONF/conf (on UNIX). The default ws-extensions.xml file is as follows:
<?xml version="1.0" encoding="UTF-8"?> <extensions xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="extensions.xsd"> </extensions>

Each extension is defined in an extension element within extensions. You can specify as many additional extensions as you need by adding more extension elements.

Endeca Confidential

Web Studio extensions

78

The extension element takes the following attributes: Attribute name


id

Attribute value
A unique string identifying this extension. Do not define an extension with the same id as one of the predefined Web Studio pages. For a list of predefined Web Studio pages and their ids, see the table in Navigation menu leaf items on page 74.

Required?
yes

defaultName

The display name for this extension that appears in the navigation menu and launch page in Web Studio. A brief description of this extension that appears on the launch page in Web Studio. The fully specified URL to this extension. The extension must be a Web application reachable through HTTP or HTTPS, but it does not have to run on the same server as Web Studio. The fully specified URL to a custom image for this extensions entry on the launch page. The id of the role that is allowed to access this extension. This can be one of the predefined Web Studio user roles, or any custom role. For more information on user roles, see Web Studio user roles on page 51. Each extension can have a maximum of one role, although a single role can allow access to many extensions. If no role is specified, the extension is available to all Web Studio users.

yes

defaultDescription

yes

url

yes

launchImageUrl

no

role

no

height

The height in pixels of the frame in which the extension is displayed. The default value is 500 pixels. A shared key that Web Studio uses to calculate the authentication token. For more information on the authentication token, see Token-based authentication for Web Studio extensions on page 82.

no

sharedSecret

no

Administrators Guide Chapter 5: Customizing Web Studio

Endeca Confidential

79

Example
This example of a ws-extensions.xml file defines a simple extension that enables a link to the Endeca Web site for all admin users.
<?xml version="1.0" encoding="UTF-8"?> <extensions xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="extensions.xsd"> <extension id="endecaHome" defaultName="Endeca home page" defaultDescription="Visit the Endeca home page" url="http://www.endeca.com" role="admin" /> </extensions>

Extension names and descriptions for multiple locales


If you deploy extensions to multiple locales in Web Studio, you can optionally specify localized names and descriptions for extensions. You can define localized names in a names element within extension that contains one or more name elements. You can define localized descriptions in a descriptions element within extension that contains one or more description elements. The name and description elements require a locale attribute whose value is a valid ISO language code.

Example
This example of a ws-extensions.xml file defines an extension with separate names and descriptions for English and French.

Endeca Confidential

Web Studio extensions

80

<?xml version="1.0" encoding="UTF-8"?> <extensions xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="extensions.xsd"> <extension id="endecaHome" defaultName="Endeca home page" defaultDescription="Visit the Endeca home page" url="http://www.endeca.com" role="admin"> <names> <name locale="en">The Endeca Web site</name> <name locale="fr">La page d'accueil d'Endeca</name> </names> <descriptions> <description locale="en">Link to the Endeca Web site</description> <description locale="fr">Lien vers la page Web dEndeca</description> </descriptions> </extension> </extensions>

Web Studio checks for a name and description that matches the locale defined in the current installation of Web Studio. If no matching localized name or description is found, the defaultName and defaultDescription values are used.

Enabling extensions in Web Studio


To enable extensions in Web Studio:
1 2 3 Stop the Endeca HTTP service. Navigate to %ENDECA_CONF%\conf (on Windows) or $ENDECA_CONF/conf (on UNIX). Open ws-extensions.xml in a text editor and add or modify extensions as necessary (see Configuration of extensions in Web Studio on page 77). Note: To enable or disable links to your extensions in the navigation menu and the launch page, see Updating the Web Studio menu and launch page on page 76. 4 5 Save and close the file. Start the Endeca HTTP service.

Administrators Guide Chapter 5: Customizing Web Studio

Endeca Confidential

81

URL tokens and Web Studio extensions


Web Studio can pass information to an extension through URL tokens in order to enable the extension to authenticate users, connect to the EAC Central Server, and maintain its state if a user navigates away from the extension and back again during the same session. The following tokens are available to pass to extensions: Token ID
${AUTH}

Token description
An MD5 hash value used to authenticate users coming from Web Studio. For more information on the authentication token, see Token-based authentication for Web Studio extensions on page 82. The name of the application that the Web Studio user is logged in to. The host running the EAC Central Server to which Web Studio is currently connected. The port on the EAC host through which Web Studio and the EAC Central Server communicate. The id of the extension as defined in ws-extensions.xml. The locale of Web Studio; this is the value of the com.endeca.webstudio.locale property in %ENDECA_CONF%\conf\webstudio.properties. The time, in milliseconds since 00:00:00 UTC January 1, 1970, when the user navigates to the extension. The username of the Web Studio user accessing the extension. The id of the users current Web Studio session. The extension can use this in combination with the ${USERNAME} token to maintain the state of the extension throughout a single Web Studio session, for instance by storing the information in a cookie.

${EAC_APP}

${EAC_HOST}

${EAC_PORT}

${EXTENSION_ID} ${LOCALE}

${TS}

${USERNAME}

${WEBSTUDIO_SESSIONID}

Endeca Confidential

Web Studio extensions

82

You use these tokens by specifying them in the url attribute of the extension definition in %ENDECA_CONF%\conf\ws-extensions.xml. The name of the URL parameter does not have to match the id of the token as listed in the preceding table. For example, the following extension definition creates a URL that passes the EAC host, port, and application to the extension:
<?xml version="1.0" encoding="UTF-8"?> <extensions xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="extensions.xsd"> <extension id="testExtension" defaultName="Test Extension" defaultDescription="Demonstrates extensions with tokens." url="http://www.example.com:8989/TestExtension/index.jsp?eac-host= ${EAC_HOST}&amp;eac-port=${EAC_PORT}&amp;eac-app=${EAC_APP}" </extension> </extensions>

Note the use of the &amp; entity in the url attribute in place of the ampersand in the URL. In general, you should ensure that the ws-extensions.xml file validates against the provided schema before updating Web Studio with the new configuration.

Token-based authentication for Web Studio extensions


You can enable extensions to authenticate users coming from Web Studio by including an authentication token in the URL. Web Studio calculates the value of the token by generating an MD5 hash from a portion of the URL and a shared secret. The portion of the URL that is used for the hash consists of everything after the host name and port, including the leading slash, but excluding the value of the AUTH token itself. The shared secret is a string that is specified in ws-extensions.xml and is also stored in the extension itself. For example, the following ws-extensions.xml file defines an extension with a URL that uses the AUTH and TS tokens:

Administrators Guide Chapter 5: Customizing Web Studio

Endeca Confidential

83

<?xml version="1.0" encoding="UTF-8"?> <extensions xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="extensions.xsd"> <extension id="authExtension" defaultName="Authenticated Extension" defaultDescription="Demonstrates token-based authentication." url="http://localhost:8080/AuthExtension/index.jsp?timestamp=${TS}&amp; auth=${AUTH}" role="admin" sharedSecret="secret!@#$%^*(987654321" /> </extensions>

In this case, the value of the authentication token is the hash of a string that looks similar to this:
/AuthExtension/index.jsp?timestamp=1189702462936&auth=secret!@#$%^*(987654321

The extension can verify that a user is coming from Web Studio by calculating the hash of the same string and comparing the result to the value of the AUTH token. This ensures that the user visiting the extension has logged in to Web Studio and has the role (if any) that is required to access the extension. Because the AUTH token is based in part on the URL, it is recommended that you include the time stamp of the request to introduce some variation in the value of the token. The time stamp can also be used to filter out stale requests and limit the possibility of an eavesdropper reusing the same URL to gain access to the extension. The following Java code shows how the extension defined in the preceding example can authenticate users from Web Studio:

Endeca Confidential

Web Studio extensions

84

// These values depend on what you defined in ws-extensions.xml String extensionSecret="secret!@#$%^*(987654321"; final String authTokenParameterName = "auth"; final String timeStampParameterName = "timestamp"; // Set the tolerance, in milliseconds, before a request is considered too old int allowedTimeStampSlackInMS = 5 * 60 * 1000; // Calculate the hash of the substring of the URL and the shared secret String url = request.getRequestURI() + "?" + request.getQueryString(); String findAuthToken = "&" + authTokenParameterName + "="; url = url.substring(0, url.indexOf(findAuthToken) + findAuthToken.length()); String authCode = request.getParameter(authTokenParameterName); MessageDigest md = MessageDigest.getInstance("MD5"); byte[] md5Hash = md.digest((url + extensionSecret).getBytes("UTF-8")); StringBuffer hashCode = new StringBuffer(); for(int i : md5Hash) { String str = Integer.toHexString(i+128); if (str.length() < 2) { str = "0" + str; } hashCode.append(str); } // Compare the hash to the value of the AUTH token if (!hashCode.toString().equals(authCode)) { // Authentication fails because AUTH token did not match } // Compare the time stamp of the request to the current time stamp long currentTime = new Date().getTime(); long ts = Long.parseLong(request.getParameter(timeStampParameterName)); if ( Math.abs(ts - currentTime) > allowedTimeStampSlackInMS) { // Authentication fails because request is too old }

The example extension places the AUTH token at the end of the URL, making it more convenient to build the substring of the URL for the hash.

Administrators Guide Chapter 5: Customizing Web Studio

Endeca Confidential

85

However, the AUTH token can be in any position in the URL. For instance, the URL can be defined in ws-extensions.xml is as follows:
url="http://localhost:8080/AuthExtension/index.jsp?auth=${AUTH}&amp; timestamp=${TS}"

This would result in a URL similar to this:


http://localhost:8080/AuthExtension/index.jsp?auth=dc40570f2e7111fbe1af820a85 4ca817&timestamp=1189702462936

The value of the authentication token would be the hash of a string similar to this:
/AuthExtension/index.jsp?auth=&timestamp=1189702462936secret!@#$%^*(987654321

In this case the code in the extension to remove the value of the authentication token from the URL would be more complex.

Theming extensions to match Web Studio


Web Studio provides a public cascading style sheet that includes the most common style elements in Web Studio. You can use the style sheet in your extension to give it a look and feel similar to that of the Web Studio interface.

To use the Web Studio public cascading style sheet:


Add the following line within the head element of your HTML document:
<link rel="stylesheet" type="text/css" href="http://hostname:8888/stylesheets/public.css"/>

The host name is the name or IP address of the Web Studio server. Replace 8888 with the Web Studio port if it is not running on the default port. For more information about the styles defined in the public style sheet, see the comments within the public.css file. The file can be viewed at the following URL on the Web Studio server:
http://hostname:port/stylesheets/public.css

The public.css file cannot be edited. If you want to specify additional styles or modify the default styles, create a separate style sheet and apply it to your application.

Endeca Confidential

Web Studio extensions

86

Troubleshooting Web Studio extensions


The following sections provide troubleshooting information about Web Studio extensions.

If the extension does not have a link in the navigation menu or launch page:
Stop and restart the Endeca HTTP service. Changes to the XML configuration files for extensions, roles, and the navigation menu do not go into effect until the service is restarted. Ensure that you have the required Web Studio user role to access the extension. Ensure that a menu item for the extension is specified in ws-mainMenu.xml and that the id attribute matches the id of the extension as defined in ws-extensions.xml. Defining an extension in ws-extensions.xml does not automatically add a link to the navigation menu in Web Studio. For more information about customizing the Web Studio menu, see Updating the Web Studio menu and launch page on page 76. If you want an extension to have an entry on the launch page, specify onLaunchPage="true" in the menuitem element for the extension in
ws-mainMenu.xml.

If you have no applications defined in Web Studio, the only links that display in the navigation menu are for the EAC Admin Console and EAC Settings. To enable display of the full Web Studio menu, you must first provision an application.

If the link displays in the menu but the extension does not display when you click the link:
Ensure that the URL for the extension specified in ws-extensions.xml is a valid HTTP or HTTPS URL. A Web Studio extension must be a Web application running in a Web server.

If the Web Studio window does not display at all after updating ws-extensions.xml:
There may be a problem with your XML configuration files that prevents Web Studio from starting up. The error messages in the Endeca HTTP

Administrators Guide Chapter 5: Customizing Web Studio

Endeca Confidential

87

service logs (located at %ENDECA_CONF%\logs\catalina.date.log on Windows, or $ENDECA_CONF/logs/catalina.date.log on UNIX) can help you identify whether one of the following is the case:

One or more of the XML configuration files is missing. The following files must be present in %ENDECA_CONF%\conf (on Windows) or $ENDECA_CONF/conf (on UNIX):

ws-extensions.xml and its associated schema, extensions.xsd ws-mainMenu.xml and its associated schema, mainMenu.xsd ws-roles.xml and its associated schema, roles.xsd

The files are created in this location when you install Endeca. By default, the ws-extensions.xml and ws-roles.xml files define no extensions or additional roles. The ws-mainMenu.xml file controls the display of the navigation menu and launch page. If you have deleted one of these files, you can restore the default file by copying it from %ENDECA_ROOT%\workspace_template\conf (on Windows) or $ENDECA_ROOT/workspace_template/conf (on UNIX).

One or more of the configuration files contains badly formed or invalid XML. Ensure that the configuration files contain well-formed XML. In particular, check that any ampersand that is used within an attribute value is specified as the &amp; entity. Use an XML tool to validate any configuration files that you have edited against the associated schema in %ENDECA_CONF%\conf (on Windows) and $ENDECA_CONF/conf (on UNIX).

Endeca Confidential

Web Studio extensions

88

Administrators Guide Chapter 5: Customizing Web Studio

Endeca Confidential

Chapter 6

Setting Up the Preview Application for Web Studio


This chapter describes how to set up a custom Endeca application so that it functions as the Web Studios preview application. It includes the following sections:

Preview application overview Preview application requirements Instrumenting your application Configuring the preview application

90

Preview application overview


The preview application is the end-user application that displays in the bottom frame of the Rule Manager page in Web Studio. Business users search and navigate to specific locations in the preview application that then become the basis for configured dynamic business rules. It is important to remember that the only purpose of the preview application is to present the data that you are changing via the Web Studio. It is not necessary for the preview application to be an exact representation of your final front-end application, as long as it is using the correct data. The business logic that is built into Web Studio is not tied to the physical representation of the front-end application. It is good practice, however, to make sure that your preview application represents your final application closely enough so that business users know if their changes are correct. By default, Web Studio is configured to use a copy of the JSP reference implementation as the preview application. This chapter describes how to set up your own custom application to be the preview application. Web Studio communicates with the preview application via settings you specify on the Preview App Settings page. The URL Mapping subsection lets you change the default preview application to your own custom preview application. Note: The JSP reference implementation that is used as the preview application for Web Studio is stored in
$ENDECA_ROOT/tools/server/webapps/endeca_jspref (%ENDECA_ROOT%\tools\server\webapps\endeca_jspref on

Windows). Do not confuse this with the regular JSP reference implementation in $ENDECA_REFERENCE_DIR/endeca_jspref (%ENDECA_REFERENCE_DIR%\endeca_jspref. on Windows).

Preview application requirements


In order to use a custom Endeca application as your Web Studio preview application, the custom application must meet the following requirements.

Administrators Guide Chapter 6: Setting Up the Preview Application for Web Studio

Endeca Confidential

91

Domain
The preview application and Web Studio must reside in the same domain (for example, endeca.com).

Javascript domain
If Web Studio and your custom application do not reside on the same host, you must declare the Javascript domain in two locations inside the custom applications code:

Navigation results page (the page that shows the set of records that correspond to a users query). Record page (the page that displays information about a single record).

Web Studio communicates with and controls the preview application via Javascript. As a result, both Web Studio and the preview application must have the same Javascript domain property. The domain property provides security for scripts that run in different browser windows but need to communicate with one another. When you enter the Javascript domain, you can also include the port number of the application server. This will ensure that you are referring to the exact host machine and port number. For example, if the custom application is on an application server running on port 8080, you can enter the Javascript domain as:
10.0.0.61:8080

or
web004:8080

The first format uses the host machines IP address, while the second uses the machine name. IMPORTANT: In addition, Web Studios Configuration page provides a field where you must enter this information. This is analogous to declaring the domain in your Javascript headers.

Embedded hidden form


You must embed small hidden HTML forms on the preview applications navigation results and record pages. The Application Instrumentation Library offers convenient methods to do this. See Instrumenting your application on page 92 for more information.

Endeca Confidential

Preview application requirements

92

No frames
The preview application must not use frames, because they are likely to collide with the frames of Web Studio itself.

URL-based state
The preview application must use URLs to handle navigation and search requests, as opposed to a hidden cookie or session state. The URLs should allow the substitution of search terms and navigation components. See Using pre-existing applications on page 95 for more information.

Cookie name
Web Studio uses cookies to maintain a users session. The name of the session cookie used by Web Studio is ESESSIONID. In rare cases it is possible for the cookie name to collide with a cookie of the same name on the same application server. This conflict can occur if you are running your application on an application server on the same host as Web Studio and using ESESSIONID for two purposes. In this situation, a user may have their session unexpectedly terminated. To resolve this issue, you can either run the application on another host (that is, a host other than the one Web Studio is on), or customize your application server to use a different cookie name (other than ESESSIONID) through custom directives on the specific application server. Note: If your application does not meet the above requirements, Endeca recommends that you use the default JSP reference implementation available in Web Studio.

Instrumenting your application


To use a custom application as the preview application in Web Studio, you must embed small, hidden HTML forms in two places within the preview application pages:

Navigation results page (the page that shows the set of records that correspond to a users query). Record page (the page that displays information about a single record).

Administrators Guide Chapter 6: Setting Up the Preview Application for Web Studio

Endeca Confidential

93

Endeca provides an Application Instrumentation Library with convenient methods to do this. The Application Instrumentation Library is a simple library, consisting of two functions, one for the navigation results page and one for the record page. A version is provided for each supported languageJava, .NET and COM. Note: The COM API is deprecated in version 5.1, and will be removed in a future version of the Endeca Information Access Platform. Therefore, if you are beginning a new project, it is recommended that you use the Java API or the .NET API.

Instrumenting the navigation results page


You use the htmlInstrumentNavigation() function to instrument the navigation results page. The following is a Java example of this function:
<% ETInstrumentor eti = new ETInstrumentor(); %> <%= eti.htmlInstrumentNavigation(nav) %>

where nav is the Navigation object for the page. The code above produces an HTML form that looks similar to this example:
<form name="eti-navigation"> <input type="hidden" name="nav" value="0"> <input type="hidden" name="srchTerms" value=""> <input type="hidden" name="srchKey" value="Wine Types"> </form>

The following snippets show equivalents to the Java code above.

COM/ASP
dim eti set eti = Server.CreateObject("Endeca.ETInstrumentor") eti.htmlInstrumentNavigation(nav)

ASP .NET
ETInstrumentor eti = new ETInstrumentor(); eti.htmlInstrumentNavigation(nav);

Endeca Confidential

Instrumenting your application

94

Instrumenting the record page


You use the htmlInstrumentRecord() function to instrument the record page. The following is a Java example of this function:
<% ETInstrumentor eti = new ETInstrumentor(); eti.htmlInstrumentRecord(rec, "NameProp", "UniqueProp") %>

where rec is the ID of the Endeca record displayed on the page, NameProp is the name of the property that represents the records name, and UniqueProp is the name of the property that uniquely identifies the record. The code above produces an HTML form that looks similar to this example:
<form name="eti-record"> <input type="hidden" name="displayName" value="Mustilli, Non-Vintage"> <input type="hidden" name="recordSpecKey" value="WineID"> <input type="hidden" name="recordSpecValue" value="1"> </form>

The following snippets show equivalents to the Java code above.

COM/ASP
dim eti set eti = Server.CreateObject("Endeca.ETInstrumentor") eti.htmlInstrumentRecord(rec, "NameProp", "UniqueProp")

ASP .NET
ETInstrumentor eti = new ETInstrumentor(); eti.htmlInstrumentRecord(rec, "NameProp", UniqueProp");

Configuring the preview application


After instrumenting your custom application, you must provide URL mappings on the Preview App Settings page of Endeca Web Studio. For the procedure on adding URL mappings on the Preview App Settings page, see the Endeca Web Studio Help.

Administrators Guide Chapter 6: Setting Up the Preview Application for Web Studio

Endeca Confidential

95

Using pre-existing applications


If you are using a pre-existing application that uses parameters other than the standard Endeca parameters (N, Ntk, Ntt, Nmpt, Nmrf, and R) as your preview application, you can still map the URLs. There are two requirements:

The URLs must contain parameters that map to navigation, search key, and search term parameters. The navigation, search key, search term parameters, record ID, preview time, and rule filter parameters must use the same encoding as the standard Endeca N, Ntk, Ntt, Nmpt, and Nmrf parameters, respectively.

Enabling and disabling the display of the preview application


By default, the URL mappings are filled in with URLs for the preview application of the JSP reference implementation. This enables Web Studio to display the preview application for the JSP reference implementation. If you clear out the default URL settings, the preview application does not display, and the preview-related options, such as Show in Preview, do not appear in the Rule List page of the Rule Manager, in Web Studio. If the display of the preview application is disabled because you previously removed the settings for the URL mappings, you may enable it again. To enable the display of the preview application, you can use either of the two options:

Enter the URLs for the preview application of the reference implementation (these URLs originally were filled in as default settings), or Enter the URL settings for your own application.

For information on the default URL settings used for the JSP reference implementation, see the Endeca Web Studio Help.

Endeca Confidential

Configuring the preview application

96

Administrators Guide Chapter 6: Setting Up the Preview Application for Web Studio

Endeca Confidential

Chapter 7

Configuring Logging and Reporting


This chapter describes how to configure and run the Log Server and the Report Generator. It includes the following sections:

About logging and reporting Implementing logging and reporting in Web Studio Viewing reports in Web Studio Additional report generation tasks

98

About logging and reporting


The Endeca logging and reporting system has two primary components: the Log Server and the Report Generator. The Endeca Log Server is a stand-alone server that translates application logging requests into flat log files. These files are later streamed through the Report Generator and transformed into configurable reports. The Report Generator reads the log files created by the Log Server and creates XML-based or HTML-based reports that can be displayed on the View Reports page of Web Studio. These reports allow you to look at what has happened on your Web site on a daily or weekly basis. The reports answer questions like these:

How much search and navigation traffic is my site getting? How are visitors searching and browsing the site? What conversion rates are occurring as a result of searching, navigating, and reacting to merchandising or content spotlighting? What are the most popular search terms and navigation requests? How effective are their searching and browsing techniques?

For information on the logging and reporting system architecture, API, and customization, see the Endeca Developers Guide.

Before you begin


Before you implement logging and reporting, note the following requirements:

In addition to running on the Web Studio server, an EAC Agent must be installed on any host where you will be running the Log Server and Report Generator components. Report generation depends on the information collected in the log files. To enable logging, your development team should implement logging API calls to your application modules. For information on the logging API, see the Endeca Developers Guide.

Administrators Guide Chapter 7: Configuring Logging and Reporting

Endeca Confidential

99

About the Log Server


The Log Server logs information about your Endeca application to a log file. This file is later consumed by the Report Generator. As soon as the Log Server starts, it attempts to open a log file and write a header and timestamp. If this fails, it exits immediately, without accepting any requests. The Log Server begins a new file when you issue the roll command, or automatically if the current file becomes larger than 1 GB. The log file name is a combination of the current date and time and the log file prefix that you specify, in the format prefix.timestamp. The timestamp indicates when the particular file was started and makes it possible to distinguish among multiple log files. When you provision a Log Server, you specify a path and prefix for its output. For details about running the Log Server in Web Studio, see page 101.

Monitoring and rolling the Log Server from a URL


Although you generally communicate with the Log Server through Web Studio, it is also possible to contact the Log Server directly to perform the following administrative tasks:

Monitoring the Log Server: To check that the Log Server is running, issue the following URL:
http://LogServerNameorIP:LogServerPortNumber/stats

If the Log Server is running, this URL returns a confirmation message containing the file name, number of log entries, and number of errors. If it is not running, you will see your browsers default error message.

Rolling the log file: To roll the Log Servers log file, issue the following URL:
http://LogServerNameorIP:LogServerPortNumber/roll

About the Report Generator


The Report Generator transforms the log files created by the Log Server into easy-to-read reports. These can be viewed in Web Studio or elsewhere. For details about running the Report Generator, see page 103.

Endeca Confidential

About logging and reporting

100

Report details
Reports help you make informed decisions about how your application is being used. For example, you can obtain information about searches that do not return desired results. By analyzing these searches, you can determine what aspects of your Endeca IAP implementation may require changes. Report generation has the following characteristics:

You can enable both daily and weekly reports. If you are using Web Studio, daily reports start at 12 am and finish at 11:59:59 pm. Weekly reports start at 12 am on the day that you specify and finish at 11:59:59 pm on the day that ends a week. For example, if you select Monday as the day to start your weekly report, your report runs from 12 am on Monday until 11:59:59 pm on the following Sunday. The Endeca software saves generated reports to the EAC directory /workspace/reports/<application_name> on UNIX and \workspace\reports\<application_name> on Windows. You cannot specify an alternate reports directory.

It is also possible to customize the contents of your reports. For details, see the Endeca Developers Guide.

Implementing logging and reporting in Web Studio


The following sections discuss how you configure and run logging and reporting in Web Studio. Web Studio is a client of the Endeca Application Controller, which provides process coordination and control. You can implement and control the Log Server and Report Generator in Web Studio in two ways:

You can configure and run each component and step individually. This method is discussed on page 101. You can automate the end-to-end logging and reporting process by using the report generation script. This method is discussed on page 109.

Administrators Guide Chapter 7: Configuring Logging and Reporting

Endeca Confidential

101

In either case, the output of the Log Server must not match up with the input of the Report Generator in terms of provisioning. In other words, the output and input must reside in separate directories. Note: You can also run logging and reporting with other Application Controller clients, such as the eaccmd tool or a custom Web services interface. For information about provisioning the LogServer and ReportGenerator components in an Application Controller provisioning file, see page 178 and page 179, respectively. For eaccmd usage, see Component and script control commands on page 201. For API details, see Endeca Application Controller API Class Reference on page 241.

Configuring and running each component individually


To configure and run logging and reporting step by step in Web Studio, do the following: 1 2 3 Provision and start a Log Server, and provision and start a Report Generator. For details, see page 101. Specify the kind of reports you want to generate. For details, see page 107. View the generated reports. For details, see page 113.

Provisioning and starting the Log Server


The first step in setting up logging and reporting is provisioning and starting the Log Server.

Provisioning the Log Server


Before you can run the Log Server, you must provision it. You can do this in Web Studio, in a provisioning file you run with eaccmd, or programmatically.

To provision a new Log Server in Web Studio:


1 2 Open Web Studio, log in as admin, and go to the EAC Administration > Admin Console page. In the Components tab, expand the Log Server component and click New Log Server.

Endeca Confidential

Implementing logging and reporting in Web Studio

102

Using the setting descriptions below, type in the settings and click Create.

Log Server settings and default or recommended values


Web Studio Setting
Host

Description
The name of the host this component is located on. Working directory for the process that is launched. If it is specified, it must be an absolute path. If any of the other properties of this component contain relative paths, they are interpreted as relative to the working directory. The path to the Log Server log file.

Default/recommended value
n/a

Working Directory

If working-dir is not specified, it defaults to $ENDECA_CONF/work/<app Name>/<componentName> on UNIX, or %ENDECA_CONF%\work\< appName>/<component Name> on Windows. If the Log File is not specified, the default is component working directory plus component name plus .log. Typically the Dgraph component port plus two (such as 8002). Note: The Log Server output directory cannot be the same as the Report Generator input directory, due to file usage contention issues. n/a

Log File

Port

Required. Port on which to run the LogServer. Required. Path and prefix name for the LogServer output. For example, output_prefix = c:\temp\wine generates files that start with wine in c:\temp. Required. Controls the archiving of log files. Possible values are true and false. Specifies the amount of time in seconds that the eaccmd waits while starting the Log Server. If it cannot determine that the Log Server is running in this timeframe, it times out.

Output Prefix

Gzip

Startup Timeout

The default is 60.

Administrators Guide Chapter 7: Configuring Logging and Reporting

Endeca Confidential

103

Web Studio Setting


Custom Properties

Description
An optional list of properties, consisting of a required name and an optional value. For more information, see Adding properties to hosts and components on page 154.

Default/recommended value
n/a

Starting the Log Server directly


You can start, run, and monitor the Log Server using Web Studio. You have to start the Log Server before it can begin logging usage of your Web application. Note: If your baseline update script does not start the Log Server automatically, you have to start it separately.

To start, run, and monitor the Log Server in Web Studio:


1 2 Open Web Studio, log in as admin, and go to the EAC Administration > Admin Console page. In the Components tab, select the Log Server component and click Start. When the Log Server is started, its icon changes to green and its status changes to Running. To monitor the status of the Log Server, click the Running link located next to its name. The screen displays the components status, its start time, and the length of time it has been running.

Provisioning and starting the Report Generator


Once the Log Server is running and logging application information, you can use the Report Generator to transform the information into reports.

Provisioning the Report Generator


Before you can run the Report Generator, you must provision it. You can do this in Web Studio, in a provisioning file you run with eaccmd, or programmatically.

Endeca Confidential

Implementing logging and reporting in Web Studio

104

To provision a new Report Generator in Web Studio:


1 2 3 Open Web Studio, log in as admin, and go to the EAC Administration > Admin Console page. In the Components tab, expand the Report Generator component and click New Report Generator. Using the setting descriptions below, type in the settings and click Create.

Report Generator settings and default or recommended values


Web Studio Setting
Host

Description
The name of the host this component is located on. Working directory for the process that is launched. If it is specified, it must be an absolute path. If any of the other properties of this component contain relative paths, they are interpreted as relative to the working directory. The path to the Report Generator log file.

Default/recommended value
n/a

Working Directory

If working-dir is not specified, it defaults to $ENDECA_CONF/work/<app Name>/<componentName> on UNIX, or %ENDECA_CONF%\work\< appName>/<component Name> on Windows. If the log-file is not specified, the default is component working directory plus component name plus .log. Note: The Log Server output directory cannot be the same as the Report Generator input directory, due to file usage contention issues.

Log File

Input Directory or File

Required. Path to the file or directory containing the logs to report on. If it is a directory, then all log files in that directory are read. If it is a file, then just that file is read.

Administrators Guide Chapter 7: Configuring Logging and Reporting

Endeca Confidential

105

Web Studio Setting


Output File

Description
Required. Name the generated report file and path to where it is stored. For example: C:\Endeca\reports\weekly\ myreport.xml on Windows /endeca/reports/weekly/my report.xml on UNIX

Default/recommended value
Note: If you are running the report generation script provided with the installation, the value of Output File will change with every run of the script, based on the specific output file for the day or week of the run. %ENDECA_CONF%\etc\ tools_report_stylesheet.xsl on Windows $ENDECA_CONF/etc/tools_ report_stylesheet.xsl on UNIX

Stylesheet File

Required. Filename and path of the XSL stylesheet used to format the generated report.

Settings File

Path to the report_settings.xml file.

%ENDECA_CONF%\etc\ report_settings.xml on Windows $ENDECA_CONF/etc/report_ settings.xml on UNIX

Start Date Stop Date

These set the report window to the given date and time. The date format should be either yyyy_mm_dd or yyyy_mm_dd.hh_mm_ss. For example, 2007_01_25.19_30_57 expresses Jan 25, 2007 at 7:30:57 in the evening. Turns on the generation of report charts. Should indicate a JDK 1.5.x or later.

n/a

Charts

Disabled by default.

Java Binary

Defaults to the JDK that Endeca installs.

Endeca Confidential

Implementing logging and reporting in Web Studio

106

Web Studio Setting


Java Options

Description
Command-line options for the java_binary setting. This command is primarily used to adjust the Report Generator memory, which defaults to 1GB and to adjust the language code for reports, which defaults to English. To set the memory, use the following (ignore the linebreak): java_options = -Xmx[MemoryInMb]m -Xms[MemoryInMb]m

Default/recommended value
n/a

Arguments

Command-line flags to pass to the Report Generator, expressed as a set of arg sub-elements.

n/a

Starting the Report Generator directly


Because the Report Generator relies on Log Server output, you must start the Log Server before starting the Report Generator. Note: Starting a Report Generator is typically done separately, and not as part of a baseline update script.

Before you run the Report Generator


Before you run the Report Generator, you have to copy the Log Server output to the Report Generator input directory. Because of file usage contention issues, these two cannot share the same directory.

To start, run, and monitor the Report Generator in Web Studio:


1 2 Open Web Studio, log in as admin, and go to the EAC Administration > Admin Console page. In the Components tab, select the Report Generator component and click Start. When the Report Generator is started, its icon changes to green and its status changes to Running.

Administrators Guide Chapter 7: Configuring Logging and Reporting

Endeca Confidential

107

IMPORTANT: Keep in mind that even though the Report Generator component is in a Running state, it is not necessarily generating reports at that moment. Web Studio runs the Report Generator automatically, just after midnight, once a day or once a week, as specified. 3 To monitor the status of the Report Generator, click the Running link located next to its name. The screen displays the components status, its start time, and the length of time it has been running.

Provisioning the Report Generator to run in French


You can provision the Report Generator with settings that produce reports in French rather than in English (the default).

To provision the Report Generator to run in French:


1 2 Provision the Report Generator in Web Studio as described in Provisioning and starting the Report Generator on page 103. In the Java Options setting, add the following option to specify the language code for French:
-Duser.language = fr

When you are finished provisioning, click Create. The next time you run the Report Generator, it creates reports in French.

Specifying report frequency


You configure report frequency in Web Studio on the Report Generation section of the Application Settings page.

To specify report frequency in Web Studio:


1 2 In Web Studio, go to the Application Settings > Report Generation page. Select a frequency for your reports: a b For reports generated once a day, check Daily Reports. For reports generated once a week, first check Weekly Reports. Then specify the day that the weekly report begins on in the drop-down list.

Note: If you want to generate daily and weekly reports, check both.

Endeca Confidential

Implementing logging and reporting in Web Studio

108

Click OK to save your configuration. If you select the report frequency in Web Studio, then Web Studio will automatically provision a host with the alias of webstudio for you, if one does not exist already. This host contains a directory provisioned with an alias of webstudio-report-dir. This is set to the following directory for report storage: On Windows:
%ENDECA_CONF%\reports\<app name>

On UNIX:
$ENDECA_CONF/reports/<app name>

After the directory mentioned in step 3 has been created, manually add daily and/or weekly sub-directories. These sub-directories are where Web Studio will look for reports to display. Note: The report generation script (described on page 109) creates these sub-directories for you.

Automatically scheduling report generation


If you specified the report frequency in Web Studio, Web Studio runs provisioned scripts named DailyReports, for the daily reports, or WeeklyReports, for weekly reports, automatically, just after midnight, once a day or once a week, as specified. (More information can be found in About the report generation script on page 109.) Keep in mind the following:

If you did not already provision either or both of these scripts before you set the report frequency in Web Studio, Web Studio provisions them for you automatically, using the correct aliases. If you change the alias name of these scripts or remove them, Web Studio does not automatically run anything.

If you are not using an EAC script to control the report generation process, you may want to automate the process using the Scheduled Tasks control panel on Windows or crontab task scheduler on UNIX. See your operating system documentation for details about automated scheduling.

Administrators Guide Chapter 7: Configuring Logging and Reporting

Endeca Confidential

109

After the Report Generator completes its run


After the Report Generator completes its run, you typically have to move the report files from the Report Generator host to the Web Studio server.

Daily report files must be copied to the following subdirectory on the Web Studio server: On Windows:
%ENDECA_CONF%\workspace\reports\<app name>\daily

On UNIX:
$ENDECA_CONF/workspace/reports/<app name>/daily

Weekly report files must be copied to the following subdirectory on the Web Studio server: On Windows:
%ENDECA_CONF%\workspace\reports\<app name>\weekly

On UNIX:
$ENDECA_CONF/workspace/reports/<app name>/weekly

For information on viewing reports in Web Studio, see page 113.

Running the report generation script


The report generation script automates much of the logging and reporting process for you. This section describes the script in detail and then explains how to use it.

About the report generation script


The report generation script is an EAC script you can use to drive and automate the entire process of report generation. The report generation script works as follows:

It looks at the list of files in the Log Server and determines which of these files are relevant to the report that you requested, such as daily or weekly. If a frequency of daily is specified on the command line, logs from the previous day are requested. If a frequency of weekly is

Endeca Confidential

Implementing logging and reporting in Web Studio

110

specified, logs from the seven days previous to the time the script is run are requested. Optionally, it can also tell the Log Server to roll its current log and start a new one. This is useful if you want to control the size of a log file or keep it within the requested date range.

It moves the relevant files from the Log Server host to the Report Generator host and instructs the Report Generator to generate a report. For weekly reports, it passes the Report Generator the exact dates of the seven days ending yesterday. It moves the report files from the Report Generator host to the Web Studio host. It instructs the Log Server to delete all files over 30 days old.

If you want to run both daily and weekly reports, add a separate script for each time range. A version of the report generation script is included with the Endeca software, and is stored in %ENDECA_ROOT%\bin\generate-report.bat for Windows ($ENDECA_ROOT/bin/generate-report.sh for UNIX). You can copy and modify this script as needed. For information, see Editing the report generation script on page 111.

Notes:
The report generation script, generate-report.bat or generate-report.sh, overrides the options specified for the Log Server and Report Generator components in Web Studio. For instance, it ignores previously-set options for Start Date and Stop Date. The supplied script for report generation does not support multi-platform scenarios (although multi-machine scenarios are supported). If you want to perform multi-platform report generation, you need to update the script or provide your own. The Log Servers Output Prefix should not be set to write to the same folder as the Report Generators Input File or Directory. If these two components are set to write to the same directory, you will receive an error.

Administrators Guide Chapter 7: Configuring Logging and Reporting

Endeca Confidential

111

Editing the report generation script


You can modify the existing report generation script to suit your own needs, by copying and editing its source files. The script is written in Java. Note the following details:

The script source tree is installed as part of the Endeca reference implementation, and can be found in %ENDECA_REFERENCE%\eac_scripts on Windows, or $ENDECA_REFERENCE/eac_scripts on UNIX. The executable files for the script are stored in the %ENDECA_ROOT%\bin (Windows) or $ENDECA_ROOT/bin (UNIX); they depend on the eacscript.jar file in %ENDECA_ROOT%\lib\java (Windows) or $ENDECA_ROOT/lib/java (UNIX).

You can generate your own version of the eacscript.jar file by modifying the source files in the reference implementation.

High-level workflow for using the report generation script


In order to run the report generation script for a given application, follow these steps: 1 Do the necessary setup. This includes provisioning the host, components, and scripts, and starting the Log Server. For details, see page 111. Start the script. For details, see page 113. View the generated reports. For details, see page 113.

2 3

Doing the necessary setup


The report generation script does not include the provisioning step. Therefore, you must provision the following items in the EAC Administration Console in the order given: 1 Add a host that points to a machine on which Web Studio is running, by selecting the Hosts tab and completing the fields as follows: a b For New Host Alias, specify webstudio. (The script depends on this alias, and fails if the host name is specified incorrectly.) For Host Name, specify the DNS name. Because you are typically working across machines, its best to avoid using localhost.

Endeca Confidential

Implementing logging and reporting in Web Studio

112

c d

For Agent Port, specify the port the agent is using, such as 8888. For Custom Directories, specify webstudio-report-dir as
C:\Endeca\MDEXEngine\workspace\reports\<app name> on Windows, or /usr/local/endeca/workspace/reports/<app name> on UNIX (assuming you installed to /usr/local). This is the directory where Web Studio will look for report files, so it must match.

e 2 3 4

Click Create Host.

Configure and start the Log Server. For instructions, see page 101. Configure the Report Generator but do not start it. For instructions, see page 103. Add a report generation script by selecting the Scripts tab and completing the information as follows. (If you want to generate daily and weekly reports, add two separate scripts, one for each.) a For New Script Alias, specify the name of the script. For a daily report, specify DailyReports; for a weekly report, specify WeeklyReports. For Command, for a daily report script, specify: On Windows:
%ENDECA_ROOT%\bin\generate-report.bat daily

On UNIX:
$ENDECA_ROOT/bin/generate-report.sh daily

For a weekly report script, specify: On Windows:


%ENDECA_ROOT%\bin\generate-report.bat weekly

ON UNIX:
$ENDECA_ROOT/bin/generate-report.sh weekly

Click Create Script.

Note: If you check off the boxes in the Application Settings > Report Generation section, this step will be done for you.

Administrators Guide Chapter 7: Configuring Logging and Reporting

Endeca Confidential

113

Starting the report generation script


Once the prerequisites are in place, you can start the report generation script.

To start the report generation script in Web Studio:


1 2 Go to the EAC Administration > Admin Console page and select the Scripts tab. Select the application and the previously provisioned report generation script that you want to start, such as daily-report-script or weekly-report-script. Click Start. Generated reports are output to the webstudio-report-dir directory.

Viewing reports in Web Studio


Typically, you use Web Studio to view the XML reports you have generated. Note: As long as you followed the filename and location requirements in the previous steps, you will be able to see your reports in Web Studio.

To view reports in Web Studio:


1 2 On a Windows machine, open Internet Explorer. In the Address box, enter the following URL:
http://WebStudioHost:8888

Note: If you specified a different Endeca HTTP service port, use that instead. 3 4 5 At the Web Studio login page, click Log In. In the Enter Network Password dialog, enter the user name and password and click OK. In the navigation menu, click View Reports.

The default report is the current daily report, which is the same as the Current (daily) link on the navigation menu (in the left pane of the View Reports page). The navigation menu also lets you view:

Endeca Confidential

Viewing reports in Web Studio

114

Current weekly report, via the Current (weekly) link List of archived daily reports, via the Daily link List of archived weekly reports, via the Weekly link

For example, click the Weekly link to display a list of weekly reports that looks like this:

From the list of weekly reports, click a specific report to display its contents. Note: For more information on using the View Reports page, see the Endeca Web Studio Help.

Additional report generation tasks


The rest of this chapter contains the following tasks:

Configuring report contents and format Customizing the report generation file Generating HTML reports

Administrators Guide Chapter 7: Configuring Logging and Reporting

Endeca Confidential

115

Viewing reports produced by other Report Generators Archiving and deleting log files and reports

Configuring report contents and format


You can customize the content of a report in either Web Studio or using the report settings XML file. These customization tasks are a part of implementing logging and reporting, and are described in the Endeca Developers Guide.

Customizing the report generation file


When generating reports for viewing on the View Reports page, Web Studio uses the report_settings.xml file as the display configuration file. This file is located on the Report Generator server in the \workspace\etc directory on Windows (/workspace/etc on UNIX). IMPORTANT: Because the Endeca Application Controller specifically looks for this name and path, you cannot change the name or location of the report configuration file. You can customize the context and appearance of your reports by modifying the report elements in the report_settings.xml file. However, both daily and weekly reports must use the same configuration file. These tasks are typically part of implementing logging and reporting and are described in the Endeca Developers Guide.

Generating HTML reports


Web Studio requires XML reports, but it is also possible to generate HTML reports.

Generating HTML reports


To generate an HTML report that can be viewed in a browser, specify report_stylesheet.xsl as your stylesheet. This file is stored on the Report Generator server in %ENDECA_CONF%\etc on Windows ($ENDECA_CONF/etc on UNIX).

Endeca Confidential

Additional report generation tasks

116

Viewing HTML reports


To display an HTML report, open it in any Web browser.

Viewing reports produced by other Report Generators


Typically, you use the View Reports page to read reports that were produced by the Report Generator that is running for the specific application you are accessing in Web Studio. However, you can also view reports that were generated by another client to the Endeca Application Controller, as long as its ReportGenerator component uses the tools_report_stylesheet.xsl stylesheet. To view these reports, copy the report XML files to the appropriate daily or weekly subdirectory in the reports directory on the Web Studio server.

Archiving and deleting log files and reports


Endeca recommends that you do the following for the existing log files and reports:

Archive your log files on a weekly basis. When the Report Generator processes reporting information, it processes all log files contained in the logs directory you specified and any of its subdirectories. This processing has performance implications as the size of your log data grows. To minimize log processing time, Endeca recommends that you archive your log files on a weekly basis to a directory that is not under the logs directory.

Delete outdated log files. The EAC script for report generation retains 30 days of log files, in case a report does not generate properly. More specifically, after a report has been successfully generated, any log files that are more than 30 days older than the start of the reports time period are deleted. By extension, if a report is not successfully generated, no log files are deleted, ensuring that no data is lost. If you are not using the report generation script, you need to purge log files manually.

Administrators Guide Chapter 7: Configuring Logging and Reporting

Endeca Confidential

117

Delete outdated reports. Reports are never deleted by the Endeca Application Controller or Web Studio. Therefore, it is the administrators responsibility to check the contents of the reports directory on the Web Studio server periodically and manually delete any obsolete or unwanted reports.

Endeca Confidential

Additional report generation tasks

118

Administrators Guide Chapter 7: Configuring Logging and Reporting

Endeca Confidential

Chapter 8

Configuring the Endeca Standard Application


This chapter describes how to set up and run the Endeca Standard Application in an Endeca Application Controller environment. It also describes how to install the Standard Application on an Apache Jakarta Tomcat or a BEA WebLogic application server. This chapter includes the following sections:

About the Endeca Standard Application Accessing the Standard Application Configuring the Standard Application Installing the Standard Application on Tomcat Installing the Standard Application on WebLogic

120

About the Endeca Standard Application


The Endeca Standard Application is an out-of-the-box front-end application for querying an MDEX Engine. The Standard Application lets users search and navigate a data set very soon after installing the Endeca IAP, without a developer having to build a custom application. It can also be used as a production-ready application suited for certain Endeca deployments where a heavily customized application is not necessary. Some of the technical details of the Standard Application are the following:

It runs under the Endeca Application Controller (when installed as part of the Endeca Application Controller and Web Studio feature). It can be installed to run on WebLogic 5.1 (or later) and Tomcat 3.0 (or later) application servers. It can be used against only one instance of a MDEX Engine. That is, you cannot use your Web browser to point the Standard Application at a different MDEX Engine. If you want to use a different MDEX Engine, you must install a second instance of the Standard Application. You cannot modify the source code, unlike for the JSP, ASP, and ASP.NET reference applications. However, you can change some configuration parameters, as described below on page 123. The Endeca Access Control System (that is, user authentication) and SSL can be configured for the Standard Application when it is running under a Tomcat or WebLogic application server.

All configuration will happen at deployment time, using standard J2EE environment entries in the appropriate configuration file (warname.runtime.xml file for WebLogic or server.xml and endeca_standard.xml for Tomcat).

Administrators Guide Chapter 8: Configuring the Endeca Standard Application

Endeca Confidential

121

Display features
The user interface of the Standard Application is similar to those of the reference implementations, but with some features removed for the sake of simplicity. For example:

The hostname and port number of the MDEX Engine for the Standard Application are pre-configured by the deployer, so that the user does not have to supply them. The UI has a Download as CSV button to allow the user to download all matching records in a CSV (comma separated values) format. (This feature is useful if the user wants to export them to Microsoft Excel, for example.) The display key for each record is Name. That is, your data must have a property called Name that is used as the title of each record. If this property does not exist, the record name will be Record 32874 or a similar display. If the record has a property named URL.External, it is used to create a link URL for the record name. For example, if the property value is the URL of a document, that document will be retrieved when you click on the record name.

Standard Application installation scenarios


The Standard Application is installed as a .WAR file in one or both of these locations, depending on which features you select when you install the Endeca software:

If you select the Endeca Application Controller & Web Studio feature, the WAR is installed in the following directory: On Windows:
%ENDECA_ROOT%\tools\server

On UNIX:
$ENDECA_ROOT/tools/server

This scenario assumes that you will be running the Standard Application under the Endeca Application Controller. If you accept the default configuration settings, the Standard Application is already installed and

Endeca Confidential

About the Endeca Standard Application

122

running on your local machine as soon as the Endeca Application Controller is running. To change the Standard Application configuration, see page 123.

If you select the Endeca Standard Application feature, the WAR is installed in the following directory:
Windows: %ENDECA_ROOT%\applications UNIX: $ENDECA_ROOT/applications

This scenario assumes that you intend to install the Standard Application to run under a Tomcat or BEA WebLogic application server. For details, see Installing the Standard Application on Tomcat on page 126 or Installing the Standard Application on WebLogic on page 133.

If you select both features, you get two copies of the Standard Application WAR, in the directories mentioned above.

Accessing the Standard Application


You can access the Standard Application if both the Endeca Application Controller and the MDEX Engine are running. The following is an example of part of the Standard Application main page.

Administrators Guide Chapter 8: Configuring the Endeca Standard Application

Endeca Confidential

123

To access the Standard Application main page:


Enter the following URL in a web browser:
http://Endeca_Application_Controller_Host:8888/endeca_standard

For example:
http://localhost:8888/endeca_standard/

If you used a different HTTP Connector port when you configured the Endeca Application Controller, substitute that port number for 8888.

Configuring the Standard Application


The configuration for the Standard Application is specified in the following two files:

server.xml which is located in the $ENDECA_CONF/conf directory on UNIX (%ENDECA_CONF%\conf on Windows) endeca_standard.xml which is located in the $ENDECA_CONF/conf/Standalone/localhost directory on UNIX (%ENDECA_CONF%\conf\Standalone\localhost on Windows)

Endeca Confidential

Configuring the Standard Application

124

Note: You cannot configure SSL and user authentication support for the Standard Application when it is running under the Endeca Application Controller. This configuration is available only when the Standard Application is running on a Tomcat or WebLogic application server. The endeca_standard.xml file contains several Context elements. A Context element represents an individual Web application that is running within its parent Host element. The Host element is specified in the server.xml file. The default Context definition for the Standard Application looks like this example:
<!-- Context configuration file for the Endeca Standard Web Application --> <Context path="/endeca_standard" docBase= "C:\Endeca\MDEXEngine\5.1.0\tools\server/standard-webapp-5.1.0.75.war" debug="0" privileged="false"> <Environment type="java.lang.String" name="ene-host" value="DOC-004"/> <Environment type="java.lang.Integer" name="ene-port" value="8000"/> </Context>

The meanings and defaults of the Context attributes, specified in endeca_standard.xml, are listed in the following table. Context attribute
path

Default setting
/endeca_standard

Description
The context path of the application. This is the name that the user enters on the browsers URL address field (as documented in the previous section) to access the main page of the application. Note that this value must be unique among all the Context elements for this Host definition. The path of the WAR that contains the application. Do not change this attribute unless you are moving the WAR to another location on the host.

docBase

UNIX: $ENDECA_ROOT/tool s/server/standardwebapp-version.war Windows: %ENDECA_ROOT%\ tools\server\ standard-app-version .war

Administrators Guide Chapter 8: Configuring the Endeca Standard Application

Endeca Confidential

125

Context attribute
debug

Default setting
0

Description
Sets the verbosity debug level for logging messages. Higher numbers generate more detailed output. Specifies whether this context is allowed to use container servlets. Do not change this attribute. The name of the machine hosting the MDEX Engine.

privileged

false

Environment name="ene-host"

The name of the machine on which the Endeca software is installed. 8000

Environment name="ene-port"

The port on which the MDEX Engine is listening. Change this value only if you are running a MDEX Engine on a port other than the default 8000 port. The title of the application, which is displayed at the top of each page. The type is a java.lang.String value. If the value is empty or the attribute is missing, Endeca will be used as the title. Note that the default configuration does not use this attribute, but you can add it to the Context if you want use your own title.

Environment name="title"

This attribute is not set by default, which means that the default value Endeca is used as the page title.

To change the Standard Application configuration:


1 2 3 4 5 Open the endeca_standard.xml file in a text editor. Find the Context element for the Standard Application, which looks like the example above. Make changes to the appropriate Context attributes. Save and close the endeca_standard.xml file. Restart the Endeca HTTP service. On UNIX: a Stop the Endeca HTTP service using:

Endeca Confidential

Configuring the Standard Application

126

$ENDECA_ROOT/tools/server/bin/shutdown.sh

Restart the Endeca HTTP service using:


$ENDECA_ROOT/tools/server/bin/startup.sh

On Windows: a b c 6 From the Windows Control Panel, select Administrative Tools, and then select Services. In the right pane of the Services window, right-click Endeca HTTP service and choose Restart. Close the Services window.

Access the Standard Application to check that your changes were successfully implemented.

Installing the Standard Application on Tomcat


You can install the Endeca Standard Application .WAR file on an Apache Jakarta Tomcat 3.0 (or greater) application server. You can then use the Standard Application by pointing it at a running MDEX Engine.

To install the Standard Application on a Tomcat server:


1 Copy the Standard Application WAR from the $ENDECA_ROOT/applications directory (%ENDECA_ROOT%\applications on Windows) to the Tomcat webapps directory. Open the Tomcat server.xml in a text editor. Add the Context element for the Standard Application. The Context element must be nested as a sub-element of the Host element. For the meanings of the Context attributes, see the table in the section Configuring the Standard Application on page 123. An example of a Context element is as follows:

2 3

Administrators Guide Chapter 8: Configuring the Endeca Standard Application

Endeca Confidential

127

<!-- Context configuration for the Endeca Standard Web Application --> <Context path="/endeca_standard" docBase="C:\Tomcat 5.5\webapps\standard-webapp-5.1.0.75.war"> <Environment name="ene-host" type="java.lang.String" value="web007"/> <Environment name="ene-port" type="java.lang.Integer" value="8000"/> <Environment name="title" type="java.lang.String" value="Endeca App"/> </Context>

4 5

Save and close the server.xml file. Restart the Tomcat server.

Assuming the above example configuration (both the Tomcat server and the MDEX Engine are running on host web007 and an HTTP Connector on port 8080 is being used), you would access the Standard Application with this URL in your browser:
http://web007:8080/endeca_standard

If you have a running MDEX Engine, you should see the Standard Application main page. Refer to the Tomcat documentation for full details on Tomcat configuration and deployment of WARs.

Enabling SSL for the MDEX Engine


The Standard Application can be configured for SSL when running on a Tomcat application server. If you have enabled SSL for the MDEX Engine, you must also configure the Standard Application so that it uses SSL communications with the MDEX Engine. The general procedure for enabling SSL for the Standard Application is as follows: 1 Enable SSL for the MDEX Engine, either from Web Studio Provisioning System page or in a control script. See the Endeca Security Guide for Java for details. Modify the endeca_standard.xml file to configure SSL for the Standard Application. Create a JKS-format certificate. Import the certificate into the JKS keystore when you start the Tomcat server.

2 3 4

Endeca Confidential

Installing the Standard Application on Tomcat

128

Steps 2 to 4 are described in the following sections. Note: The following sections assume that you have configured the Java JSSE framework on the server, including setting up an SSL HTTP/1.1 connector. Consult the Tomcat documentation for details on the SSL setup procedure.

Adding the SSL environment entry to the endeca_standard.xml file


You must modify the endeca_standard.xml file (on the Tomcat server) to add the ene-ssl-is-enabled Environment entry to the Context element for the Standard Application. The syntax for this entry is:
<Environment type="java.lang.Boolean" name="ene-ssl-is-enabled" value="true"/>

The following example shows the Standard Application Context element with the SSL environment entry:
<!-- Context configuration for the Endeca Standard Web Application --> <Context path="/endeca_standard" docBase="C:\Endeca\MDEXEngine\5.1.0\tools\server/standard-webapp-5.1.0.75.war"> <Environment name="ene-host" type="java.lang.String" value="web007"/> <Environment name="ene-port" type="java.lang.Integer" value="8000"/> <Environment name="title" type="java.lang.String" value="Endeca App"/> <Environment type="java.lang.Boolean" name="ene-ssl-is-enabled" value="true"/> </Context>

Creating a JKS-format certificate


The Tomcat server uses a client certificate in the standard Java KeyStore (JKS) format. You can produce a JKS-format client certificate by converting the eneCert.p12 certificate key that is shipped in the $ENDECA_CONF/etc directory (%ENDECA_CONF%\etc on Windows). To convert the key, use the Endeca-provided endeca_PKCS12ToJKS.jar file, as described the Endeca Security Guide for Java. The next section assumes that eneCert.jks is the name of the resulting JKS-format client certificate and that endeca is its password.

Administrators Guide Chapter 8: Configuring the Endeca Standard Application

Endeca Confidential

129

Starting Tomcat with the JKS certificate


When you start the Tomcat application server, you must specify the location and password of the keystore and truststore files with the following JVM java -D system property command arguments:

-Djavax.net.ssl.keyStore specifies the keystore file. -Djavax.net.ssl.keyStorePassword specifies the password of the

keystore.
-Djavax.net.ssl.trustStore specifies the truststore file to use to

validate client certificates.


-Djavax.net.ssl.trustStorePassword specifies the password to access

the truststore file. One way to provide these values to Tomcat is to use the Tomcat
CATALINA_OPTS environment variable, which provides Java runtime

options when the server is started. You can set the CATALINA_OPTS environment variable in an existing Tomcat startup file (.bat on Windows or .sh on UNIX) or create a wrapper file that sets the variable and then calls the Tomcat startup file. For example, this Windows batch file can be placed in the Tomcat bin directory and used to start the server:
@echo off setlocal set CLIENT_CERT=C:\Endeca\NavigationEngine\workspace\etc\eneCert.jks set CATALINA_OPTS=-Djavax.net.ssl.keyStore=%CLIENT_CERT% -Djavax.net.ssl.keyStorePassword=endeca -Djavax.net.ssl.trustStore=%CLIENT_CERT% -Djavax.net.ssl.trustStorePassword=endeca cd c:\tomcat\bin call c:\tomcat\bin\startup.bat endlocal

The values for the set CATALINA_OPTS command are actually on one line, but are shown as wrapping in the example.

Enabling user authentication for the Standard Application


When running on a Tomcat application server, the Standard Application can be configured to use the Endeca Access Control System. This system can

Endeca Confidential

Installing the Standard Application on Tomcat

130

authenticate a users identity against and obtain authorization information from an LDAP directory or a local username/password file. The authorization information is used to build a user entitlement filter (that is, a security filter), which controls the records that are retrieved from a MDEX Engine query. The general procedure for configuring user authentication for the Standard Application is as follows: 1 2 3 4 5 Configure the Java JAAS framework on the Tomcat application server. Modify the Tomcat server.xml file to configure the Standard Application to use user authentication. Set up the Access Control System login configuration file. Configure the Endeca instance configuration to set access permissions on the Endeca records. Access the Standard Application and log in.

These steps are described in the following sections.

Configuring JAAS on the application server


To use the Endeca Access Control System for user authentication, you need to install the Java Authentication and Authorization Service (JAAS) to function as its framework. Chapter 4 of the Endeca Security Guide for Java contains some helpful instructions for installing the JAAS software. However, for full details, consult the documentation for the JAAS module and for your application server.

Adding the user authentication entry to the server.xml file


You must modify the server.xml file (on the Tomcat server) to add the enable-security-filters Environment entry to the Context element for the Standard Application. The syntax for this entry is:
<Environment type="java.lang.Boolean" name="enable-security-filters" value="true"/>

The following example shows the Standard Application Context element with user authentication enabled:

Administrators Guide Chapter 8: Configuring the Endeca Standard Application

Endeca Confidential

131

<!-- Context configuration for the Endeca Standard Web Application --> <Context path="/endeca_standard" docBase="standard-webapp-5.0.0.war"> <Environment name="ene-host" type="java.lang.String" value="web007"/> <Environment name="ene-port" type="java.lang.Integer" value="8000"/> <Environment name="title" type="java.lang.String" value="Endeca App"/> <Environment type="java.lang.Boolean" name="enable-security-filters" value="true"/> </Context>

Setting up the login configuration file


The Endeca Security Guide for Java contains details instructions on how to set up the configuration for the Access Control System (ACS). This section provides a brief overview of the process. The Standard Application supports both Endeca ACS plug-ins:

LDAPLoginModule for authentication against an LDAP server. FileLoginModule for authentication against a local password file.

When you create the login configuration file, it must have a configuration entry for this login module:
com.endeca.webapp.profind.auth.PassthroughLoginModule required;

The PassthroughLoginModule refers to an internal class in the Standard Application WAR. The following is an example login configuration file that configures the Access Control System to use a local file for authentication.
Endeca { com.endeca.webapp.profind.auth.PassthroughLoginModule required; }; StandardWebApp { com.endeca.navigation.FileLoginModule required passwordFile="C:/Endeca/NavigationEngine/workspace/etc/passwd" checkPasswords="true"; };

In the example, the Standard Application will use the FileLoginModule for authentication against the local password file specified by the passwordFile parameter. The format of the password file is described in Chapter 4 of the Endeca Security Guide for Java. If you want the Standard Application to use an LDAP server for user authentication, use the LDAPLoginModule (instead of the FileLoginModule)

Endeca Confidential

Installing the Standard Application on Tomcat

132

in the StandardWebApp configuration entry. See the Endeca Security Guide for Java for an example of an LDAPLoginModule configuration. After you set up the login configuration file, you must specify its location to the Tomcat application server via the java.security.auth.login.config property. One method of setting this property is to edit the JAVA_HOME/jre/lib/security/java.security file and add the name of the login configuration file, as in this Windows example:
# Default login configuration file login.config.url.1=file:C:/EndecaProjects/SSL/Login.conf

Please consult your application server documentation for full details on how to set this property.

Configuring record permissions


Chapter 4 of the Endeca Security Guide for Java explains how to set up your instance configuration so that it uses the Endeca.ACL.Allow.Read property to tag the Endeca records with the proper access permissions. You can use either a Content Acquisition System (CAS) pipeline or an Access Rules component.

Logging in with the Standard Application


After you have run a baseline update and started the Tomcat application server, users can access the Standard Application. A login page will prompt for a user name and password, as in this example.

Administrators Guide Chapter 8: Configuring the Endeca Standard Application

Endeca Confidential

133

The user name and password are then authenticated against the LDAP server or the password file, depending on the login configuration. After authentication, the MDEX Engine will construct a user entitlement filter (based on the users group information) and return only the records that the user is authorized to see.

Installing the Standard Application on WebLogic


You can deploy the Standard Application as a J2EE client application on a BEA WebLogic 5.1 (or greater) server. Refer to the WebLogic documentation for details on adding the configuration values to the WebLogic-specific runtime deployment descriptor (warname.runtime.xml) and deploying the application. To configure the Standard Application to use SSL and/or user authentication, use the relevant information in the previous section, as well as the WebLogic documentation.

Endeca Confidential

Installing the Standard Application on WebLogic

134

Administrators Guide Chapter 8: Configuring the Endeca Standard Application

Endeca Confidential

SECTION II
Administering Application Controller Environments

136

Administrators Guide

Endeca Confidential

Chapter 9

About the Endeca Application Controller


This chapter introduces the Endeca Application Controller. This chapter contains the following sections:

About the Endeca Application Controller Architecture of the Application Controller

138

About the Endeca Application Controller


The Endeca Application Controller (EAC) is the interface you use to control, manage, and monitor your Endeca implementations:

It provides the infrastructure to support Endeca projects from design through deployment and runtime. It replaces the Control Interpreter (deprecated in the 5.0 release), while leaving the Endeca tools (Developer Studio and Web Studio) largely intact. It uses open standards, such as the Web Services Descriptive Language (WSDL), which makes the Application Controller platform- and language-independent. As a result, the Application Controller supports a wide variety of applications in production. It allows you to handle complex operating environments that support features such as partial updates, delta updates, phased MDEX Engine updates, and more.

Architecture of the Application Controller


The Application Controller is typically run in a distributed environment. The Application Controller is installed on each machine that runs the Endeca software. Depending on the role that the Application Controller plays in the Endeca implementation, each instance of the Application Controller can take one of two roles:

One instance serves as the EAC Central Server. This instance includes a WSDL interface, through which you communicate with the Application Controller. Communication is implemented with the standard Web Services protocol, SOAP. You can communicate with the Application Controller using any of the following methods:

Using Web Studio. Endeca Web Studio communicates through the WSDL interface to the EAC Central Server. Using Web Studio you can provision, run, and monitor your application. For details, see the Endeca Web Studio Help. Using the command line utility, eaccmd. eaccmd lets you script the Application Controller within a language such as Perl, shell, or batch. (For details, see Using the Eaccmd Tool on page 191.)

Endeca Administrators Guide Chapter 9: About the Endeca Application Controller

Endeca Confidential

139

Using direct programmatic control through the Endeca WSDL and languages, such as Java, that support Web Services. (For details, see Endeca Application Controller API Interface Reference on page 217.)

Using any of these methods, you can instruct the Application Controller to perform different operations in your Endeca implementations, such as start or stop a component (for example, Forge or Dgraph), or a utility (for example, Copy or Shell environment). The EAC Central Server also contains a repository that stores provisioning informationthat is, data about the hosts, components, applications and scripts that the Application Controller is managing.

All other instances of the EAC serve as Agents. The Agents instruct their host machines to do the actual work of an Endeca implementation, such as processing data with a Forge component, or coordinating the workings of multiple MDEX Engines with an Aggregated MDEX Engine component. Each Agent also contains a small repository for its own use. The EAC Central Server communicates with its Agents through an internal Web Service interface. You do not communicate directly with the Agentsall command, control, and monitoring functions are sent through the EAC Central Server.

The diagram below shows the architecture of the Application Controller.

Endeca Confidential

Architecture of the Application Controller

140

Endeca Administrators Guide Chapter 9: About the Endeca Application Controller

Endeca Confidential

141

In this diagram, the following happens: 1 The developer, business user, and system administrator provide instance configuration and resource configuration information to the EAC Central Server, using any of the three methods:

Developer Studio and Web Studio The eaccmd command line utility Direct programmatic control through the Endeca Web services interface, or any of the languages, such as Java.

The EAC Central Server uses that information to communicate with EAC Agents that run on each machine hosting an implementation. The Agents in turn run the necessary processes on each machine.

Endeca Confidential

Architecture of the Application Controller

142

Endeca Administrators Guide Chapter 9: About the Endeca Application Controller

Endeca Confidential

Chapter 10

Using the Application Controller


This chapter discusses how to use the Application Controller. It contains the following sections:

Installing the Application Controller Specifying the EAC Central Server in Web Studio Starting and stopping the Application Controller directly Using the eac.properties file Modifying Application Controller logging levels

144

Installing the Application Controller


You have the following choices when you install the Endeca Application Controller:

Install the Agent. The Agent controls the workings of a single machine in an Application Controller deployment. There are typically several Agents in a deployment. Install the EAC Central Server. The Central Server acts as a hub in an Application Controller deployment, relaying commands to each of the Agents in the deployment. As such, there is only a single Central Server per deployment. Alternatively, you can use an SSL-enabled Central Server. Upon configuration, this version encrypts the HTTP channel between the Central Server and the client Web services.

Install both.

During installation, when you select whether you want to run the Agent and/or the Central Server on a machine, an XML pointer to the appropriate WAR file is copied to its workspace directory. The presence or absence of these files in the workspace directory determines what that machine is running. If you want to run the SSL-enabled version of the Central Server, you must copy the XML pointer to it to your workspace directory manually, as described in the following section.

Enabling SSL security on the Application Controller


SSL in the Application Controller is disabled by default. To enable SSL security (between the client and the EAC Central Server, between the Central Server and an Agent, or between Agents), you need to do the following:

Enable the SSL version of the appropriate Application Controller WAR file (eac-ssl.war replaces eac.war for the Central Server, and eac-agent-ssl.war replaces eac-agent.war for the Agent). Modify the server.xml file for the Tomcat that is hosting the Application Controller.

Administrators Guide Chapter 10: Using the Application Controller

Endeca Confidential

145

For details on enabling SSL security in the Application Controller, see the Endeca Security Guide for Java.

Specifying the EAC Central Server in Web Studio


On the EAC Settings page of Web Studio, you specify the host and port for the EAC Central Server. These settings control which machine Web Studio communicates with when making requests to EAC. See the Endeca Web Studio help for more information.

Starting and stopping the Application Controller directly


Although you typically control the Application Controller through Web Studio, you can also start and stop it independently.

Starting and stopping the Application Controller on UNIX


In a UNIX shell, you start the Application Controller (along with any other components using the same port) with the following command:
$ENDECA_ROOT/tools/server/bin/startup.sh

Note: If you followed the instructions to set the environment variables in the Endeca Installation Guide, you can use this shortened version, startup.sh, instead. You stop the Application Controller (along with any other components using the same port) with the following command:
$ENDECA_ROOT/tools/server/bin/shutdown.sh

Starting the Application Controller from inittab


In a UNIX production environment, the Endeca Application Controller can be started by init from inittab. If the EAC crashes or is terminated, init automatically restarts it. For details on creating a startup script and adding

Endeca Confidential

Specifying the EAC Central Server in Web Studio

146

it to the inittab file, see the chapter titled Installing the Endeca Information Access Platform on UNIX in the Endeca Installation Guide.

Starting and stopping the Application Controller on Windows


The Endeca HTTP service, which controls the Endeca Application Controller, is created, registered, and configured by the installation, and started when you reboot your computer after installation. To stop and restart the Application Controller after installation, do the following: 1 2 3 Go to Start > Control Panel > Administrative Tools > Services. In the Windows Services editor, select the Endeca HTTP service. Click Stop or Restart.

Using the eac.properties file


The eac.properties file, which is located in the $ENDECA_CONF/conf directory on UNIX, or %ENDECA_CONF%\conf on Windows, is the general configuration file for the Endeca Application Controller. The following section describes the process control-related settings you can specify in eac.properties. Note: SSL-related properties in this file are discussed in the Endeca Security Guide for Java.

Setting the Copy utilitys temporary directory


Directories are copied first to a specified temporary directory on the destination machine before being copied one file at a time to the target location. You can configure the location of this temporary directory in the eac.properties file, using the optional setting com.endeca.eac.filetransfer.fileTransferTempDir as follows:

If this setting is defined as an absolute path, the Copy utility uses it.

Administrators Guide Chapter 10: Using the Application Controller

Endeca Confidential

147

If it is defined as a relative path, the Copy utility considers it to be relative to %ENDECA_CONF%/state/ If it is not defined, the Copy utility uses the directory
%ENDECA_CONF%/state/file_transfer/

Ensuring clean component shutdown


Server components such as the Dgraph can be cleanly shut down via their HTTP interface. When stopping a server, the Application Controller first attempts to shut down the server through its HTTP interface. If this does not complete within 30 seconds, it kills the server process. You can modify this default with the com.endeca.eac.process.shutdownTimeOutSecs setting in eac.properties.

Managing server restarts


In an effort to make Endeca deployments more fault tolerant, the Application Controller automatically restarts servers that crash. You can configure the number of times the Application Controller attempts to restart a server within a specified time window. If the server crashes more than the specified number of times in the specified time window, then it is marked as failed. Both of these variables are set in eac.properties. The
com.endeca.eac.process.maxServerRestartsPerWindow setting defaults to five, while com.endeca.eac.process.serverRestartTimeWindowMins

defaults to one.

Modifying Application Controller logging levels


By default, Application Controller log files log WARNING and SEVERE messages. If you want to capture INFO level messages as well, you need to modify the logging.properties file, which is located in the $ENDECA_CONF/conf directory on UNIX, or %ENDECA_CONF%\conf on Windows.

Endeca Confidential

Modifying Application Controller logging levels

148

1 2 3

Open logging.properties. Find the section EAC Log Level. In the line com.endeca.eac.level, change WARNING to INFO.

Administrators Guide Chapter 10: Using the Application Controller

Endeca Confidential

Chapter 11

Provisioning an Implementation with the Endeca Application Controller


You specify Application Controller hosts, components, and scripts, and later reference them in Web Studio, eaccmd, or your custom Web services interface. This process is known as provisioning. This chapter describes how you write the provisioning file and use it to create an implementation. This chapter contains the following sections:

Provisioning overview About the provisioning file and schema Provisioning your implementation with eaccmd Forcing the removal of an application Incremental provisioning Provisioning your deployment with Endeca Deployment Template

150

Provisioning overview
Provisioning an Endeca implementation with the Application Controller consists of the following steps:

Creating a provisioning file, in which you define the hosts and components that comprise your implementation, as well as the scripts that it uses. Referencing that file when creating an implementation with the eaccmd tool or your custom Web service interface.

Note: This chapter provides examples using the sample wine reference implementation and the eaccmd tool. For information about provisioning programmatically using the WSDL, see Endeca Application Controller API Interface Reference on page 217. For information about provisioning EAC within Web Studio, see the Web Studio online help.

About the provisioning file and schema


The provisioning file is a file in XML format in which you define the following aspects of your implementation:

Application (the root element) Hosts (and, optionally, directories on hosts) Components Scripts.

Note: You can name this file anything you like. In the remainder of this chapter, we frequently refer to the provisioning file as app.xml. The provisioning schema, eaccmdProvisioning.xsd, is located in the MDEXEngine\<version>\conf\schema directory.

Administrators Guide Chapter 11: Provisioning an Implementation with the Endeca Application Controller

Endeca Confidential

151

Invalid characters in provisioning


The following characters cannot be used when provisioning applications, components, hosts, scripts, or utility tokens:

Invalid Windows file name characters, such as:

Forward slash (/) Backslash (\) Colon (:) Asterisk (*) Question mark (?) Right and left angle brackets (< >) Double quotation mark ( ) Vertical pipe (|)

These additional characters:

Single quotation mark () Space

Defining the root Application element


The root element in a provisioning file is the <application> element. As you can see in the example below, the application identifier is an attribute to <application>:
<application application-id=agraph-wine>

You can also specify an applicationID in the eaccmd tool, which is described in Chapter 12. If eaccmd specifies a different applicationID for the same application, it overrides the one provided in the provisioning file.

Endeca Confidential

About the provisioning file and schema

152

Defining hosts
In the <hosts> element you list each <host> by a host ID, a host name, a port number, and (optionally) properties and directories. The <host> syntax is as follows:
<host host-id="host1" host-name="localhost" port="8888"> <properties> <property name="department" value="engineering" /> <property name="department" value="prof services" /> <property name="enforceDiskQuota" /> </properties> </host>

In this example the port is the HTTP port through which the EAC Central Server communicates with its Agents. The optional use of host-id to alias host definitions is explained in the following section. The optional addition of properties is described on page 154. The optional addition of directories is described on page 153.

Aliasing hosts with host-id


In each <host> definition, you can create a unique alias called host-id that may be used to refer to the specified host and port. (The host-name and port do not need to be unique.) For example, say you defined host1 as follows:
<host host-id="host1" host-name="localhost" port="8888" />

Later, when defining components, you could simply refer to that host-id when specifying the host for a given component.
<dgidx name="dgidx-0" host-id="host1">

Aliasing hosts in this way has two benefits:

It allows you to switch staging and production machines easily, by changing the name and port associated with a host-id alias. It makes it possible to reference a single physical host through different host-id aliases.

Administrators Guide Chapter 11: Provisioning an Implementation with the Endeca Application Controller

Endeca Confidential

153

Provisioning directories on a host


As part of host provisioning, you can also provision directories using a full path and a name. For example, assuming a host has already been provisioned as defined above, you could add the following element:
<host > ... <directories> <directory dir-id="input"> <path>C:\staging_app\working\input</path> </directory> </directories> </host>

Defining components in your provisioning file


The <components> element contains all of the components in your implementation. Depending on the component type, the settings vary. The following section provides details about all supported component types.

Notes:
The order of elements in a component does not matter. Unless otherwise noted, relative paths are supported. Required elements are labelled as such. If you attempt to provision a component without a required element, you will receive an error.

Using XML entities in your provisioning file


The Application Controller supports the use of XML entities in provisioning files. For example, assume you established the following entities in your XML provisioning file:
<!DOCTYPE application [ <!ENTITY W_base "C:\Endeca\MDEXEngine\reference\sample_wine_data\data"> ... <!ENTITY H1 "host1"> ... ]>

Endeca Confidential

About the provisioning file and schema

154

Subsequently, when defining a Forge component, rather than having to enter the host machine and working directory like this:
<forge component-id="forge1" host-id="host1"> <working-dir>C:\Endeca\MDEXEngine\reference\ sample_wine_data\data\ </working-dir> ... </forge>

you can instead refer to them by their entities, like this:


<forge component-id="forge1" host-id="&H1;"> <working-dir>&W_base;\</working-dir>

Adding properties to hosts and components


You can add properties, consisting of a required name and an optional value, to any host or component element. Such properties can be used for value mapping as well as for flagging the element in question. You add properties as part of provisioning your application. After your application is provisioned, any properties that you defined are included in the application definition, which you can retrieve using eaccmds describe-app command. This feature is only useful in user-provided scripts; it is not an additional place to pass arguments or options to Endeca components.

Defining scripts in your provisioning file


IMPORTANT: EAC scripts are not the same as control scripts, which were deprecated in version 5.0 of the Endeca IAP. A script is a named command that you provision and run within the Application Controller. In most cases, a script invokes a batch file that runs a process, such as a baseline update or report generation, or otherwise exercises component control. Scripts provide the automation that makes it possible for you to wrap and reuse a sequence of commands, without removing your ability to configure your application. Although only one instance of each script can run at a time, most scripts are designed to be run repeatedly. For example, rather than start each component separately using Web Studio or eaccmd, you can launch a

Administrators Guide Chapter 11: Provisioning an Implementation with the Endeca Application Controller

Endeca Confidential

155

baseline update script that will execute the start component commands in the proper sequence. You can reuse this script as often as you like. Scripts live on the EAC Central Server; the EAC runs them from there. You can use scripts with the eaccmd tool, when accessing the Endeca WSDL programmatically, or within Web Studio. Details on starting, stopping, and obtaining status for scripts for each of these environments can be found in the following places:

Component and script control commands, located on page 201 of Using the Eaccmd Tool The ScriptControl interface, located on page 238 in the Endeca Application Controller API Interface Reference In the Web Studio Help

Note: Scripts are not supported on clusters that are not uniformly one platform.

Developing and maintaining scripts


You can write your own script in Java or .NET to contact the Central Server directly. Because the EAC does not offer any mechanism for passing arguments to scripts at runtime, you need to provision a separate EAC script for every combination of arguments you plan to use. For example, if you want the Report Generator to generate daily and weekly reports, you must provision the associated script twice, once for each time period argument.

Script reference implementations


The Endeca IAP ships with a number of simple scripts that can be run out of the box. In addition, the source code for these scripts can be used as an example for further configuration and script development. Scripts include the following:

A baseline update script that runs a very simple (Forge/Index/Dgraph) baseline update. An MDEX Engine update script that pushes configuration changes to the MDEX Engine.

Endeca Confidential

About the provisioning file and schema

156

A report generation script that can run daily or weekly reports. This script is discussed in detail on page 109.

For reference, the script source tree will be installed as part of the Endeca reference implementation. Compiled scripts reside in $ENDECA_ROOT/bin, with any dependent jar files in $ENDECA_ROOT/lib/java.

Script environment variables


You can write your own script in Java or .NET to contact the EAC Central Server directly. Script environment variables allow you to look up the host, port, and application name if you want to use them in your script. These environment variables are set in the scripts runtime environment. The EAC Central Server provides values for the following three variables:

EAC_HOST is the hostname for the EAC Central Server host. EAC_PORT is the port number for the EAC Central Server host. EAC_APP is the application in which this script is provisioned.

Provisioning scripts
Scripts, like hosts and components, need to be provisioned before they can be used in the Application Controller. Scripts can be provisioned with the following elements: Sub-element
script-id cmd log-file

Description
Required. The name of this script. Required. The command to launch the script. Name of the script log file. If log-file is not specified, the default value is used. Working directory for the process that is launched. If it is specified, it must be an absolute path. If working-dir is not specified, the default value of $ENDECA_CONF/working/(app_id)/ is used.

working-dir

Administrators Guide Chapter 11: Provisioning an Implementation with the Endeca Application Controller

Endeca Confidential

157

Example
This example provisions two scripts.
<scripts> <script script-id="script1"> <cmd>runthis.sh</cmd> </script> <script script-id="script2"> <cmd>run.sh --this</cmd> </script> </scripts>

Using canonical paths in an application


The Application Controller provides a great deal of flexibility in computing directories. However, if you want to write a generic script that can work with any kind of provisioning, the getApplication() method can make it difficult to predict unspecified directory destinations. In such cases, the getCanonicalApplication() method returns the provisioning just as getApplication() does, but with all paths canonicalized. This process ensures that all paths are absolute, and that the working directory and log path settings are provided. It also prevents .. from being used in a path name. In eaccmd, you use the optional --canonical flag to the describe-app command to enable canonicalization. Because it has to resolve paths on each Agent, getCanonicalApplication() can be slightly slower than getApplication(). Therefore, if you know that your script uses full paths, you may prefer to use getApplication().

Endeca Confidential

About the provisioning file and schema

158

Component reference
This section provides details and examples for the following components:

Forge on page 158 Dgidx on page 161 Dgraph on page 165 Agidx on page 168 Agraph on page 170 Crawler on page 174 LogServer on page 178 ReportGenerator on page 179

Note: In the components that follow, if input-dir, output-dir, or state-dir are not specified, they default to directories named input, output, and state respectively, underneath the components working-dir.

Forge
A Forge element launches the Forge (Data Foundry) software, which transforms source data into tagged Endeca records.

Attributes
Every Application Controller component contains the following attributes: Attribute
component-id

Description
Required. The name of this instance of the component. Required. The alias of the host upon which the component is running. An optional list of properties, consisting of a required name and an optional value. For more information, see Adding properties to hosts and components on page 154.

host-id

properties

Administrators Guide Chapter 11: Provisioning an Implementation with the Endeca Application Controller

Endeca Confidential

159

Sub-elements
The Forge element contains the following sub-elements: Sub-element
args

Description
Command-line flags to pass to Forge, expressed as a set of arg sub-elements. If an argument takes a value, the argument and value must be on separate lines in the provisioning file. For example: <args> <arg>--threads</arg> <arg>3</arg> </args>

input-dir log-file

The path to the Forge input. Name of the Forge log file. If the log-file is not specified, the default is component working directory plus component name plus .log. The implementation-specific prefix name, without any associated path information. Directory where the output from the Forge process will be stored. Required. Name of the Pipeline.epx file to pass to Forge. The number of partitions. Working directory for the process that is launched. If it is specified, it must be an absolute path. If any of the other properties of this component contain relative paths, they are interpreted as relative to the working directory. If working-dir is not specified, it defaults to $ENDECA_CONF/work/<appName>/ <componentName> on UNIX, or %ENDECA_CONF%\work\<appName>/ <componentName> on Windows.

output-prefix-name

output-dir

pipeline-file

num-partitions working-dir

Endeca Confidential

Component reference

160

Sub-element
state-dir temp-dir web-service-port

Description
The directory where the state file is located. The temporary directory that Forge uses. The port on which the Forge metrics Web service listens. Both the parallel Forge and Forge metrics Web service can secure their communications with SSL. The ssl-configuration element contains three sub-elements of its own:

ssl-configuration

cert-file
The cert-file specifies the path of the eneCert.pem certificate file that is used by Forge processes to present to any client. This is also the certificate that the Application Controller Agent should present to Forge when trying to talk to it. The file name can be a path relative to the components working directory.

ca-file
The ca-file specifies the path of the eneCA.pem Certificate Authority file that Forge processes uses to authenticate communications with other Endeca components. The file name can be a path relative to the components working directory.

cipher
The cipher is an optional cipher string (such as RC4-SHA) that specifies the minimum cryptographic algorithm that parallel Forge processes use during the SSL negotiation. If you omit this setting, the SSL software tries an internal list of ciphers, beginning with AES256-SHA. Note: The Forge metrics Web service does not use the cipher sub-element.

Administrators Guide Chapter 11: Provisioning an Implementation with the Endeca Application Controller

Endeca Confidential

161

Example
The following example provisions a Forge component for use with the sample wine data:
<forge component-id="wine_forge" host-id="wine_indexer"> <args> <arg>-vw</arg> </args> <num-partitions>1</num-partitions> <working-dir>C:\Endeca\MDEXEngine\reference\sample_wine_data</working-dir> <pipeline-file>.\data\forge_input\pipeline.epx</pipeline-file> <input-dir>.\data\forge_input</input-dir> <output-dir>.\data\partition0\forge_output</output-dir> <state-dir>.\data\partition0\state</state-dir> <log-file>.\logs\wine_forge.log</log-file> <output-prefix-name>wine</output-prefix-name> </forge>

Dgidx
A Dgidx component sends the finished data prepared by Forge to the Dgidx program, which generates the proprietary indices for each Dgraph.

Attributes
Every Application Controller component contains the following attributes: Attribute
component-id

Description
Required. The name of this instance of the component. Required. The alias of the host upon which the component is running. An optional list of properties, consisting of a required name and an optional value. For more information, see Adding properties to hosts and components on page 154.

host-id

properties

Endeca Confidential

Component reference

162

Sub-elements
The Dgidx element contains the following sub-elements: Sub-element
args

Description
Command-line flags to pass to Dgidx, expressed as a set of arg sub-elements. If an argument takes a value, the argument and value must be on separate lines in the provisioning file. For example: <args> <arg>--threads</arg> <arg>3</arg> </args>

app-config-prefix

Path and file prefix that define the input for Dgidx. For example, in /endeca/project/ files/myProject, files beginning with myProject in the directory /endeca/ project/files are the ones to be considered. Required. Path and prefix name for the Dgidx output. For example, output_prefix = c:\temp\wine generates files that start with wine in c:\temp.

output-prefix

Administrators Guide Chapter 11: Provisioning an Implementation with the Endeca Application Controller

Endeca Confidential

163

Sub-element
log-file

Description
The path to and name of the Dgidx log files. If the log-file is not specified, the default is component working directory plus component name plus .log. Dgidx can generate three distinct log files: the basic component log file, and two files that log the subtasks described in run-aspell, below.

The file dgwordlist logs stdout/stderr for the


dgwordlist subtask described below. The name of this file is derived from the Dgidx components log-file location, plus the term dgwordlist. If an extension exists, dgwordlist is added before the extension. For example, if the original log-file is C:\dir\dgidx-1.log, then the dgwordlist log would be C:\dir\dgidx-1.dgwordlist.log.

The file aspellcopy logs the stdout/stderr for


the subtask of uploading the Aspell files to Dgidxs output directory, where the Dgraph can access them. The name of this file is derived from the Dgidx components log-file location, plus the term aspellcopy. If an extension exists, aspellcopy is added before the extension. For example, if the original log-file is C:\dir\dgidx-1.txt, then the aspellcopy log would be C:\dir\dgidx-1.aspellcopy.txt. input-prefix Required. Path and prefix name for the Forge output that Dgidx indexes. Working directory for the process that is launched. If it is specified, it must be an absolute path. If any of the other properties of this component contain relative paths, they are interpreted as relative to the working directory. If working-dir is not specified, it defaults to $ENDECA_CONF/work/<appName>/ <componentName> on UNIX, or %ENDECA_CONF%\work\<appName>/ <componentName> on Windows.

working-dir

Endeca Confidential

Component reference

164

Sub-element
run-aspell

Description
Specifies Aspell as the spelling correction mode for the implementation. This causes the Dgidx component to run dgwordlist and to copy the Aspell files to its output directory, where the Dgraph component can access them. The default is true. See log-file above for details on the logging of these subtasks. For Aspell details, see the Using Spelling Correction and Did You Mean section in the Endeca Developers Guide.

temp-dir

A temporary directory used by this component.

Example
The following example provisions a Dgidx component to work with the sample wine data:
<dgidx component-id="wine_dgidx" host-id="wine_indexer"> <args> <arg>-v</arg> </args> <working-dir>C:\Endeca\MDEXEngine\reference\sample_wine_data</working-dir> <input-prefix>.\data\partition0\forge_output\wine</input-prefix> <app-config-prefix>.\data\partition0\forge_output\wine</app-config-prefix> <output-prefix>.\data\partition0\dgidx_output\wine</output-prefix> <log-file>.\logs\wine_dgidx.log</log-file> <run-aspell>true</run-aspell> </dgidx>

Administrators Guide Chapter 11: Provisioning an Implementation with the Endeca Application Controller

Endeca Confidential

165

Dgraph
A Dgraph element launches the Dgraph (MDEX Engine) software, which processes queries against the indexed Endeca records.

Attributes
Every Application Controller component contains the following attributes: Attribute
component-id

Description
Required. The name of this instance of the component. Required. The alias of the host upon which the component is running. An optional list of properties, consisting of a required name and an optional value. For more information, see Adding properties to hosts and components on page 154.

host-id

properties

Sub-elements
The Dgraph element contains the following sub-elements: Sub-element
args

Description
Command-line flags to pass to Dgraph, expressed as a set of arg sub-elements. If an argument takes a value, the argument and value must be on separate lines in the provisioning file. For example: <args> <arg>--threads</arg> <arg>3</arg> </args>

port

Required. The port at which the Dgraph should listen. The default is 8000.

Endeca Confidential

Component reference

166

Sub-element
log-file

Description
The path to and name of the Dgraph log file. If the log-file is not specified, the default is component working directory plus component name plus .log. Required. Path and prefix name for the Dgidx output that the Dgraph uses as an input. Path and file prefix that define the input for the Dgraph. For example, in /endeca/project/ files/myProject, files beginning with myProject in the directory /endeca/ project/files are the ones to be considered. Working directory for the process that is launched. If it is specified, it must be an absolute path. If any of the other properties of this component contain relative paths, they are interpreted as relative to the working directory. If working-dir is not specified, it defaults to $ENDECA_CONF/work/<appName>/ <componentName> on UNIX, or %ENDECA_CONF%\work\<appName>/<co mponentName> on Windows.

input-prefix

app-config-prefix

working-dir

startup-timeout

Specifies the amount of time in seconds that the Application Controller waits while starting the Dgraph. If it cannot determine that the Dgraph is running in this timeframe, it times out. The default is 60.

req-log-file spell-dir

Path to and name of the request log. If specified, is the directory in which the Dgraph will look for Aspell files. If it is not specified, the Dgraph will look for Aspell files in the Dgraphs input directory (that is, input-prefix without the prefix). For example, if input-prefix is /dir/prefix and all the Dgraph input files are /dir/prefix.*, the Dgraph will look for the Aspell files in /dir/).

Administrators Guide Chapter 11: Provisioning an Implementation with the Endeca Application Controller

Endeca Confidential

167

Sub-element
update-dir

Description
Specifies the directory from which the Dgraph reads partial update file. For more information, see the "Implementing Partial Updates" section in the Endeca Information Transformation Layer Guide. Specifies the file for update-related log messages. A temporary directory used by this component. Contains three sub-elements of its own:

update-log-file

temp-dir ssl-configuration

cert-file
The cert-file specifies the path of the eneCert.pem certificate file that is used by the Dgraph processes to present to any client. This is also the certificate that the Application Controller Agent should present to the Dgraph when trying to talk to the Dgraph. The file name can be a path relative to the components working directory.

ca-file
The ca-file specifies the path of the eneCA.pem Certificate Authority file that the Dgraph processes uses to authenticate communications with other Endeca components. The file name can be a path relative to the components working directory.

cipher
The cipher is an optional cipher string (such as RC4-SHA) that specifies the minimum cryptographic algorithm that the Dgraph processes use during the SSL negotiation. If you omit this setting, the SSL software tries an internal list of ciphers, beginning with AES256-SHA. See the Endeca Security Guide for more information.

Endeca Confidential

Component reference

168

Example
The following example provisions an SSL-enabled Dgraph component for use with the sample wine data:
<dgraph component-id="wine_dgraph" host-id="wine_indexer"> <args> <arg>--spl</arg> <arg>--dym</arg> </args> <port>8000</port> <working-dir>C:\Endeca\MDEXEngine\reference\sample_wine_data</working-dir> <input-prefix>.\data\partition0\dgraph_input\wine</input-prefix> <app-config-prefix>.\data\partition0\dgraph_input\wine</app-config-prefix> <log-file>.\logs\wine_dgraph.log</log-file> <req-log-file>.\logs\wine_dgraph_req_log.out</req-log-file> <startup-timeout>120</startup-timeout> <ssl-configuration> <cert-file>C:\Endeca\MDEXEngine\workspace\etc\eneCert.pem</cert-file> <ca-file>C:\Endeca\MDEXEngine\workspace\etc\eneCA.pem</ca-file> <cipher>AES128-SHA</cipher> </ssl-configuration> </dgraph>

Agidx
An Agidx component runs Agidx on a machine, creating a set of Agidx indices that support the Agraph program in a distributed environment. The Agidx component is used only in distributed environments and is run sequentially on multiple machines. On the first machine, the Agidx component takes the Dgidx output from that machine as its input. On the next machine, the output from the first Agidx run is copied over, using the Copy service. It, along with the Dgidx output from that machine, is used as Agidx input.

Attributes
Every Application Controller component contains the following attributes: Attribute
component-id

Description
Required. The name of this instance of the component.

Administrators Guide Chapter 11: Provisioning an Implementation with the Endeca Application Controller

Endeca Confidential

169

Attribute
host-id

Description
Required. The alias of the host upon which the component is running. An optional list of properties, consisting of a required name and an optional value. For more information, see Adding properties to hosts and components on page 154.

properties

Sub-elements
The Agidx element contains the following sub-elements: Sub-element
args

Description
Command-line flags to pass to Agidx, expressed as a set of arg sub-elements. If an argument takes a value, the argument and value must be on separate lines in the provisioning file. For example: <args> <arg>--threads</arg> <arg>3</arg> </args>

output-prefix

Required. Path and prefix name for the Agidx output. For example, output_prefix = c:\temp\wine generates files that start with wine in c:\temp.

log-file

The path to and name of the Agidx log file. If the log-file is not specified, the default is component working directory plus component name plus .log. Required. The path to the output of various Dgidxes, which Agidx uses as input. These are listed as a set of input-prefix sub-elements.

input-prefixes

Endeca Confidential

Component reference

170

Sub-element
working-dir

Description
Working directory for the process that is launched. If it is specified, it must be an absolute path. If any of the other properties of this component contain relative paths, they are interpreted as relative to the working directory. If working-dir is not specified, it defaults to $ENDECA_CONF/work/<appName>/ <componentName> on UNIX, or %ENDECA_CONF%\work\<appName>/ <componentName> on Windows.

previous-agidx-outputprefix

The file prefix of the Agidx data from the previous run, which has been copied to this machine by a Copy operation. This parameter should not be used when running the Agidx component on the first data subset.

Example
The following example provisions an Agidx component to work with the sample wine data:
<agidx component-id="mkt_agidx" host-id="host2"> <working-dir>C:\Endeca\MDEXEngine\reference\sample_wine_data </working-dir> <args> <arg>-v</arg> </args> <input-prefixes> <input-prefix>C:\Endeca\MDEXEngine\reference\sample_wine_data\data\ partition0\dgidx_output1\wine</input-prefix> <input-prefix>C:\Endeca\MDEXEngine\reference\sample_wine_data\data\ partition0\dgidx_output2\wine</input-prefix> </input-prefixes> <output-prefix>C:\Endeca\MDEXEngine\reference\sample_wine_data\data\ partition0\agidx\wine</output-prefix> <log-path>C:\Endeca\MDEXEngine\workspace\logs\agidx.out</log-path> </agidx>

Agraph
An Agraph component runs the Agraph program, which defines and coordinates the activities of multiple, distributed Dgraphs.

Administrators Guide Chapter 11: Provisioning an Implementation with the Endeca Application Controller

Endeca Confidential

171

Attributes
Every Application Controller component contains the following attributes: Attribute
component-id

Description
Required. The name of this instance of the component. Required. The alias of the host upon which the component is running. An optional list of properties, consisting of a required name and an optional value. For more information, see Adding properties to hosts and components on page 154.

host-id

properties

Sub-elements
The Agraph component contains the following sub-elements: Sub-element
args

Description
Command-line flags to pass to Agraph, expressed as a set of arg sub-elements. If an argument takes a value, the argument and value must be on separate lines in the provisioning file. For example: <args> <arg>--threads</arg> <arg>3</arg> </args>

port

Required. The port at which the Agraph should listen. The path to and name of the Agraph log file. If the log-file is not specified, the default is component working directory plus component name plus .log.

log-file

Endeca Confidential

Component reference

172

Sub-element
children

Description
Required. A list of the child Dgraphs and related devices for this Agraph, children is a single element that can contain a mixture of dgraph-ref and host-port elements.

The dgraph-ref element is a simple string


name of a Dgraph that exists within the same Application Controller implementation. For example: <dgraph-ref name="dgraph-0"/>

The host-port element has host and port


attributes and is typically used to refer to an unprovisioned device such as a load balancer. For example: <host-port host-name="localhost" port="9900"/> If you know you are referring only to actual Dgraphs, and not to load balancers or other unprovisioned devices, you may use dgraph-ref elements exclusively. input-prefix Required. Path and prefix name for the Agidx output that the Agraph uses as an input. Working directory for the process that is launched. If it is specified, it must be an absolute path. If any of the other properties of this component contain relative paths, they are interpreted as relative to the working directory. If working-dir is not specified, it defaults to $ENDECA_CONF/work/<appName>/ <componentName> on UNIX, or %ENDECA_CONF%\work\<appName>/ <componentName> on Windows. app-config-prefix Path and file prefix that define the input for the Agraph. For example, in /endeca/project/ files/myProject, files beginning with myProject in the directory /endeca/ project/files are the ones to be considered.

working-dir

Administrators Guide Chapter 11: Provisioning an Implementation with the Endeca Application Controller

Endeca Confidential

173

Sub-element
startup-timeout

Description
Specifies the amount of time in seconds that the Application Controller will wait while starting the Agraph. If it cannot determine that the Agraph is running in this timeframe, it times out. The default is 60.

req-log-file ssl-configuration

Path to and name of the request log. Contains three sub-elements of its own:

cert-file
The cert-file specifies the path of the eneCert.pem certificate file that is used by the Agraph processes to present to any client. This is also the certificate that the Application Controller Agent should present to the Agraph when trying to talk to the Agraph. The file name can be a path relative to the components working directory.

ca-file
The ca-file specifies the path of the eneCA.pem Certificate Authority file that the Agraph processes uses to authenticate communications with other Endeca components. The file name can be a path relative to the components working directory.

cipher
The cipher is an optional cipher string (such as RC4-SHA) that specifies the minimum cryptographic algorithm that the Agraph processes use during the SSL negotiation. If you omit this setting, the SSL software tries an internal list of ciphers, beginning with AES256-SHA. See the Endeca Security Guide for more information.

Endeca Confidential

Component reference

174

Example
The following example provisions a non-SSL Agraph component to work with the sample wine data:
<agraph component-id="mkt_agraph-3" host-id="host2"> <working-dir>C:\Endeca\MDEXEngine\reference\sample_wine_data</working-dir> <args/> <port>10020</port> <app-config-prefix>C:\Endeca\MDEXEngine\reference\sample_wine_data\data\ forge_input\wine</app-config-prefix> <log-file>C:\Endeca\MDEXEngine\workspace\logs\agraph3.out</log-file> <req-log-file>C:\Endeca\MDEXEngine\workspace\logs\agraph_requests3.out </req-log-file> <children> <dgraph-ref component-id="dgraph-0"/> <!-- <dgraph-ref component-id="dgraph-1"/> --> <host-port host-name="localhost" port="9900"/> <!-- <host-port host-name="localhost" port="9901"/> --> </children> <input-prefix>C:\Endeca\MDEXEngine\reference\sample_wine_data\data\ partition0\agraph-3\wine</input-prefix> <startup-timeout>120</startup-timeout> </agraph>

Crawler
A Crawler component runs the Endeca Advanced Crawler, which creates Endeca records based on crawled source documents. For more information about the Advanced Crawler, see the Endeca Information Transformation Layer Guide.

Attributes
Every Application Controller component contains the following attributes: Attribute
component-id

Description
Required. The name of this instance of the component. Required. The alias of the host upon which the component is running.

host-id

Administrators Guide Chapter 11: Provisioning an Implementation with the Endeca Application Controller

Endeca Confidential

175

Attribute
properties

Description
An optional list of properties, consisting of a required name and an optional value. For more information, see Adding properties to hosts and components on page 154.

Sub-elements
The Crawler component contains the following sub-elements: Sub-element
working-dir

Description
Working directory for the process that is launched. If it is specified, it must be an absolute path. If any of the other properties of this component contain relative paths, they are interpreted as relative to the working directory. If working-dir is not specified, it defaults to $ENDECA_CONF/work/<appName>/ <componentName> on UNIX, or %ENDECA_CONF%\work\<appName>/ <componentName> on Windows.

log-file

The path to and name of the Crawler log file. If the log-file is not specified, the default is component working directory plus component name plus .log. Command-line flags to pass to the Crawler, expressed as a set of arg sub-elements. If an argument takes a value, the argument and value must be on separate lines in the provisioning file. For example: <args> <arg>--threads</arg> <arg>3</arg> </args>

args

default-settings-file

Required. Path to the default settings file for this Crawler component. The file is typically named something like <prefix>.crawler_ defaults.properties.

Endeca Confidential

Component reference

176

Sub-element
global-config-file

Description
Required. Path to the global configuration file for this Crawler component. The file is typically named something like <prefix>.crawler_ global_config.xml. Required. Path to the profile configuration file to use for this crawler run. The file is typically named something like crawler_profile_ 1_config.xml. Required. Path to the file that contains the list of URLs to crawl. The file is typically named something like crawl_profile_1_url_list.xml. Required. Path and prefix name for the data the Crawler component stores. For example, output_prefix = c:\temp\wine generates files that start with wine in c:\temp. Also, any downloaded files the crawler stores are in a subdirectory of output_prefix called \crawler_downloaded_files.

profile-config-file

url-list-file

output-prefix

port

Port on which to run the Crawler component. The default is 8099. Java Virtual Machine settings. If you are modifying Java source files, you may need to modify these settings, which are passed to the Java process. Class path add-ons. If you are modifying Java source files, the modifications may require additions to the class path.

java-options

classpath-elements

Administrators Guide Chapter 11: Provisioning an Implementation with the Endeca Application Controller

Endeca Confidential

177

Example
The following example provisions a Crawler component based on the sample wine data.
<crawler component-id="mkt_crawler" host-id="host2"> <working-dir>C:\Endeca\MDEXEngine\reference\sample_wine_data </working-dir> <port>9099</port> <default-settings-file> C:\Endeca\MDEXEngine\reference\sample_wine_data\ wine.crawler_defaults.properties </default-settings-file> <global-config-file> C:\Endeca\MDEXEngine\reference\sample_wine_data\ wine.crawler_global_config.xml </global-config-file> <profile-config-file> C:\Endeca\MDEXEngine\reference\sample_wine_data\ crawl_profile_1_config.xml </profile-config-file> <url-list-file> C:\Endeca\MDEXEngine\reference\sample_wine_data\ crawl_profile_1_url_list.xml </url-list-file> <output-prefix>wine</output-prefix> </crawler>

Endeca Confidential

Component reference

178

LogServer
The LogServer component controls the use of the Endeca Log Server.

Attributes
Every Application Controller component contains the following attributes: Attribute
component-id

Description
Required. The name of this instance of the component. Required. The alias of the host upon which the component is running. An optional list of properties, consisting of a required name and an optional value. For more information, see Adding properties to hosts and components on page 154.

host-id

properties

Sub-elements
The LogServer component contains the following sub-elements: Sub-element
port output-prefix

Description
Required. Port on which to run the LogServer. Required. Path and prefix name for the LogServer output. For example, output_prefix = c:\temp\wine generates files that start with wine in c:\temp. Required. Controls the archiving of log files. Possible values are true and false.

gzip

Administrators Guide Chapter 11: Provisioning an Implementation with the Endeca Application Controller

Endeca Confidential

179

Sub-element
working-dir

Description
Working directory for the process that is launched. If it is specified, it must be an absolute path. If any of the other properties of this component contain relative paths, they are interpreted as relative to the working directory. If working-dir is not specified, it defaults to $ENDECA_CONF/work/<appName>/ <componentName> on UNIX, or %ENDECA_CONF%\work\<appName>/ <componentName> on Windows.

startup-timeout

Specifies the amount of time in seconds that the eaccmd waits while starting the LogServer. If it cannot determine that the LogServer is running in this timeframe, it times out. The default is 60.

log-file

The path to the LogServer log file. If the log-file is not specified, the default is component working directory plus component name plus .log.

Example
The following example provisions a LogServer component based on the sample wine data.
<logserver component-id="wine_logserver" host-id="wine_indexer"> <port>8002</port> <working-dir>C:\Endeca\MDEXEngine\reference\sample_wine_data</working-dir> <output-prefix>.\logs\logserver_output\wine</output-prefix> <gzip>false</gzip> <startup-timeout>120</startup-timeout> <log-file>.\logs\wine_logserver.log</log-file> </logserver>

ReportGenerator
The ReportGenerator component runs the Report Generator, which processes Log Server files into HTML-based reports that you can view in your Web browser and XML reports that you can view in Web Studio.

Endeca Confidential

Component reference

180

Attributes
Every Application Controller component contains the following attributes: Attribute
component-id

Description
Required. The name of this instance of the component. Required. The alias of the host upon which the component is running. An optional list of properties, consisting of a required name and an optional value. For more information, see Adding properties to hosts and components on page 154.

host-id

properties

Sub-elements
The ReportGenerator component contains the following sub-elements: Sub-element
working-dir

Description
Working directory for the process that is launched. If it is specified, it must be an absolute path. If any of the other properties of this component contain relative paths, they are interpreted as relative to the working directory. If working-dir is not specified, it defaults to $ENDECA_CONF/work/<appName>/ <componentName> on UNIX, or %ENDECA_CONF%\work\<appName>/ <componentName> on Windows.

input-dir-or-file

Required. Path to the file or directory containing the logs to report on. If it is a directory, then all log files in that directory are read. If it is a file, then just that file is read.

Administrators Guide Chapter 11: Provisioning an Implementation with the Endeca Application Controller

Endeca Confidential

181

Sub-element
output-file

Description
Required. Name the generated report file and path to where it is stored. For example: C:\Endeca\reports\myreport.html on Windows /endeca/reports/myreport.html on UNIX

stylesheet-file

Required. Filename and path of the XSL stylesheet used to format the generated report. For example: %ENDECA_CONF%\etc\ report_stylesheet.xsl on Windows $ENDECA_CONF/etc/report_stylesheet.xsl on UNIX

settings-file

Path to the report_settings.xml file. For example: %ENDECA_CONF%\etc\ report_settings.xml on Windows $ENDECA_CONF/etc/report_settings.xml on UNIX

timerange

Sets the time span of interest (or report window). Allowed keywords:

Yesterday LastWeek LastMonth DaySoFar WeekSoFar MonthSoFar


These keywords assume that days end at midnight, and weeks end on the midnight between Saturday and Sunday. start-date <date> stop-date <date> These set the report window to the given date and time. The date format should be either yyyy_mm_dd or yyyy_mm_dd.hh_mm_ss. For example, 2007_01_25.19_30_57 expresses Jan 25, 2007 at 7:30:57 in the evening.

Endeca Confidential

Component reference

182

Sub-element
time-series

Description
Turns on the generation of time-series data and specifies the frequency, Hourly or Daily. Turns on the generation of report charts. Disabled by default. The path to the ReportGenerator log file. If the log-file is not specified, the default is component working directory plus component name plus .log. Should indicate a JDK 1.5.x or later. Defaults to the JDK that Endeca installs. Command-line options for the java_binary setting. This command is primarily used to adjust the ReportGenerator memory, which defaults to 1GB. To set the memory, use the following (ignore the linebreak): java_options = -Xmx[MemoryInMb]m -Xms[MemoryInMb]m

charts

log-file

java_binary

java_options

args

Command-line flags to pass to the ReportGenerator, expressed as a set of arg sub-elements.

Example
The following example provisions a ReportGenerator component based on the sample wine data.

Administrators Guide Chapter 11: Provisioning an Implementation with the Endeca Application Controller

Endeca Confidential

183

<reportgenerator component-id="wine_gen_html_report" host-id="wine_indexer"> <working-dir>C:\Endeca\MDEXEngine\reference\sample_wine_data</working-dir> <input-dir-or-file>.\logs\logserver_output</input-dir-or-file> <output-file>.\reports\daily\daily_report.html</output-file> <stylesheet-file>.\etc\report_stylesheet.xsl</stylesheet-file> <settings-file>.\etc\report_settings.xml</settings-file> <timerange>day-so-far</timerange> <charts>true</charts> <log-file>.\logs\wine_gen_html_report.log</log-file> </reportgenerator>

Provisioning your implementation with eaccmd


You can use the eaccmd command-line interface to create an implementation based on the provisioning file you created in the steps above. The full set of eaccmd commands is described in Using the Eaccmd Tool on page 191.

To provision your implementation:


1 2 Create a provisioning document as described above. Run eaccmd with the --define-app command, specifying the provisioning document you created in step 1. For example:
eaccmd localhost:8888 define-app --app myApp --def app.xml

Provisioning the Application Controller to work on multiple machines


Typically, you provision the Application Controller to work in a distributed environment. You do this by defining the implementation appropriately and then starting the components on the provisioned delegate machines.

Multiple machine example


The example below illustrates how provisioning and running the Application Controller work in multi-machine environments.

Endeca Confidential

Provisioning your implementation with eaccmd

184

In this scenario, there are three machines: devhost, which serves as the EAC Central Server, and dev555 and dev777, which serve as Agent machines running Forge and Dgraph respectively. The Application Controller is installed identically on each machine. Eaccmd is run on devhost (aliased host_1), using HTTP port 8888. Eaccmd issues commands to the EAC Central Server, which in turn passes them on to Agent machines dev555 (aliased data_proc) and dev777 (aliased dgraph_1) via HTTP. The EAC Central Server machine, devhost, handles all direct communication with the user, while the Agent machines execute application tasks. Note: You can run eaccmd on any machine, as long as it is pointed at the EAC Central Server.

Agent 1 dev555 data_proc eaccmd HTTP connection EAC Central Server devhost host_1 Agent 2 dev777 dgraph_1 Dgraph HTTP connections Forge Dgidx

Communication with Web Server

Application Execution

The following steps walk you through multi-machine provisioning and execution using the Application Controller. 1 First, write a provisioning document for the EAC Central Server in which you define all of the components and their corresponding host

Administrators Guide Chapter 11: Provisioning an Implementation with the Endeca Application Controller

Endeca Confidential

185

machines. Save this document as app.xml. (For complete syntax, see page 152.) 2 Run eaccmd on the host_1 machine, using the app.xml provisioning document as follows:
eaccmd devhost:8888 define-app --app myApp --def app.xml

To start the component Forge on machine data_proc, issue this eaccmd command on host_1:
eaccmd devhost:8888 start --app myApp --comp forge

To start the component Dgidx on machine data_proc, issue this eaccmd command on host_1:
eaccmd devhost:8888 start --app myApp --comp dgidx

To start the component Dgraph on machine dgraph_1, we issue this eaccmd command on host_1:
eaccmd devhost:8888 start --app myApp --comp dgraph

Forcing the removal of an application


You remove an application in eaccmd with the remove-app command. If you want to remove an application that is throwing an error (for example, because it contains a host or component that has become unreachable), or one with running utilities or components, you must add the --force flag. The syntax is as follows:
remove-app --force --app app_id

In a WSDL tool, this behavior is controlled by the forceRemove property on the RemoveApplicationType class. For details, see page 261.

Incremental provisioning
With incremental provisioning, it is possible to add, remove, or modify one or more hosts, components, or scripts without having to bring down the entire implementation.

Endeca Confidential

Forcing the removal of an application

186

You can perform incremental provisioning in eaccmd or your custom Web service tool. We use eaccmd in the examples below. Note: For the WSDL API, see Endeca Application Controller API Interface Reference on page 217.

Incremental provisioning guidelines


The following guidelines apply to incremental provisioning:

Scripts can be changed at any time, as long as they are not running. Properties on either hosts or components can be changed at any time. Properties are described on page 154. Anything other than a property on a component cannot be changed, nor can a component be removed, if the component is either running or unreachable. Anything other than a property or a directory on a host cannot be changed, nor can a host be removed, if any components or utilities on it are running, or if the host is unreachable.

You can attempt to override the constraints mentioned above by using the --force flag, which is described on page 187.

About the def_file setting


The def_file is the provisioning document used to add a component or host to the implementation. You can use a larger provisioning file for this purpose, or you can use one that specifies exactly one component or host. If you choose to use a larger provisioning file, then you must specify which component or host listed within it that you are adding. For example, say you want to add a host called new_host to your application. You could add provisioning information for new_host to your existing provisioning file, myApp.xml. When you run the add-host command, you would give it the host name as well as the provisioning file name. In the case of scripts, you have two options: you can use a def_file, as you do with components and hosts, or you can provide the necessary

Administrators Guide Chapter 11: Provisioning an Implementation with the Endeca Application Controller

Endeca Confidential

187

information individually, through the --cmd (command), --wd (working directory), and --log-file settings.

About the --force flag


The --force flag indicates whether or not the Application Controller should attempt to force any running components, utilities, or scripts to stop before attempting an update or a remove operation. In the case of updates, the update persists in the application provisioning, regardless of whether or not the forced stop was successful, even if this leaves a dangling process somewhere. For example:

In the case of a component, the following command:


update-component --force --app myApp --name forge

would first stop the component forge, if it is running, before updating it.

In the case of a host, the following command:


remove-host --force --app myApp --name dev777

would first stop any running components or services on host dev777 before removing that host.

In the case of a script, the following command:


update-script --force --app myApp --script newbaseline.pl --cmd perl

would first stop the script newbaseline.pl before updating it.

Adding, removing, or updating a component


You can add a component, remove a component, or modify an existing component using the eaccmd operations described below.

To add a component in eaccmd, use the following syntax:


add-component --app app_id [--comp comp_id] --def def_file

For example:

Endeca Confidential

Incremental provisioning

188

add-component --app myApp --comp new_forge --def myApp.xml

To remove a component in eaccmd, use the following syntax:


remove-component [--force] --app app_name --comp comp_id

For example:
remove-component --force --app myApp --comp forge

To change the attributes of a previously-defined component in eaccmd, use the following syntax:
update-component [--force] --app app_id [--comp comp_id] --def def_file

For example:
update-component --force --app myApp --def newDgraphProps.xml

Adding, removing, or updating a host


You can add a host, remove a host, or modify an existing host using the eaccmd operations described below.

To add a host in eaccmd, use the following syntax:


add-host --app app_id [--host host_id] --def def_file

For example:
add-host --app myApp --host mktg022 --def myApp.xml

To remove a host in eaccmd, use the following syntax:


remove-host [--force] --app app_id --host host_id

For example:
remove-host --force --app myApp --host dev777

To change the attributes of a previously-defined host in eaccmd, use the following syntax:
update-host [--force] --app app_id [--host host_id] --def def_file

For example:

Administrators Guide Chapter 11: Provisioning an Implementation with the Endeca Application Controller

Endeca Confidential

189

update-host --force --app myApp --host mktg022 --def newMktgHostProps.xml

Adding, removing, or updating a script


You can add a script, remove a script, or modify an existing script using the eaccmd operations described below.

To add a script in eaccmd, use the following syntax:


add-script --app app_id --script script_id [--cmd command --wd working_dir --log-file log_file] | [--def def_file]

For example:
add-script --app myApp --script newbaseline.pl --cmd perl

To remove a script in eaccmd, use the following syntax:


remove-script [--force] --app app_id --script script_id

For example:
remove-script --app myApp --script testbaseline.pl

To modify an existing script in eaccmd, use the following syntax:


update-script [--force] --app app_id --script script_id [--cmd command --wd working_dir --log-file log_file] | [--def def_file]

For example:
update-script --app myApp --script newbaseline.pl --def myApp.xml

Provisioning your deployment with Endeca Deployment Template


The Endeca Deployment Template is a collection of operational components that provides a starting point for development and application deployment. Representing the best practices of Endecas Customer Solutions organization, the template includes the complete directory structure required for deployment, including EAC scripts, configuration files, and batch files or shell scripts that wrap common script functionality.

Endeca Confidential

Provisioning your deployment with Endeca Deployment Template

190

This template includes functionality required for a Dgraph deployment powered by the EAC and the Java EAC Development Toolkit, including support for baseline and partial index updates and Web Studio integration.

Downloading the Endeca Deployment Template


The Endeca Deployment Template is available as a free download from the Endeca support site, https://support.endeca.com/, in the Tools & Utilities section. Included in the downloadable .zip file are several documents detailing the installation and use of the Endeca Deployment Templates and Toolkit.

Using the Endeca Deployment Template


The Endeca Deployment Template should be installed immediately following the installation of the Endeca IAP on all servers that will be hosting IAP components, and before any provisioning has been done through Web Studio. If Web Studio has been used to make any changes to the IAP configuration prior to installing the Endeca Deployment Template, they will be overwritten and lost.

Administrators Guide Chapter 11: Provisioning an Implementation with the Endeca Application Controller

Endeca Confidential

Chapter 12

Using the Eaccmd Tool


This chapter describes the eaccmd command-line tool, which can be used to provision and run the Endeca Application Controller. This chapter contains the following sections:

About eaccmd Eaccmd usage Eaccmd command reference

192

About eaccmd
When you manage your Endeca implementation with the Endeca Application Controller, you control and monitor its working through the EAC Central Server. You can communicate with the EAC Central Server in two ways:

With the eaccmd command-line tool, as described in this chapter. Through direct programmatic control with a language that understands Web services. The Application Controllers WSDL API is described in Endeca Application Controller API Interface Reference on page 217.

Running eaccmd
The eaccmd tool is installed by default in
C:\Endeca\MDEXEngine\<version>\bin on Windows. On UNIX, it is $ENDECA_ROOT/bin. You run eaccmd within a scripting environment such

as Bash or Perl. You can run eaccmd on any machine as long as it is pointing at the EAC Central Server. The eaccmd syntax, which is platform-independent, is described starting on page 194.

Eaccmd feedback
Eaccmd gives no feedback in cases of success (that is, if a component is running or completed or a service is completed). If an operation fails, a FAILED message is printed to the screen. If instead you want eaccmd to run asynchronously, you must use the --async flag on the command line after the command, as follows:
eaccmd host:port <cmd> [--async]

Administrators Guide Chapter 12: Using the Eaccmd Tool

Endeca Confidential

193

Component and utility status verbosity


By default, eaccmd provides single-word component and utility status messages, such as Running. To receive more detailed feedback, you can run eaccmd with the --verbose flag. This flag provides useful information beyond simply the state.

Server component status verbosity


The following is an example of a verbose status message for a server component. Server components include the Dgraph, Agraph, and LogServer.
State: NotRunning Start time: 5/25/07 3:58 PM Failure Message:

Batch component status verbosity


The following is an example of a verbose status message for a batch component. Batch components include Forge, Dgidx, Agidx, ReportGenerator, and Crawler.
State: NotRunning Start time: 5/25/07 3:58 PM Duration: 0 days 0 hours 0 minutes 6.96 seconds Failure Message:

Using a default host and port


In the eaccmd.properties file, which is located in the $ENDECA_CONF/conf directory on UNIX and %ENDECA_CONF%\conf on Windows, you can specify a host and port for eaccmd to use. (The default values are host=localhost and port=8888.) With this file in place, you do not have to specify the host and port on the command line. If your EAC Central Server is not on localhost:8888, you must either edit the file to point to the correct host and port or continue to specify host:port on the command line. Any host:port specified on the command line overrides the settings in the eaccmd.properties file.

Endeca Confidential

About eaccmd

194

Eaccmd usage
The eaccmd usage is as follows:
eaccmd host:eac_port <cmd> [--async] [-verbose]

where settings in square brackets ([ ]) are optional and <cmd> is one of:
[Provisioning commands:] define-app [--app app_id] [--def def_file] describe-app --app app_id [--canonical] remove-app [--force] --app app_id list-apps [Incremental Provisioning commands:] add-component --app app_id [--comp comp_id] --def def_file add-host --app app_id [--host host_id] --def def_file add-script --app app_id --script script_id (--def def_file | [--wd working_dir] [--log-file log_file] --cmd command [args...]) remove-component [--force] --app app_id --comp comp_id remove-host [--force] --app app_id --host host_id remove-script --app app_id --script script_id update-component [--force] --app app_id [--comp comp_id] --def def_file update-host [--force] --app app_id [--host host_id] --def def_file update-script [--force] --app app_id --script script_id (--def def_file | [--wd working_dir] [--log-file log_file] --cmd command [args...]) [Synchronization commands:] set-flag --app app_id --flag flag remove-flag --app app_id --flag flag remove-all-flags --app app_id list-flags --app app_id [Component and Script Control commands:] start --app app_id [--comp comp_id | --script script_id] stop --app app_id [--comp comp_id | --script script_id] status --app app_id [--comp comp_id | --script script_id] [Utility commands:] ls --app app_id --host host_id --pattern file_pattern start-util --type shell --app app_id [--token token] --host host_id [--wd working_dir] --cmd command [args...] start-util --type copy --app app_id [--token token] [--recursive] --from host_id --to host_id --src src_path --dest dest_path start-util --type backup --app app_id [--token token] --host host_id --dir ls [--method <copy|move>] [--backups num_backups] start-util --type rollback --app app_id [--token token] --host host_id --dir ls stop-util --app app_id --token token status-util --app app_id --token token

Administrators Guide Chapter 12: Using the Eaccmd Tool

Endeca Confidential

195

Eaccmd command reference


The eaccmd tool contains commands for provisioning, resource configuration, and component use.

Provisioning commands
The provisioning commands make it possible for you to define and manage your applications from the command line. Command
define-app [--app app_id] [--def def_file]

Description
Defines an application. Def_file takes an XML provisioning file, a sample of which, sample_wine_definition.xml, is located in the %ENDECA_REFERENCE_DIR%\ sample_wine_data\etc directory on Windows, or the $ENDECA_ REFERENCE_DIR\sample_wine_data\etc directory on UNIX. The provisioning file typically contains an application ID. If eaccmd specifies a different app_id for the same application, the eaccmd version overrides the one in provided in the provisioning file.

describe-app --app app_id [--canonical]

Describes an application. Returns an XML file in the format used by the def_file setting of define-app. If --canonical is specified, all paths are canonicalized, as described on page 157. Removes the named application. The optional --force flag indicates whether or not this remove operation should force any running components or services to stop before attempting the remove. Remove fails if any components or services are still running (that is, not forced to stop). Lists all defined applications.

remove-app [--force] --app app_id

list-apps

Endeca Confidential

Eaccmd command reference

196

Provisioning example
The following example defines an application called my_wine. (In this and all examples that follow we assume that the host and port are set in the eaccmd.properties file and so do not need to be included on the command line.)
eaccmd define-app --app my_wine --def sample_wine_definition.xml

Incremental provisioning commands


The incremental provisioning commands make it possible for you to add, remove, or update a host, component, or script without having to bring down the entire application. Command
add-component --app app_id [--comp comp_id] --def def_file

Description
Adds a single component to an application. Def_file is a provisioning document. You can use a larger provisioning file for this purpose, or you can use one that specifies exactly one component or host. If you choose to use a larger provisioning file, then you must specify which component listed within it that you are adding, using the --comp flag. Adds a single host to an application. Def_file is a provisioning document. You can use a larger provisioning file for this purpose, or you can use one that specifies exactly one component or host. If you choose to use a larger provisioning file, then you must specify which host listed within it that you are adding, using the --host flag.

add-host --app app_id [--host host_id] --def def_file

Administrators Guide Chapter 12: Using the Eaccmd Tool

Endeca Confidential

197

Command
add-script --app app_id --script script_id (--def def_file | [--wd working_dir] [--log-file log_file] --cmd command [args...])

Description
Adds a script to an application. Scripts can be added at any time. You can use --def to specify a definition file to start the script, or use the following settings: --log-file is the file for appended stdout/stderr output. If it is not specified, it defaults to $ENDECA_CONF/logs/script/ (app_id).(script_id).log --wd is the working directory. If it is not specified, it defaults to $ENDECA_CONF/working/(app_id)/ --cmd is the command that is used to start the script. If --cmd is omitted, the first unrecognized argument is taken as the start of your command. Note: The --log-file and --wd, if used, should come before --cmd.

remove-component [--force] --app app_id --comp comp_id

Removes a single component from an application. The optional --force flag indicates whether or not this remove operation should force any running components or services to stop before attempting the remove. Remove fails if any components or services are still running (that is, not forced to stop). Removes a single host from an application. The optional --force flag indicates whether or not this remove operation should force any running components or services to stop before attempting the remove. Remove fails if any components or services are still running (that is, not forced to stop). Removes a script from an application. The optional --force flag indicates whether or not this remove operation should force a running script to stop before attempting the remove.

remove-host [--force] --app app_id --host host_id

remove-script [--force] --app app_id --script script_id

Endeca Confidential

Eaccmd command reference

198

Command
update-component [--force] --app app_id [--comp comp_id] --def def_file

Description
Updates a component. Component properties can be updated at any time. Other changes cannot be made if the component is running or unreachable. The optional --force flag indicates that the Application Controller will attempt to force the conditions under which the specified updates can be made (by stopping stop a running component or utility invocation, for example). Regardless of whether or not the forced stop is successful, however, the update persists in the application provisioning, even if this leaves a dangling process somewhere. Updates a host. Host properties can be updated at any time. Other changes cannot be made if any components or services are running on the host, or if the host is unreachable. The optional --force flag indicates that the Application Controller will attempt to force the conditions under which the specified updates can be made (by stopping stop a running component or utility invocation, for example). Regardless of whether or not the forced stop is successful, however, the update persists in the application provisioning, even if this leaves a dangling process somewhere.

update-host [--force] --app app_id [--host host_id] --def def_file

Administrators Guide Chapter 12: Using the Eaccmd Tool

Endeca Confidential

199

Command
update-script [--force] --app app_id --script script_id (--def def_file | [--wd working_dir] [--log-file log_file] --cmd command [args...])

Description
Updates a script. The optional --force flag indicates whether or not this update operation should force a running script to stop before attempting the update. You can use --def to specify a definition file to update the script, or use the following settings: --wd is the working directory. If it is not specified, it defaults to $ENDECA_CONF/working/(app_id)/ --log-file is the file for appended stdout/stderr output. If it is not specified, it defaults to $ENDECA_CONF/logs/script/ (app_id).(script_id).log --cmd is the command that is used to start the script. If --cmd is omitted, the first unrecognized argument is taken as the start of your command. Note: The --log-file and --wd, if used, should come before --cmd.

Incremental provisioning example


The following example adds a Forge component to the my_wine application. Because this provisioning file contains only a single component, it is not necessary to use the --comp flag.
eaccmd add-component --app my_wine --def update_forge.xml

Note: For more information about incremental provisioning, see page 185.

Endeca Confidential

Eaccmd command reference

200

Synchronization commands
Synchronization commands are used by the Synchronization service (described below) to manage application-level flags that let users know when processes are in use. Command
set-flag --app app_id --flag flag

Description
Sets a flag that demonstrates that a group of processes are in use. You specify the flag with the application name and a flag name, which may be arbitrary but should be well-known. Removes the named flag and releases the reserved processes.

remove-flag --app app_id --flag flag remove-all-flags --app app_id list-flags --app app_id

Removes all flags in an application and releases all reserved processes. Lists all flags in an application.

About the Synchronization service


The Synchronization service lets you create, query, and delete application-level flags on a series of processes. These flags indicate that the flagged processes are in use. The service creates flags on the fly at the users request and deletes them when they are released. Using this service, multiple users can synchronize their activities by obtaining and querying the flags. If two users attempt to flag the same processes at the same time an error occurs.

Flag naming and management


Synchronization service flags are identified by an application name/flag name pair. Because flag names are user-created and arbitrary, all users must be aware of flag names and consistent in their use. If a set of processes needs to be reserved, then everyone concerned needs to know the name of the flag.

Administrators Guide Chapter 12: Using the Eaccmd Tool

Endeca Confidential

201

Synchronization examples
The following example adds a flag called mkt1010 to the my_wine application:
eaccmd set-flag --app my_wine --flag mkt1010

The following example removes all flags in the my_wine application:


eaccmd remove-all-flags --app my_wine

Component and script control commands


The component and script control commands are used to start and stop components or scripts and retrieve their status. Command
start --app app_id [--comp comp_id | --script script_id] stop --app app_id [--comp comp_id | --script script_id] status --app app_id [--comp comp_id | --script script_id]

Description
Starts a component or a script.

Stops a component or a script.

Gets the status of a component (one of Starting, Running, NotRunning, or Failed) or a script (one of Running, NotRunning, or Failed). Note: For information on changing the verbosity of the status message with the --verbose flag, see page 192.

Component control example


The following example starts a Dgraph named wine_dgraph in the my_wine application.
eaccmd start --app my_wine --comp wine_dgraph

Endeca Confidential

Eaccmd command reference

202

Utility commands
The utility commands allow you to run and monitor Application Controller utilities through the eaccmd tool.

General notes on Application Controller utilities


Keep in mind the following general points about Application Controller utilities.

Utility naming
Be sure to name your utilities carefully. If you create a new utility that has the same name as a running utility, an error is issued. However, if there is an existing utility with the same name that is not running, the new utility overwrites it.

System cleanup of utility output


Each instance of the Shell and Copy utilities stores status information and output logs. The Application Controller clears this information for non-running utilities instances every seven days (that is, 10,080 minutes) to save system resources. This setting can be modified in the eac.properties file.

Administrators Guide Chapter 12: Using the Eaccmd Tool

Endeca Confidential

203

The List Directory Contents (ls) command


The List Directory Contents command lets you see the contents of directories on remote machines. Its behavior is similar to that of ls on UNIX, although some non-ls restrictions, noted below, apply. Command
ls --app app_id --host host_id --pattern file_pattern

Description
Returns a list of files matching the pattern input in file_pattern. Note the following:

A file_pattern must start with an absolute


path, such as C:\ or /.

A file_pattern can contain . or .. as


directory names, and expands * and ? wildcards.

A file_pattern cannot contain the wildcard


expressions .*, .?, or ..* as directory or file names.

Bracketed wildcards, such as file[123].txt,


are not supported.

Wildcards cannot be applied to drive names.


Note: You cannot use .. to create paths that do not exist. For example, the path /temp/../../a.txt refers to a path that is above the root directory. This is an invalid path that causes the operation to fail.

Wildcard behavior
The List Directory Contents command expands the wildcards in a pattern. If the expansion results in a file, it returns a file. If the expansion results in a directory, it returns the directory non-recursively. Wildcard expansion can result in any combination of files and directories. For example, assume that the following directories and files exist:
/home/endeca/reference/... /home/endeca/install.log /home/e.txt

The following command:

Endeca Confidential

Eaccmd command reference

204

eaccmd ls --app my_wine --host my_host --pattern /home/e\\*

would list all of these files and directories, because they match the file_pattern.

Delimiting wildcard arguments


To prevent inappropriate expansion, any wildcard arguments you use with the List Directory Contents utility in eaccmd need to be delimited with double quotation marks. For example:

On Windows, "C:\*.txt". On UNIX, "/home/endeca/test/*.txt".

The Shell utility


The Shell utility allows you to run arbitrary commands in a host system shell.

Shell utility commands


Command
start-util --type shell --app app_id [--token token] --host host_id [--wd working_dir] --cmd command [args...]

Description
Starts a Shell utility with the specified command string. The token is a string. If you do not specify a token, one is generated and returned when you start the utility. The token is used to stop the utility or to get its status. --wd, which is optional, sets the working directory for the process that gets launched. If specified, it must be an absolute path. If wd is not specified, the setting defaults to %ENDECA_CONF%\working\ <appName>\shell on Windows or $ENDECA_CONF/working/ <appName>/shell on UNIX. The --cmd arguments are passed in a single string. If --cmd is omitted, the first unrecognized argument is taken as the start of your command.

Administrators Guide Chapter 12: Using the Eaccmd Tool

Endeca Confidential

205

Command
stop-util --app app_id --token token

Description
Stops a Shell utility. The token is a string, either user-created or generated and returned when you start the utility, that eaccmd prints to screen. The token can be used to stop the utility or to get its status. Gets the status of a Shell utility. The token is a string, either user-created or generated and returned when you start the utility, that eaccmd prints to screen. The token can be used to stop the utility or to get its status.

status-util --app app_id --token token

Shell utility examples


The first example deletes the Dgidx output after it has been copied in a separate action over to the Dgraph:
eaccmd start-util --type shell --app my_wine --host mkt1010 --cmd rm <dgidx-output-dir>/*.*

The second example performs a recursive directory copy:


eaccmd start-util --type shell --app myapp --host hosttorunon --cmd cpr /mysourcedir /mydestdir

Troubleshooting the Shell utility


In many cases, particularly cross-platform scenarios, the Shell command must be wrapped in double quotation marks. The error message returned, which occurs at the console level, is usually something similar to the following:
The system cannot find the path specified.

The Copy utility


The Copy utility uses an internal Web services interface to copy files or directories, either locally or between machines. It supports wildcards (* and ?) and recursive copying. In some cases, the destination directory must already exist; in others, the utility automatically creates both the destination directory and any empty directories in the transfer.

Endeca Confidential

Eaccmd command reference

206

Directories are copied first to a temporary directory on the destination machine before being copied one file at a time to the target location. You can configure the location of this temporary directory in the eac.properties file, using the optional setting com.endeca.eac.filetransfer.fileTransferTempDir as follows:

If this setting is defined as an absolute path, the Copy utility uses it. If it is defined as a relative path, the Copy utility considers it to be relative to %ENDECA_CONF%/state/ If it is not defined, the Copy utility uses the directory
%ENDECA_CONF%/state/file_transfer/

If the Copy utility tries to copy a file to a location where another file already exists, the utility overwrites the preexisting file.

Enabling SSL
The Copy utility supports both SSL and non-SSL communication, with SSL being off by default. For details on enabling SSL, see the Endeca Security Guide for Java.

Destination directories
In most cases, the destination directory where the copied files are placed has to exist already. However, there are a few exceptions where the destination directory does not have to exist prior to the copy:

Copying just one file to the location of an existing file. Copying just one file to a new file name in an existing directory. Copying just one directory to a new directory name in an existing parent directory.

Failure and recovery


The following situations result in a failure of the Copy utility:

The Copy utility tries to write to a directory it doesnt have permissions to. There is not enough disk space. There is no file at the source location.

Administrators Guide Chapter 12: Using the Eaccmd Tool

Endeca Confidential

207

The wildcard expression matches no files. When there are mismatches between directories and files. For example:

The Copy utility tries to copy a file to path where a directory with that name already exists. The Copy utility tries to create a directory in the destination and a file with that name already exists.

You cannot use .. to create paths that do not exist. For example, the path /temp/../../a.txt refers to a path that is above the root directory. This is an invalid path that causes the utility to fail. Asking for a copy that results in multiple files being written to the same location. For example, given the following directory structure on the source:
/trunk/src/a.txt /testbranch/src/a.txt

a copy from /t*/src/* to /temp would result in the Copy utility trying to write both a.txt files to the same location in the temp directory. There is no recovery for copies. Therefore, if the transfer of a large file fails, the entire file must be transferred again. Likewise, if a multi-file transfer fails before completion, you must either re-run the entire transfer or request only those parts that did not transfer.

Explicit machine naming


Keep in mind that when you are using the Copy utility, you are potentially working with three machines: the EAC Central Server, from which you issue eaccmd commands, the Agent machine you are copying data from, and the one you are copying data to. In such cases, the name localhost can be confusing. Unless you are using the Copy utility to move files on a single machine, you should use explicit machine names rather than simply localhost.

Delimiting wildcard arguments


To prevent inappropriate expansion, any wildcard arguments you use with the Copy utility in eaccmd need to be delimited with double quotation marks. For example:

On Windows, "C:\*.txt".

Endeca Confidential

Eaccmd command reference

208

On UNIX, "/home/endeca/test/*.txt".

Copying across platforms


If you are copying files or directories between machines on different platforms, you have to wrap any Window paths on a Linux or Solaris shell in double quotation marks (for example, "C:\*.txt").

Copy utility commands


Command
start-util --type copy --app app_id [--token token] [--recursive] --from host_id --to host_id --src file_pattern --dest dest_path

Description
As part of the Copy utility, starts a copy. You identify the hostname, port, and path for both the source and destination directories. If the copy is local, you do not need to specify the host_id. Note: Keep in mind that you are not necessarily copying to the machine you are running eaccmd on. The hosts you are copying to and from are those you specified in your provisioning file. --token is a string used to stop the utility or get its status. If you do not specify a token, one is generated and returned when you start the utility. If --recursive is specified, it indicates that the Copy utility recursively copies any directories that match the wildcard. If --recursive is not specified, the Copy utility does not copy directories, even if they match the wildcard. Instead, it creates intermediate directories required to place the copied files at the destination path. --src is a string representing the file, wildcard, or directory to be copied. A --src must start with an absolute path, such as C:\ or /. A --src can contain . or .. as directory names, and expands * and ? wildcards. continued

Administrators Guide Chapter 12: Using the Eaccmd Tool

Endeca Confidential

209

Command
start-util --type copy --app app_id [--token token] [--recursive] --from host_id --to host_id --src file_pattern --dest dest_path

Description
Continued from the previous page Note the following:

You cannot use the wildcard expressions .*,


.?, or ..* as directory or file names.

Bracket wildcards, such as file[123].txt, are


not supported.

Wildcards cannot be applied to drive names.


--dest is the full path to the destination file or directory. --dest must be an absolute path, and no wildcards are allowed. If --dest is a directory, that directory must exist, unless the following conditions are met:

The parent of the destination already exists. You are copying only one thing.
stop-util --app app_id --token token Stops a Copy utility. The token is a string, either user-created or generated and returned when you start the utility, that eaccmd prints to screen. The token can be used to stop the utility or to get its status. Gets the status of a Copy utility. The token is a string, either user-created or generated and returned when you start the utility, that eaccmd prints to screen. The token can be used to stop the utility or to get its status.

status-util --app app_id --token token

Copy utility examples


This section illustrates several different Copy actions. For simplicity, the copying is done on a single machine.

Endeca Confidential

Eaccmd command reference

210

First, assume the following directory structure exists on the source:


/ endeca1/ work/ dgraphlogs/ a.log forgelogs/ b.log endeca2/ work/ dgraphlogs/ c.log forgelogs/ d.log e.log destination/

To copy one file to a new name:


The following command copies one file to a new name.
eaccmd start-util --type copy --app myApp --src "/endeca1/work/dgraphlogs/a.log" --dest "/destination/out.log"

The resulting directory change would look like this:


destination/ out.log

To copy one file into an existing directory:


The following command copies one file into an existing directory.
eaccmd start-util --type copy --app myApp --src "/endeca1/work/dgraphlogs/a.log" --dest "/destination"

The resulting directory change would look like this:


destination/ a.log

Administrators Guide Chapter 12: Using the Eaccmd Tool

Endeca Confidential

211

To copy one directory to a new name recursively:


The following command recursively copies a directory to a new name:
eaccmd start-util --type copy --app myApp --src "/endeca1/work/dgraphlogs" --dest "/destination/outlogs" --recursive

The resulting directory change would look like this:


destination/ outlogs/ a.log

To copy one directory into an existing directory recursively:


The following command recursively copies a directory into an existing directory.
eaccmd start-util --type copy --app myApp --src "/endeca1/work/dgraphlogs" --dest "/destination" --recursive

The resulting directory change would look like this:


destination/ dgraphlogs/ a.log

To copy all files in a directory:


The following command copies all files in a directory.
eaccmd start-util --type copy --app myApp --src "/endeca2/work/forgelogs/*" --dest "/destination"

The resulting directory change would look like this:


destination/ d.log e.log

To perform a copy with multiple wildcards:


The following copy command demonstrates the use of multiple wildcards.
eaccmd start-util --type copy --app myApp --src "/e*/work/*logs/*.log" --dest "/destination"

Endeca Confidential

Eaccmd command reference

212

The resulting directory change would look like this:


destination/ a.log b.log c.log d.log e.log

To perform a recursive copy with wildcards:


The following copy demonstrates a recursive copy with wildcards.
eaccmd start-util --type copy --app myApp --src "/e*/work" --dest "/destination" --recursive

The resulting directory change would look like this:


destination/ work/ dgraphlogs/ a.log c.log forgelogs/ b.log d.log e.log

The Archive utility


The Archive utility allows you to archive and roll back directories. Using the Archive utility, you can save off and back up a set of component outputs, which later can be rolled back on demand. With the backup operation, you create back up copies of directories distinguished by time stamps. Backup is discussed on page 213. With the rollback operation, you replace the current version of a directory with the most recently backed-up version. The current version is then renamed with an .unwanted suffix. Rollback is discussed on page 214. IMPORTANT: Do not start a backup or rollback operation while another such operation is in progress on the same directory. Unexpected behavior may occur if you do so.

Administrators Guide Chapter 12: Using the Eaccmd Tool

Endeca Confidential

213

Backup operations
Backup operations create an archive directory from an existing directory. The archive directory has the same name as the original directory, but with a timestamp appended to the end. The timestamp reflects the time when the backup operation was performed. For example, if the original directory is called logs and was backed up on October 11, 2006 at 8:00 AM, the backup operation creates a directory called logs.2006_10_11.08_00_00.

Backup operation commands


Command
start-util --type backup --app app_id [--token token] --host host-id --dir dir [--method] <copy|move> [--backups num_backups]

Description
Starts the backup operation. The token is a string. If you do not specify a token, one is generated and returned when you start the utility. The token is used to stop the utility or to get its status. The host and dir settings specify the path to the directory that will be archived. The method is either copy or move (the default). The optional backups setting specifies the maximum number of archives to store. This number does not include the original directory itself, so if backups is set to 3, you would have the original directory plus up to three archive directories, for a total of as many as four directories. The default num_backups is 5.

stop-util --app app_id --token token

Stops a backup operation. The token is a string, either user-created or system-generated when you start the utility. The token can be used to stop the utility or to get its status. Gets the status of a backup operation. The token is a string, either user-created or system-generated when you start the utility. The token can be used to stop the utility or to get its status.

status-util --app app_id --token token

Endeca Confidential

Eaccmd command reference

214

Backup operation example


In the following example, an archive version of the logs directory is created.
eaccmd start-util --type backup --app my_wine --host mkt1010 --dir c:\my_wine\data\logs --backups 2

Rollback operations
Rollback operations roll back the directory to the most recent backed up version. For example, say you have a directory called logs, one called logs.2006_10_11.08_00_00, and other, older versions. When you roll back, the following things happen:

logs is renamed logs.unwanted. logs.2006_10_11.08_00_00 is renamed logs.

The older versions are left alone.

Note: There can only be a single .unwanted directory at a time. If you roll back twice, the .unwanted directory from the first rollback is deleted.

Administrators Guide Chapter 12: Using the Eaccmd Tool

Endeca Confidential

215

Rollback operation commands


Command
start-util --type rollback --app app_id [--token token] --host host_id --dir dir

Description
Starts the rollback operation. The token is a string. If you do not specify a token, one is generated and returned when you start the utility. The token is used to stop the utility or to get its status. The host and dir settings specify the path to the directory that will be rolled back. Stops a rollback operation. The token is a string, either user-created or generated and returned when you start the utility, that eaccmd prints to screen. The token can be used to stop the utility or to get its status. Gets the status of a rollback operation. The token is a string, either user-created or generated and returned when you start the utility, that eaccmd prints to screen. The token can be used to stop the utility or to get its status.

stop-util --app app_id --token token

status-util --app app_id --token token

Rollback operation example


In the following example, the archived logs directory is rolled back.
eaccmd start-util --type rollback --app my_wine --host mkt1010 --dir c:\my_wine\data\logs

Endeca Confidential

Eaccmd command reference

216

Administrators Guide Chapter 12: Using the Eaccmd Tool

Endeca Confidential

Chapter 11

Endeca Application Controller API Interface Reference


This chapter describes the Endeca Application Controller API interfaces. Details on classes can be found in the following chapter, beginning on page 241. The interfaces are documented here. However, the exact syntax of a class member depends on the output of the WSDL tool that you are using. Be sure to check the client stub classes that are generated by your WSDL tool for the exact syntax of the Application Controller API class members.

218

Using the Application Controller WSDL


You can use the Endeca Application Controller WSDL API to write your application in the language of your choice (such as Java, C#, or Perl). Using the Web services tool of your choice (such as Axis for Java), do the following: 1 2 Run the WSDL through your tool to generate the stubs (that is, an API that your code can call). Write your application, using that code to control the Application Controller.

Notes:
The Application Controller schema is defined in eac.wsdl, which is located in the $ENDECA_ROOT/lib/services directory on UNIX (C:\Endeca\MDEXEngine\<version>\lib\services on Windows). You generate clent stubs (or proxies) using the eac.wsdl file located in the file system provided by the Endeca installation. You cannot generate client stubs using the SOAP Web services addresses associated with each service within the WSDL file.

Simple types in the Application Controller WSDL


The Application Controller WSDL defines several data types that can be treated as simple data types:

IDType, TokenType, BackupMethodType, TimeRangeType, and TimeSeriesType can be treated as Strings PortNumber can be treated as an Integer TimeOut can be treated as a Long

Administrators Guide Chapter 11: Endeca Application Controller API Interface Reference

Endeca Confidential

219

ComponentControl interface
The ComponentControl interface provides component management capabilities.

ComponentControl methods startComponent(FullyQualifiedComponentIDType startComponentInput)


Starts the named component.

FullyQualifiedComponentIDType parameters:
applicationID identifies the application to use. componentID identifies the component to use.

Throws:
EACFault is the error message returned by the Application Controller

when the method fails.

stopComponent(FullyQualifiedComponentIDType stopComponentInput)
Stops the named component.

FullyQualifiedComponentIDType parameters:
applicationID identifies the application to use. componentID identifies the component to use.

Throws:
EACFault is the error message returned by the Application Controller

when the method fails.

Endeca Confidential

ComponentControl interface

220

getComponentStatus(FullyQualifiedComponentIDType getComponentStatusInput)
Returns the status of a component.

FullyQualifiedComponentIDType parameters:
applicationID identifies the application to use. componentID identifies the component to use.

Throws:
EACFault is the error message returned by the Application Controller

when the method fails.

Returns:
A BatchStatusType object (for batch components; see page 247) or a StatusType object (for server components; see page 269).

Synchronization interface
The Synchronization interface manages application-level flags that let users know when processes are in use. For example, your code could create a flag named update-running to ensure that a new baseline update does not start while another update is already in progress.

Administrators Guide Chapter 11: Endeca Application Controller API Interface Reference

Endeca Confidential

221

Typical usage is as follows:


if (setFlag(MY_FLAG_ID) == true) [perform action, such as a baseline update] removeFlag(MY_FLAG_ID) else [signal error such as "an update is already in progress"]

Synchronization methods setFlag(FullyQualifiedFlagIDType setFlagInput)


Creates a new flag, identified by flagID, that is associated with the named application.

FullyQualifiedFlagIDType parameters:
applicationID identifies the application to use. flagID is a unique string identifier for this flag.

Throws:
EACFault is the error message returned by the Application Controller

when the method fails.

Returns:
A Boolean, false if the flag was already set, or true if it was not set meaning the method succeeded).

removeFlag(FullyQualifiedFlagIDType removeFlagInput)
Removes the named flag.

FullyQualifiedFlagIDType parameters:
applicationID identifies the application to use. flagID is a unique string identifier for this flag.

Endeca Confidential

Synchronization interface

222

Throws:
EACFault is the error message returned by the Application Controller

when the method fails.

removeAllFlags(IDType removeAllFlagsInput)
Removes all flags in an application.

IDType parameter:
applicationID identifies the application to use.

Throws:
EACFault is the error message returned by the Application Controller

when the method fails.

listFlags(IDType listFlagsInput)
Lists the collection of flags in an application.

IDType parameter:
applicationID identifies the application to use.

Throws:
EACFault is the error message returned by the Application Controller

when the method fails.

Returns:
flagIDList, a string collection of flagIDs.

Administrators Guide Chapter 11: Endeca Application Controller API Interface Reference

Endeca Confidential

223

Utility interface
The Utility interface allows you to manage the Application Controller utilities (Shell, Copy, and Archive) programmatically. Note: Be sure to name your utilities carefully. If you create a new utility that has the same name as a running utility, an error is issued. However, if there is an existing utility with the same name that is not running, the new utility overwrites it.

Utility methods startBackup(RunBackupType startBackupInput)


Starts the backup operation of the Archive utility. Backup operations create an archive directory from an existing directory. The archive directory has the same name as the original directory, but with a timestamp appended to the end. The timestamp reflects the time when the backup operation was performed. For example, if the original directory is called logs and was backed up on October 11, 2006 at 8:00 AM, the backup operation creates a directory called logs.2006_10_11.08_00_00. IMPORTANT: Do not start a backup or rollback operation while another such operation is in progress on the same directory. Unexpected behavior may occur if you do so.

RunBackupType parameters:
applicationID identifies the application to use. token identifies the token used to stop the utility or to get its status. If

you do not specify a token, one is generated and returned when you start the utility.

hostID is a unique identifier for the host. The hostID and dirName

parameters specify the path to the directory that will be archived.


dirName is the full path of the directory. The hostID and dirName

parameters specify the path to the directory that will be archived.

Endeca Confidential

Utility interface

224

backupMethod is either copy or move. numBackups specifies the maximum number of archives to store. This number does not include the original directory itself, so if numBackups is set to 3, you would have the original directory plus up to three

archive directories, for a total of as many as four directories. The default numBackups is 5.

Throws:
EACFault is the error message returned by the Application Controller

when the method fails.

Returns:
The string token assigned to this invocation.

startFileCopy(RunFileCopyType startFileCopyInput)
Launches the Copy utility, which copies files either on a single machine or between machines.

RunFileCopyType parameters:
applicationID identifies the application to use. token identifies the token used to stop the utility or to get its status. If

you do not specify a token, one is generated and returned when you start the utility.

fromHostID is a unique identifier for the host from which you are

copying.
toHostID is a unique identifier for the host to which you are copying. sourcePath is a string representing the file, wildcard, or directory to be copied. A sourcePath must start with an absolute path, such as C:\ or /. A sourcePath can contain . or .. as directory names, and expands * and ? wildcards.

Note the following:

You cannot use the wildcard expressions .*, .?, or ..* as directory or file names.

Administrators Guide Chapter 11: Endeca Application Controller API Interface Reference

Endeca Confidential

225

Bracket wildcards, such as file[123].txt, are not supported. Wildcards cannot be applied to drive names.

destinationPath is the full path to the destination file or directory. destinationPath must be an absolute path, and no wildcards are

allowed. Note: The destination directory must exist, unless the parent of the destination already exists and you are copying only one thing.

recursive, when true, indicates that the Copy utility recursively copies

any directories that match the wildcard. If recursive is false, the Copy utility does not copy directories, even if they match the wildcard. Instead, it creates intermediate directories required to place the copied files at the destination path.

Throws:
EACFault is the error message returned by the Application Controller

when the method fails.

Returns:
The string token assigned to this invocation.

startRollback(RunRollbackType startRollbackInput)
Rollback operations roll back the directory to the most recent backed up version. For example, say you have a directory called logs, one called logs.2007_1_11.08_00_00, and other, older versions. When you roll back, the following things happen:

logs is renamed logs.unwanted. logs.2007_1_11.08_00_00 is renamed logs.

The older versions are left alone.

Note: There can only be a single .unwanted directory at a time. If you roll back twice, the .unwanted directory from the first rollback is deleted. IMPORTANT: Do not start a backup or rollback operation while another such operation is in progress on the same directory. Unexpected behavior may occur if you do so.

Endeca Confidential

Utility interface

226

RunRollbackType parameters:
applicationID identifies the application to use. token identifies the token used to stop the utility or to get its status. If

you do not specify a token, one is generated and returned when you start the utility.

hostID is a unique identifier for the host. The hostID and dirName

parameters specify the path to the directory that will be archived.


dirName is the full path of the directory. The hostID and dirName

parameters specify the path to the directory that will be archived.

Throws:
EACFault is the error message returned by the Application Controller

when the method fails.

Returns:
The string token assigned to this invocation.

startShell(RunShellType startShellInput)
The startShell() method launches the Shell utility, which allows you to run arbitrary commands in a host system shell.

RunShellType parameters:
applicationID identifies the application to use. token identifies the token used to stop the utility or to get its status. If

you do not specify a token, one is generated and returned when you start the utility.

hostID is a unique identifier for the host. cmd is the command line to execute. workingDir is the full path to the working directory.

Administrators Guide Chapter 11: Endeca Application Controller API Interface Reference

Endeca Confidential

227

Throws:
EACFault is the error message returned by the Application Controller

when the method fails.

Returns:
The string token assigned to this invocation.

stop(FullyQualifiedUtilityTokenType)
Takes a token returned by any of the start methods, and stops that invocation by terminating the process that is running it.

FullyQualifiedUtilityTokenType parameters:
applicationID identifies the application to use. token identifies the token used to stop the utility.

Throws:
EACFault is the error message returned by the Application Controller

when the method fails.

getStatus(String applicationID, String token)


Takes a token returned by any of the Utility start methods (startBackup(), startFileCopy(), startRollback(), or startShell()), and returns the current status of that utility.

Parameters:
applicationID identifies the application to use. token identifies the token used to get the utilitys status.

Throws:
EACFault is the error message returned by the Application Controller

when the method fails.

Endeca Confidential

Utility interface

228

Returns:
A BatchStatusType object (see page 247).

listDirectoryContents(ListDirectoryContentsInputType listDirectoryContentsInput)
Performs a list operation similar to UNIX ls on a remote host, with the following restrictions on the input file pattern:

A filePattern must start with an absolute path, such as C:\ or /. A filePattern can contain . or .. as directory names, and expands * and ? wildcards. A filePattern cannot contain the wildcard expressions .*, .?, or ..* as directory or file names. Bracketed wildcards, such as file[123].txt, are not supported. Wildcards cannot be applied to drive names. You cannot use .. to create paths that do not exist. For example, the path /temp/../../a.txt refers to a path that is above the root directory. This is an invalid path that causes the operation to fail.

ListDirectoryContentsInputType parameters:
applicationID (required) identifies the application to use. hostID (required) is a unique identifier for the host. filePattern (required) is the name of the directory, file, or wildcard combination of directory and file whose contents are to be listed.

Throws:
EACFault is the error message returned by the Application Controller

when the method fails. Failure conditions correspond to bad input cases.

Returns:
A FilePathListType object representing the contents of the requested directory.

Administrators Guide Chapter 11: Endeca Application Controller API Interface Reference

Endeca Confidential

229

Provisioning interface
The Provisioning interface allows you to define and manage your Endeca applications programmatically.

Provisioning methods defineApplication(ApplicationType application)


Defines an application.

ApplicationType parameters:
applicationID identifies the application to use. hosts is a collection of HostType objects (see page 258), representing

the hosts to define.


components is a collection of ComponentType objects (such as ForgeComponentType, DgraphComponentType, and so on) representing the components to define. The ComponentType class is

described on page 248.

scripts is a collection of ScriptType objects. The ScriptType class is

described on page 267.

Throws:
EACFault is the error message returned by the Application Controller

when the method fails.


ProvisioningFault is a list of provisioning errors and a list of

provisioning warnings thrown when there are fatal errors during provisioning.

Returns:
A ProvisioningWarningListType object, containing minor warnings about non-fatal provisioning problems.

Endeca Confidential

Provisioning interface

230

getApplication(IDType getApplicationInput)
Gets an application, which is composed of hosts, components, and scripts and identified by an application ID.

IDType parameter:
applicationID identifies the application to use.

Throws:
EACFault is the error message returned by the Application Controller

when the method fails.

Returns:
An ApplicationType object, as described on page 246.

getCanonicalApplication(IDType getCanonicalApplicationInput)
The getCanonicalApplication() method returns the provisioning just as getApplication() does, but with all paths canonicalized. This process ensures that all paths are absolute, and that the working directory and log path settings are provided. It also prevents .. from being used in a path name.

IDType parameter:
applicationID identifies the application to use.

Throws:
EACFault is the error message returned by the Application Controller

when the method fails.

Returns:
An ApplicationType object, as described on page 246.

Administrators Guide Chapter 11: Endeca Application Controller API Interface Reference

Endeca Confidential

231

listApplicationIDs(listApplicationIDsInput)
Lists the applications that are defined.

Returns:
An ApplicationIDListType object, as described on page 245.

Throws:
EACFault is the error message returned by the Application Controller

when the method fails.

removeApplication(RemoveApplicationType removeApplicationInput)
Removes the named application.

RemoveApplicationType parameter:
applicationID identifies the application to use.

Throws:
EACFault is the error message returned by the Application Controller

when the method fails.


ProvisioningFault is a list of provisioning errors and a list of

provisioning warnings thrown when there are fatal errors during provisioning.

Returns:
A ProvisioningWarningListType object, containing minor warnings about non-fatal provisioning problems.

addComponent(AddComponentType addComponentInput)
Adds a single component to an application.

Endeca Confidential

Provisioning interface

232

AddComponentType parameters:
applicationID identifies the application to use. component is one of the following:

Forge (see ForgeComponentType class on page 255) Dgidx (see DgidxComponentType class on page 250) Dgraph (see DgraphComponentType class on page 251) Agidx (see AgidxComponentType class on page 243) Agraph (see AgraphComponentType class on page 244) Crawler (see CrawlerComponentType class on page 248) LogServer (see LogServerComponentType class on page 259) ReportGenerator (see ReportGeneratorComponentType class on page 262)

Throws:
EACFault is the error message returned by the Application Controller

when the method fails.


ProvisioningFault is a list of provisioning errors and a list of

provisioning warnings thrown when there are fatal errors during provisioning.

Returns:
A ProvisioningWarningListType object, containing minor warnings about non-fatal provisioning problems.

removeComponent(RemoveComponentType removeComponentInput)
Removes a single component from an application.

RemoveComponentType parameters:
applicationID identifies the application to use. componentID identifies the component to use.

Administrators Guide Chapter 11: Endeca Application Controller API Interface Reference

Endeca Confidential

233

forceRemove indicates whether or not a remove operation should force

the component to stop before attempting the remove. If the component is running, and forceRemove is not set to true, then the remove call will fail.

Throws:
EACFault is the error message returned by the Application Controller

when the method fails.


ProvisioningFault is a list of provisioning errors and a list of

provisioning warnings thrown when there are fatal errors during provisioning.

Returns:
A ProvisioningWarningListType object, containing minor warnings about non-fatal provisioning problems.

updateComponent(UpdateComponentType updateComponentInput)
Updates a running component.

UpdateComponentType parameters:
applicationID identifies the application to use. component is one of the following:

Forge (see ForgeComponentType class on page 255) Dgidx (see DgidxComponentType class on page 250) Dgraph (see DgraphComponentType class on page 251) Agidx (see AgidxComponentType class on page 243) Agraph (see AgraphComponentType class on page 244) Crawler (see CrawlerComponentType class on page 248) LogServer (see LogServerComponentType class on page 259)

Endeca Confidential

Provisioning interface

234

ReportGenerator (see ReportGeneratorComponentType class on page 262)

forceUpdate indicates that the Application Controller will attempt to

force the conditions under which the update can take place, by stopping running components.

Throws:
EACFault is the error message returned by the Application Controller

when the method fails.


ProvisioningFault is a list of provisioning errors and a list of

provisioning warnings thrown when there are fatal errors during provisioning.

Returns:
A ProvisioningWarningListType object, containing minor warnings about non-fatal provisioning problems.

addHost(AddHostType addHostInput)
Adds a host to an application.

AddHostType parameters:
applicationID identifies the application to use. host is a HostType object (see page 258) specifying the host to add. directories allows you to specify directories using a full path and a

name. These directories are associated with hosts and created when the host is provisioned.

Throws:
EACFault is the error message returned by the Application Controller

when the method fails.


ProvisioningFault is a list of provisioning errors and a list of

provisioning warnings thrown when there are fatal errors during provisioning.

Administrators Guide Chapter 11: Endeca Application Controller API Interface Reference

Endeca Confidential

235

Returns:
A ProvisioningWarningListType object, containing minor warnings about non-fatal provisioning problems.

updateScript(UpdateScriptType updateScriptInput)
Updates a running script.

UpdateScriptType parameters:
applicationID identifies the application to use. script is a ScriptType object specifying the script to be updated. forceUpdate is a Boolean that indicates whether the Application

Controller should force a running script to stop before attempting the update.

Throws:
EACFault is the error message returned by the Application Controller

when the method fails.


ProvisioningFault is a list of provisioning errors and a list of

provisioning warnings thrown when there are fatal errors during provisioning.

Returns:
A ProvisioningWarningListType object, containing minor warnings about non-fatal provisioning problems.

removeHost(RemoveHostType removeHostInput)
Removes a single host from an application.

RemoveHostType parameters:
applicationID identifies the application to use. hostID is a unique string identifier for this host.

Endeca Confidential

Provisioning interface

236

forceRemove indicates whether or not the Application Controller

should force any running components or services to stop before attempting the remove. If a component or service is running, and forceRemove is not set to true, then the remove call will fail.

Throws:
EACFault is the error message returned by the Application Controller

when the method fails.


ProvisioningFault is a list of provisioning errors and a list of

provisioning warnings thrown when there are fatal errors during provisioning.

Returns:
A ProvisioningWarningListType object, containing minor warnings about non-fatal provisioning problems.

updateHost(UpdateHostType updateHostInput)
Updates a running host.

UpdateHostType parameters:
applicationID identifies the application to use. host is a HostType object (see page 258) specifying the host to add. directories allows you to specify directories using a full path and a

name. These directories are associated with hosts and created when the host is provisioned.

forceUpdate indicates that the Application Controller will attempt to

force the conditions under which the update can take place, by stopping running components or services.

Throws:
EACFault is the error message returned by the Application Controller

when the method fails.

Administrators Guide Chapter 11: Endeca Application Controller API Interface Reference

Endeca Confidential

237

ProvisioningFault is a list of provisioning errors and a list of

provisioning warnings thrown when there are fatal errors during provisioning.

Returns:
A ProvisioningWarningListType object, containing minor warnings about non-fatal provisioning problems.

addScript(AddScriptType addScriptInput)
Adds a script to an application.

AddScriptType parameters:
applicationID identifies the application to use. script is a ScriptType object (see page 267) specifying the script to

add.

Throws:
EACFault is the error message returned by the Application Controller

when the method fails.


ProvisioningFault is a list of provisioning errors and a list of

provisioning warnings thrown when there are fatal errors during provisioning.

Returns:
A ProvisioningWarningListType object, containing minor warnings about non-fatal provisioning problems.

removeScript(RemoveScriptType removeScriptInput)
Removes a script from an application.

RemoveScriptType parameters:
applicationID identifies the application to use. scriptID is a unique string identifier for this host.

Endeca Confidential

Provisioning interface

238

forceRemove indicates that the Application Controller will attempt to

force the conditions under which the remove can take place.

Throws:
EACFault is the error message returned by the Application Controller

when the method fails.


ProvisioningFault is a list of provisioning errors and a list of

provisioning warnings thrown when there are fatal errors during provisioning.

Returns:
A ProvisioningWarningListType object, containing minor warnings about non-fatal provisioning problems.

ScriptControl interface
The ScriptControl interface provides programmatic script management capabilities.

ScriptControl methods startScript(FullyQualifiedScriptIDType startScriptInput)


Starts the named script.

FullyQualifiedScriptIDType parameters:
applicationID identifies the application to use. scriptID identifies the script to use.

Throws:
EACFault is the error message returned by the Application Controller

when the method fails.

Administrators Guide Chapter 11: Endeca Application Controller API Interface Reference

Endeca Confidential

239

stopScript(FullyQualifiedScriptIDType stopScriptInput)
Stops the named script.

FullyQualifiedScriptIDType parameters:
applicationID identifies the application to use. scriptID identifies the script to use.

Throws:
EACFault is the error message returned by the Application Controller

when the method fails.

getScriptStatus(FullyQualifiedScriptIDType getScriptStatusInput)
Returns the status of a script.

FullyQualifiedScriptIDType parameters:
applicationID identifies the application to use. scriptID identifies the script to use.

Throws:
EACFault is the error message returned by the Application Controller

when the method fails.

Returns:
A ScriptStatus object (a sub-class of the StatusType class described on page 269). This status may be Running, NotRunning, or Failed. (Failure results from a failure error code or internal EAC errors).

Endeca Confidential

ScriptControl interface

240

Administrators Guide Chapter 11: Endeca Application Controller API Interface Reference

Endeca Confidential

Chapter 12

Endeca Application Controller API Class Reference


This chapter describes the Endeca Application Controller API classes. Details on the related interfaces can be found in the preceding chapter, beginning on page 217. The classes and their properties are documented here. However, the exact syntax of a class member depends on the output of the WSDL tool that you are using.

Typically, a Java WSDL tool translates these classes into get and set methods. For example, the ApplicationIDType class would generate getApplicationID() and setApplicationID(String[] applicationID) methods. The Microsoft .NET WSDL tool translates these classes into .NET properties.

Be sure to check the client stub classes that are generated by your WSDL tool for the exact syntax of the Application Controller API class members.

242

AddComponentType class
A class that describes a component to be added to a named application during incremental provisioning.

AddComponentType properties

applicationID (required) identifies the application to use. component (required) is one of the following:

Forge (see ForgeComponentType class on page 255) Dgidx (see DgidxComponentType class on page 250) Dgraph (see DgraphComponentType class on page 251) Agidx (see AgidxComponentType class on page 243) Agraph (see AgraphComponentType class on page 244) Crawler (see CrawlerComponentType class on page 248) LogServer (see LogServerComponentType class on page 259) ReportGenerator (see ReportGeneratorComponentType class on page 262)

AddHostType class
A class that describes a host to be added to a named application during incremental provisioning.

AddHostType properties

applicationID (required) identifies the application to use. host (required) is a description of the host to add. directories allows you to specify directories using a full path and a

name.

Administrators Guide Chapter 12: Endeca Application Controller API Class Reference

Endeca Confidential

243

AddScriptType class
A class that describes a script to be added to a named application during incremental provisioning.

AddScriptType properties

applicationID (required) identifies the application to use. script (required) is a description of the script to add.

AgidxComponentType class
A class that describes an Agidx component within an application. An Agidx component runs Agidx on a machine, creating a set of Agidx indices that support the Agraph program in a distributed environment. The Agidx component is used only in distributed environments and is run sequentially on multiple machines. On the first machine, the Agidx component takes the Dgidx output from that machine as its input. On the next machine, the output from the first Agidx run is copied over, using the Copy service. It, along with the Dgidx output from that machine, is used as Agidx input.

AgdixComponentType properties

componentID (required) identifies the component to use. hostID (required) is a unique string identifier for this host. workingDir is a string identifying the working directory for this component. Any relative paths in component properties are be interpreted as relative to the component's workingDir. The workingDir property, if specified, must be an absolute path. logFile is a string identifying the log file for this component.

Endeca Confidential

AddScriptType class

244

args is a list of command-line flags to pass to Agidx. previousAgidxOutputPrefix is the file prefix of the Agidx data from the previous run, which has been copied to this machine by a Copy operation. This parameter should not be used when running the Agidx component on the first data subset. inputPrefixList (required) is the paths to the output of various Dgidxes, which Agidx uses as input. outputPrefix (required) is the path and prefix name for the Agidx

output.

AgraphChildListType class
A class used by the AgraphComponentType class to establish the list of child Dgraphs and related devices used by a resulting Agraph. Each Agraph component can contain a mixture of DgraphReferenceType and DgraphHostPortType objects. A DgraphReferenceType object refers to a child Dgraph, while a DgraphHostPortType object is typically used to refer to an unprovisioned device such as a load balancer. If you know you are referring only to actual Dgraphs, and not to load balancers or other unprovisioned devices, you do not need to use DgraphHostPortType objects.

AgraphChildListType properties

child (required) is a collection of child Dgraphs and related devices comprising this AgraphChildListType object.

AgraphComponentType class
A class that describes an Agraph component within an application. An Agraph component runs the Agraph program, which defines and coordinates the activities of multiple, distributed Dgraphs.

Administrators Guide Chapter 12: Endeca Application Controller API Class Reference

Endeca Confidential

245

AgraphComponentType properties

componentID (required) identifies the component to use. hostID (required) is a unique string identifier for this host. workingDir is a string identifying the working directory for this

component. Any relative paths in component properties are be interpreted as relative to the components workingDir. The workingDir property, if specified, must be an absolute path.

logFile is a string identifying the log file for this component. args is a list of command-line flags to pass to Agidx. port (required) is the port at which the Agraph should listen. appConfigPrefix is the path and file prefix that define the input for the

Agraph.
reqLogFile is the path to and name of the request log. children is a list of the child Dgraphs and related devices for this

Agraph.
inputPrefix (required) is the path and prefix name for the Agidx output that the Agraph uses as an input. startupTimeout specifies the amount of time in seconds that the

Application Controller will wait while starting the Agraph.


sslConfiguration sets SSL usage for this Agraph.

ApplicationIDListType class
A class that describes a returned value of a list application call to the Provisioning service. ApplicationIDListType encapsulates the list of applications running on this EAC Central Server.

ApplicationIDListType properties

applicationID identifies the application to use.

Endeca Confidential

ApplicationIDListType class

246

ApplicationType class
A class that describes an application to be deployed by the Application Controller. An application is composed of a set of components residing on a set of hosts. You can construct an ApplicationType object as a full specification of the application, including all hosts and components. Alternatively, you can start with an empty an ApplicationType object and incrementally fill in the hosts, components, and scripts. In the latter case, order matters, because a host must be added before you add a component that lives on that host.

ApplicationType properties

applicationID identifies the application to use. hosts is a list of hosts. components is a list of components. scripts is a list of scripts.

BackupMethodType class
In relation to the Archive utility, this class serves as an identifier for the type of backup you want the utility to perform, Copy or Move.

BackupMethodType fields
The enumeration of possible values is as follows:

Copy Move

Administrators Guide Chapter 12: Endeca Application Controller API Class Reference

Endeca Confidential

247

BatchStatusType class
Based on the StatusType class (see page 269), a BatchStatusType object describes the status of a batch component. Batch components include Forge, Dgidx, Agidx, ReportGenerator, and Crawler.

BatchStatusType properties

StateType (required) An enumeration of the following fields:

Starting

Note: Starting only applies to server components (Dgraph, Agraph, or LogServer).

Running NotRunning Failed

startTime (required) The time the batch component started; for

example, 9/25/06 3:58 PM.


failureMessage The failure message, which tells you that a failure has occurred in the execution of the component. failureMessage is empty unless state is FAILED. (This is different from EACFault, which tells you that a problem has occurred while processing the Web Service request to get the status.) duration (required) The length of time the batch component has been running; for example, 0 days 0 hours 0 minutes 6.96 seconds.

ComponentListType class
A class that describes a list of components, such as ForgeComponentType and DgraphComponentType.

Endeca Confidential

BatchStatusType class

248

ComponentListType properties

component (required) A collection of components comprising this ComponentListType object.

ComponentType class
A class that describes the base type for all components within an application.

ComponentType properties
Note: Each component contains these properties, as well as some others.

componentID (required) identifies the component to use. hostID (required) is a unique string identifier for this host. workingDir is a string identifying the working directory for this

component.
logFile is a string identifying the log file for this component. properties is a string identifying any properties associated with this

component.

CrawlerComponentType class
A class that describes a Crawler component within an application. A Crawler component runs the Endeca Advanced Crawler, which creates Endeca records based on crawled source documents. For more information about the Advanced Crawler, see the Endeca Information Transformation Layer Guide.

Administrators Guide Chapter 12: Endeca Application Controller API Class Reference

Endeca Confidential

249

CrawlerComponentType properties

componentID (required) identifies the component to use. hostID (required) is a unique string identifier for this host. workingDir is a string identifying the working directory for this

component. Any relative paths in component properties are be interpreted as relative to the component's workingDir. The workingDir property, if specified, must be an absolute path.

logFile is a string identifying the log file for this component. args are command-line arguments to pass to this component. javaOptions are the Java Virtual Machine settings. If you have

extended the Crawler using Java code, you may need to modify these settings, which are passed to the Java process.

classpath lists class path add-ons. If you have extended the Crawler

using Java code, the modifications may require additions to the class path.

defaultSettingsFile (required) is the path to the default settings file for this Crawler component. globalConfigFile (required) is the path to the global configuration file

for this Crawler component.


profileConfigFile (required) is the Path to the profile configuration file to use for this crawler run. urlListFile (required) is the path to the file that contains the list of URLs

to crawl.
outputPrefix (required) is the path and prefix name for the data the Crawler component stores. port is the port at which the Crawler should listen for status request messages.

Endeca Confidential

CrawlerComponentType class

250

DgidxComponentType class
A class that describes a Dgidx component within an application. A Dgidx component sends the finished data prepared by Forge to the Dgidx program, which generates the proprietary indices for each Dgraph.

DgidxComponentType properties

componentID (required) identifies the component to use. hostID (required) is a unique string identifier for this host. workingDir is a string identifying the working directory for this component. Any relative paths in component properties are be interpreted as relative to the component's workingDir. The workingDir property, if specified, must be an absolute path. logFile is a string identifying the log file for this component. args is a list of command-line flags to pass to Dgidx. appConfigPrefix is the path and file prefix that define the input for

Dgidx.
inputPrefix (required) is the path and prefix name for the Forge output

that Dgidx indexes.


outputPrefix (required) is the path and prefix name for the Dgidx

output.
runAspell prepares the Aspell files for the Dgraph. The default is true. It causes the Dgidx component to run dgwordlist and to copy the Aspell files to its output directory, where the Dgraph component can access them. tempDir is the path to the temporary directory that Dgidx uses.

Administrators Guide Chapter 12: Endeca Application Controller API Class Reference

Endeca Confidential

251

DgraphComponentType class
A class that describes a Dgraph component within an application. A Dgraph element launches the Dgraph (MDEX Engine) software, which processes queries against the indexed Endeca records.

DgraphComponentType properties

componentID (required) identifies the component to use. hostID (required) is a unique string identifier for this host. workingDir is a string identifying the working directory for this component. Any relative paths in component properties are be interpreted as relative to the component's workingDir. The workingDir property, if specified, must be an absolute path. logFile is a string identifying the log file for this component. args is a list of command-line flags to pass to the Dgraph. port (required) is the port the Dgraph listens at. The default is 8000. appConfigPrefix is the path and file prefix that define the input for

Dgraph.
inputPrefix (required) is the path and prefix name for the Dgidx output

that the Dgraph uses as an input.


reqLogFile is the path to and name of the request log. spellDir, if specified, is the directory in which the Dgraph will look for Aspell files. If it is not specified, the Dgraph will look for Aspell files in the Dgraphs input directory (that is, inputPrefix without the prefix). For example, if inputPrefix is /dir/prefix and all the Dgraph input files are /dir/prefix.*, the Dgraph will look for the Aspell files in /dir/). startupTimeout specifies the amount of time in seconds that the

Application Controller will wait while starting the Dgraph.


sslConfiguration sets SSL usage for this Dgraph.

Endeca Confidential

DgraphComponentType class

252

updateDir is the directory from which Dgraph reads partial update files.

For more information, see the Implementing Partial Updates section in the Endeca Information Transformation Layer Guide.

updateLogFile specifies the file for update-related log messages. tempDir is the path to the temporary directory that the Dgraph uses.

DgraphHostPortType class
A class used by the AgraphChildListType class to represent a (non-Dgraph) related device used by a parent Agraph. Each Agraph component can contain a mixture of DgraphReferenceType and DgraphHostPortType objects. A DgraphReferenceType object refers to a child Dgraph that is provisioned with the Application Controller, while a DgraphHostPortType object is typically used to refer to an unprovisioned device such as a load balancer. If you know you are referring only to actual Dgraphs, and not to load balancers or other unprovisioned devices, you do not need to use DgraphHostPortType objects.

DgraphHostPortType properties

hostname (required) is the name of the host. port (required) is the communications port.

DgraphReferenceType class
A class used by the AgraphComponentType class to represent a child Dgraph. Each Agraph component can refer to a mixture of DgraphReferenceType and DgraphHostPortType objects.

Administrators Guide Chapter 12: Endeca Application Controller API Class Reference

Endeca Confidential

253

DgraphReferenceType properties

componentID (required) is the unique identifier of a Dgraph that exists

within the same Application Controller application.

DirectoryListType class
A class that represents a collection of DirectoryType objects.

DirectoryListType property

directory (required) is a collection of DirectoryType objects.

DirectoryType class
A class used by the HostType class to define a directory while provisioning a host.

DirectoryType properties

dirID (required) is a unique identifier for this directory. dir (required) is a full path for this directory.

EACFault class
The class that creates the EACFault. EACFault is the error message returned by the Application Controller when the method fails.

Endeca Confidential

DirectoryListType class

254

EAC Fault property

error is the error message.

FilePathListType
An array of FilePathTypes that describes a returned value of a listDirectoryContents call. FilePathListType operates on the application level.

FilePathListType property

filePaths (required) describe a file on a remote host.

FilePathType
A class that describes a file on a remote host.

FilePathType properties

path (required) is the full path to the file. directory (required) indicates whether the path is a directory.

FlagIDListType class
A class that describes a returned value of a list flags call. FlagIDListType operates on the application level.

Administrators Guide Chapter 12: Endeca Application Controller API Class Reference

Endeca Confidential

255

FlagIDListType property

flagID is a unique string identifier for this flag.

ForgeComponentType class
A class that describes a Forge component within an application. A Forge element launches the Forge (Data Foundry) software, which transforms source data into tagged Endeca records.

ForgeComponentType properties

componentID (required) identifies the component to use. hostID (required) is a unique string identifier for this host. workingDir is a string identifying the working directory for this component. Any relative paths in component properties are be interpreted as relative to the component's workingDir. The workingDir property, if specified, must be an absolute path. logFile is a string identifying the log file for this component. args is a list of command-line flags to pass to Forge. stateDir is the directory where the state file is located. inputDir is the path to the Forge input. outputDir is the directory where the output from the Forge process will

be stored.
outputPrefixName is the prefix, without any associated path

information, that Forge uses to save its output files. These files are located in the directory specified by outputDir.

numPartitions is the number of partitions. pipelineFile (required) is the name of the Pipeline.epx file to pass to Forge. tempDir is the temporary directory that Forge uses.

Endeca Confidential

ForgeComponentType class

256

webServicePort is the port used by the Forge metrics Web service,

which provides progress and performance metrics for Forge. For details, see page 327.

FullyQualifiedComponentIDType class
A class that serves as an input to the start, stop, get status, and remove component commands.

FullyQualifiedComponentIDType properties

applicationID (required) identifies the application to use. componentID (required) identifies the component to use.

FullyQualifiedFlagIDType class
In relation to the Synchronization service, this class serves as an input to an acquire or release flag method.

FullyQualifiedFlagIDType properties

applicationID (required) identifies the application to use. flagID (required) is a unique string identifier for this flag.

FullyQualifiedHostIDType class
A class that identifies a host so that it can be used as an input to another command, such as remove host.

Administrators Guide Chapter 12: Endeca Application Controller API Class Reference

Endeca Confidential

257

FullyQualifiedHostIDType properties

applicationID (required) identifies the application to use. hostID (required) is a unique string identifier for this host.

FullyQualifiedScriptIDType class
A class that identifies a script so that it can be used as an input to another command, such as startScript().

FullyQualifiedScriptIDType properties

applicationID (required) identifies the application to use. scriptID (required) is a unique string identifier for this script.

FullyQualifiedUtilityTokenType class
In relation to the Utility service, this object represents the token.

FullyQualifiedUtilityTokenType properties

applicationID (required) identifies the application to use. token (required) identifies the token used to stop the utility or to get its status. If you do not specify a token, one is generated and returned when you start the utility.

HostListType class
A class that represents a collection of HostType objects.

Endeca Confidential

FullyQualifiedScriptIDType class

258

HostListType property

host (required) is a unique identifier comprising a hostname, port, and hostID.

HostType class
A class that describes a host within an application. Along with components, a collection of HostType objects define an application.

HostType properties

hostname (required) is the name of the host. port (required) is the connection port. hostID is a unique string identifier for this host. directories allows you to specify directories using a full path and a

name.

ListApplicationIDsInput class
An empty object you pass into the Web services interface to get back a list of applications.

ListDirectoryContentsInputType class
An object that serves as an input to the listDirectoryContents object.

Administrators Guide Chapter 12: Endeca Application Controller API Class Reference

Endeca Confidential

259

ListDirectoryContentsInputType properties

applicationID (required) identifies the application to use to look up the

host.
hostID (required) is a unique identifier for the host within that application. filePattern (required) is the pattern that listDirectoryContents()

expands the wildcards in a pattern. If the expansion results in a file, it returns a file. If the expansion results in a directory, it returns the directory non-recursively. Wildcard expansion can result in any combination of files and directories.

LogServerComponentType class
A class that describes a LogServerComponent within an application. The LogServer component controls the use of the Endeca Log Server.

LogServerComponentType properties

componentID (required) identifies the component to use. hostID (required) is a unique string identifier for this host. workingDir is a string identifying the working directory for this component. Any relative paths in component properties are be interpreted as relative to the components workingDir. The workingDir property, if specified, must be an absolute path. logFile is a string identifying the log file for this component. port (required) is the port on which to run the LogServer. outputPrefix (required) is the path and prefix name for the LogServer

output.
gzip (required) controls the archiving of log files. Possible values are true and false.

Endeca Confidential

LogServerComponentType class

260

startupTimeout (required) specifies the amount of time in seconds

that the Application Controller will wait while starting the LogServer.

PropertyListType class
A class that represents a collection of PropertyType objects.

PropertyListType property

properties is a collection of name/value pairs.

PropertyType class
The PropertyType class allows you to add arbitrary properties (that is, name/value pairs) to host and all component elements.

PropertyType properties

name (required) is a non-null string. value is a string.

ProvisioningFault class
An extension of EACFault, the ProvisioningFault class is thrown when there are fatal errors during provisioning.

ProvisioningFault properties

errors is a list of provisioning errors. warnings is a list of provisioning warnings.

Administrators Guide Chapter 12: Endeca Application Controller API Class Reference

Endeca Confidential

261

RemoveApplicationType class
Related to the Provisioning service, this class serves as input to the incremental remove command.

RemoveApplicationType properties

applicationID (required) identifies the application to use. forceRemove indicates whether or not a remove operation should force

any running components or services to stop before attempting the remove.

RemoveComponentType class
Related to the Provisioning service, this class serves as input to the incremental remove command.

RemoveComponentType properties

FullyQualifiedComponentIDType (required) identifies the component

to use.
forceRemove indicates whether or not a remove operation should force

the component to stop before attempting the remove.

RemoveHostType class
Related to the Provisioning service, this class serves as input to the incremental remove command.

Endeca Confidential

RemoveApplicationType class

262

RemoveHostType properties

FullyQualifiedHostIDType (required) is a unique string identifier for

this host.
forceRemove is a Boolean that indicates whether or not a remove

operation should force any running components or services to stop before attempting the remove.

RemoveScriptType class
Related to the Provisioning service, this class serves as input to the incremental remove command.

RemoveScriptType properties

applicationID (required) identifies the application. scriptID (required) identifies the script to remove.

ReportGeneratorComponentType class
A class that describes a ReportGenerator component within an application. The ReportGenerator component runs the Report Generator, which processes Log Server files into HTML-based reports that you can view in your Web browser and XML reports that you can view in Web Studio.

ReportGeneratorComponentType properties

componentID (required) identifies the component to use. hostID (required) is a unique string identifier for this host. workingDir is a string identifying the working directory for this

component. Any relative paths in component properties are be

Administrators Guide Chapter 12: Endeca Application Controller API Class Reference

Endeca Confidential

263

interpreted as relative to the component's workingDir. The workingDir property, if specified, must be an absolute path.

logFile is a string identifying the log file for this component. args is a list of command-line flags to pass to the ReportGenerator. javaBinary, if used, should indicate a JDK 1.5.x or later. Defaults to the

JDK that Endeca installs.


javaOptions are the command-line options for the javaBinary

parameter. This parameter is primarily used to adjust the ReportGenerator memory, which defaults to 1GB. To set the memory, use the following:
java_options = -Xmx[MemoryInMb]m -Xms[MemoryInMb]m

inputDirOrFile (required) is the path to the file or directory containing the logs to report on. If it is a directory, then all log files in that directory are read. If it is a file, then just that file is read. outputFile (required) is the name the generated report file and path to

where it is stored.
stylesheetFile (required) is the filename and path of the XSL

stylesheet used to format the generated report.


settingsFile is the path to the report_settings.xml file. timerange sets the time span of interest (or report window). Allowed

keywords:

Yesterday LastWeek LastMonth DaySoFar WeekSoFar MonthSoFar

These keywords assume that days end at midnight, and weeks end on the midnight between Saturday and Sunday.

startDate set the report window to the given date and time. The date

format should be either yyyy_mm_dd or yyyy_mm_dd.hh_mm_ss.

Endeca Confidential

ReportGeneratorComponentType class

264

stopDate sets the report window to the given date and time. The date

format should be either yyyy_mm_dd or yyyy_mm_dd.hh_mm_ss.


timeSeries turns on the generation of time-series data and specifies the frequency, Hourly or Daily. charts turns on the generation of report charts.

RunBackupType class
A child of the RunUtilityType class, this class provides all the information you need to perform a backup operation to the Archive utility.

RunBackupType properties

applicationID (required) is the unique identifier for this application. token identifies the token used to stop the utility or to get its status. If

you do not specify a token, one is generated and returned when you start the utility.

hostID (required) is a unique identifier for the host. The hostID and dirName parameters specify the path to the directory that will be

archived.

dirName (required) is the full path of the directory. The hostID and dirName parameters specify the path to the directory that will be

archived.

backupMethod is either Copy or Move. numBackups specifies the maximum number of archives to store. This number does not include the original directory itself, so if numBackups is set to 3, you would have the original directory plus up to three

archive directories, for a total of as many as four directories. The default numBackups is 5.

Administrators Guide Chapter 12: Endeca Application Controller API Class Reference

Endeca Confidential

265

RunFileCopyType class
A child of the RunUtilityType class, this class provides all the information you need to run the Copy utility.

RunFileCopyType properties

applicationID (required) identifies the application to use. token identifies the token used to stop the utility or to get its status. If

you do not specify a token, one is generated and returned when you start the utility.

fromHostID (required) is the unique identifier for the host you are copying the data from. toHostID (required) is the unique identifier for the host you are copying the data to. sourcePath (required) is the full path to the source file or directory.

If sourcePath contains no wildcards, then destinationPath must be the destination file or directory itself, rather than the parent directory.

destinationPath (required) is the full path to the destination file or

directory.
recursive, when specified, downloads the directories recursively.

RunRollbackType class
A child of the RunUtilityType class, this class provides all the information you need to perform a rollback operation to the Archive utility.

RunRollbackType properties

applicationID (required) identifies the application to use.

Endeca Confidential

RunFileCopyType class

266

token identifies the token used to stop the utility or to get its status. If

you do not specify a token, one is generated and returned when you start the utility.

hostID (required) is a unique identifier for the host. The hostID and dirName parameters specify the path to the directory that will be

archived.

dirName (required) is the full path for the directory. The hostID and dirName parameters specify the path to the directory that will be

archived.

RunShellType class
A child of the RunUtilityType class, this class provides all the information you need to run the Shell utility.

RunShellType properties

applicationID (required) identifies the application to use. token identifies the token used to stop the utility or to get its status. If

you do not specify a token, one is generated and returned when you start the utility.

hostID (required) is a unique identifier for the host. cmd (required) is the command(s). workingDir is the full path for the working directory.

RunUtilityType class
Parent class of the other Utility classes.

Administrators Guide Chapter 12: Endeca Application Controller API Class Reference

Endeca Confidential

267

RunUtilityType properties

applicationID (required) identifies the application to use. token identifies the token used to stop the utility or to get its status. If

you do not specify a token, one is generated and returned when you start the utility.

ScriptListType class
A class that describes a list of scripts.

ScriptListType properties

script (required) is a collection of scripts comprising this ScriptListType object.

ScriptType class
A class that describes the base type for all scripts within an application.

ScriptType properties

scriptID (required) is a unique string identifier for the script. cmd (required) is the command that is used to start the script. logFile is the file for appended stdout/stderr output. It defaults to $ENDECA_CONF/logs/script/(app_id).(script_id).log. workingDir is the working directory. It defaults to $ENDECA_CONF/working/(app_id)/.

Endeca Confidential

ScriptListType class

268

SSLConfigurationType class
A class used by the DgraphComponentType class and AgraphComponentType class to enable SSL on the resulting components.

SSLConfigurationType properties

certFile (required) specifies the path of the eneCert.pem certificate file

that is used by the Dgraph or Agraph processes to present to any client. The file name can be a path relative to the components working directory.

caFile (required) specifies the path of the eneCA.pem Certificate

Authority file that the Dgraph or Agraph processes use to authenticate communications with other Endeca components. The file name can be a path relative to the components working directory.

cipher is an optional cipher string (such as RC4-SHA) that specifies the minimum cryptographic algorithm that the Dgraph or Agraph processes use during the SSL negotiation. If you omit this setting, the SSL software tries an internal list of ciphers, beginning with AES256-SHA. See the Endeca Security Guide for more information.

StateType class
A class used by the StatusType class to describe the state of a component.

StateType fields
An enumeration of the following fields:

Starting

Note: Starting only applies to server components (Dgraph, Agraph, or LogServer).

Running

Administrators Guide Chapter 12: Endeca Application Controller API Class Reference

Endeca Confidential

269

NotRunning Failed

StatusType class
Describes the status of a server component in the Application Controller. Server components include the Dgraph, Agraph, and LogServer. All other components (Forge, Dgidx, Agidx, ReportGenerator, and Crawler) are batch components. Their status is described by the BatchStatusType class on page 247.

StatusType properties

StateType (required) An enumeration of the following fields:

Starting

Note: Starting only applies to server components (Dgraph, Agraph, or LogServer).

Running NotRunning Failed

startTime (required) The time the component started; for example, 5/25/07 3:58 PM. failureMessage The failure message, which tells you that a failure has occurred in the execution of the component. failureMessage is empty unless state is FAILED. (This is different from EACFault, which tells you that a problem has occurred while processing the Web Service request to get the status.)

TimeRangeType class
A class used by the ReportGeneratorComponentType class to set the time span of interest (or report window).

Endeca Confidential

StatusType class

270

TimeRangeType fields
The enumeration of possible values is as follows:

Yesterday LastWeek LastMonth DaySoFar WeekSoFar MonthSoFar

TimeSeriesType class
A class used by the ReportGeneratorComponentType class to turn on the generation of time-series data and specify the frequency, hourly or daily.

TimeSeriesType fields
The enumeration of possible values is as follows:

Hourly Daily

UpdateComponentType class
A class that describes a component to be updated during incremental provisioning.

UpdateComponentType properties

applicationID (required) identifies the application. component (required) identifies the component to update.

Administrators Guide Chapter 12: Endeca Application Controller API Class Reference

Endeca Confidential

271

forceUpdate indicates whether or not the Application Controller should

force the component to stop before attempting the update.

UpdateHostType class
A class that describes a host to be updated during incremental provisioning.

UpdateHostType properties

applicationID (required) identifies the application. host (required) identifies the host to update. forceUpdate indicates whether the Application Controller should force

any components or services running on the host to stop before attempting the update.

UpdateScriptType class
A class that describes a script to be updated during incremental provisioning.

UpdateScriptType properties

applicationID (required) identifies the application. scriptID (required) identifies the script to update.

Endeca Confidential

UpdateHostType class

272

Administrators Guide Chapter 12: Endeca Application Controller API Class Reference

Endeca Confidential

SECTION III
Transferring Implementations Between Environments

274

Administrators Guide

Endeca Confidential

Chapter 13

Transferring Endeca Implementations Between Environments


This chapter describes how to transfer your Endeca implementation from a staging environment that uses Web Studio to a production environment that uses Web Studio. Two methods are described: one uses the Endeca tools to manually transfer the implementation, and the other uses an emgr_update utility that allows you to script and automate transfers. To improve the readability of the chapter, we assume you are transferring your Endeca implementation from a staging environment to a production environment. This need not be the case, however. You can use these procedures to transfer between any environments you choose. This chapter includes the following sections:

About transferring your front-end Web application Transferring implementations using the tools Transferring implementations using the emgr_update utility Removing an application from Endeca IAP

276

About transferring your front-end Web application


This chapter focuses on transferring your instance configuration and MDEX Engine between environments. Depending on your environment and requirements, you may also have to transfer your front-end Web application to complete the move from one environment to another. From an Endeca perspective, all you have to do to transfer the front-end Web application is make sure the MDEX Engine hostname and port you are using in your ENEConnection object is correct for the new environment. Note: See the Endeca Developers Guide for details on using the
ENEConnection interface.

Transferring implementations using the tools


You can use the Endeca tools to manually transfer from a staging environment that uses Web Studio to a production environment that uses Web Studio.

Retrieving the Web Studio instance configuration with Developer Studio


1 2 In your staging environment, start Developer Studio and create a new project. From the Tools menu, choose Web Studio Settings. The Web Studio Settings dialog box appears. 3 Specify the hostname and port for Web Studio for this application. Make sure the hostname and port that are specified correspond with Web Studio whose information you want to retrieve. In the same dialog box, select the application from the drop-down list, or make sure that the application name that is specified corresponds with the name of the application whose configuration you want to retrieve from Web Studio. Note that you can have instance configurations for more than one application created in Web Studio.

Administrators Guide Chapter 13: Transferring Endeca Implementations Between Environments

Endeca Confidential

277

In the Web Studio toolbar, click Get Instance Configuration.

6 7 8

From the File menu, choose Save to save the project with the latest instance configuration. Optionally, remove inactive dynamic business rules from the instance configuration. Copy the instance configuration files from the saved project to the location in your production environment where the Endeca Application Controller expects them to be. Use the Application Controller to run a baseline update on the production system.

Transferring implementations using the emgr_update utility


Similar to the Endeca tools, the emgr_update utility lets you transfer your Endeca implementation from a staging environment that uses Web Studio to the production environment that uses Web Studio. The primary benefit of the emgr_update utility is that it allows you to script and automate transfers between environments. Transferring an implementation from staging to production is a two-step process where you get the instance configuration from the staging environment and then set (update) the configuration in the production environment.

emgr_update syntax
The emgr_update is a utility that assists you in updating the instance configuration of a production system based on the changes made with the Endeca tools in a staging environment. You run emgr_update from a command line. Open a command prompt or UNIX shell to run the program. The syntax for running emgr_update is:

Endeca Confidential

Transferring implementations using the emgr_update utility

278

emgr_update <parameters>

The following table describes the command line parameters you can use with emgr_update. You can specify only one --action operation for each invocation of the utility. emgr_update parameter
--host name:port

Description
Specifies the host name and the port of a machine running Web Studio. If you are retrieving settings (using the get operation), this is the host name of the environment you are transferring from; if you are updating settings (using the set operation), this is the host name of the environment you are transferring to.

--action <op>

Specifies one of the actions, where <op> is one of the operations listed below. Retrieves all the instance configuration settings for a project you performed in the Web Studio in the staging environment, for their use in the production environment. Required parameters: --dir, --prefix Optional parameters: --filter

--action get_all_settings

--action get_ws_settings

Retrieves only those instance configuration settings that can be modified in Web Studio (not all settings). These configuration settings include the following Web Studio features: dynamic business rules, keyword redirects, thesaurus entries, automatic phrases, stop words, and dimension ordering.

Administrators Guide Chapter 13: Transferring Endeca Implementations Between Environments

Endeca Confidential

279

emgr_update parameter
--action get_mdex_settings

Description
Retrieves the instance configuration settings that were modified in Web Studio, and that do not require a baseline update to update the MDEX Engine. These configuration settings include the following Web Studio features: dynamic business rules, keyword redirects, thesaurus entries, automatic phrases. Required parameters: --dir, --prefix Optional parameters: --filter

--action set_post_forge_dims

Updates the Web Studio configuration with the post-Forge dimensions. Retrieves the copy of Web Studio settings for the post-Forge dimensions. Typically, this operation can be used for debugging purposes. Updates a Web Studio production environment with instance configuration settings that were extracted from the Web Studio configuration in the staging environment. Removes all the instance configuration files from Web Studio for the application that you specify with the --app_name parameter. Removing the instance configuration does not remove the associated provisioning information for an application.

--action get_post_forge_dims

--action update_mgr_settings

--action remove_all_settings

--app_name <string>

Specifies the name of the application provisioned to the EAC Central Server.

Endeca Confidential

Transferring implementations using the emgr_update utility

280

emgr_update parameter
Optional action parameters --dir <string>

Description

Specifies the pathname of the directory where the instance configuration files are written to or read from. Required for all --action operations except for set_post_forge_dims and remove_all_settings. Specifies the prefix used for the instance configuration files. This option is required for all --action operations except for set_post_forge_dims and get_post_forge_dims. Filters out dynamic business rules that have a state of inactive. (A rule has the property endeca.internal.workflow.state set to INACTIVE.) This option can be used in conjunction with get_all_settings or get_ws_settings when retrieving an instance configuration. Removing inactive rules is not required but it is recommended. With the default rule filter in place, the MDEX Engine does not fire any rule whose state is inactive. In other words, you can transfer an instance configuration, including both active and inactive rules, and the MDEX Engine fires only active rules in reply to user queries.

--prefix <string>

--filter filter

--post_forge_file <string>

Specifies the pathname to the file that contains post-Forge dimensions. This option is required for the set_post_forge_dims operation.

Administrators Guide Chapter 13: Transferring Endeca Implementations Between Environments

Endeca Confidential

281

emgr_update parameter
Optional global parameters --stop_on_warnings

Description

Stops the utility without asking you if the target directory is not empty before a get operation, or if it finds extra or missing files before an update operation. Continues running the utility if the target directory is not empty before a get operation, or continues if there are extra or missing files before an update operation. Displays the usage parameters for the utility. Displays the version number for the utility.

--ignore_warnings

--help

--version

Here is an example of usage:


emgr_update --host localhost:8888 --action get_ws_settings --prefix wine --dir /apps/endeca/data/forge_input --app_name wine

By using the appropriate --action operations, you can use the emgr_update program to do the following tasks:

Transfer the instance configuration files for a particular application of your choice from the staging environment to the production environment. After the transfer, you run a baseline update using your own EAC scripts. You have the option of transferring all instance configuration files, or transferring just the instance configuration files that Web Studio modified. If your implementation uses the Advanced Crawler, you run it before running a baseline update. Transfer the instance configuration for a particular application from one Web Studio environment to another. Remove instance configuration information for a specified application from the Web Studio configuration. Send the Forge dimensions to the Web Studio.

These operations are described in the following sections.

Endeca Confidential

Transferring implementations using the emgr_update utility

282

Using emgr_update to transfer from a Web Studio staging environment to a Web Studio production environment
This section describes how to transfer instance configuration files from a staging environment that uses Web Studio to a production environment that also uses Web Studio. Two scenarios are described:

Transferring all instance configuration files for an Endeca project. Transferring only the instance configuration files that can be modified by Web Studio.

Transferring all instance configuration files


For this task, you move all the instance configuration files from the staging environment to the Forge input directory in the production environment. You then use the Endeca Application Controller to run a baseline update.

To transfer all configuration files to the production system:


1 2 In the staging environment, use Developer Studio and/or Web Studio to make changes to the project. Run emgr_update with an --action of get_all_settings. a b c d For the --host parameter, specify the machine name and port for the staging Web Studio environment. For the --dir parameter, specify the Forge input directory in the production environment. For the --app_name parameter, specify the application name whose instance configuration you want to transfer. Use the --filter parameter, to remove inactive business rules.

The following is a UNIX example.


emgr_update --host localhost:8888 --app_name My_application --action get_all_settings --prefix wine --filter --dir /apps/endeca/data/forge_input

If the destination directory is not empty, you will be prompted to continue. Answer y.

Administrators Guide Chapter 13: Transferring Endeca Implementations Between Environments

Endeca Confidential

283

When the utility finishes, all project configuration files (including the project and pipeline files) are copied to the production directory specified by the --dir parameter. 4 Use the Endeca Application Controller to run a baseline update on the production system.

The utility uses prefix.esp as the name of the output Developer Studio project file (where prefix is whatever you specified with the --prefix parameter). If there is an existing project file in the production directory with another name, it is recommended that you change it to prefix.esp.

Transferring only instance configuration files modified by Web Studio


In this task, you transfer files from the staging environment to the Forge input directory in the production environment. However, these files are the instance configuration files that can be modified by Web Studio. They are not the full set of instance configuration files. Web Studio can modify instance configuration files for any of the following features:

The Endeca Advanced Crawler Dynamic business rules Thesaurus entries Automatic phrases Stop words Dimension ordering

A subsequent baseline update uses the updated files for these features.

To transfer instance configuration files modified by Web Studio to a production system:


1 2 In the staging environment, use Web Studio to make any necessary instance configuration changes. Run emgr_update with an --action of get_ws_settings. a b For the --host parameter, specify the machine name and port for the Web Studio in the staging environment. For the --dir parameter, specify the Forge input directory in the production environment.

Endeca Confidential

Transferring implementations using the emgr_update utility

284

c d

For the --app_name parameter, specify the application name whose instance configuration you want to transfer. Use the --filter parameter, to remove inactive dynamic business rules.

The following is a Windows example:


emgr_update.bat --host localhost:8888 --app_name My_application --action get_ws_settings--prefix wine --filter --dir c:\endecaproduction\data\forge_input

If the destination directory is not empty, you will be prompted to continue. Answer y. When the utility finishes, the project files that Web Studio modified are copied to the production directory specified by the --dir parameter.

Use the Application Controller to run a baseline update on the production system.

Using emgr_update to transfer from one Web Studio environment to another


The process for transferring and deploying instance configuration files from one Web Studio environment to another, using the emgr_update utility is accomplished using one of the emgr_update utilitys --action operations, and the entire process can be scripted.

To transfer and deploy all instance configuration files to the production system:
1 2 In the staging environment, use Developer Studio and/or Web Studio to make changes to the project. Run emgr_update with an --action of get_all_settings. a b c For the --host parameter, specify the machine name and port for the staging environment in Web Studio. For the --dir parameter, specify the Forge input directory in the production environment. For the --app_name parameter, specify the application name whose instance configuration you want to transfer.

Administrators Guide Chapter 13: Transferring Endeca Implementations Between Environments

Endeca Confidential

285

Use the --filter parameter, to remove inactive dynamic business rules.

The following is a Windows example:


emgr_update.bat --host localhost:8888 --app_name My_app --action get_all_settings --prefix wine --filter --dir c:\endecaproduction\data\forge_input

If the destination directory is not empty, you will be prompted to continue. Answer y. When the utility finishes, all project configuration files are copied to the production directory specified by the --dir parameter.

Run emgr_update with an --action of update_mgr_settings. a b For the --host parameter, specify the machine name and port for the production environment in Web Studio. For the --dir parameter, specify the directory that contains the project configuration files that will be used to update the production environment in Web Studio (typically, this will be the same directory that was used in step 2). For the --app_name parameter, specify the application name whose instance configuration you want to transfer.

The following is a Windows example:


emgr_update.bat --host localhost:8888 --app_name My_app--action update_mgr_settings --prefix wine --dir c:\endecaproduction\data\forge_input

5 6

If your implementation uses the Advanced Crawler, use the Application Controller to run the baseline update. Use the Application Controller to run a baseline update on the production system.

Using emgr_update to remove instance configuration files from Web Studio


Deleting an application from EAC in Web Studio does not remove its instance configuration files. If you want to delete the instance configuration files for the application, you can use emgr_update.

Endeca Confidential

Transferring implementations using the emgr_update utility

286

To remove instance configuration files:


Run emgr_update with an --action of remove_all_settings. a b For the --host parameter, specify the machine name and port for the staging environment in Web Studio. For the --app_name parameter, specify the application name whose instance configuration you want to remove.

The following is a Windows example:


emgr_update.bat --host localhost:8888 --app_name My_app --action remove_all_settings --prefix My_prefix

The applications instance configuration files are removed from the Web Studio.

Using emgr_update to send the dimensions file produced by Forge to the Web Studio
Read this section only if you are not using an Application Controller default script for running the baseline update, and are using your own scripts for this purpose. If you are using your own scripts for running the baseline update, then after you run Forge, you need to send the dimensions file produced by Forge to the Web Studio instance configuration for your application.

To send the dimensions file produced by Forge to the Web Studio, run emgr_update as follows:
On the machine that has access to the output files of Forge (this is typically the machine on which you ran Forge), run emgr_update with an action of set_post_forge_dims: a b For the --host parameter, specify the machine name and port for the environment in Web Studio. For the --app_name parameter, specify the application name whose instance configuration you want to update with this information. For the --post_forge_file parameter, specify the full pathname to the output file where Forge stores its dimensions.

Administrators Guide Chapter 13: Transferring Endeca Implementations Between Environments

Endeca Confidential

287

emgr_update.bat -host localhost:8888 --action set_post_forge_dims -app_name wine -post_forge_file C:\sample_wine_data\data\partition0\forge_output\wine.dimensions.xml

Removing an application from Endeca IAP


Removing an application in the EAC Admin Console removes provisioning information for an application but not instance configuration files. Running remove_all_settings in emgr_update removes instance configuration files but not provisioning information. To completely remove all information about an application, you perform both steps. If you do not perform both steps, you may store unnecessary or duplicate sets of files for an application.

To completely remove an application from the Endeca IAP:


1 2 In Web Studio, log in to the application you want to remove. On the EAC Admin Console page, click Delete. Alternatively, you can also perform steps 1 and 2 using an EAC web services client. This removes the provisioning information. Run remove_all_settings of emgr_update to delete the instance configuration files. For details, see Using emgr_update to remove instance configuration files from Web Studio on page 285.

Endeca Confidential

Removing an application from Endeca IAP

288

Administrators Guide Chapter 13: Transferring Endeca Implementations Between Environments

Endeca Confidential

SECTION IV
Tuning Endeca Implementations

290

Administrators Guide

Endeca Confidential

Chapter 14

The MDEX Engine Request Log


This chapter describes the MDEX Engine (Dgraph) request log, which you can use to analyze Endeca application performance. The chapter contains the following sections:

About the MDEX Engine request log Request log file format Extracting information from request logs Request log rolling URL parameter mapping

292

About the MDEX Engine request log


The MDEX Engine request log (also called the Dgraph request log) is the file that captures Web application query information. The MDEX Engine always generates a request log with a default name dgraph.reqlog. You use the --log option when running the MDEX Engine to specify a different path to store the request log (for more information, see Dgraph options on page 352). You can extract queries from this log file and use them with the Endeca Eneperf tool to analyze Web application performance. Eneperf is described in Chapter 15. You can also use Perl to extract useful information from Dgraph request logs; for more information, see Extracting information from request logs on page 294. In addition, depending upon the size of your log files, you can import them into a tool that allows you to manipulate column-based data, such as Microsoft Excel.

Request log file format


The following is an example of a single line from a request log file:
1180560941 127.0.0.1 1180560943_127.0.0.1_8888_9 21303 73.78 2.37 200 19290 0 - /graph?node=0&group=0&offset=0&nbins=10&irversion=510&ignore=1&merchpreviewtime =2007%2D05%2D30T17%3A35&agreq=1180560943_127.0.0.1_8888_9

Each new line in a request log file starts with a time stamp such as 1180560941. Each entry has the following eleven columns:
[Time of Request] [Client IP Address] [Query ID Tag] [Response Size] [Response Duration] [Processing Time] [HTTP Return Code] [Number of Results Returned] [Queue Status] [Thread ID] [Request URL]

The following table describes the log entries in more detail. Column
Time of Request

Description
Time stamp indicating the time the request was completed, in seconds, since the epoch (January 1, 1970, 00:00:00 UTC).

Administrators Guide Chapter 14: The MDEX Engine Request Log

Endeca Confidential

293

Column
Client IP Address Query ID Tag

Description
IP address of the requesting client. Query identifier tag, comprised of underscore-seperated timestamp, IP-address, port, and sequential values. This value can be used to correlate an entry in a child Dgraphs log with a query in a parent Agraphs log. This field will always contain a - character in request logs from non-child Dgraphs. Number of bytes written to the client. May be less than or equal to the intended result size, for example, due to a premature session end. The request lifetime, in milliseconds. Equal to the total amount of time between when the Dgraph reads the request from the network and finished sending the result. May include queuing time, such as time spent waiting for earlier requests to be completed. Processing time, in milliseconds. Equal to the total computation time required for the Dgraph to handle the request, excluding network and wait time. This value gives an accurate measure of how expensive the request was to compute, given current system state. (That is, if the machine in question was busy with other threads or processes, the time may be longer than on an otherwise unused machine.) For any given query, Processing Time will always be smaller than Response Duration.

Response Size

Response Duration

Processing Time

HTTP Return Code

HTTP return code, such as 200 (OK) or 404 (Not Found). Number of results returned (or - if the HTTP request was not a query).

Number of Results Returned

Endeca Confidential

Request log file format

294

Column
Queue Status

Description
Number of threads busy when the request was received. If the queue status, Q, is positive, it means that there were Q requests in the queue. If Q < 0, it means that there were 0 requests in the queue, and -Q threads idle. The thread ID of the thread that was assigned the request (or - in single-threaded mode). The URL passed to the MDEX Engine, unquoted, exactly as it was received.

Thread ID

Request URL

Extracting information from request logs


MDEX Engine request logs can be very large and difficult to read. You might find it useful to sort them on fields you are interested in, such as Processing Time or Response Duration. You can then look for a pattern or feature in the most time-consuming queries that might be the origin of the performance issue. Here are two approaches to extract information from request logs:

Run the Cheetah script available from Endeca Support. Write your own Perl code.

The Cheetah script reads one or more MDEX Engine logs and reports on the nature and performance of the queries recorded in those logs. This report provides information on what actually happened in the past, instead of reporting on potential performance or capacity planning for the future. This script can be run manually in order to debug performance problems, and should also be run on a regular basis to continually monitor performance and call out trends in dgraph traffic load, latency, throughput, and application behavior. To download the Cheetah script, log in to the Endeca Support Center at https://support.endeca.com/ and see the Tools and Utilities page. If you write Perl to extract, manipulate, and analyze the information you a request log, you may find the following setting useful in Perl scripts:

Administrators Guide Chapter 14: The MDEX Engine Request Log

Endeca Confidential

295

perl -nae

where:

n indicates that it is a loop processing each line of the input file(s) in

turn.
a turns on autosplit. e indicates that it should execute the next argument, which should be Perl code.

This script shows how many queries took more than five seconds. It splits the line on whitespace into an array called F. The fourth element in the array ([3]) corresponds to the Response Duration and represents the amount of time the query took.
perl -nae 'print if $F[3] > 5000' logfile

If you are tracking system trends by time, you may find it useful to correlate the epochal time that the log displays with human-readable time. This script is used to convert the time stamps into a more readable form.
perl -nae 'print scalar localtime $F[0]," $_"'

Note: In this script, Localtime is set to the location where you are doing analysis, so if you are looking at a log from a different time zone, you may want to change the timezone. On UNIX systems the TZ environment variable can be set to effect this change. For example, TZ=US/Pacific.

Request log rolling


The MDEX Engine request log is subject to log rotation when it goes over one gigabyte. When this occurs, the existing logfile is renamed from, say, dgraph.reqlog to dgraph.reqlog.PID.N, where:

PID is the Dgraph process ID. N is the number of logs that this Dgraph has already rotated. N=0 the

first time the Dgraph does log rotation, and then goes up by 1 each time.

Endeca Confidential

Request log rolling

296

URL parameter mapping


This section provides a mapping between the URL that is sent from the application to the Endeca Presentation API, and the URL that is sent from the API to the MDEX Engine. As described in the Endeca Developers Guide, there is not a one-to-one correlation between these two URLs. The Presentation API transforms the URL it receives from the application into a MDEX Engine-specific URL before sending it to the engine.
Request URL MDEX Enginespecific

Your Application

Endeca Presentation API

Endeca MDEX Engine

You can use the information in the remainder of this section to translate the MDEX Engine request log file, which tells you exactly which URLs the MDEX Engine has processed. By extension, these are the URLs that the Presentation API has sent to the MDEX Engine. If the API has sent an incorrect URL to the MDEX Engine, it is a good indication that the API received an incorrect URL from the Web application in the first place. Use the table below to map the parameters as follows: The left column is the mapping between the Presentation API and the ENE parameter. The far right column is the mapping between the Web application and the Presentation API parameter. Note: For a complete description of the ENE URL query parameters, see the Endeca Developers Guide.

Example mappings
Here are some sample mappings. Web Application to API
/controller.jsp?N=0 /controller.jsp?N=0&Ntk=DESC& Ntt=merlot

API to MDEX Engine


/graph?node=0 /graph?node=0+attrs=DESC+merlot

Administrators Guide Chapter 14: The MDEX Engine Request Log

Endeca Confidential

297

Mapping parameters
The ENE parameters in bold are the primary parameters, while those in non-bold are secondary parameters. MDEX Engine parameter
graph? node

Description
Navigation query Navigation query parameter, navigation descriptors Navigation query parameter, record offset Navigation query parameter, aggregated record offset Navigation query parameter, exposed refinements Navigation query parameter, records per aggregated record Navigation query parameter, sort Navigation query parameter, sort order Navigation query parameter, rollup Navigation query parameter, record search key, terms, and options Navigation query parameter, search interface, relevance ranking terms, relevance ranking strategy and match mode Navigation query parameter, Did You Mean Navigation query parameter, compute phrasings

Maps to...
N N

offset

No

offset

Nao

group

Ne

allbins

Np

sort sort groupby attrs

Ns Nso Nu Ntk, Ntt, Ntx Nrk, Nrt, Nrr, Nrm

relrank

dym

Nty

autophrase

Ntpc

Endeca Confidential

URL parameter mapping

298

MDEX Engine parameter


autophrasedwim

Description
Navigation query parameter, rewrite query Navigation query parameter, merchandising preview time Navigation query parameter, merchandising rule filter Navigation query parameter, range filters Navigation query parameter, record filters Navigation query parameter, Endeca Query Language Navigation query parameter, analytics Navigation query parameter, dynamic refinement ranking

Maps to...
Ntpr

merchpreviewtime

Nmpt

merchrulefilter

Nmrf

pred

Nf

filter

Nr

structured

Nrs

stat refinement

Nl Nrc

search? terms

Dimension search query Dimension search query parameter, search terms Dimension search query parameter, options Dimension search query parameter, dimension search scope Dimension search query parameter, search dimension Dimension search query parameter, number of results

D D

options

Dx

node

Dn

model

Di

num

Dp

Administrators Guide Chapter 14: The MDEX Engine Request Log

Endeca Confidential

299

MDEX Engine parameter


offset

Description
Dimension search query parameter, offset Dimension search query parameter, rank Dimension search query parameter, range filters Dimension search query parameter, record filters Dimension search query parameter, Endeca Query Language

Maps to...
Do

rank pred

Dk Df

filter

Dr

structured

Drs

abin? id

Aggregated record query Aggregated record query parameter, record ID Aggregated record query parameter, descriptors Aggregated record query parameter, rollup Aggregated record query parameter, range filters Aggregated record query parameter, record filters Aggregated record query parameter, Endeca Query Language

A A

node

An

groupby

Au

pred

Af

filter

Ar

structured

Ars

bin?

Record query

Endeca Confidential

URL parameter mapping

300

MDEX Engine parameter


id

Description
Record query parameter, record ID

Maps to...
R

Administrators Guide Chapter 14: The MDEX Engine Request Log

Endeca Confidential

Chapter 15

The Eneperf Tool


Eneperf is a performance testing tool that is included in your Endeca installation. This chapter describes how to use Eneperf. It contains the following sections:

About Eneperf Using Eneperf Obtaining logs for use with Eneperf Debugging Eneperf

302

About Eneperf
Eneperf is a performance debugging tool that can measure throughput to help you identify system bottlenecks. Eneperf makes HTTP queries against the MDEX Engine (Dgraph) based on your MDEX Engine request logs and gathers the resulting statistics, without processing the results in any way. Because Eneperf is lightweight, it has a very slight impact on performance. In most cases, it can be run on the same machine as the Dgraph or Agraph being tested. In addition, it can be run on a remote machine. Eneperf drives a substantial load at the MDEX Engine and reveals how many operations per second the MDEX Engine responds with. You specify the log file and tell Eneperf how many times to run through it, as well as the number of client connections to simulate. Eneperf understands Endeca MDEX Engine URLs, which use the pipe symbol (|). Because the pipe symbol is not a legal character in the URL/URI standards, other programs, such as wget, may transform it inappropriately.

Using Eneperf
Eneperf is installed in the Endeca Navigation Platform bin directory. It has the following usage: usage: eneperf [-v] [--header <header file path>] [--gzip] [--nreq <n>] [--nodnscache] [--progress] [--pidcheck <pid>] [--quitonerror] [--rcvbuf <size bytes>] [--record <recording file prefix>] [--record_hdr] [--record_ord] [--record_roll <max KB per recording file>] [--reqstats] [--runtime <max runtime (minutes)>] [--sleeponerror <secs>] [--stats <num reqs>] [--throttle <max req/sec>] [--warn <max req time warning threshold (msecs)>] <host> <port> <log> <num connections> <num iteration> Eneperf has both required and optional settings.

Administrators Guide Chapter 15: The Eneperf Tool

Endeca Confidential

303

Required settings
The required settings (shown in order) are as follows:
<host> <port> <log> <num connections> <num iterations>

Their usage is as follows: Setting


<host> <port> <log>

Description
Target host for requests. Port the target host is listening to for requests. Log file of the query portion of the MDEX Engine URLs (that is, the portion that resides in the last column of the MDEX Engine request log), which is used for HTTP request generation. URLs from the <log> file are replayed in order. Maximum number of outstanding requests to allow before waiting for replies. In other words, the number of simultaneous HTTP connection streams to keep open at all times. This number emulates multiple clients for the target server. For example, using <num connections> of 16 emulates 16 concurrent clients querying the target server at all times. Number of times to replay the query log. All outstanding requests are drained before a new iteration is started.

<num connections>

<num iterations>

The following sections contain additional information about the required settings.

Host and port settings


You can run Eneperf locally or from a remote machine.

Endeca Confidential

Using Eneperf

304

Running Eneperf locally


Eneperf is lightweight and has a very slight impact on performance. It can usually be run on the same machine as the Dgraph or Agraph being tested with no impact on results. To run Eneperf on the same machine as the Dgraph or Agraph, you point it to localhost and <port>. This configuration is useful for isolating MDEX Engine performance from any potential networking issues.

Running Eneperf on a remote host


Eneperf can also be run from a remote host. Using Eneperf to test the same MDEX Engine from the local machine and from across the network can expose networking problems if the throughputs are significantly different. Note: Eneperf can be run on a machine with a different architecture than one you are testing.

Log file settings


MDEX Engine request logs (described in Chapter 14) can be used as Eneperf input with some modification. URLs in the log should not include any machine connection parameters such as protocol, host, or port. These are added automatically. For example, a log entry of the following form is valid:
/graph?node=0

But a log entry of the following form is not valid:


http://myhost:5555/graph?node=0

Higher concurrent load can be achieved by using a single large request log file (which might simply be repeated concatenations of a smaller log file) than by using multiple iterations of a small log file. The log file should preferably be at least 100 lines, even if it consists of the same query repeated over and over. Because Eneperf drains all connections between each iteration, running a one-line log file through Eneperf 100 times will give you skewed throughput statistics.

Setting the number of connections and iterations


Eneperf load is driven by the num connections setting, which indicates the number of simultaneous connections Eneperf will try to maintain at a time.

Administrators Guide Chapter 15: The Eneperf Tool

Endeca Confidential

305

For example, if num connections is set to 4, it sends four requests to the MDEX Engine. When one returns, another is sent out to replace it. The number of connections needed to saturate the MDEX Engine varies depending on your MDEX Engine configuration and the server machine characteristics, and generally correlates to the number of threads in use. For example, if you have four threads, you might start with six or eight client connections. A good rule of thumb is to use two times the number of threads. However, a MDEX Engine with four threads might be saturated by just three connections if the queries are complex and all CPUs are being used 100%. There is no hard and fast rule, so feel free to experiment. Although num connections does not have to be large, you want to make sure there are always enough simultaneous clients so that requests are waiting to be served. This ensures that the MDEX Engine stays busy during the communication lag between the MDEX Engine and Eneperf. If you are using a small log with a large num connections, keep in mind that each time the log is restarted, all connections are drained. In effect, using a log file with just one entry limits num connections to one. To generate a MDEX Engine request log showing the canonical time for each query, run Eneperf with a single client (that is, num connections equal to one), so that it sends only one request at a time. Each query will be executed alone; no other query computations will be contending for the machines resources. The request log can then be examined for slow queries without the concern that they happened to be slow because other queries were executing simultaneously.

Optional settings
Eneperf contains the following optional settings: Setting
-v

Description
Verbose mode. Print query URLs as they are requested. Specify path of file containing HTTP header text, one header field per line.

--header <header file path>

Endeca Confidential

Using Eneperf

306

Setting
--gzip --nreq <n> --nodnscache

Description
Send 'Accept-encoding: gzip' in the HTTP request. Stop after n requests. Disable caching of DNS hostname lookups. By default, Eneperf caches these lookups to improve performance. On a connection error, tests the target Dgraph or Agraph. If the process is not alive, Eneperf terminates. Display the percentage of the log file processed. Causes Eneperf to terminate if it encounters a fatal HTTP error. By default, errors are ignored. Override the default TCP receive buffer size, set via the SO_RCVBUF socket option. Record a log of all HTTP responses. Recorded data is placed in output files with the prefix <recording file prefix>. Data files are given the suffixes .dat1, .dat2, and so on. An index file with the suffix .idx is also produced. In --record mode, record HTTP header information along with page content. In --record mode, ensure that log entries are recorded in the same order that they are listed in the <log> file, even if they are processed out of order. Set the maximum number of KB per recording file. Default is 1024 KB.

--pidcheck <pid>

--progress --quitonerror

--rcvbuf <size bytes> --record <recording file prefix>

--record_hdr

--record_ord

--record_roll <max KB per recording file> --reqstats

Maintain and report per-request timing statistics. Note: This option only produces accurate results when <num connections> is set to 1.

Administrators Guide Chapter 15: The Eneperf Tool

Endeca Confidential

307

Setting
--runtime <max runtime (minutes)> --seek <n>

Description
Places a limit on the run time for Eneperf. Eneperf exits after <max runtime> minutes.

Tells Eneperf to skip a specified number of requests in the specified log file and start with log entry n. For example, in a log containing 100 requests, if Eneperf is invoked with --seek 50, it will issue 50 requests from 50 to 100. Used in conjunction with --seek to indicate that Eneperf should start each iteration with the log entry specified by --seek. --seekrepeat only comes into play when the number of iterations specified is greater than one. If so, when Eneperf reaches the end of the log file, --seekrepeat indicates that it should start the next iteration from the log entry specified as a value to --seek (50 in the example above). The default behavior (without --seekrepeat) is to seek only on the first iteration and restart from the beginning of the file on subsequent iterations.

--seekrepeat

--sleeponerror <secs>

Causes Eneperf to sleep for number of seconds before sending any new requests after it encounters a connection error. Print statistics after every <num reqs> requests are processed (sent and received). Places an approximate limit on the number of requests per second that Eneperf will generate. For more information, see Setting the number of queries sent to the Dgraph on page 308. Causes Eneperf to print a warning message for any requests that take longer than the specified threshold time limit to return (useful for finding the slow requests in a log file).

--stats <num reqs> --throttle <max req/sec>

--warn <max req time warning threshold (msecs)>

The sections that follow highlight some useful optional settings.

Endeca Confidential

Using Eneperf

308

Generating incremental statistics


You use the --stats setting to specify how many queries you want to see statistics reported on. Typical values are 500 or 100. The --reqstats setting provides a finer level of detail.

Generating statistics on the fly


Eneperf can run for hours. If you neglected to set --stats yet want to obtain a statistics printout without stopping the process, you can send Eneperf a usr1 signal. For example, on UNIX, you could use the kill command to send a signal like this:
kill -usr1 pid

Setting the number of queries sent to the Dgraph


By default, Eneperf drives load as fast as the MDEX Engine can handle it. However, there is a setting, --throttle, that allows you to place an approximate limit on the number of queries per second sent to the MDEX Engine. That means you can drive load at a rate you select. The --throttle setting is useful when you want to approximate a special case. For example, imagine you expect high-traffic load during the holiday season. You want to calculate maximum load, while maintaining a comfortable margin of error for the MDEX Engine by running it at 80% utilization. You might prepare an estimate by multiplying the maximum load by 0.8. Alternatively, you could use --throttle to try different numbers of queries per second and to capture the CPU performance on the MDEX Engine machine, using a tool such as vmstat on Solaris. You could then calculate the average CPU utilization from these numbers, or plot a chart of utilization over time in Microsoft Excel. The mapping of the --throttle setting to queries per second is not exact. Eneperf uses a simple method to calculate the waiting times to insert between queries. You get a real number of operations per second but it might be significantly lower than you want or expect. The --throttle setting to Eneperf can exceed the maximum throughput of the MDEX Engine and still result in throughput results for the MDEX Engine that are less than its maximum. Once again, you should feel free to experiment.

Administrators Guide Chapter 15: The Eneperf Tool

Endeca Confidential

309

Obtaining logs for use with Eneperf


In order to use Eneperf, you need a log of URLs in the correct format. The lines in the log file you use with Eneperf should be formatted like these:
/search?terms=blackberry&rank=0&opts=mode+matchall&offset=0&compound=1 &irversion=410 /graph?node=0&group=10&offset=0&nbins=10&attrs=All+berry|mode+matchall &dym=1&irversion=410

There are numerous ways that you can obtain such logs; this section provides you with a few examples.

Converting a MDEX Engine request log file


You can convert a MDEX Engine request log file (described in Chapter 14) for Eneperf use with a command like this one:
sed -e '/DGRAPH STARTUP/d' <logfile> | sed -e '/\/admin.*$/d' | sed -e 's/^.*\//\//g' > <new logfile>

This does the following:

It deletes DGRAPH STARTUP lines, because these lines contain no commands. It removes admin requests, such as admin?op=stats or admin?op=exit, that can cause problems in an Eneperf run. It strips out everything before the first slash (/) character in each remaining line.

Creating a log file by hand using substitute search terms


You can also approximate a log file to be used with Eneperf. This method is useful when you dont have a running MDEX Engine and archives of logs to work with. For example, you may want to test the performance of search terms culled from some other system.

Endeca Confidential

Obtaining logs for use with Eneperf

310

To create a log file by hand:


1 2 3 Create a list of search terms that you want to test. Copy or create a URL in the appropriate format. Compose a new log file by substituting your search terms into URLs of the correct format.

Debugging Eneperf
Because it is very lightweight, Eneperf itself is not prone to errors. In general, if you make an error while typing the command line argument, Eneperf returns its help message. However, if you accidentally mistype the MDEX Engine port, Eneperf generates numerous failed connection error messages. It is also possible for error messages to be displayed during normal operation. For example, if the log file contains a request to retrieve a record that is not present in the MDEX Engines data set, Eneperf (as expected) presents a 404 (file not found) message. Note: Queries that cause HTTP errors are not counted towards ops/sec performance results displayed by Eneperf.

Administrators Guide Chapter 15: The Eneperf Tool

Endeca Confidential

Chapter 16

MDEX Engine Statistics and Auditing


The MDEX Engine Statistics page displays MDEX Engine (Dgraph) performance statistics. The MDEX Engine Auditing page tracks usage for licensing and performance purposes. This chapter describes both of these pages. It contains the following sections:

About the MDEX Engine Statistics page Viewing the MDEX Engine Statistics page Sections of the MDEX Engine Statistics page Checking the aliveness of a Dgraph or Agraph

312

About the MDEX Engine Statistics page


The MDEX Engine Statistics page (also called the Dgraph Stats page or Admin Stats page) is a useful source of information about your Endeca implementations configuration and performance. It provides a detailed breakdown of what the Dgraph is doing that allows you to focus your tuning and load-balancing efforts. If you look at this page carefully you can see where the Dgraph is spending its time. You may want to begin your tuning efforts by identifying the features in the Hot Spot Analysis section with the highest totals.

Viewing the MDEX Engine Statistics page


You can request the MDEX Engine Statistics page with the following URL:
http://DgraphServerNameOrIP:DgraphPort/admin?op=stats

For example, if your Dgraph is running on your local machine and listening on port 8000, specify this:
http://localhost:8000/admin?op=stats

To reset the statistics, make the following request:


http://DgraphServerNameOrIP:DgraphPort/admin?op=statsreset

The source data for the statistics is stored in XML. By default, the MDEX Engine Statistics page is rendered into HTML through an Endeca XSLT stylesheet, stats.xslt, that is installed in the ENDECA_ROOT/conf/dtd/xform directory. If your browser supports XSLT transformations (for example, Internet Explorer 6 and later), you can view the statistics as transformed by stats.xslt or you can modify the shipped stats.xslt stylesheet to provide a different transformation of the data. If your browser does not support XSLT transformations, or if you want to see the raw XML, rename or remove ENDECA_ROOT/conf/dtd/xform/stats.xslt.

Administrators Guide Chapter 16: MDEX Engine Statistics and Auditing

Endeca Confidential

313

Sections of the MDEX Engine Statistics page


The MDEX Engine Statistics page is divided into tabs. Information on all of the tabs is presented through the statistics pages URL as described in the following section.

Performance Summary General Information Index Preparation Cache Details

The Performance Summary tab


The Performance Summary tab contains the following sections: Section
Performance

Description
Various statistics (average, standard deviation, minimum, maximum, and total) on:

Queue length (in multithreaded mode only) Number of threads busy (in multithreaded mode
only)

Time spent waiting in the queue, processing in the


Dgraph, writing to the network, and total Throughput Five-minute, one-minute, and ten-second average throughput statistics.

Performance Statistics

Total processing time Number of records (results) Response size (in bytes)

Properties Dimensions

The usage of memory per property. The usage of memory per dimension.

Endeca Confidential

Sections of the MDEX Engine Statistics page

314

The General Information tab


The General Information tab contains the following sections: Section
Information Arguments

Description
Connection details as well as A list of all arguments the Dgraph was started with.

The Index Preparation tab


The Index Preparation tab contains the following section: Section
Index Preparation

Description
Displays how much time the Dgraph has spent computing sorts and range filters. (This is typically done as background work.)

The Cache tab


The Cache tab contains the following sections: Section
Main Cache Page Cache

Description
Total and per-key statistics for the main cache. Total and per-key statistics for the page cache.

Administrators Guide Chapter 16: MDEX Engine Statistics and Auditing

Endeca Confidential

315

The Details tab


The Details tab contains the following sections: Section
Most Expensive Queries

Description
The URL and total time in milliseconds for the ten queries with the largest total computation time (that is, queue time plus Dgraph processing time plus write time) made in the session. Details on the performance of specific features, including navigation, record filter, range filter, merchandising, record search, and snippeting. The number of result pages served, as well as format performance and byte size by average, standard deviation, minimum, maximum, and total. Information about the number of navigation pages, as well as performance, query size, and result size by average, standard deviation, minimum, maximum, and total. The total number of sorts performed, and the percentage of those sorts for each sort type. Information pertaining to the analytics features in Endeca Analytics. A finer-grained analysis of the performance of individual features.

Hotspots

Results

Navigation

Record Sorting

Analytics

Search

Note: If you modified the shipped stats.xslt stylesheet, the information might display differently.

Checking the aliveness of a Dgraph or Agraph


You can view the MDEX Engine Statistics page to check as to whether the MDEX Engine is running and accepting queries, but that comes with some

Endeca Confidential

Checking the aliveness of a Dgraph or Agraph

316

overhead. A quicker way to check the aliveness of a Dgraph or an Agraph is by accessing the following URL:
http://DgraphServerNameOrIP:DgraphPort/admin?op=ping

or
http://AgraphServerNameOrIP:AgraphPort/admin?op=ping

The Dgraph or Agraph quickly returns a lightweight HTML response page with the following content:
dgraph host:port responding at date/time

or
agraph host:port responding at date/time

About the Endeca MDEX Engine Auditing page


The MDEX Engine Auditing page lets you view the output of XML reports that track ongoing usage statistics. These statistics persist through process restarts. This data can be used to verify compliance with licensing terms, and is also useful for tracking product usage.

Viewing the MDEX Engine Auditing page


You can request the MDEX Engine Auditing page with the following URL:
http://DgraphServerNameOrIP:DgraphPort/admin?op=audit

For example, if your Dgraph is running on your local machine and listening on port 8000, specify this:
http://localhost:8000/admin?op=audit

Administrators Guide Chapter 16: MDEX Engine Statistics and Auditing

Endeca Confidential

317

The source data for the auditing reports is stored in XML. By default, the MDEX Engine Auditing page is rendered into HTML through an Endeca XSLT stylesheet, audit.xslt, that is installed in the ENDECA_ROOT/conf/dtd/xform directory.

Audit persistence file details


The naming convention for the audit persistence file is:
audit-<data_prefix>-<agidx_persistence_number>.xml

For example, an audit persistence file on the sample wine implementation might look like this:
audit-wine-0.xml

This convention ensures that each Dgraph creates a unique file. It makes it possible to maintain the audit persistence files for numerous Dgraphs in an application in the same directory without contention. By default, the audit persistence file is written to a directory called persist that is located in the applications working directory. To direct it elsewhere, use the Dgraph flag --persistdir when you first create the Dgraph. Do not move or rename this directory after it has been created. You should not delete the audit persistence file or attempt to edit it manually. Upon startup, the Dgraph checks for the presence of this file, and if it cannot find it or read it, it issues a warning message and creates a new one. Note: If you see such a warning message when you first create a Dgraph, you can safely disregard it.

Sections of the MDEX Engine Auditing page


The MDEX Engine Auditing page consists of two tabs: Audit Stats and General Information. Auditing statistics are gathered in one of two ways:

The Query Load statistic tracks the hour with the most queries in each calendar week, starting when you first run the Dgraph and persisting through process restarts.

Endeca Confidential

About the Endeca MDEX Engine Auditing page

318

All other auditing statistics constantly monitor the peak value over the course of a calendar week, and report the exact time when a value greater than the current peak value appears, starting when you first run the Dgraph and persisting through process restarts. Because these metrics are calculated over the course of a week, a change such as a deleted record is not reflected until the following week, when the peak value count is reset.

The Audit Stats tab


The Audit Stats tab contains the following information: Section
Query Load

Description
The peak number of queries per hour that the Dgraph has seen in the past week. In addition to the peak value, this metric also returns the peak interval, expressed as the peak hour (aggregated by hour). The exact time with the peak number of records in the past week. The peak value for the total number of identified properties and dimensions across all records in the Dgraph. This value is calculated over the past week. The total number of populated dimensions or properties for all records. This value is calculated over the past week.

Number of Records

Number of Columns

Number of Assignments

Data Size

The total size, in bytes, of all user data. Note: This may vary, depending on platform and on whether the machine is 32 or 64 bit.

Administrators Guide Chapter 16: MDEX Engine Statistics and Auditing

Endeca Confidential

319

The General Information tab


The General Information tab contains the following sections: Section
Information Arguments

Description
Basic connection and machine details. A list of all arguments the Dgraph was started with.

Note: This tab is identical to the one of the same name on the MDEX Engine Server Statistics page.

Endeca Confidential

About the Endeca MDEX Engine Auditing page

320

Administrators Guide Chapter 16: MDEX Engine Statistics and Auditing

Endeca Confidential

Chapter 17

The Forge Logging System


This chapter provides a brief introduction to the Forge logging system. Its command-line interface allows you to focus on the messages that interest you globally and by topic. The chapter consists of the following sections:

Overview of the Forge logging system About log levels Logging topics The command line interface

322

Overview of the Forge logging system


The Forge logging system provides a logging interface to Forge components. With this system, you can specify the logging level for a component globally or by topic. This allows you to filter logging messages so you can monitor elements of interest at the appropriate granularity without being overwhelmed by messages that are not relevant. A simple command-line interface makes it easy to adjust your logging strategy to respond to your needs. During development, you might be interested in feedback on only the feature you are working on, while in production, you would typically focus on warnings and errors.

About log levels


The log levels used by Forge logging are as follows: Log level
FATAL

Description
Indicates a problem so severe that you have to shut down. Non-fatal error messages. Alerts you to any peculiarities the system notes. You may want to address these. Provides status messages, even if everything is working correctly. Provides all information of interest to a user.

ERROR WARN

INFO

DEBUG

Logging topics
All log messages are flagged with one or more topics. There are different types for different components, all logically related to some aspect of the component.

Administrators Guide Chapter 17: The Forge Logging System

Endeca Confidential

323

Forge logging topics


In Forge, you can specify individual logging levels for each of the following topics:

baseline update config webservice metrics

The command line interface


You access logging on Forge with the --logLevel option. Its usage is as follows:
--logLevel (<topicName>=)<logLevel>

By selecting a level you are requesting all feedback at of that level of severity and greater. For example, by specifying the WARN level, you receive WARN, ERROR, and FATAL messages. The --logLevel option sets either the default log level, the topic log level, or both:

The default log level provides global logging for the component. This example
forge --logLevel WARN

logs all WARN level or higher messages. Note: Forge defaults to log all INFO or higher level messages if a default level is not specified.

The topic log level provides logging at the specified level for just the specified topic. This example
forge --logLevel baseline=DEBUG

overrides the default log level and logs all DEBUG messages and higher in the baseline topic.

Endeca Confidential

The command line interface

324

If two different log levels are specified, either globally or to the same topic, the finer-grained level is used. In the case of this example
forge logLevel INFO logLevel WARN

all INFO level messages and higher are printed out. It is possible to specify both default and topic level logging in the same command to filter the feedback that you receive. For example, the command
forge --logLevel WARN --logLevel config=INFO --logLevel update=DEBUG

works as follows:

It logs all WARN or higher messages, regardless of topic. It logs any message flagged with the config topic if it is INFO level or higher. It logs any message flagged with the update topic if it is DEBUG level or higher.

Administrators Guide Chapter 17: The Forge Logging System

Endeca Confidential

325

Aliasing existing -v levels


The pre-5.1 Forge -v logging option is still supported, but has been changed to alias the new --logLevel option as follows: -v[f|e|w|i|d]. The following table maps the relationships and indicates the status of the arguments in this release (new, supported, or deprecated). Argument
-v

Status in 5.1
Supported.

Old meaning
Defaults to v (verbose) or the EDF_LOG_LEVEL environment variable. Verbose (all messages). Info (info, stat, warnings, and errors). Stat (stat, warnings, and errors). Warnings and errors. Errors. Quiet mode (errors). Silent mode (fatal errors). n/a n/a Printed out the timestamp when using --legacyLogFormat.

New log level


DEBUG

-vv -vi

Deprecated. Supported.

DEBUG INFO

-va -vw -ve -vq -vs -vd -vf -vt

Deprecated. Supported. Supported. Deprecated. Deprecated. New. New. Deprecated.

INFO WARN ERROR ERROR FATAL DEBUG FATAL Has no effect. The timestamp is always printed now.

For information about the deprecation state of logging systems used in previous versions of the Endeca software, see the Endeca Migration Guide version 5.1.

Endeca Confidential

The command line interface

326

Logging output to a file


In Forge, the -o flag defines a location for the logging output file. If you do not specify a location, it logs to standard error. The following snippet shows the start of an output file:
INFO 01/25/07 15:15:50.791 UTC FORGE {config}: forge 5.0.1.52 ("i86pc-win32") INFO 01/25/07 15:15:50.791 UTC FORGE {config}: Copyright 2001-2007 Endeca Technologies, Inc. INFO 01/25/07 15:15:50.791 UTC FORGE {config}: Command Line: i86pc-win32\bin\forge.exe INFO 01/25/07 15:15:50.791 UTC FORGE {config}: Initialized cURL, version: libcurl/7.15.5 OpenSSL/0.9.8 ERROR 01/25/07 15:15:50.791 UTC FORGE {config}: A file name is required!

Changes to the EDF_LOG_LEVEL environment variable


The EDF_LOG_LEVEL environment variable continues to be supported. This environment variable sets the default Forge log level. If you choose to use EDF_LOG_LEVEL, the variable should be set to one of the new log level names, such as WARN or ERROR. Just as in previous versions of logging, the value set in EDF_LOG_LEVEL may be overridden by any command line argument that changes the global log level.

Administrators Guide Chapter 17: The Forge Logging System

Endeca Confidential

Chapter 18

The Forge Metrics Web Service


You can query a running Forge component for performance metrics using the Forge metrics Web service. This makes Forge easier to integrate into your management and monitoring framework. This chapter describes the Forge metrics Web service. It contains the following sections:

About the Forge metrics Web service Enabling Forge metrics Using Forge metrics The MetricsService interface

328

About the Forge metrics Web service


The Forge metrics Web service provides progress and performance metrics for Forge. You can use the output of this Web service in the monitoring tool or interface of your choice. A running instance of Forge hosts a WSDL interface, metrics.wsdl. Using this WSDL interface, you can query Forge for specific information about its performance. Metrics are hierarchical, with parent-child relationships indicated by their location in the tree. You can either give the service a full path to precisely the information you are seeking, or get the full tree and traverse it to find what you want. The following is an example of the kind of information tree returned by the Forge metrics Web service.
(Root) Start time: Wed Jan 24 14:34:14 2007 Percent complete: 41.4% Throughput: 871 records/second Records processed: 24000 Components IndexerAdapter Records processed: 24902 Total processing time: 2.441 seconds PropDimMapper Records processed: 24902 Total processing time: 6.983 seconds LoadMainData Records processed: 24903 Total processing time: 8.19 seconds

Each metric can be one of three types:

Metric, which serves as a parent category for child metrics, without containing any data of its own. Attribute metric, such as the start time of the Forge being queried. For each attribute metric you request, you receive ID, Name, and Attribute Value (a string).

Measurement metric, including:

Estimated percent complete. Overall throughput.

Administrators Guide Chapter 18: The Forge Metrics Web Service

Endeca Confidential

329

Number of records processed. Per-component throughput.

For each measurement metric you request, you receive ID, Name, Measurement Units (a string), and Measurement Value (a number).

Notes:
The Forge metrics Web service does not tell you what step Forge is on or its estimated time to completion. The service is not long-lived; it exits when Forge does. For this reason, you cannot use this service to find out how long the Forge run took. The Forge metrics Web service does not work in conjunction with parallel Forge.

Enabling Forge metrics


Before you can generate Forge metrics, you have tell Forge the port on which to set up the Forge metrics Web service. By doing so, you also turn Forge metrics on. In the Endeca Application Controller, you set the web-service-port when you provision the Forge component. You can do this three ways:

In Web Studio, on the EAC Administration page. In a provisioning file used with the eaccmd tool (for details on provisioning a Forge component, see page 158). Programmatically, via the webServicePort on the ForgeComponentType object. For details, see page 255.

Outside of the Application Controller environment, you can also set or change the Web service port (and thus turn on Forge metrics) at the Forge commandline. The commandline argument for setting the metrics port is --wsport <port-number>.

Endeca Confidential

Enabling Forge metrics

330

Enabling SSL security


You can enable SSL on the Forge component to make the metrics Web service secure. For information on enabling SSL on the Forge component while provisioning with eaccmd, see page 158. For information on enabling Forge SSL programmatically, see page 255. Note: The Web services port disregards the cipher sub-element of the ssl-configuration element.

Using Forge metrics


Assuming Forges web-service-port is set, when you start Forge, it serves up the metrics Web service. You can then use any Web services interface to talk to it and request metrics. You can request global information on the parent node, or request on a component-by-component basis. (Each pipeline component has corresponding metrics.) If you request / , the metrics Web service returns the root and all of its children. To refine your request, you give the Web service the path to the node you are interested in.

The MetricsService interface


This section describes the method and classes used by the MetricsService. Note: The metrics schema is defined in metrics.wsdl, which is located in the $ENDECA_ROOT/lib/services directory on UNIX (C:\Endeca\MDEXEngine\5.1.0\lib\services on Windows).

Administrators Guide Chapter 18: The Forge Metrics Web Service

Endeca Confidential

331

getMetric method
The MetricsService interface consists of a single method, getMetric.

getMetric(MetricInputType getMetricInput)
Lists the collection of metrics in an application.

Parameters:
getMetricInput is a MetricInputType object consisting of a path to the

node you want to query and a Boolean setting that allows you to exclude that nodes children from the query.

Throws:
MetricFault is the error message returned when the method fails.

Returns:
getMetricOutput, a string collection of metrics.

MetricsService classes
The MetricsService interface contains the following classes:

MetricType MetricListType MetricInputType MetricResultType AttributeType MeasurementType

MetricType
A class that describes a metric.

Endeca Confidential

The MetricsService interface

332

Properties
id is a unique string identifier for the metric. displayName is the name for the metric, as it appears in the output

file.
children is a collection of metric objects.

MetricListType
A class that describes a list of metrics.

Properties
metric is a collection of metrics comprising this MetricListType object.

MetricInputType
A class that describes the input to the getMetric method.

Properties
path is the path to the node you want to query. Null indicates top level,

returning the whole tree.


excludeChildren lets you indicate if you want just the metrics of the node specified in path or those of its children too.

MetricResultType
A class that describes the output returned by the getMetric method.

Properties
metric is an object of type MetricType.

AttributeType
An extension of MetricType, the AttributeType class describes an attribute metric.

Administrators Guide Chapter 18: The Forge Metrics Web Service

Endeca Confidential

333

Properties
value is a string describing the attribute.

MeasurementType
An extension of MetricType, the MeasurementType class describes a measurement metric.

Properties
value is a double representing the value of the measurement metric. units is a string describing the unit of measure used by the metric.

Endeca Confidential

The MetricsService interface

334

Administrators Guide Chapter 18: The Forge Metrics Web Service

Endeca Confidential

Chapter 19

Useful Third-Party Tools


This chapter lists some third-party tools that you may find useful during the Endeca performance monitoring process. It contains the following sections:

Cross-platform tools Solaris and Linux tools Solaris-specific tools Linux-specific tools Windows tools

Note: The tools listed here are not supported by Endeca and are subject to change. In addition, these suggestions are not meant to overrule your choice of other tools.

336

Cross-platform tools
The following tools are available in both UNIX and Windows versions. Tool
Ethereal

Description
Ethereal is an open source license network protocol analyzer for both UNIX and Windows. It allows you to examine data from a live network or a capture file on disk. For information and downloads, see http://www.ethereal.com.

Tcpdump/Windump

Tcpdump (and its Windows version, Windump) are network traffic analysis tools. These tools can be used to watch and diagnose network traffic according to various complex rules. You can download Tcpdump from http://www.tcpdump.org. You can download Windump from http://www.winpcap.org/windump. Note: Tcpdump comes with most Linux distributions by default.

Administrators Guide Chapter 19: Useful Third-Party Tools

Endeca Confidential

337

Solaris and Linux tools


The following tools are available for both Solaris and Linux. Tool
Netperf

Description
Netperf is a network benchmarking tool that can be used to measure the throughput of many different types of TCP and UDP connections. Netperf provides tests for both unidirectional throughput, and end-to-end latency. Note: Be sure to compile netperf with histogram support. To simulate the network traffic to a MDEX Engine with average result pages of 50,000 bytes, run netperf like this: netperf -l 600 -v 2 -H remotehost -p 8899 -t TCP_CRR -- -r 200, 50000 where:

-l is the length of the test in seconds -v specifies verbose output level -H indicates the host where netserver is running -p indicates the port that was given to the netserver process -t indicates the test to run. TCP_CRR is the TCP test that opens
a new TCP connection for each request/response

-r specifies the request/response characteristics, in this case a


200 byte request (approximately the size of a URL) and a 50K result For information and downloads, see http://www.netperf.org. Top Top is a Unix utility that provides a rolling display of top CPU-using processes. It is a popular and common tool for monitoring system-wide process activity. For information and downloads, see http://www.groupsys.com/top.

Endeca Confidential

Solaris and Linux tools

338

Tool
Sar

Description
Sar reports system activity on single processor systems. It reports the status of counters in the operating system that are incremented as the system performs various activities. These include counters for CPU utilization, buffer usage, disk I/O activity, TTY device activity, switching and system-call activity, file access, queue activity, inter-process communications, swapping and paging. On Solaris, sar is part of the system activity reporter package. On Linux, it is part of the downloadable sysstat package.

iostat

The iostat utility iteratively reports terminal, disk, and tape I/O activity, as well as CPU utilization. On Solaris, iostat is built in to the operating system. On Linux, it is part of the downloadable sysstat package (see Linux-specific tools on page 339).

Solaris-specific tools
The following utilities are built into Solaris. Tool
prstat

Description
On Solaris the prstat command displays information about active processes on the system. By default, prstat displays information about all processes sorted by CPU usage. On multiprocessor machines, cpusar reports per-CPU statistics, and mpsar reports systemwide statistics. Kstat reports many kernel parameters and statistics. The lockstat utility gathers and displays kernel locking and profiling statistics. Lockstat allows you to specify which events to watch, how much data to gather for each event, and how to display the data.

cpusar and mpsar Kstat lockstat

Administrators Guide Chapter 19: Useful Third-Party Tools

Endeca Confidential

339

Linux-specific tools
The following tools are available for Linux. Tool
sysstat

Description
The sysstat utilities package is a download for Linux that contains performance monitoring tools such as iostat, sar, and mpstat. Iostat and sar are described in Solaris and Linux tools on page 337; mpstat is described below. For information and downloads, see http://perso.wanadoo.fr/sebastien.godard.

Mpstat

Mpstat is the Linux multiprocessor load display utility. It displays system processor activity information on your screen for each of the processors serialized on your system.

Windows tools
The following tools are available for Windows. Tool
Task Manager

Description
The Windows Task Manager provides information about programs and processes running on your computer. It also displays the most commonly used performance measures for processes. You can access the Task Manager by right-clicking an empty area on the task bar on your Windows machine.

Performance Monitor

The Performance Monitor provides details about the resources used by specific components of the operating system and by programs that have been designed to collect performance data. You can access the Performance Monitor from the Control Panel by selecting Administrative Tools > Performance.

Endeca Confidential

Linux-specific tools

340

Tool
Other performance tools

Description
Sysinternals (http://www.sysinternals.com) offers useful freeware tools, including the following:

Process Explorer, which shows you information about


which handles and DLL processes have opened or loaded.

TCPView, which shows you detailed listings of all TCP


and UDP endpoints on your system, including the local and remote addresses and state of TCP connections. On Windows NT, 2000, and XP TCPView also reports the name of the process that owns the endpoint.

Administrators Guide Chapter 19: Useful Third-Party Tools

Endeca Confidential

SECTION V
Appendices

342

Administrators Guide

Endeca Confidential

Appendix A

Endeca Flag Reference


This appendix provides a description of the options (flags) used by the following Endeca programs:

Agidx options Agraph options Dgidx options Dgraph options Forge options

IMPORTANT: All options are case-sensitive. Note: Keep in mind the following terminology equivalences, which may clarify the meaning of some options:

Bin equals record Model equals dimension Category equals dimension value Attribute equals property Feature link equals precedence rule Edge equals refinement dimension value

344

Agidx options
Agidx is a file that runs in a distributed environment. It creates a set of Agidx indices and aggregates the Agraph index with the current data subset. Usage Agidx has the following usage:
agidx [-v] [--options] <input db_prefix list> <output db_prefix>

Options Agidx contains the following options: Option


--agidx_out <input db_prefix>

Description
Prefix for output generated previously by Agidx that should now be used as input. This option helps you incrementally build the Agidx index, allowing you to run Agidx against individual data subsets that have been generated by Dgidx. Verbose mode. Specify file path to which stdout/stderr should be remapped (default is to use default stdout/stderr for the process). Print version information and exit. Print this help message and exit.

-v --out <stdout/stderr file>

--version --help

Administrators Guide Appendix A: Endeca Flag Reference

Endeca Confidential

345

Agraph options
A distributed configuration requires an additional program called Agraph. The Agraph program is responsible for receiving requests from clients, forwarding the requests to the distributed MDEX Engines, and coordinating the results. From the perspective of the Endeca API, the Agraph program behaves identically to a Dgraph program. Usage The Agraph has the following usage:
agraph [-v] [--options] <db_prefix>

Options The Agraph uses the following options: Option


-v --help --back_compat <api-version>

Description
Verbose mode. Print this help message and exit. Enable backwards compatibility, so that the Agraph can communicate with previous versions of the Presentation API. Only the previous two full versions are supported (i.e., 5.0.x and 4.8.x). Therefore, the value for <api-version> must be one of the following:

500 = for all 5.0.x versions of the API. 480 = for all 4.8.x versions of the API.
--child <host>:<port> Specify the location of a child Dgraph or Agraph process. Specify a configuration file to read on startup. The configuration file should contain arguments of the same format used on the command line (that is, it ignores whitespace, including newlines).

--config <filename>

Endeca Confidential

Agraph options

346

Option
--fork

Description
(UNIX only) Causes the Agraph to fork off a new process to handle each request. (UNIX only) Set the maximum number of live child processes in --fork mode. Default value is 4. Change the path for the request log file (./agraph.reqlog is the default value). Set default maximum wait time (in seconds) for client connection shutdown. The default value is 1 second. Specify the maximum number of seconds the Agraph waits for the client to download data across the network. The default network timeout value is 30 seconds. Disable caching of hostname to IP number lookups for child Dgraphs. By default, the Agraph caches these name lookups to improve performance. Disable inclusion of implicit refinement dimension values in computed refinement sets. Implicit refinements are dimension values that are assigned to all records in the current result set, and whose selection therefore does not narrow the results. Do not process merchandising (that is, dynamic business rule) results from children. These are processed by default. Do not return results if any child fails to respond. Specify file path to which stdout/stderr should be remapped. (The default is to use default stdout/stderr for the process.)

--fork-max <max-fork- children>

--log <path>

--net-closetimeout

--net-timeout

--nodnscache

--noimplicit

--nomerch

--no-partial --out <stdout/stderr file>

Administrators Guide Appendix A: Endeca Flag Reference

Endeca Confidential

347

Option
--pidfile <pidfilename>

Description
Specify the file to write the process ID (pid) to. If unspecified, the default name of the pid file depends on how the Agraph starts. Running the Agraph in a Control System environment (deprecated) or from the command line creates a default named agraph.pid. Running the Agraph in an Endeca Application Controller environment creates a default named agraph-S0-R0.pid.

--port <num>

Specify the port that the Agraph listens to for user queries on the associated host. Default is 8888. Specify initial record list radius (tuning parameter; the default is 100). Create dynamic record properties indicating the relevance rank assigned to record search results. Print version information and exit.

--radius <num>

--stat-brel

--version

Endeca Confidential

Agraph options

348

Dgidx options
The Dgidx program indexes the tagged Endeca records that were prepared by Forge, and creates the proprietary indices for the Endeca MDEX Engine. Usage The usage of Dgidx is as follows:
dgidx [-cCqvS] [--options] <data export file> <output db_prefix>

Options Dgidx contains the following options: Option


-q -v --compoundDimSearch

Description
Quiet mode. Verbose mode. Enable compound dimension search for the application. Use of this option increases indexing time. However, if this option is not enabled at index time, compound dimension results (multiple-dimension-value results) are not returned by the MDEX Engine. Compute and report coverage statistics for dimensions and properties. Compute dimension value equivalence classes as a space-saving optimization. This adds time to the indexing phase, but reduces the size of the index. The default is to search leaf assignments only. Print the help message and exit. Deprecated. Do not delete unused dimension values from the system. Note: From version 4.8, this behavior is now the default.

--cov

--equivopt

--help --keepcats

Administrators Guide Appendix A: Endeca Flag Reference

Endeca Confidential

349

Option
--lang <lang-id>

Description
Assume all documents are in the specified language. The default for <lang-id> is en. For details, see the Internationalized Data section in the Endeca Developers Guide. Ignore character accents when indexing text. Use ISO Latin 1 character mappings for international characters when performing search indexing. Note that the accents are folded down before indexing, so only a single form is indexed. If --wildcard indexing is enabled, specifies the minimum text substring length to index. Generally, this value should not be modified (default is 1). Disable strict attribute checking. Allows records to retain property values for properties with no property (or <PROP_REF> element) defined in the navigation configuration file, and in the Properties view of Developer Studio. Do not do XML validation while reading the XML export file. This option only makes a difference if the export file is in XML format. Limit the number of records that Dgidx reads. Compute the inverted index using the slower but more memory-efficient offline method. Specify the number of tmp files that should be used during offline indexing of the inverted index (the default is 8). Specify file path to which stdout/stderr should be remapped (the default is to use default stdout/stderr for the process).

--latin1

--ngram_min <value>

--nostrictattrs

--noxmlvalidate

--numbins <num> --offline

--offline_tmpn <num>

--out <stdout/stderr file>

Endeca Confidential

Dgidx options

350

Option
--pos_backcompat

Description
Deprecated. Instructs Dgidx to create positional indexes using the 4.8.x configuration model. This flag is provided as a migration convenience to replicate 4.8.x indexing behavior. To replicate 4.8.x behavior for creating positional indexes, run the flag and also select Enable positional indexing for a dimension or property. If you run the flag and do not select Enable positional indexing for a dimension or property, then Dgidx does not create a positional index. (In versions 5.0.0 and higher, Dgidx creates a positional index by default for each dimension and property.)

--sort <spec>

Specify a default sort specification for the data set. The format of <spec> is: key|dir where key is the name of a property or dimension on which to sort and dir is either asc for ascending or desc for descending (if not specified, the order will be ascending). key can also be a geocode property, as in this example: Location(43,73)|desc You can specify multiple sort keys in the format: key_1[|dir_1]||key_2[|dir_2]| |...||key_n[|dir_n] If you specify multiple sort keys, the records are sorted by the first sort key, with ties being resolved by the second sort key, whose ties are resolved by the third sort key, and so on.

Administrators Guide Appendix A: Endeca Flag Reference

Endeca Confidential

351

Option
--spellmode <mode>

Description
Specify the spelling correction mode for the application. Supported modes are:

default aspell espell aspell_OR_espell aspell_AND_espell


For details, see the Using Spelling Correction and Did You Mean section in the Endeca Developers Guide. --spellnum In spelling modes that enable the espell module, include non-word terms (numbers, symbols, and so on) in the espell dictionary. By default, such terms are not included. For details, see the Using Spelling Correction and Did You Mean section in the Endeca Developers Guide. Specify the path to a temporary directory to be used to perform offline sorts (for --offline mode only). The default directory is <output db_prefix>. Report which record properties are mapped to which languages. Print version information and exit. Deprecated. Instead, build needed indexes to support wildcard text searches.

--tmpdir <dir>

--verbose-language-m apping --version --wildcard

Endeca Confidential

Dgidx options

352

Dgraph options
You start the MDEX Engine by running a program called Dgraph, and pointing it at a set of indices prepared by the Data Foundry. The Dgraph has a number of options that allow you to adjust the MDEX Engine (for example, you can tweak spelling, caching, and so forth). Usage The usage of Dgraph is as follows:
dgraph [-?Adv] [--options] <db_prefix>

Options The Dgraph contains the following options: Option


-? -A

Description
Print the help message and exit. Disallow server shutdown/restart through admin URLs. Start in debug mode. Verbose mode. Print information about each request to stdout. Compute counts for root dimension values and any intermediate dimension value selections. This matches the default behavior in earlier versions of the MDEX Engine. By default, the Dgraph only computes refinement counts for proper refinements (in other words, for actual refinement dimension values). It does not compute counts for root dimension values or for any intermediate dimension value selections.

-d -v

--ancestor_counts

Administrators Guide Appendix A: Endeca Flag Reference

Endeca Confidential

353

Option
--back_compat <api-version>

Description
Enable backwards compatibility, so that the Dgraph can communicate with previous versions of the Presentation API. Only the previous two full versions are supported (i.e., 5.0.x and 4.8.x). Therefore, the value for <api-version> must be one of the following:

500 = for all 5.0.x versions of the API. 480 = for all 4.8.x versions of the API.
--backlog-timeout <time in seconds> Specify the wait limit for a query that has been read and queued for processing. After n seconds spent waiting in the process queue, the Dgraph responds with a timeout message. The default is 60 seconds. Specify the maximum memory usage in MB for the MDEX Engine main cache. When --cmem is not specified, a default value size of 256 MB is used. Specify a configuration file to read on startup. The configuration file should contain arguments of the same format used on the command line (that is, it ignores whitespace, including newlines). Deprecated. Instead, use the SEARCH_INERT_DVALS attribute of DIMSEARCH_CONFIG in your projects Dimsearch_config.xml file. Allow non-navigable dimension values, (such as dimension roots) in dimension value search results. Normally, these dimension values are dynamically filtered out of dimension value search results. --deadends Allow dead-end refinement options.

--cmem <MB>

--config <path>

--csrch_nnav

Endeca Confidential

Dgraph options

354

Option
--disable_fast_aspell

Description
Disable fast mode for the aspell spelling module. If you disable fast mode, it decreases the performance of the spelling correction, but may allow additional queries to be corrected. When the fast mode is enabled, it can significantly speed up applications that use spelling correction features with the aspell module. The fast mode is used by default.

--dtag <data-tag>

Specify the data tag to send with all result XML objects. The default is to use <db_prefix> as the data tag. Enable did you mean explicit query spelling suggestions for fulltext search queries. Specify the threshold number of hits at or above which did you mean suggestions will not be generated. The default is 20. Specify the maximum number of did you mean query suggestions to return for any query. The default is 1. Specify the threshold spelling correction score for words used by the did you mean engine. The default is 175. Deprecated. Enable refinement verbose/debugging messages. Deprecated. Specify the maximum number of records to sample during refinement computation. The default is 64 records. Larger values can improve edge-ranking quality, but may reduce performance. Specify the minimum number of records to sample during refinement computation. The default is 0. Larger values can improve dynamic refinement ranking quality but may reduce performance.

--dym

--dym_hthresh <thresh>

--dym_nsug <count>

--dym_sthresh <thresh>

--edebug

--esamp <num>

--esampmin <num>

Administrators Guide Appendix A: Endeca Flag Reference

Endeca Confidential

355

Option
--esampr <num>

Description
Deprecated. For dynamically-ranked dimensions, specify the maximum number of dimension values per top N refinements above which the refinement algorithm will reduce to exclusively bottom-up mode. The default is 4. Deprecated. Specify the threshold number of records below which strict bottom-up refinement LCA computation is used. The default is 128 records. This is a performance tuning parameter. The best value depends on the structure and scale of the input data set. Deprecated. Set a threshold number of hits above which exact/substring based rankings will not be computed. The default is no threshold. Print the help message and exit. Disable the default approximate computation of implicit refinements. This option is not a recommended setting. If this option is not enabled, dimension values without full coverage of the current result record set may sometimes be returned as implicit refinements, although the probability of such false implicit refinements is miniscule.

--ethresh <num>

--ftrnk_thrsh <thresh>

--help --implicit_exact

--implicit_sample

Specify the maximum number of records to sample per query. The default is 1024. In approximate computation mode (default), this parameter allows a trade-off between performance and the likelihood of incorrect implicit refinements being returned. In implicit_exact mode, this option is simply a performance tuning parameter that can be used to trade off record sampling work for index access work.

Endeca Confidential

Dgraph options

356

Option
--lang <lang-id>

Description
Assume all queries are in the specified language. The default is en. For details, see the Internationalized Data section in the Endeca Developers Guide. Ignore character accents when handling search requests, and use ISO Latin 1 character mappings when processing search requests. Specify the path for the Dgraph request log file. If unspecified, the default name of the request log file depends on how the Dgraph starts. Running the Dgraph in a Control System environment (deprecated) or from the command line creates a default log file named dgraph.reqlog. Running the Dgraph in an Endeca Application Controller environment creates a default log file named dgraph-S0-R0.log. Specify the path and filename for the Endeca Query Language statistics log. By default, this log is turned off; specifying this flag will activate logging of statistics for Endeca Query Language requests. Sets the threshold above which statistics information for an Endeca Query Language request will be logged. The value is specified in milliseconds (1000 milliseconds = 1 second). The value can also be specified in seconds by adding a trailing s to the number, such as 1s for 1 second. The default is 60000 milliseconds (1 minute). Note that this flag is dependent on the --log_stats flag being used. Show memory usage of the data structures of the Dgraph. Deprecated. Display verbose debugging messages during merchandising rule processing.

--latin1

--log <path>

--log_stats <path>

--log_stats_thresh <value>

--memusage

--merch_debug

Administrators Guide Appendix A: Endeca Flag Reference

Endeca Confidential

357

Option
--net-close-timeout

Description
Specify the maximum wait time (in seconds) for client connection shutdown. The default value is 1 second. Specify the maximum number of seconds the Dgraph waits for the client to download data across the network. The default network timeout value is 30 seconds. Do not return information about implicit dimensions with node results, when displaying refinements in navigation results. This flag lets you optimize performance for applications where it is not necessary to present the implicit dimensions to the users in navigation results. If you specify this flag, the MDEX Engine still computes the implicit dimensions with node results, but they are not included in the navigation results that are displayed to the users.

--net-timeout

--noctrct

--nomrf --out <stdout/stderr file>

Disable filtering for dynamic business rules. Specify file path to which stdout/stderr should be remapped (the default is to use default stdout/stderr for the process). Running the Dgraph in an Endeca Application Controller environment creates a default file named dgraph-S0-R0.out.

--pcmem

Specify the maximum memory usage in MB for the page cache. The default is 32 MB. Deprecated. Display verbose performance debugging messages during core Dgraph navigation computations.

--perf_msg

Endeca Confidential

Dgraph options

358

Option
--persistdir

Description
Directs the Dgraph audit persistence file (written by default to a directory called persist that is located in the applications working directory) to a directory of your choice. For details about the audit persistence file, see page 317.

IMPORTANT: Use the --persistdir flag only when you first create the Dgraph. Do not move or rename this directory after it has been created.
--pidfile <pidfile-path> Specify the file to write the process ID (pid) to (./dgraph.pid. is the default). Specify the port to use in server (non-interactive) mode. The default is 5555. Deprecated. Causes the Dgraph to display a help message describing the set of available dynamic search relevance ranking modules, and the syntax for specifying relevance ranking strategies. For details on the relevance ranking modules, see the Using Relevance Ranking section in the Endeca Developers Guide. Deprecated. Display verbose information about relevance ranking during search query processing. Deprecated. Set the relevance ranking strategy for dimension value search. Deprecated. Display verbose information about record filter performance. Specify the max number of terms for text search. Default is 10.

--port <num>

--relrnk_help

--relrnk_msg

--relrnk_cats <relrank-str> --rflt_msg

--search_max <num>

Administrators Guide Appendix A: Endeca Flag Reference

Endeca Confidential

359

Option
--snip_cutoff <num words>

Description
Limit the number of words in a property that the MDEX Engine evaluates to identify the snippet. If a match is not found within <num> words, the MDEX Engine does not return a snippet, even if a match occurs later in the property value. If <num words> is unspecified, the default is 500. (The 500-word default applies even if the flag is not specified at all in the Dgraph options.) Globally disable snippeting. Specify location of spelling data files. Parameter should be a full path to a directory containing the needed aspell support files for spelling correction features (see --dym, --spl, and --spld options). Note that this path must be an absolute path (relative paths are not supported). In addition, this is a path to a directory containing at least the generic pspell/aspell support files. This does not need to be the same as the location of the .spelldat file for the indexed data set. The Dgraph typically requires write permissions in this directory, unless a correct or writable .pwli file is already available in this directory. Set maximum number of variants considered for spelling and did you mean correction (the default is 32). Allow cross-property suggestions, and count cross-property matches when evaluating the frequencies of suggestions. Normally, suggestions must match results in a single property value.

--snip_disable --spellpath <path>

--spell_bdgt <num>

--spell_glom

--spell_msg

Deprecated. Enables verbose output for spelling correction features.

Endeca Confidential

Dgraph options

360

Option
--spell_nobrk

Description
Disable word-break analysis in the suggestion engine. Normally, in addition to considering spelling corrections, the suggestion engine considers alternate word separation points for the query to generate suggestions for did you mean and auto-correct.

--spl

Enable auto-suggest spelling corrections for fulltext search. Specify the minimum number of hits at or above which auto-correct suggestions will not be generated for full text searches. The default is 1, meaning that if there are one or more hits for a users full text search, then auto-correct does not provide spelling suggestions. Stated differently, if you use the default of 1 and there are zero (0) hits for a users search, then spelling auto-correct does engage and provides suggestions for alternate keyword spellings. Specify the maximum number of auto-correct suggestions to return for any full text search query. The default is 1. Specify the threshold spelling correction score for words used as auto-correct suggestions. The default is 125. Deprecated. Filter dimension values out of term search results when an ancestor dimension value is already available in the search results. Specify the path of the eneCert.pem certificate file that will be used by the Dgraph to present to any client for SSL communications. If not given, SSL is not enabled for Dgraph communications.

--spl_hthresh <thresh>

--spl_nsug <count>

--spl_sthresh <thresh>

--srchfltr

--sslcertfile <certfile-path>

Administrators Guide Appendix A: Endeca Flag Reference

Endeca Confidential

361

Option
--sslcafile <CA certfile-path>

Description
Specify the path of the eneCA.pem Certificate Authority file that the Dgraph will use to authenticate SSL communications with other Endeca components. If not given, SSL mutual authentication is not performed. Set one or more cipher names (such as RC4-SHA) that specify the minimum cryptographic algorithm that the Dgraph will use during the SSL negotiation. If multiple ciphers are specified, the names must be separated by colons. Enable all available dynamic dimension value attributes. Note that this option has performance implications and is not intended for production use. Set the cutoff for record counts. Once there are this many records associated with a refinement dimension value, the record count algorithm will stop and return this number or a number higher than it. Set the threshold for stat-bins (that is, the maximum number of records above which record counts will not be computed). By default, stat-bins runs with no threshold. Create dynamic record attributes indicating the relevance rank assigned to fulltext search result records. Create dynamic record attributes indicating the weight for all records returned. Create dynamic dimension value attributes indicating the relevance ranking score (for dimension value search results).

--sslcipher <cipher-list>

--stat-all

--stat-bins-cutoff <nbins>

--stat-bins-thresh <thresh>

--stat-brel

--stat-bwgt

--stat-rel

Endeca Confidential

Dgraph options

362

Option
--stat-srnk

Description
Deprecated. Create dynamic dimension value attributes indicating the static rank for all dimension values. Direct all output to syslog. Set a limit on the number of words in a users search query that are subject to thesaurus replacement. The default value of <limit> is 3. This means that up to 3 words in a users search query can be replaced with thesaurus entries. If there happen to be more terms in the query that match thesaurus entries, say terms 4 and 5, then terms 4 and 5 are not replaced by thesaurus expansion. This option is intended as a performance guard against very expensive thesaurus queries. Lower values improve thesaurus engine performance. For more information, see the Using Stemming and Thesaurus section in the Endeca Developers Guide.

--syslog --thesaurus_cutoff <limit>

--thesaurus_msg

Deprecated. Enables verbose output for the thesaurus engine. Specify that words in a multiple-word thesaurus form should be treated like phrases and should not be stemmed, which will increase performance for some query loads. Single-word terms will be subject to stemming regardless of whether this flag is specified. This flag prevents the Dgraph from expanding multi-word thesaurus forms by stemming. Thesaurus entries continue to match any stemmed form in the query, but multi-word expansions only include explicitly listed forms. To get the multi-word stemmed thesaurus expansions, the various forms must be listed explicitly in the thesaurus.

--thesaurus_ multiword_nostem

Administrators Guide Appendix A: Endeca Flag Reference

Endeca Confidential

363

Option
--threads <num>

Description
Specify the number of query threads. If the specified value is 0, the Dgraph runs in non-threaded mode. If the specified value is greater than 0, the Dgraph runs in threaded mode executing the specified number of query threads. The default is 0 (non-threaded). In threaded mode, additional threads are also started to execute internal maintenance tasks. Specify the path to a temporary directory to be used to hold temporary files (the default is the base directory of db_prefix). Specify to the dgraph not to compute implicit dimensions, and to only compute and present explicitly specified dimensions, when displaying refinements in navigation results. Specifying this flag does not reduce the size of the resulting record set that is being displayed. Be aware that if you use this flag, in order to receive meaningful navigation refinements, you need to make top-level precedence rules work for ALL outbound queries. (Since the dgraph does not compute implicit dimensions, it also no longer uses precedence rules for all queries, which otherwise it does by default). You can make top-level precedence rules work for all your queries by appending the ID of the root of the primary dimension to the navigation state on each outbound query (such as, use N=xxx, instead of N=0 in your query). If you do not do this, you may receive meaningless refinement options returned, for some of your queries. Specifying this flag lets you improve run-time performance of the MDEX Engine. For more information on ways of improving the run-time performance of the MDEX Engine, see the "Displaying refinement dimension values" section in the Endeca Performance Tuning Guide.

--tmpdir <dir>

--unctrct

Endeca Confidential

Dgraph options

364

Option
--updatedir <dir>

Description
Specify the directory into which completed partial update files will be placed. Partial update files are also read from this directory. For more information, see the Implementing Partial Updates section in the Endeca Information Transformation Layer Guide. Specify the file for update related log messages. If unspecified, the default name of the update file depends on how the Dgraph starts. Running the Dgraph in a Control System environment (deprecated) or from the command line creates a default named dgraph.updatelog. Running the Dgraph in an Endeca Application Manager environment creates a default named dgraph-S0-R0-update.log. Show verbose messages while processing updates. Validate that all indexed data loads and then exit. Print version information and exit. In word-break analysis, specify the maximum number of breaks to insert or remove per query. The default is 1. In word-break analysis, specify the minimum length of a new word-break term. The default is 2. In word-break analysis, disable word-break insertion analysis. In word-break analysis, disable word-break removal analysis.

--updatelog

--updateverbose

--validate_data

--version --wb_maxbrks

--wb_minbrklen

--wb_noibrk

--wb_norbrk

Administrators Guide Appendix A: Endeca Flag Reference

Endeca Confidential

365

Option
--wildcard_approx <mode>

Description
Enable approximate wildcard search query matching, which is faster than default exact wildcard matching, but may return some false positive matches. (Use larger values of --ngram_max at indexing time to decrease the likelihood of false positives with this option.) Supported values for the <mode> parameter are:

all = enables approximate matching for all


wildcard queries.

offline = enables approximate matching only


for offline document content (which is typically the most expensive to query with wildcards). --wildcard_max <count> Specify the maximum number of terms that can match a wildcard term in a dictionary-based wildcard query. The default is 100. Deprecated. Enable verbose output for wildcard search matching. Enable computation of why did it match dynamic record attributes returned as results of fulltext search queries. These dynamic attributes contain a copy of the property/dimension key and value that caused the match, along with query interpretation notes (spelling, thesaurus, and so on). Similar to --whymatch, but produces more concise dynamic attribute values containing only the property/dimension key and query interpretation notes. This is useful when the property value might include large amounts of text, such as document contents. For more information on both options, see the Using Why Did It Match section in the Endeca Developers Guide.

--wildcard_msg

--whymatch

--whymatchConcise

Endeca Confidential

Dgraph options

366

Option
--wordinterp

Description
Enable computation of word interpretation dynamic supplement (or see-also) objects, which report on alternate forms of user query terms considered by the text search engine while processing fulltext (record) search requests. For more information, see the Using Word Interpretation section in the Endeca Developers Guide.

Forge options
The Forge program transforms your raw data into tagged Endeca records. Forge references the information in the pipeline you create with Developer Studio to perform its transformations. Usage The usage of Forge is as follows:
forge [-bcdinov] [--options] <Pipeline-XML-File>

<Pipeline-XML-File> can be a relative path or use the file://[hostname]/ protocol. Options Forge takes the following options: Option
-b <cache-num>

Description
Specify the maximum number of records that the record caches should buffer. This may be set individually in the Maximum Records field of the Record Cache editor in Developer Studio.

Administrators Guide Appendix A: Endeca Flag Reference

Endeca Confidential

367

Option
-c <name=value>

Description
Forge has a set of XML entity definitions whose values can be overridden at the command line, such as current_date, current_time, and end_of_line. You can specify a replacement string for the default entity values using the -c option, or in an .ini file specified with -i (described below). The format is: <configValName=configVal> For example: end_of_line=\n which would be specified on the command line with: -c end_of_line=\n or included as a line in an .ini file specified with -i. This allows you to assign pipeline values to Forge at the command line. In the above example, you would specify &end_of_line; in your pipeline file instead of hard-coding \n, then invoke Forge with the -c option shown above. Forge would substitute \n whenever it encountered &end_of_line;. For a complete list of entities and their default values, see the ENTITY definitions in Endeca_Root/conf/dtd/common.dtd.

-d <dtd-path>

Specify the directory containing DTDs (overrides the DOCTYPE directive in XML). Specify an .ini file that contains XML entity string replacements. Each line must be in this form: <configValName=configVal> See the description of the -c option for details.

-i <ini-filename>

-n <parse-num>

Specify the number of records to pull through the pipeline. This option is ignored by the record cache component.

Endeca Confidential

Forge options

368

Option
-o <filename> -v[f|e|w|i|d]

Description
Specify an output file for messages. Set the global log level. See --logLevel for corresponding information. If the -v option is omitted, the global log level defaults to d (DEBUG) or the value set in the EDF_LOG_LEVEL environment variable. If the -v option is used without a level, it defaults to d (DEBUG).

f = FATAL messages only. e = ERROR and FATAL messages. w = WARNING, ERROR, and FATAL
messages.

i = INFO, WARNING, ERROR, and FATAL


messages.

d = DEBUG, INFO, WARNING, ERROR, and


FATAL messages. Note: Options -v[a|q|s|t|v] have been deprecated. See the Migration Guide for more details. --client <server:port> Run as a client and connect to a server in a parallel environment. Specify to a server to use num instead of assigning a client number. Useful when the client number must remain consistent (that is, it must start from zero and be sequential for all clients). Requires the --client option. Specify the number of records that can be combined (via a Combine join or a record cache with the Combine Records setting enabled) before issuing a warning that performance may be slow. The default is 100 while 0 will disable the warnings.

--clientNum <num>

--combineWarnCount <num>

Administrators Guide Appendix A: Endeca Flag Reference

Endeca Confidential

369

Option
--compression <num> | off

Description
Instruct Forge to compress the output to a level of <num>, which is 0 to 9 (where 0 = minimum, 9 = maximum). Specify off to turn off compression. Specify the number of retries (-1 to 100) when connecting to the server. The default is 12 while -1 = retry forever. Requires the --client option. Deprecated. Specify the global disk backed record cache setting (<value> is either NONE or IN_MEMORY_INDEX). Encrypt a key pair so that only Forge can read it. For details on this options, see the Implementing the Endeca Crawler section in the Endeca Data Foundry Guide. Print full help if used with no options. Prints specific help with these options (option names and arguments are case sensitive):

--connectRetries <num>

--dbRecCache <value>

--encryptKey [user:]<password>

--help [option]

expression = Prints help on expression


syntax.

expression:TYPE = Prints help on the


syntax for a specific expression type, which can be DVAL, FLOAT, INTEGER, PROPERTY, STREAM, STRING, or VOID.

config = Prints help on configuration


options. --idxCompression [<num> | off] Set the compression of the IndexerAdapter output Forge to a level of <num>, which is 0 to 9 (where 0 = minimum, 9 = maximum). Specify off to turn off compression. Instruct Forge to ignore any state files on startup. The state files are ignored only during the startup process. After start up, Forge creates state files during an update and overwrites the existing state files.

--ignoreState

Endeca Confidential

Forge options

370

Option
--indexConfigDir <path>

Description
Instruct Forge to copy index configuration files from the specified directory to its output directory. Instruct Forge to load input data from this directory. <path> must be an absolute path and will be used as a base path for the pipeline. Any relative paths in the pipeline will be relative to this base path.

--inputDir <path>

Note: If the pipeline uses absolute paths, Forge ignores this flag.
--input-encoding <encoding> --javaArgument <java_arg> Deprecated. Specify the encoding of non-XML input files. Prepend the given Java option to the Java command line used to start a Java virtual machine (JVM). Override the value of the Class path field on the General tab of the Record adapter, if one is specified. If the Record adapter has a Format setting with JDBC selected, then Class path indicates the JDBC driver. If the Record adapter has a Format setting with Java Adapter selected, then Class path indicates the absolute path to the custom record adapters .jar file. --javaHome <java_home> Specifies the location of the Java runtime engine (JRE). This option overrides the value of the Java home field on the General tab of a Record adapter, if one is specified. The --javaHome setting requires Java 2 Platform Standard Edition 5.0 (aka JDK 1.5.0) or later. --logDir <path> Instructs Forge to write logs to this directory, overriding any directories specified in the pipeline.

--javaClasspath <classpath>

Administrators Guide Appendix A: Endeca Flag Reference

Endeca Confidential

371

Option
--logLevel (<topicName>=) <logLevel>

Description
Set the global log level and/or topic-specific log level. If this option is omitted, the value defaults to INFO or to that set in the EDF_LOG_LEVEL environment variable. For corresponding information, see the -v option. For more information about Forge logging, see page 321. Possible log levels are:

FATAL = FATAL messages only. ERROR = ERROR and FATAL messages. WARNING = WARNING, ERROR, and FATAL
messages.

INFO = INFO, WARNING, ERROR, and FATAL


messages.

DEBUG = DEBUG, INFO, WARNING, ERROR,


and FATAL messages. Possible topics for Forge are:

baseline update config webservice metrics


--noAutoGen Do not generate new dimension value IDs (for incremental updates when batch processing is running). The number of clients connecting. Required with --server option.

--numClients <num>

Endeca Confidential

Forge options

372

Option
--numPartitions <num>

Description
Specify the number of Dgidx instances available to Forge. This number corresponds to the number of Dgraphs, which in turn corresponds to the number of file sets Forge creates. This option overrides the value of the NUM_IDX attribute in the ROLLOVER element of your projects Pipeline.epx file, if one is specified.

--outputDir <path>

Instruct Forge to save output data to this directory, overriding any directories specified in the pipeline. Override the value specified in Output prefix field of the Indexer Adapter or Update Adapter editors in your Developer Studio pipeline. Add <dir> to perls library path. May be repeated. File in which to store process ID (PID). Print records as they are produced by each pipeline component. If number is specified, start printing after that number of records have been processed. Specify the number of seconds (0 to 60) to sleep between connection attempts. The default is 5. Requires the --client option. Run as a server and listen on port specified Requires the --numClients option.

--outputPrefix <prefix>

--perllib <dir>

--pidfile <pidfile-path> --printRecords [number]

--retryInterval <num>

--server <portNum>

Administrators Guide Appendix A: Endeca Flag Reference

Endeca Confidential

373

Option
--spiderThrottle <wait>: <expression_type>: <expression>

Description
During a crawl, throttle the rate at which URLs are fetched by the spider, where: <wait> is the fetch interval in seconds. <expression_type> specifies the type of regular or host expression to use:

url-regex url-wildcard host-regex host-wildcard


<expression> is the corresponding expression. Example: --spiderThrottle 10:url-wildcard:*.html This would make all URLs that match the wildcard *.html wait 10 seconds between fetches. Note: This flag controls the Endeca Crawler not the Endeca Advanced Crawler. See the Endeca Data Foundry Guide for details about the Endeca Advanced Crawler. --sslcafile <CAcertfile-path> Specify the path of the eneCA.pem Certificate Authority file that the Forge server and Forge clients will use to authenticate each other. Specify the path of the eneCert.pem certificate file that will be used by the Forge server and Forge client for SSL communications. Set a cipher string (such as RC4-SHA) that specifies the minimum cryptographic algorithm the Forge server/client will use during the SSL negotiation. Note: This setting is ignored by the --wsport flag, even when it uses SSL to secure its communications. --stateDir <path> Instruct Forge to persist data in this directory, overriding any directories specified in the pipeline.

--sslcertfile <certfile-path>

--sslcipher <cipher>

Endeca Confidential

Forge options

374

Option
--tmpDir <path>

Description
Instruct Forge to write temporary files in the specified directory, overriding any directories specified by environment variables. The <path> value is interpreted as being based in Forges working directory, not in the directory containing Pipeline.epx. Timing statistics (comp = time each component). Specify the number of seconds (from -1 to 300) that the server waits for clients to connect. Default is 60 and -1 means wait forever. Requires the --server option. Print out the current version information. Start the Forge metrics Web service, which is off by default. It listens on the port specified.

--time <comp>

--timeout <num>

--version --wsport <portNum>

Administrators Guide Appendix A: Endeca Flag Reference

Endeca Confidential

Appendix B

Endeca IAP Ports


This appendix provides a list of the ports used by the Endeca Information Access Platform and their default port numbers. By default, we mean that these port numbers are used in the configuration files that ship with the reference implementation. You can replace any of the default port numbers with numbers of your own, as long as they do not conflict with an existing port on your machine.

376

Default Ports

Port
Endeca MDEX Engine, user query port Endeca Logging and Reporting Server port Note: The Log Server port number can be no larger than 32767. Endeca Control System JCD port Note: The JCD is deprecated in this release. Endeca HTTP service port Endeca HTTP service shutdown port

Default
8000 8002

8088

8888 8090

Administrators Guide Appendix B: Endeca IAP Ports

Endeca Confidential

Appendix C

About the Baseline Update Script


The baseline update script is an EAC script you can use to run a simple baseline update. The baseline update script works as follows:

The script copies the project configuration files from Web Studio to the host running Forge, using emgr_update.pl. It runs Forge, and then copies the post-Forge dimensions back to Web Studio, again using emgr_update.pl. The script copies the Forge output files onto the host running the Indexer, and then runs the Indexer. It brings down the Dgraph if it is running, copies the Indexer output files, and uses emgr_update.pl to download associated application rules, redirects, and thesaurus entries. The script starts the Dgraph.

A version of the baseline update script is included with the Endeca software, and is stored in %ENDECA_ROOT%\bin\baseline-update.bat for Windows ($ENDECA_ROOT/bin/baseline-update.sh for UNIX). You can copy and modify the underlying Java code as needed, as described in the following section. Note: Before editing the baseline update script, Endeca recommends using the Endeca Deployment Template. The Endeca Deployment Template is a collection of operational components that provides a starting point for development and application deployment, and is a free download from the support site. For more information about the Endeca Deployment Template, see Using the Endeca Deployment Template on page 190.

378

Editing the baseline update script


The baseline update script runs a very simple baseline update that starts Forge, the Indexer, and the Dgraph in succession. You can modify the existing baseline update script to suit your own needs, by copying and editing its source files. The script is written in Java. Note the following details:

The script source tree is installed as part of the Endeca reference implementation, and can be found in %ENDECA_REFERENCE%\eac_scripts on Windows, or $ENDECA_REFERENCE/eac_scripts on UNIX (where ENDECA_REFERENCE stands for the location of the reference implementations). The executable files for the script are stored in the %ENDECA_ROOT%\bin (Windows) or $ENDECA_ROOT/bin (UNIX); they depend on the eacscript.jar file in %ENDECA_ROOT%\lib\java (Windows) or $ENDECA_ROOT/lib/java (UNIX).

You can generate your own version of the eacscript.jar file by modifying the source files in the reference implementation.

Prerequisites for using the baseline update script


In order to use the baseline update script as-is, an application must meet the following criteria:

The application has an EAC Agent running on the Web Studio host. The application contains a provisioned host named webstudio. That host must be specified with a fully qualified host name. The application has exactly one Forge component provisioned. Any additional ones are ignored. The application has exactly one Indexer component provisioned. Any additional ones are ignored. The application has exactly one Dgraph component provisioned. Any additional ones are ignored.

Administrators Guide Appendix C: About the Baseline Update Script

Endeca Confidential

379

The baseline update script itself must be provisioned. For details on provisioning EAC scripts, see Defining scripts in your provisioning file on page 154.

Running the baseline update script


Details on starting, stopping, and obtaining status for scripts can be found in the following places:

Component and script control commands are located on page 201 of Using the Eaccmd Tool The ScriptControl interface is located on page 238 in the Endeca Application Controller API Interface Reference Information about using scripts in Web Studio is located in the Web Studio Help

Endeca Confidential

Running the baseline update script

380

Administrators Guide Appendix C: About the Baseline Update Script

Endeca Confidential

Index

AddComponentType class 242, 270, 271 AddHostType class 242, 243 adding cookies to the preview application 93 Agidx component, Endeca Application Controller 168, 243 Agidx flags 344 AgidxComponentType class 243 Agraph checking aliveness of 315 showing status in Web Studio 43 Agraph component, Endeca Application Controller 170, 244 Agraph flags 345 AgraphChildListType class 244 AgraphComponentType class 244 Application Controller. See Endeca Application Controller Application element, defining 151 application provisioning, in Web Studio 38 ApplicationIDListType class 245 ApplicationType class 246 archiving log files 116 audience for this guide xxi

auditing, viewing for MDEX Engine 316 authentication configuration parameter for LDAPLoginModule 67 AuthHttpENEConnection enableSSL method 129 automatically scheduling report generation 108

B
BackupMethodType class 246 baseline update script 377 BatchStatusType class 247

C
canonical paths 157 certificates location of Java keystore 68 changing the preview application in Web Studio 90 checkPasswords configuration parameter for LDAPLoginModule 68 classes AddComponentType 242 AddHostType 242 AgidxComponentType 243 AgraphChildListType 244

382

AgraphComponentType 244 ApplicationIDListType class 245 ApplicationType 246 BackupMethodType 246 BatchStatusType 247 ComponentListType 247 ComponentType 248 CrawlerComponentType 248 DgidxComponentType 250 DgraphComponentType 251 DgraphHostPortType 252 DgraphReferenceType 252 EACFaultMessage 253 FlagIDListType 254 ForgeComponentType 255 FullyQualifiedComponentIDType 256 FullyQualifiedFlagIDType 256 FullyQualifiedUtilityTokenType 257 HostListType 257 HostType 258 ListApplicationIDsInput 258 LogServerComponentType 259 RemoveComponentType 261 RemoveHostType 261 ReportGeneratorComponentType 26 2 RunBackupType 264 RunFileCopyType 265 RunRollbackType 265 RunShellType 266 RunUtilityType 266 SSLConfigurationType 268 StatusType 269 TimeRangeType 269 TimeSeriesType 270 command options for Endeca programs, specifying 31 ComponentControl interface 219 ComponentListType class 247 ComponentType class 248 connection setting for Eneperf 304
Administrators Guide

cookie name for Web Studio 92 cookies adding to the preview application 93 Cpusar performance analysis tool 338 Crawler component, Endeca Application Controller 174, 248 CrawlerComponentType class 248 custom application for Web Studio See preview application

D
defining scripts 154 deleting log files 116 deleting outdated reports 116 Developer Studio See Endeca Developer Studio Dgidx showing status in Web Studio 43 specifing command options from Web Studio 31 Dgidx component, Endeca Application Controller 161, 250 Dgidx, flags 348 DgidxComponentType class 250 Dgraph checking aliveness of 315 See MDEX Engine Dgraph component, Endeca Application Controller 165, 251 Dgraph request log See MDEX Engine request log Dgraph Stats page See MDEX Engine Statistics page Dgraph, flags 352 DgraphComponentType class 251 DgraphHostPortType class 252 DgraphReferenceType class 252 dgraph.reqlog MDEX Engine request log file 292 directories, provisioning on a host 153 dynamic business rules
Endeca Confidential

383

deploying with emgr_update use of preview application in defining 90

283

E
EAC Agent, introduced 139 EAC Central Server, introduced 138 EAC scripts baseline updates 377 defined 154 editing 378 report generation 109 EAC. See Endeca Application Controller eaccmd about 192 Archive utility 212 component commands 201 Copy utility 205 incremental provisioning commands 196199 provisioning 183 provisioning commands 195196 Shell utility 204 synchronization commands 200201 usage 194 utility commands 202214 EACFaultMessage class 253 emgr_update utility deploying Web Studio changes 283 overview 277 Endeca Access Control System, configuring 131 Endeca Application Controller adding a component or host 187 Agents 139 architecture diagram 139 ComponentControl interface 219 EAC Central Server 138 HTTPS security in 144 introduced 27, 138 Java WSDL tool interpretation 241
Endeca Confidential

.NET WSDL tool interpretation 241 Provisioning interface 229 provisioning overview 150 removing a component or host 188 starting and stopping 145 starting and stopping on Windows 145 starting from inittab 145 Synchronization interface 220 Utility interface 223 WSDL overview 218 Endeca Application Controller provisioning file Agidx component 168, 243 Agraph component 170, 244 aliasing hosts 152 Crawler component 174, 248 defining components 153 defining hosts 152 Dgidx component 161, 250 Dgraph component 165, 251 Forge component 158, 255 LogServer component 178, 259 ReportGenerator component 179, 262 Endeca Deployment Template 189 Endeca Developer Studio 27 about additional tasks 33 changing to another Web Studio 30 retrieving the project configuration from Web Studio 33 specifying command options for Endeca programs 31 Endeca HTTP service changing port 45 Endeca Presentation API HttpENEConnection 296 Endeca Standard Application accessing main page 122 attribute for title 125 attribute for URL address 124

384

enabling SSL 128 enabling user authentication 130 file-based authentication 131 LDAP authentication 131 location of WAR 121 login page 132 overview 120 Tomcat installation 126 URL.External property, use of 121 WebLogic server installation 133 Endeca tools Developer Studio 27 overview 26 Web Studio 26 Endeca Web Studio 2627 audience for 26 changing port 45 configuring the preview application 9495 cookie name 92 customizing the navigation menu 72 downloading instance configuration 42 navigation results page 93 record page 94 user permissions 50 endeca_standard.xml file used for Standard Application 124 ENE URL parameter mapping 296 Eneperf debugging 310 generating statistics 308 introduced 302 logs for use with 309 optional settings 305 required settings 303 running locally 304 running remotely 304 setting the number of queries 308 usage 302 Ethereal performance analysis tool 336
Administrators Guide

F
FileLoginModule for Standard Application 131 FlagIDListType class 254 Forge showing status in Web Studio 43 specifying command options from Web Studio 31 Forge component, Endeca Application Controller 158, 255 Forge hierarchical logging introduced 321 Forge, flags 366 ForgeComponentType class 255 FullyQualifiedComponentIDType class 256 FullyQualifiedFlagIDType class 256 FullyQualifiedUtilityTokenType class 257

G
generate-report.bat 109 generate-reports.bat running the script 111 source tree, editing 111, 378 generating reports with the report generation script 109 groupPath configuration parameter for LDAPLoginModule 66 groupTemplate configuration parameter for LDAPLoginModule 66

H
host setting for Eneperf 303 HostListType class 257 HostType class 258

I
implementing logging and reporting in Web Studio 100 inittab, starting the Endeca Application
Endeca Confidential

385

Controller from 145 instance configuration downloading from Web Studio 42 retrieving from Web Studio with Developer Studio 33 instrumenting the preview application 92 invalid characters in provisioning 151 Iostat performance analysis tool 338 iteration setting for Eneperf 304

J
JAAS framework for Access Control System 130 Java keystore configuring location 68 Java keystore file, creating 128 Javascript domain for preview application 91

K
keyStoreLocation configuration parameter for LDAPLoginModule 68 keyStorePassphrase configuration parameter for LDAPLoginModule 68

L
LDAP authentication rebinding 67 LDAP server configuration for multiple servers 69 configuring SSL 68 ldapBindAuthentication configuration parameter for LDAPLoginModule 67 LDAPLoginModule authentication configuration parameter 67 checkPasswords configuration parameter 68 groupPath configuration
Endeca Confidential

parameter 66 groupTemplate configuration parameter 66 keyStoreLocation configuration parameter 68 keyStorePassphrase configuration parameter 68 ldapBindAuthentication configuration parameter 67 passwordAttribute configuration parameter 68 serverInfo configuration parameter 69 serviceAuthentication configuration parameter 67 servicePassword configuration parameter 67 serviceUsername configuration parameter 67 userPath configuration parameter 66 useSSL configuration parameter 68 LDAPLoginModule for Standard Application 131 Linux performance tools 339 Linux sysstat package 339 ListApplicationIDsInput class 258 Lockstat performance analysis tool 338 log file (eneperf) creating 309 settings 304 Log Server about 99 archiving log files 116 monitoring 99 monitoring in Web Studio 43 provisioning 101 rolling logs 99 settings 102 starting 103 stopping and starting from Web Studio 40

386

using the Log Server command line 99 logging and reporting introduced 98 login page for Standard Application 132 logs to be used with Eneperf 309 LogServer component, Endeca Application Controller 178, 259 LogServerComponentType class 259

N
navigation results page, instrumenting 93 Netperf performance analysis tool 337

P
passwordAttribute configuration parameter for LDAPLoginModule 68 performance, tuning 289 Perl guidelines, for MDEX Engine request log 294 ping, to check Dgraph or Agraph 315 port for Endeca HTTP service and Web Studio, changing 45 port setting for Eneperf 303 prerequisites for the baseline update script 378 prerequisites to logging and reporting 98 preview application described 90 enabling and disabling its display in Web Studio 95 instrumenting 92 Javascript domain 91 requirements 9092 Process Explorer performance analysis tool 340 properties, adding to hosts and components 154 provisioning adding properties to hosts and components 154 an Endeca Application Controller implementation 150 directories on a host 153 incremental 185 invalid characters in 151 multi-machine 183 report generation script, in Web Studio 112
Endeca Confidential

M
MDEX Engine auditing statistics, viewing 316 checking aliveness of 315 making SSL connection 129 showing status in Web Studio 43 specifying command options from Developer Studio 31 specifying command options from Web Studio 31 statistics, resetting 312 statistics, viewing 312 stopping and starting from Web Studio 40 MDEX Engine Auditing page viewing 316 MDEX Engine parameter mapping 296 MDEX Engine request log converting for use with Eneperf 309 extracting information 294 file format 292 introduced 292 MDEX Engine Statistics page about 312 presentation transformed with XSLT 312, 317 sections of 313 viewing 312 viewing raw XML 312 Mpsar performance analysis tool 338 Mpstat performance analysis tool 339

Administrators Guide

387

scripts 154 the provisioning file 150 the Provisioning interface 229 using Endeca Deployment Template 189 using XML entities 153 Prstat performance analysis tool 338

R
rebinding for LDAP authentication 67 record page, instrumenting 94 reference implementations, scripts 155 RemoveComponentType class 261 RemoveHostType class 261, 262 report generation script 109 adding in Web Studio 112 defined 109 running 111 Report Generator about 99 about reports 100 automatic scheduling of 108 creating HTML reports 115 enabling the display of reports 109 monitoring status in Web Studio 43 provisioning 103 settings 104 specifying report frequency 107 starting 106 ReportGenerator component, Endeca Application Controller 179, 262 ReportGeneratorComponentType class 262 reports customizing content and appearance 115 deleting 117 generated from control scripts 116 reports, viewing in Web Studio 113 request log See MDEX Engine request log
Endeca Confidential

retrieving the instance configuration from Web Studio, with Developer Studio 276 retrieving the Web Studio project configuration from Developer Studio 33 root Application element, defining 151 RunBackupType class 264 RunFileCopyType class 265 RunRollbackType class 265 RunShellType class 266 RunUtilityType class 266

S
Sar performance analysis tool 338 scripts baseline update 377 developing 155 editing the report generation script 111 environment variables 156 high-level workflow for report generation 111 preparing to use in Web Studio 111 provisioning 156 report generation script 109 using canonical paths in 157 serverInfo configuration parameter for LDAPLoginModule 69 server.xml file used for Endeca HTTP service 45 serviceAuthentication configuration parameter for LDAPLoginModule 67 servicePassword configuration parameter for LDAPLoginModule 67 serviceUsername configuration parameter for LDAPLoginModule 67 Solaris performance tools 338 specifying command options in Developer Studio 31 specifying report frequency 107

388

SSL configuring LDAP server 68 enabling connection to MDEX Engine 129 enabling for Standard Application 128 SSLConfigurationType class 268 starting the Endeca Application Controller 145 statistics setting to Eneperf 308 statistics, viewing for MDEX Engine stats.xslt file 312, 317 StatusType class 269 stopping the Endeca Application Controller 145 Synchronization interface 220

from the staging to production environment 277 tuning performance 289

U
updates running baseline 40 URL address for Standard Application, changing 124 URL mappings for the preview application 95 URL.External property for Standard Application 121 user authentication for Standard Application 130 user entitlement filter LDAPLoginModule configuration parameter 66 user permissions in Web Studio 50 user roles in Web Studio predefined 51 user defined 52 userPath configuration parameter for LDAPLoginModule 66 useSSL configuration parameter for LDAPLoginModule 68 Utility interface 223

312

T
Task Manager performance analysis tool 339 Tcpdump performance analysis tool 336 TCPView performance analysis tool 340 terminology equivalences 343 thesaurus entries, deploying 283 third-party performance tools cross-platform 336 Solaris and Linux 337 Windows 339 throttle setting to Eneperf 308 TimeRangeType class 268, 269 TimeSeriesType class 270 title of Standard Application, changing 125 Tomcat application server importing keystore file 129 installation of Standard Application 126 JAAS framework 130 JAAS login configuration file 131 Top performance analysis tool 337 transferring the instance configuration
Administrators Guide

W
WAR for Standard Application, location of 121 Web Studio retrieving the instance configuration 276 See Endeca Web Studio viewing reports in 113 Web Studio extensions and URL tokens 81 configuring 77 defined 77

Endeca Confidential

389

enabling 80 theming 85 token-based authentication 82 troubleshooting 86 WebLogic application server, installing Standard Application on 133 Windows Task Manager 339 third-party performance tools 339 Windump performance analysis tool 336 WSDL Endeca Application Controller 218 special ID types 218

X
XML entities, using in your provisioning file 153 XSLT, transforming MDEX Engine statistics 312, 317

Endeca Confidential

390

Administrators Guide

Endeca Confidential

Vous aimerez peut-être aussi