Vous êtes sur la page 1sur 476

IDOL server

version 5.x - revision 4

Administrators Guide

Information in this document is subject to change without notice. No part of this document may be reproduced or
transmitted in any form or by any means, electronic or mechanical, for any purpose, without the express permission
of Autonomy Systems Ltd.
Windows is a trademark of Microsoft Corp., UNIX is a trademark of X/OPEN Ltd.

Copyright 2005 Autonomy. All rights reserved.

Autonomy Dashboard, Autonomy Desktop Suite, DAH, DIH, DiSH, IAS, IDOL server, Portal-in-a-Box and Retina
are trademarks of Autonomy Systems Ltd.

Table of Contents
Preface................................................................................................................................................ i
Autonomy ................................................................................................................................... i
Contact ...................................................................................................................................... ii
Downloading manual updates from Automater .........................................................................iii
Typographical conventions ........................................................................................................iii
Related documentation ............................................................................................................ iv

Before you begin


1.

Autonomy infrastructure ........................................................................................................ 1


IDOL server ............................................................................................................................... 3
Connectors ................................................................................................................................ 3
Interfaces ................................................................................................................................... 3
Distributed systems ................................................................................................................... 3
Administration ............................................................................................................................ 4
PODS ........................................................................................................................................ 4
Data flow and security ............................................................................................................... 5

2.

An introduction to IDOL server .............................................................................................. 7


IDOL server operations ............................................................................................................. 7
System architecture ................................................................................................................. 14
General .............................................................................................................................. 14
Indexing and querying ....................................................................................................... 15
Security .............................................................................................................................. 16
IDOL server functionality matrix .............................................................................................. 18

3.

Installing IDOL server ........................................................................................................... 23


System requirements .............................................................................................................. 23
Installing IDOL server on Windows ......................................................................................... 25
Directory structure ............................................................................................................. 28
Installing IDOL server on UNIX ............................................................................................... 32
Directory structure ............................................................................................................. 35
Deploying Retina to your application server ............................................................................ 39
Licensing ................................................................................................................................. 40
Important ........................................................................................................................... 40
Displaying license information ........................................................................................... 41
Revoking a client license ................................................................................................... 42
Forcibly revoking licenses from inaccessible clients ......................................................... 43
Troubleshooting licensing errors ....................................................................................... 44
Distributing IDOL server .......................................................................................................... 46
Distributing IDOL server: An example ............................................................................... 47
Upgrading to IDOL server ....................................................................................................... 50
Step 1: Export data ............................................................................................................ 50
Step 2: Copy categories, taxonomies and clusters to IDOL server 5 ................................ 54
Step 3: Copy users to IDOL server 5 ................................................................................. 54
Step 4: Import data into IDOL server 5 .............................................................................. 55
Step 5: Synchronize IDOL server 5 ................................................................................... 56

4.

Running IDOL server ............................................................................................................ 59


Starting and stopping IDOL server .......................................................................................... 59
Starting IDOL server .......................................................................................................... 59
Stopping IDOL server ........................................................................................................ 60
Sending action commands to IDOL server ............................................................................. 61
Displaying online help ....................................................................................................... 61
Action command syntax .................................................................................................... 62

5.

Before storing content in IDOL server ................................................................................ 63


Storing content ........................................................................................................................ 64
Disabling content storage .................................................................................................. 64
Storing IDOL server's data files on multiple disks ............................................................. 64
Allocating files to IDOL server databases ......................................................................... 65
Setting up field indexing .......................................................................................................... 67
Indexing XML attributes ..................................................................................................... 69
Configuring IDOL server to process required languages ........................................................ 71
Optimizing indexing ................................................................................................................. 72
The indexing process ........................................................................................................ 72
Delayed synchronization ................................................................................................... 72
Processing data before indexing it .......................................................................................... 73
Setting up tasks to process data before indexing .............................................................. 74
Example: Automatic generation of titles for documents before indexing ........................... 75
Example: Automatic generation of titles for documents before indexing into a different
IDOL server ....................................................................................................................... 76
Example: Categorizing data and adding it to a legacy SQL database ............................. 78
Example: Simple routing of documents according to type ................................................ 79
Example: Advanced routing of documents according to type ........................................... 81

6.

Storing content in IDOL server ............................................................................................ 83


Index commands ..................................................................................................................... 84
DREADD: directly indexing IDX and XML files .................................................................. 84
DREADDDATA: indexing data over a socket .................................................................... 94
Adding metadata to documents after indexing ...................................................................... 103
Indexing hyphenated terms ................................................................................................... 104
Using Reference fields to eliminate duplicate copies of documents during indexing ............ 105
Checking if the indexing process was successful ................................................................. 106
Tracking documents through the import and indexing process ............................................ 108

7.

Storing users in IDOL server ............................................................................................. 109


Creating users ....................................................................................................................... 110
Integrating with a third party user structure ........................................................................... 111

8.

Setting up security .............................................................................................................. 113

9.

Checking that IDOL server is running correctly .............................................................. 117


Executing GetRequestLog actions ........................................................................................ 117
Executing an GetLicenseInfo action ..................................................................................... 118
Executing a GetStatus command ......................................................................................... 118
Using the Autonomy Service Dashboard .............................................................................. 118

IDOL server operations


10. Agents .................................................................................................................................. 121
Creating an agent .................................................................................................................. 121
Editing an agent .................................................................................................................... 122
Retraining an agent ............................................................................................................... 122
Querying with an agent ......................................................................................................... 122
Copying an agent .................................................................................................................. 123
Viewing an agents details ..................................................................................................... 123
Deleting an agent .................................................................................................................. 123
11. Alerting ................................................................................................................................. 125
Alerting users to new content ................................................................................................ 125
Writing templates for alert emails .................................................................................... 127
12. Categorization ..................................................................................................................... 129
Creating a hierarchical category structure ............................................................................. 130
Creating categories from scratch ..................................................................................... 130
Creating categories from clusters .................................................................................... 131
Creating categories from legacy topic sets ...................................................................... 131
Creating categories by copying categories ...................................................................... 132
Creating categories by generating a taxonomy ............................................................... 132
Creating categories from XML ......................................................................................... 132
Training categories .......................................................................................................... 140
Retraining categories ....................................................................................................... 140
Moving categories ........................................................................................................... 140
Viewing and administering categories ................................................................................... 141
Viewing category details .................................................................................................. 141
Viewing category hierarchy details .................................................................................. 141
Viewing category terms and weights ............................................................................... 142
Viewing category training ................................................................................................ 142
Changing category fields ................................................................................................. 142
Changing category term weights ..................................................................................... 143
Replacing categories ....................................................................................................... 143
Activating or deactivating categories ............................................................................... 143
Building categories .......................................................................................................... 144
Deleting categories .......................................................................................................... 144
Deleting category training ................................................................................................ 144
Exporting categories to XML ........................................................................................... 145
Synchronizing IDOL servers Category index with the categories stored on disk ........... 145
Categorizing data .................................................................................................................. 146
Suggesting categories ........................................................................................................... 147
Suggesting conceptually similar categories for documents ............................................. 147
Suggesting conceptually similar categories for text ......................................................... 147
Suggesting conceptually similar categories for categories .............................................. 147
Matching categories .............................................................................................................. 148

13. Channels .............................................................................................................................. 149


Setting up and using channels .............................................................................................. 149
14. Clustering ............................................................................................................................ 151
Generating snapshots ........................................................................................................... 152
Generating spectrograph data .............................................................................................. 153
Generating WhatsNew and WhatsHot information ............................................................... 154
Configuring clustering ........................................................................................................... 156
Changing the number and size of clusters ...................................................................... 156
Setting up schedules ............................................................................................................. 160
15. Collaboration ....................................................................................................................... 163
16. Dynamic Thesaurus ............................................................................................................ 165
17. Eduction ............................................................................................................................... 167
Extracting built-in data types ................................................................................................. 167
Extracting user-defined data types ....................................................................................... 170
18. Expertise .............................................................................................................................. 171
19. Hyperlinking ........................................................................................................................ 173
Implementing hyperlinking .................................................................................................... 174
20. Mailing .................................................................................................................................. 175
Automatically emailing agent and channel results ................................................................ 176
Sending custom emails ......................................................................................................... 179
Mailer templates .................................................................................................................... 180
Editing templates ............................................................................................................. 181
21. Profiling ............................................................................................................................... 185
Profiling a user ...................................................................................................................... 185
Editing a profile ..................................................................................................................... 187
Querying with a profile .......................................................................................................... 187
Viewing a profiles details ...................................................................................................... 187
Deleting a profile ................................................................................................................... 187
22. Retrieval ............................................................................................................................... 189
Action commands ................................................................................................................. 189
Conceptual matching ............................................................................................................ 191
Advanced keyword searches ................................................................................................ 193
Boolean and bracketed Boolean searches ........................................................................... 194
Precedence of Boolean and Proximity operators ............................................................ 195
Exact Phrase searches ......................................................................................................... 196
Quotation marks .............................................................................................................. 196
TERMEXACTPHRASE{ } field specifier .......................................................................... 197
TERMPHRASE{ } field specifier ...................................................................................... 197

Field restrictions .................................................................................................................... 198


Field text queries ................................................................................................................... 199
Field specifiers for common restrictions .......................................................................... 199
Field specifiers for advanced restrictions ........................................................................ 210
Field specifiers for biasing result scores .......................................................................... 226
Fuzzy queries ........................................................................................................................ 227
Parametric searches ............................................................................................................. 228
Proper Names queries .......................................................................................................... 231
Proximity searches ................................................................................................................ 235
Precedence of Boolean and Proximity operators ............................................................ 236
Soundex keyword searches .................................................................................................. 237
Synonym queries ................................................................................................................... 238
Combining different query types ............................................................................................ 241
Synonym and Boolean searches ..................................................................................... 241
Synonym and Field restrictions ....................................................................................... 241
Soundex and Proper Names ........................................................................................... 241
Soundex and Boolean searches ...................................................................................... 242
Soundex and Proximity searches .................................................................................... 242
Soundex and Field restrictions ........................................................................................ 242
Exact Phrase searches and Boolean searches ............................................................... 243
Exact Phrase searches and Proximity searches ............................................................. 243
Exact Phrase searches and Field restrictions ................................................................. 244
Boolean searches and Proximity searches ..................................................................... 244
Boolean searches and Field restrictions .......................................................................... 244
Proximity searches and Field restrictions ........................................................................ 245
Using wildcards in queries ..................................................................................................... 246
Using wildcards in query text ........................................................................................... 246
Using wildcards in field text queries ................................................................................ 247
Wildcard searches in Japanese, Chinese, Korean and Thai ........................................... 248
Querying for non-alphanumeric characters ........................................................................... 249
Optimizing the retrieval of tagged documents ....................................................................... 251
Query syntaxes ................................................................................................................ 251
23. Spelling correction .............................................................................................................. 255
24. Summarization ..................................................................................................................... 257
Returning summaries with query action results ..................................................................... 258
Summarizing text or documents ............................................................................................ 259
25. Taxonomy generation ......................................................................................................... 261
Generating taxonomies ......................................................................................................... 261
Scheduling taxonomy generation .......................................................................................... 262

About results
26. Results ................................................................................................................................. 265
Relevance ranking ................................................................................................................ 265
Manipulating the relevance of query results ......................................................................... 266
Setting up a field process to boost result relevance ........................................................ 266
Using the BIAS field specifier to boost result relevance .................................................. 269
Using multipliers to boost result relevance ...................................................................... 271
Using Reference fields to filter results at query time ............................................................. 272
Displaying additional fields with results ................................................................................. 275
Configure IDOL server to automatically display additional fields .................................... 275
Display additional fields for individual queries ................................................................. 276

About fields
27. Fields .................................................................................................................................... 279
Processing fields and documents that contain specific fields ............................................... 281
Index fields ............................................................................................................................ 285
Setting up Index fields ..................................................................................................... 285
NumericDateType fields ........................................................................................................ 287
Setting up memory mapping for numerical date fields .................................................... 287
Numerical fields .................................................................................................................... 289
FieldCheckType fields ........................................................................................................... 291
Reference fields .................................................................................................................... 293
Setting up Reference fields ............................................................................................. 293
Simultaneously using KillDuplicates and Combine on Reference fields ......................... 295
Highlight fields ....................................................................................................................... 297
Setting up Highlight fields ................................................................................................ 297
Agentboolean fields .............................................................................................................. 299
Storing Boolean agents in agentboolean fields ............................................................... 299
Matching documents against agentboolean categories .................................................. 300
Meta fields ............................................................................................................................. 301
Changing field values ............................................................................................................ 303

About languages
28. Languages ........................................................................................................................... 307
Running IDOL server in multiple languages ......................................................................... 309
Checking which languages are set up in IDOL server .................................................... 311
Defining language types in IDOL server's configuration file ............................................ 312
Configuring IDOL server to associate language types with documents .......................... 314
Adding language type fields to documents ...................................................................... 318
Defining a default language type in IDOL server's configuration file ............................... 319
Enabling Automatic Language Detection ........................................................................ 320
Specifying the language type of your query .................................................................... 321
Converting results to a specific encoding ........................................................................ 322
Returning documents in multiple languages for your query ............................................ 323
Returning documents in a specific language for your query ........................................... 324

29. Language settings and files ............................................................................................... 325


Encoding settings for supported languages .......................................................................... 325
TermSize setting for supported languages ............................................................................ 349
Transliteration settings for supported languages .................................................................. 350
SentenceBreaking files for supported languages .................................................................. 351
Stoplists for supported languages ......................................................................................... 353

Administration
30. Administering IDOL server ................................................................................................. 357
Executing configuration changes .......................................................................................... 358
Deleting documents from IDOL server by reference ............................................................. 359
Deleting individual documents and ranges of documents from IDOL server ........................ 360
Restoring deleted documents to IDOL server ....................................................................... 361
Creating a new database in IDOL server .............................................................................. 362
To send a DRECREATEDBASE command to IDOL server ............................................ 362
To add a database to IDOL server's configuration file ..................................................... 363
Deleting a database and all the documents it contains ......................................................... 364
Deleting all documents from a database ............................................................................... 365
Expiring documents ............................................................................................................... 366
Exporting IDX documents from IDOL server ......................................................................... 369
Exporting XML documents from IDOL server ........................................................................ 371
Changing the index date, expire date or database of IDOL server documents ..................... 373
Changing field values in IDOL server documents ................................................................. 375
Compacting IDOL servers Data index .................................................................................. 376
Backing up IDOL servers Data index ................................................................................... 378
Initializing IDOL servers Data index ..................................................................................... 381
Exporting users, roles, agents and profiles ........................................................................... 382
Importing users, roles, agents and profiles ........................................................................... 383
Setting up log streams ........................................................................................................... 384

Appendices
Appendix A: The IDOL server configuration file........................................................................ 389
Displaying help on configuration settings .............................................................................. 389
Modifying configuration parameter values ............................................................................. 390
Configuration file sections ..................................................................................................... 391
[License] section .............................................................................................................. 392
[Service] section .............................................................................................................. 392
[Server] section ................................................................................................................ 393
[TermCache] section ....................................................................................................... 393
[IndexCache] section ....................................................................................................... 393
[SectionBreaking] section ................................................................................................ 394
[Paths] section ................................................................................................................. 394
[Databases] section ......................................................................................................... 394
[Schedule] section ........................................................................................................... 395
[Summary] section ........................................................................................................... 395

[FieldProcessing] section ................................................................................................ 395


[Properties] section .......................................................................................................... 397
[Security] section ............................................................................................................. 399
[User] section .................................................................................................................. 400
[UserSecurityFields] section ............................................................................................ 400
[UserSecurity] section ..................................................................................................... 401
[Role] section ................................................................................................................... 403
[Agent] section ................................................................................................................. 403
[Profile] section ................................................................................................................ 404
[ProfileNamedAreas] section ........................................................................................... 404
[Community] section ........................................................................................................ 404
[UserCustom] section ...................................................................................................... 405
[UserStructure] section .................................................................................................... 406
[DRE] section .................................................................................................................. 406
[DataDRE] section ........................................................................................................... 406
[Cluster] section ............................................................................................................... 407
[Taxonomy] section ......................................................................................................... 407
[AnalysisSchedules] section ............................................................................................ 407
[IndexTasks] section ........................................................................................................ 409
[DocumentTracking] section ............................................................................................ 410
[Synonym] section ........................................................................................................... 410
[Templates] section ......................................................................................................... 410
[Logging] section ............................................................................................................. 411
[LanguageTypes] section ................................................................................................ 413
Appendix B: Error codes and messages ................................................................................... 415
Error codes ........................................................................................................................... 415
Error messages ..................................................................................................................... 416
VQL conversion error messages ..................................................................................... 416
Appendix C: Service port commands......................................................................................... 423
GetConfig .............................................................................................................................. 424
GetLogStream ....................................................................................................................... 424
GetLogStreamNames ........................................................................................................... 425
GetStatistics .......................................................................................................................... 425
GetStatus .............................................................................................................................. 426
GetStatusInfo ........................................................................................................................ 426
MergeConfig ......................................................................................................................... 427
SetConfig .............................................................................................................................. 429
Stop ....................................................................................................................................... 429
Appendix D: manually creating IDX files.................................................................................... 431
Sectioning a document ......................................................................................................... 433
Glossary ........................................................................................................................................ 435
Index .............................................................................................................................................. 441

Preface
Autonomy
Autonomy employs a fundamentally different and unique combination of technologies to enable
computers to form an understanding of a page of text, web pages, emails, voice, documents
and people.
Autonomy's solution is therefore able to power any application dependent upon unstructured
information within every market sector, including: e-commerce, customer relationship
management, knowledge management, enterprise information portals and online publishing
applications.
This is evidenced by the significant penetration of the technology in a diversity of vertical
markets and has been achieved principally because every market sector needs to manage and
leverage the benefits of unstructured information.

Autonomy was founded in 1996 and has offices in Boston, Chicago, Dallas, San Francisco,
New York, and Washington, D.C. in the United States, as well as offices throughout EMEA,
including Amsterdam, Brussels, Cambridge, Frankfurt, Milan, Paris, Oslo, and Sydney. In July
1998, the company went public on the EASDAQ exchange (EASDAQ:AUTN). Autonomy
floated on The NASDAQ National Market (NASDAQ: AUTNY) in May 2000, and on the London
Stock Exchange (LSE: AU.) in November 2000.

Contact
To contact Autonomy, please get in touch with your nearest location listed below.

Europe and South Pacific


Autonomy Systems Ltd.
Cambridge Business Park
Cowley Road
Cambridge
CB4 0WZ
Help Desk:

+44 (0) 800 0 282 858

Switchboard:

+44 (0) 1223 448 000

Fax:

+44 (0) 1223 448 001

Email

for information:

autonomy@autonomy.com

for support:

uksupport@autonomy.com

The Help Desk operates from 9.30 am to 6.00 pm (GMT) Monday to Friday.
Website: www.autonomy.com

USA
Autonomy Inc.
One Market
Spear Street Tower
San Francisco
CA 94105
Help Desk:

+1 877 333 7744

Switchboard:

+1 415 243 9955

Fax:

+1 415 243 9984

Email

for information:

info@us.autonomy.com

for support:

support@us.autonomy.com

The Help Desk operates from 9.30 am to 6.00 pm (CST) Monday to Friday, toll-free.
Website: www.autonomy.com

ii

Downloading manual updates from Automater


To assist you in utilizing the benefits that Autonomys solutions offer you, Autonomy provides
free downloads of the latest available documentation.
To download documentation updates:
1.

Enter the following URL in your web browser's Address field:


http://automater.autonomy.com

2.

Enter your Username and Password, and click on the Login button.

3.

Click on the Download menu option.

4.

Under the Documentation and Release Notes heading, click on the Click here link,
then click on the Manuals folder to display the latest available manual versions. You can
display any of the manuals in your browser and download them.

Note: the manual's version number (for example, version 4.1.x) corresponds to the product
version. The last number of the product version has been replaced with an x for all manuals
as this number relates to minor product releases that have no effect on the documentation. If
a manual has a revision number (for example revision 5), it indicates that this manual has
been revised since it was first released. Automater always contains the latest available
revision of all manuals.

Typographical conventions
Autonomy documentation uses the following typographical conventions.
Formatting convention:

Type of information:

Bold type

References to any of the following:

Interface options (for example, menus or buttons)

Actions

Parameters

Courier font

Configuration examples

<text>

A string that needs to be replaced with a personal setting. For


example <port> indicates that you have to specify a port
number, [<MySection>] indicates that you have to specify a
section name and so on.
Note that this only applies where this does not explicitly refer to
XML. Another exception are instructions for writing ACI
templates (an appendix to product manuals where this is
applicable) where personal settings are indicated by Italic type.

iii

iv

Preface

Related documentation
You should use the IDOL server manual in connection with the following:

Best Practices Guide


The Best Practice Guide provides useful hints and tips on setting up and configuring
Autonomy solutions as well as examples on how to combine multiple products effectively.

IAS manual
The IAS manual contains details on how you can use Autonomys Intelligent Asset
Protection System (IAS) to ensure secure access through authentication and role
permissions.

Appropriate connector manuals


Autonomy connector manuals (for example, the HTTPFetch manual, the Oracle Fetch
manual and so on) detail how you can configure individual connectors in order to tailor their
performance to your environment.

Import Module manual


Autonomys Import Module is an integral part of any Autonomy connector. The Import
Module manual provides information on how you can configure the settings that determine
how content is treated during the importing process (before it is indexed into IDOL servers).

DiSH manual
The DiSH (Distributed Service Handler) manual contains details on how you can use a
DiSH server to administer and control multiple Autonomy services.

Retina manual
The Retina manual contains details on the setup and usage of the Retina user interface.

Online help
Online help is provided for IDOL servers action commands and configuration parameters.
Please see Displaying online help on page 61 for details on how to display help.

iv

Before you begin

1. Autonomy infrastructure
"Today, 80% of business is conducted on unstructured information." Gartner Group
"85 per cent of all data stored is held in an unstructured format." Butler Group
"Unstructured data doubles every three months." Gartner Group

Information that you need in order to conduct business successfully comprises the following types:

In the past companies could only make use of 20% of the information that was relevant to them. In
order to deal with this information they used keyword search engines, tagging schemes, collaborative
filtering or linguistic methods. These methods were not only costly and time-inefficient but also nonscalable, inaccurate and taking the focus from core business.
80% of relevant information could not be utilized.

Page 1

Autonomy infrastructure
Autonomy's software infrastructure allows you to utilize 100% of the information that is relevant to you.
It automates all the business processes that formerly had to be dealt with manually.
By developing a patented combination of Bayesian Inference, Shannon's information theory and
pattern matching, Autonomy has enabled computers to understand unstructured, structured and semistructured information. This means that Autonomy's software infrastructure solves a fundamental
problem that affects every industry, and can be used in virtually any application that handles
unstructured information:

E-Commerce

CRM

Knowledge Management

Business Intelligence

Enterprise Information Portals

Online Publishing

Autonomy's software infrastructure is fully scalable and allows you to process information:

automatically

in real time

in any language

Page 2

Autonomy infrastructure

IDOL server
Using Autonomy connectors, Autonomy's Intelligent Data Operating Layer (IDOL) server integrates
unstructured, semi-structured and structured information from multiple repositories through an
understanding of the content, delivering a real time environment in which operations across
applications and content are automated, removing all the manual processes involved in getting the
right information to the right people at the right time.

Connectors
Connectors enable automatic content aggregation from any type of local or remote repository (for
example, a database, a web site, a real-time telephone conversation etc.), forming a unified solution
across all information assets within the organization.

Interfaces

Portlets are windows that can be set up in Autonomy's Portal-in-a-Box or third party portals. Each
portlet contains an application that allows the portals' end users to benefit from a variety of IDOL
server functionality.

Retina, an easy-to-use web interface application that provides a full scale of retrieval methods
that adjust to the individual users proficiency.

Autonomy Desktop Suite brings the power of Autonomy to every desktop. Conducting a realtime analysis of the ideas involved in the content of any opened desktop application, Desktop
Suites ActiveKnowledge or Active Windows Extensions module provides real-time links to
relevant internal and external information without the user being needlessly diverted from his work
in progress to perform an exasperating search or retrieval operation.

Distributed systems
Autonomys distribution solutions facilitate linear scaling of systems through faster command
execution and reduction of processing time

DAH (Distributed Load Handler) enables the distribution of ACI (Autonomy Content
Infrastructure) action commands to multiple Autonomy IDOL servers, providing failover and load
balancing.

DIH (Distributed Index Handler) enables distributed indexing of documents into multiple
Autonomy IDOL servers, providing failover and load balancing.

Page 3

Autonomy infrastructure

Administration

DiSH (Distributed Service Handler) provides crucial maintenance, administration, control and
monitoring functionality for the Autonomy infrastructure. DiSH delivers a unified way to
communicate with all Autonomy services such as connectors, DIH, DAH and so on from a
centralized location

Autonomy Service Dashboard is a stand-alone web application that allows administrators to


manage all Autonomy modules /services running locally or remotely.
The Dashboard communicates with the Distributed Service Handler (DiSH) module that is the
back end process for monitoring and controlling all the Autonomy child services. Autonomy
Service Dashboard provides the administrator with a list of all child services that DiSH is
monitoring, together with control buttons and status information.

PODS
Autonomys Product Orientated Drop-in Solutions allow Autonomy solutions to be easily integrated
with third party applications and solution providers. PODS enable organizations to make their existing
applications compatible with IDOL with minimal configuration and administration requirements. Making
IDOL server a part of any solution delivers the direct benefits of content automation and the ability to
perform a vast range of IDOL server operations, irrelevant of file format or location.

Page 4

Autonomy infrastructure

Data flow and security

Page 5

Autonomy infrastructure
Aggregation & Distribution
Connectors aggregate content from various repositories and index it into IDOL server or, if the content
needs to be distributed across multiple IDOL servers, a DIH (Distributed Index Handler).

Querying & Distribution


User queries are sent from a front end directly to IDOL server or distributed to multiple IDOL servers
using the DAH (Distributed Load Handler).

Distributed Administration
The DiSH (Distributed Service Handler) enables administrators to maintain, configure and control
multiple Autonomy services via the Autonomy Service Dashboard, a front-end web interface.

Security
The Autonomy IAS (Intellectual Asset Protection System) ensures secure access through
authentication and role permissions. When a user logs on to a front end (for example, Retina or a 3rd
party portal) his authentication details are sent to IDOL server which returns the user's security details
to the front end, where they are stored until the user logs off or his session times out. Every time the
user issues a query, his security details are attached to the query string that is sent to IDOL server.
The group servers store the user group information of repositories that store users in groups. This
allows the front end to quickly retrieve user security information from the group servers, and send the
query and the user's security information to IDOL server in order to check if the user is permitted to
view result documents before they are displayed to the user.
IDOL server passes the user's security details to the security libraries for the data repositories that
contain result documents for the user's query. The security libraries then check the user's security
details against the ACLs for the documents that match the query. If the user is entitled to view a
document, it is returned as a result to the front end.

Page 6

2. An introduction to IDOL server


Using Autonomy Connectors, Autonomy's Intelligent Data Operating Layer (IDOL) server integrates
unstructured, semi-structured and structured information from multiple repositories through an
understanding of the content, delivering a real time environment in which operations across
applications and content are automated, removing all the manual processes involved in getting the
right information to the right people, at the right time.

IDOL server operations


Autonomys IDOL server can perform the following intelligent operations across structured, semistructured and unstructured data:

Agents

Alerting

Categorization

Channels

Clustering

Collaboration

Dynamic Thesaurus

Eduction

Expertise

Hyperlinking

Mailing

Profiling

Retrieval

Spelling Correction

Summarization

Taxonomy Generation

Note: your license determines which of these operations your IDOL server installation can perform.

Page 7

An introduction to IDOL server

Agents
Agents provide the facilities to find and monitor information from a configurable list of Internet
and Intranet sites, News Feeds, Chat Streams and internal repositories that is highly relevant to
the explicit interests of a user. Agents are created in a very user-friendly way using the following
options:

Natural language descriptions

Example content (point and click)

Legacy Keyword or Boolean Expressions

IDOL server provides the conceptual information that is needed to create agents. The server
accepts a piece of content (training text, a document or a set of documents) or reference
(identifier) and returns an encoded representation of the concepts, including each concepts
specific underlying patterns of terms and associated probabilistic ratings.
Users can retrain their agents by submitting a piece of content (training text, a document or a
set of documents) whose concepts the server uses to adapt the agent.

Alerting
IDOL server analyzes data in new documents (when it receives the documents) and compares
the concepts in documents with users agents. If new data matches a users agent, it
immediately notifies the user by email or a third party system (for example by SMS or a pager).

Categorization
IDOL server can automatically categorize data with no requirement for manual input
whatsoever. The flexibility of Autonomys Categorization feature allows you to precisely derive
categories using concepts found within unstructured text. This ensures that all data is classified
in the correct context with the utmost accuracy. Autonomys Categorization feature is a
completely scalable solution capable of handling high volumes of information with extreme
accuracy and total consistency.
Rather than relying on rigid rule based category definitions such as Legacy Keyword and
Boolean Operators, Autonomys infrastructure relies on an elegant pattern matching process
based on concepts to categorize documents and automatically insert tag data sets, route
content or alert users to highly relevant information pertinent to the users profile.
This highly efficient process means that Autonomy is able to categorize upwards of four million
documents in 24 hours per CPU instance, that's approximately one document, every 25
milliseconds. Autonomy hooks into virtually all repositories and data formats respecting all
security and access entitlements, delivering complete reliability.
Category matching
IDOL server accepts a category or piece of content and returns categories ranked by
conceptual similarity. This determines for which categories the piece of content is most
appropriate, so that the piece of content can subsequently be tagged, routed or filed
accordingly.

Page 8

An introduction to IDOL server

Channels
IDOL server can automatically provide users with a set of hierarchical channels with highly
relevant information pertinent to the respective channel. Eliminating the requirement for manual
intervention or pre-tagging, real-time information is dynamically updated into the channels
automatically, minimizing the maintenance effort required. Moreover, the administrator can add
and remove channels on the fly, without having to re-categorize all of the data.

Clustering
IDOL server can automatically cluster information. Clustering is the process of taking a large
repository of unstructured data, agents or profiles and automatically partitioning the data so that
similar information is clustered together. Each cluster represents a concept area within the
knowledge base and contains a set of items with common properties.

Collaboration
IDOL server automatically matches users with common explicit interest agents or similar
implicit profiles. This information can be used to create virtual expert knowledge groups.

Dynamic Thesaurus
When it executes queries, IDOL server can automatically suggests alternative queries, allowing
users to quickly produce a variety of relevant result sets.

Eduction
Eduction identifies concepts in the document in order to add tags to the kind of content you
specify:

Tag training

Plain Tagging

ConceptValue Tagging

Negative Name training

Default User definable phrase tags

Case sensitive user defined phrase tags

Expertise
IDOL server accepts a natural language or Boolean search string and returns users who own
matching agents or profiles. This allows instant identification of experts in any subjects at hand,
eliminating time consuming searches for specialists, and unnecessary researching of subjects
for which expert knowledge is already available.

Page 9

An introduction to IDOL server

Hyperlinking
Hyperlinks can be automatically generated in real time. These link to contextually similar
content and can be used to recommend related articles, documents, affinity products or
services, or media content that relates to textual content. Because links are automatically
inserted at the time a document is retrieved, they can include references to documents and
articles written long before, or hyperlinks from archived material can link to the latest news or
material on that subject.

Mailing
IDOL server matches the agents and profiles against its document content in regular intervals,
and automatically notifies users of documents that match their agents and / or profiles by
sending them email.

Profiling
IDOL server automatically creates interest and expertise profiles for users, in real time.
Interest profiles are created by tracking the content that a user views and extracting a
conceptual understanding of it. IDOL server then uses this understanding to keep users
interest profiles up-to-date. Interest profiles can be used to target information on users,
recommend content to users, to alert users to the existence of content and to put users in touch
with other users who have similar interests.
Expertise profiles are created by tracking the content that a user creates and extracting a
conceptual understanding of it. IDOL server then uses this understanding to keep users
expertise profiles up-to-date. Expertise profiles can be used to trace users who are experts in
particular subject areas.

Retrieval
IDOL server offers a range of retrieval methods, from simple legacy keyword search to
sophisticated conceptual querying:
Conceptual matching
IDOL server accepts a piece of content (a sentence, paragraph or page of text, the body of an
e-mail, a record containing human readable information, or the derived contextual information
of an audio or speech snippet) or reference (identifier) as input and returns references to
conceptually related documents ranked by relevance, or contextual distance. This is used to
generate automatic hyperlinks between pieces of content.
Advanced Keyword search
IDOL server matches any term or phrase that appears in quotation marks in its exact prestemmed form.

Page 10

An introduction to IDOL server


Boolean/ bracketed Boolean
IDOL server accepts simple or complex Boolean and bracketed Boolean expressions and
returns a list of matching documents. Boolean expressions can be formed using a range of
Boolean and proximity operators:

AND

XOR/EOR

WNEAR

NOT

NEAR

BEFORE

OR

DNEAR

AFTER

Exact Phrase
Provides the ability to search for exact phrases by putting quotation marks around a string of
words. For example: world market
Field restrictions
Simple field restrictions within a query's text restrict results to documents that contain specific
values in specific fields.
Field text queries
Field text queries provide a wide range of field specifiers that you can use in order to query
fields, restrict query results or bias query result scores.
Fuzzy queries
If a search string is not quite accurate (for example, if it contains spelling mistakes) a fuzzy
query returns results that contain words that are similar to the entered string. (Note that you
need to enable fuzzy queries before you can use them).
Parametric search
Advanced Parametric Refinement is used to provide an improved user experience coupled with
increased productivity via an advanced real time information discovery process. Real time
navigation across multiple taxonomies is supported with no additional manual configuration
necessary, including full access to intersections of diverse taxonomy definitions.
From among the complete set of field names present within the corpus, a subset of fields can
be defined in the servers configuration as of type "Parametric". These fields are known as
parametric fields.
Once indexed, IDOL server will create and store a structure containing information about all
tag-value pairs that occur within defined parametric fields (tag-value pairs are defined where a
field contains a textual or numerical value and the field name is considered paired to its textual
value). The user may then query IDOL server with the name of a parametric field or fields. IDOL
server returns a list of all textual values that appear within the given field or fields within the
documents stored in the server.

Page 11

An introduction to IDOL server


This underlying operation can be used to power a user interface that enables a user to
gradually refine the scope of query from a complete corpus to the subset of documents that
contain information pertinent to the user's current enquiry.
Proper Names
IDOL server recognizes names and treats them as a unit.
Proximity search
IDOL server returns documents in which specific terms occur within a given proximity with a
higher weighting.
Soundex Keyword search
If the spelling of a keyword is not quite accurate but phonetically correct a Soundex keyword
search returns results that contain the keyword and phonetically similar keywords (using a
configurable Soundex algorithm).
Synonym queries
A synonym query returns results which are conceptually similar to the query's terms and / or
conceptually similar to the synonyms that are available for the query's terms.

Spelling Correction
IDOL server can automatically spell check query text that it receives and suggest correct
spelling for terms that it doesnt contain.

Summarization
IDOL server accepts a piece of content and returns a summary of the information. IDOL server
can generate different types of summary:
Conceptual summaries
Summaries that contain the most salient concepts of the content.
Contextual summaries
Summaries that relate to the context of the original inquiry - allowing the most applicable
dynamic summary to be provided in the results of a given inquiry.
Quick summaries
Summaries that comprise a few sentences of the result documents.

Page 12

An introduction to IDOL server

Taxonomy Generation
IDOL server's automatic taxonomy generation feature can automatically understand and create
deep hierarchical contextual taxonomies of information. Clustering or any other conceptual
operation can be used as a seed for the process. The resulting taxonomy can be used to
provide insight into specific areas of the information, provide an overall information landscape,
or as training material for automatic categorization, which then allows information to be placed
into a formally dictated and controlled category hierarchy.
Automatic taxonomy based on cluster result
Based on cluster results, IDOL server can use the cluster results to build taxonomies
automatically and in real time.
Automatic taxonomy to category generation
Once the automatic taxonomy generation process has taken place it contextually understands
the type of data it is dealing with. From this, a deep hierarchical contextual taxonomy is
generated known also as an information landscape. Much like the automatic cluster to category
generation, this feature takes the taxonomy results and uses that data to create categories (in
order to perform categorization of information using the Categorization operation).

Page 13

An introduction to IDOL server

System architecture
General
IDOL server uses the ACI (Autonomy Content Infrastructure) Client API to communicate with custombuilt applications that retrieve data using HTTP commands. This communication is implemented over
HTTP using XML and can adhere to SOAP.

Page 14

An introduction to IDOL server

Indexing and querying


Documents in IDX or XML format are indexed into IDOL server (directly or using a Connector). IDOL
server stores the concepts of the document, and in response to queries, agents, profiles or content
returns a link to the result document as well as a percentage weighting, which indicates how relevant
this result document is to the original query. It can return results not only as plain text but also as XML
(even if the document was not in XML format when it was indexed):

Page 15

An introduction to IDOL server

Security
Text queries
IDOL server contains data that has been aggregated from one or more repositories. In this example
each of the repositories has its own group server which stores the repositories' user names and the
groups that these users belong to. IDOL server aggregates this security information from the group
servers.
When a user logs on to a client his authentication details are sent to IDOL server which returns the
user's security details to the client where they are stored until the client logs off or his session times
out. Every time the user issues a text query from a client, his security details are attached to the query
string that is sent to IDOL server.
Using the security information in the query string, IDOL server checks if the user who has sent the
query is permitted to access the documents that match the query (matching the security string against
the documents' ACLs), and returns all matching documents that the user is permitted to see to the
client.

Agent, profile and category queries


IDOL server contains data that has been aggregated from one or more repositories. In this example
each of the repositories has its own group server which stores the repositories user names and the
groups that these users belong to. IDOL server aggregates this security information from the group
servers.
The client sends an agent, profile query to IDOL server. IDOL server (which stores all agents and
profiles) matches this agent, profile or category against the documents it contains. Using the
information it has received from the group servers, IDOL server checks if the user is permitted to
access the documents that match the agent, profile o category, and returns all matching documents
that the user is permitted to see to the client.

Page 16

An introduction to IDOL server


Community queries
IDOL server stores the agents and profiles of users, so that they can be matched against community
queries (that is any type of query that requires agents or profiles to be returned as results).
When a community query is issued from a client, it is sent to IDOL server which matches it against the
agents and profiles it stores and returns matching agents and/or profiles to the client.

Page 17

An introduction to IDOL server

IDOL server functionality matrix


You can communicate with IDOL server using ACI (Autonomy Content Infrastructure) action
commands.
To ensure backwards compatibility (if you are upgrading from a DRE 3), you can also use qmethods to
operate IDOL server. This allows you to operate IDOL server the same way you would a DRE 3.
However, not all IDOL server's functionality is available for qmethods. Also, in many cases IDOL
server will not respond to qmethods as quickly as to ACI action commands.
Depending on how you communicate with IDOL server the following functionality is available.

DRE 3

IDOL
server
qmethods

IDOL
server
ACI
actions

Concept matching

Agent creation

Agent matching

Agent retraining

Agent alerting

Profile creation

Profile matching

Profile retraining

Profile alerting

Categorization

Summarization

Clustering

Active matching

Retrieval

Natural Language queries

Page 18

Available functionality

An introduction to IDOL server

DRE 3

IDOL
server
qmethods

IDOL
server
ACI
actions

Available functionality

Boolean and bracketed Boolean queries

Fuzzy queries

Proximity search

Soundex keyword search (expanded in IDOL server)

Proper names queries

Thesaurus / Synonym queries

XML indexing

Handling of multiple languages

Fields printing

Storage of all fields by default

Automatic language detection at index time


IDOL server can automatically detect the language and
encoding of a buffer that you pass to it.

Automatic language detection at query time


IDOL server can automatically detect the language and
encoding of a piece of text.

Direct indexing of well formed XML with no import step

XML output of information and results

Scheduled expiration of individual documents

Scheduled backups of IDOL server

Scheduled compression of IDOL server

Defined log streams

Automatic detection of documents' built-in attributes and


properties

Page 19

An introduction to IDOL server

DRE 3

IDOL
server
qmethods

IDOL
server
ACI
actions

Available functionality

Compound sorting
Result sorting can be based on more than one field at a time.

Updated weighting algorithm


Relevance is calculated using the results of latest research.

Agentboolean matching
Boolean queries can be specified within a document, that
must be matched by the query text before it can be returned.

Inverted agent matching


Queries can be weighted as if the document were the query
text and the query text were in IDOL server.

Case sensitive matching


Field text can be specified so that it only matches documents
with exactly the same case.

Combine on any Reference field


Results can have duplicates removed by any Reference
field.

Highlighting
Terms or sentences, within a document or buffer of your
choice, can be highlighted if they satisfy certain criteria.

Restriction by encoding / language


Results can be restricted to a specific language, encoding or
language type.

Restriction by number of links


Results can be restricted to documents that contain a certain
number of link terms.

Restriction by document ID
Results can be restricted to fall in a certain range of
document ID.

Page 20

An introduction to IDOL server

DRE 3

IDOL
server
qmethods

IDOL
server
ACI
actions

Available functionality

Minimum term length


Query text terms that don't have a certain length can be
ignored.

Field printing
IDOL server can return specific fields, all fields or combine
content to a single field.

Sorting
Results can be sorted by any field numerically or
alphabetically, or by document ID, as well as the usual
methods.

Term information
IDOL server can retrieve the total number of occurrences,
the APCM weighting, and document occurrences of any or
all terms, sorted by any of the values.

Term information in documents


IDOL server can retrieve the number of occurrences, the
APCM weighting, and document occurrences of terms within
only a subset of IDOL server documents.

Result biasing
Results can be biased on a sliding scale of your choice,
according to your value in a certain field.

Bitwise field matching


Decimal as well as hexadecimal bitwise field matching.

Empty field detection


Field specifiers exist to detect whether a document does or
does not contain certain fields, and also to detect whether
the field is empty or not.

Exact term matching


Documents can be matched according to the exact form of a
term, prior to stemming.

Page 21

An introduction to IDOL server

DRE 3

IDOL
server
qmethods

IDOL
server
ACI
actions

Available functionality

Relevance of field text


Conceptual matching of the field text terms can be made to
affect the overall relevance, or indeed not to affect it.

Wildcard handling
Wildcard terms can now contain characters in all encodings.

Improved performance
IDOL servers performance is significantly faster than DRE 3
for many cases, including:

Field text with many terms

Field text with MATCH or TERM specifiers

Requesting the total result count

Matching of numerical terms within fields

Parametric refinement
A parametric search allows you to search for items by their
characteristics (values in certain fields).
Parametric field dependence
IDOL server allows you to find parametric fields that occur
together.
Parametric counts
You can enable parametric count to find out how many
documents contain a specific parametric value.

Spellchecker
IDOL server suggests spelling corrections for misspelled
word in queries.

Query summary
IDOL server returns result documents with a summary that
comprises their best terms and phrases.

Taxonomy generation
IDOL server creates taxonomies and stores them in
categories and/or an XML file.

Page 22

3. Installing IDOL server


System requirements
Note:

IDOL server should be installed by the system administrator.

You cannot run IDOL server with restricted file system permissions (for example disk
quotas, file handle limits or memory limits).

Your file system must permit file locking (this means that you cannot run IDOL server
on an NFS mount, for example).

Your network must support TCP/IP.

A single IDOL server can hold an approximate maximum of 8 million document


sections (depending on the functionality and performance required).

If you are running anti-virus software on the machine that hosts IDOL server, you
should ensure that it doesnt monitor the IDOL server directories as this can have a
serious impact on IDOL servers performance.

Supported platforms
Microsoft Windows NT4, 2000, XP and 2003
Linux (all versions) kernel 2.2, 2.4 and 2.6
Sun Solaris for SPARC versions 5 - 9
Sun Solaris for Intel version 9
AIX version 4.3, 5 and 5.1
HP-UX for PA-RISC version 10, 11 and 11i
HP-UX for Itanium version 11i
Tru64 version 5.1

Note:
if you are installing IDOL server on Solaris, you require the libiconv library file which you
can download from http://www.gnu.org/software/libiconv/.
IDOL server also supports other POSIX UNIX versions on request.

Page 23

Installing IDOL server

Minimum hardware specifications


1 Gb RAM
30 Gb Disk
1.5 GHz CPU

Recommended hardware specifications


a dedicated SCSI disk
4 Gb RAM
100 Gb Disk
a minimum of 2 dedicated CPU - XEON 3 GHz or above

Page 24

Installing IDOL server

Installing IDOL server on Windows


To install under Windows, double-click on WINDOWS_AutonomyIDOLserver-5.0.exe.
Read and follow all installation instructions on the screen carefully.

1.

The installation dialog is displayed.


Select the language in which you want to run the installer and click on OK.

2.

The installation opens with the Welcome dialog. Read the text and click on Next.

3.

The License Agreement dialog is displayed.


Read the license agreement. Select I accept the terms of the License Agreement and click on
Next.

4.

The Select Install Set dialog is displayed.


Check each component that you want to install, and click on Next:
IDOL Server
Installs Autonomy IDOL server.
DiSH License Server
Installs an Autonomy DiSH which is required for licensing.
Retina Web Application
Installs Autonomy Retina, a web application that provides a user interface for the
functionality IDOL server supplies.
Autonomy Service Dashboard Web Application
Installs an Autonomy Service Dashboard which allows you to administer, configure and
control Autonomy services.

5.

The Select Install Folder dialog is displayed.


Specify the directory in which you want to install IDOL server, and click on Next. By default IDOL
server is installed on C:\Autonomy\IDOLserver, but you can use the Choose button to navigate
to another location.

6.

The Select Shortcut Folder dialog is displayed.


Select a location for shortcut icons, and click on Next.

Page 25

Installing IDOL server


7.

If you selected to install IDOL server, the IDOL server Port Settings dialog is displayed.
Enter the following, and click on Next:
ACI Port
The port that client machines use to send action commands to IDOL server. By default this
is 9000.
This entry sets the Port parameter in IDOL servers configuration file.
Index Port
The port that administrative client machines use to index documents into IDOL server (and
to administer IDOL server). By default this is 9001.
This entry sets the IndexPort parameter in IDOL servers configuration file.
Service Port
Enter the port number that IDOL server will use for DiSH communication. By default this is
9002. Note that this port must not be used by any other service.
This entry sets the ServicePort parameter in IDOL servers configuration file.

8.

If you selected to install the DiSH server, the DiSH Server Port Settings dialog is displayed.
Enter the following, and click on Next:
ACI Port
The port that client machines use to send action commands to the DiSH server. By default
this is 20000.
This entry sets the Port parameter in the DiSH configuration file.
Service Port
Enter the port number by which service commands can be sent to DiSH. By default this is
20003. Note that this port must not be used by any other service.
This entry sets the ServicePort parameter in the DiSH configuration file.

9.

The Executable and Service Prefix dialog is displayed.


Enter a prefix for your installations services and executables in order to uniquely identify them. By
default, this is Autonomy.

10. The SMTP Server Details dialog is displayed.


Enter the following, and click on Next:
IP Address
The IP address of your SMTP servers host.
Port
The port on which your SMTP server listens for SMTP commands.
11. The Pre-Installation Summary dialog is displayed.
Check the settings you have made, and click on Install.

Page 26

Installing IDOL server


12. The Installing IDOL server dialog is displayed.
The progress of the installation process is indicated. If you want to abort the installation process,
click Cancel.
13. The Install Complete dialog is displayed. IDOL server has been installed successfully. Click on
Done to exit the installation.
14. Copy the licensekey.dat file to the DiSH subdirectory of your IDOL server installation. You can
obtain this file from Autonomy Support (see Licensing on page 40).
If you have installed Retina, you can now to deploy the Retina WAR file to your application server (the
war file is located in the webapps directory within your IDOL server installation). Please refer to your
Retina manual for details.

Page 27

Installing IDOL server

Directory structure
Once the installation of IDOL server is completed and you have started your IDOL server, your
installation directory contains the following files and subdirectories (note that bold font indicates
folders):

AutonomyServiceDashboard

Folder that contains Autonomy Service Dashboard files.

webapps

Folder that contains the autonomyservicedashboard.war file.

configuration

Folder that contains Retina and IDOL server configuration files.

idol.cfg

File that contains configuration settings that Retina uses to


communicate with your DiSH and IDOL server installation.

retina.cfg

Configuration file that contains Retina's settings.

DiSH

Folder that contains license server files.

audit

Folder that contains audit data.

documentTracking

Folder that contains document tracking data.

errors

Folder that contains internal data files.

graphs

Folder that contains generated graphs.

license

Folder that contains license data.


Note: you must not remove the content of this directory.

logs

Folder that contains log files for each configured log stream.

queue

Folder that contains internal data files.

uid

Folder that contains document tracking and license files.

DISH.exe

DiSH executable.

DISH.cfg

Configuration file that contains the DiSH's settings.

license.log

License log file.

licensekey.dat

License key that you transferred after installing IDOL server.

service.log

Service log file.

various internal files

Page 28

Installing IDOL server

IDOL

Folder that contains IDOL server files.

agentstore

Folder in which IDOL server stores agent and category data.

category

Folder in which category data is stored.

category

Folder that contains categories.

cluster

Folder that contains cluster data

2DMAPS

Folder that contains the 2D maps in gif format that have been
generated from clusters using the ClusterServe2DMap action.

CLUSTEREXPORT

Folder that contains the directory structure in which the cluster


documents are stored (in txt format) that have been written to
disk using the ClusterWriteToDisk action (this action has been
deprecated). Each folder in this directory structure contains one
cluster.

CLUSTERS

Folder that contains configuration files that store the WhatsHot


and WhatsNew information that has been generated from
snapshot data using the ClusterCluster action.

SGCLUSTDOCS

Folder that contains spectrograph documents.

SGDATA

Folder that contains the spectrograph data in sgd format


(internal proprietary format) that has been generated from a set
of snapshots using the ClusterSGDataGen action.

SGPICCACHE

Folder that caches the images (in gif format) that have been
generated from data sets using the ClusterSGPicServe action.

SNAPSHOTS

Folder that contains snapshots in binary cls format that have


been taken using the ClusterSnapshot action.

imex

Folder that contains category files to be imported and exported.

license

Folder that contains license data.


Note: you must not remove the content of this directory.

queue

Folder that stores queued actions.

taxonomy

Folder that contains taxonomy files

uid

Folder that contains document tracking and license files.


Note: you must not remove the content of this directory.

Page 29

Installing IDOL server

community
license

Folder that contains files for IDOL servers community features.


Folder that contains license data.
Note: you must not remove the content of this directory.

queue

Folder that contains internal data files.

temp

Folder that contains temporary files.

uid

Folder that contains document tracking and license files.

users

Folder that contains users.

content

Folder that in which content data is stored.

dynterm

Folder that contains a list of terms and their occurrences.

license

Folder that contains license data.


Note: you must not remove the content of this directory.

main

Folder that contains internal data files.

nodetable

Folder in which document content and metadata is stored.

numeric

Folder that contains memory mapped files for fast field text
operation on numeric fields.

queue

Folder that contains internal data files.

refindex

Folder that contains an index of document references.

status

Folder that contains items waiting to be indexed.

tagindex

Folder that contains an index of parametric values.

uid

Folder that contains document tracking and license files.

indextasks
incoming
langfiles
dic

Folder that files for pre-index processing.


Folder in which incoming data that is stored, if pre-index
processing is enabled.
Folder that contains language resource files (including stoplists,
sentence breaking files and Unicode conversion tables).
Folder that contain data for Japanese sentence breaking.

logs

Folder that contains log files for each configured log stream.

modules

Folder that contains internally used library files.

queue

Folder that contains internal data files.

Page 30

Installing IDOL server

templates

Folder that contains templates for generating alerts.

uid

Folder that contains document tracking and license files.

<InstallationName>.cfg

Configuration file that contains the IDOL server settings.

<InstallationName>.exe

IDOL server executable.

<InstallationName>.log

IDOL server log file.

<InstallationName>cfg.log

Text file that contains important information.

license.log

License log file.

service.log

Service log file.

various internal files

IDOLUninstallerData

Folder that contains IDOL server uninstaller files.

resource

Folder that contains resource files.

Uninstall.exe

Executable to uninstall IDOL server.

various internal files

jre

Folder that contains uninstaller files, generated by the installer.

UninstallerData

Folder that contains Retina uninstaller files.

resource

Folder that contains resource files.

Uninstall Retina 4.6.exe

Executable to uninstall IDOL server.

various internal files

webapps

Folder that contains the Retina.war file

IDOLserver_InstallLog.log

IDOL server installation log file.

Page 31

Installing IDOL server

Installing IDOL server on UNIX


To install under UNIX, run LINUX_AutonomyIDOLserver-5.0.bin -i console from the command line.
Note that if you are installing IDOL server on Solaris, you require the libiconv library file which you can
download from http://www.gnu.org/software/libiconv/.
Read and follow all installation instructions on the screen carefully.

1.

The console mode installation starts.


Enter the number of the language in which you want to run the installer and press Enter.

2.

The Welcome text is displayed. Read the text and press Enter.

3.

The first section of the License Agreement is displayed.


Read the section and press Enter to display the next section. Repeat this until you have read all
sections of the License Agreement, and then enter Y to indicate that you accept the License
Agreement terms.

4.

The Select Install Set text is displayed.


By default all components are installed (an X indicates that they are selected). Press Enter to
install all of the following components:
1

IDOL Server
Installs Autonomy IDOL server.

DiSH License Server


Installs an Autonomy DiSH which is required for licensing.

Retina Web Application


Installs Autonomy Retina, a web application that provides a user interface for the
functionality IDOL server supplies.

Autonomy Service Dashboard Web Application


Installs an Autonomy Service Dashboard which allows you to administer, configure
and control Autonomy services.

Alternativelt, if you dont want to install some of the components, enter a comma-separated list of
the components that you do not want to install, and press Enter.
5.

The Select Install Folder text is displayed.


By default IDOL server is installed on /Autonomy/IDOLserver. Press Enter to accept the default
location, or enter an absolute path to an alternative location.

Page 32

Installing IDOL server


6.

If you selected to install IDOL server, the IDOL server Port Settings are displayed.
Enter a number and press Enter for each of the following:
ACI Port
The port that client machines use to send action commands to IDOL server. By default this
is 9000.
This entry sets the Port parameter in IDOL servers configuration file.
Index Port
The port that administrative client machines use to index documents into IDOL server (and
to administer IDOL server). By default this is 9001.
This entry sets the IndexPort parameter in IDOL servers configuration file.
Service Port
Enter the port number that IDOL server will use for DiSH communication. By default this is
9002. Note that this port must not be used by any other service.
This entry sets the ServicePort parameter in IDOL servers configuration file.

7.

If you selected to install the DiSH server, the DiSH Server Port Settings are displayed.
Enter a number and press Enter for each of the following:
ACI Port
The port that client machines use to send action commands to the DiSH server. By default
this is 20000.
This entry sets the Port parameter in the DiSH configuration file.
Service Port
Enter the port number by which service commands can be sent to DiSH. By default this is
20003. Note that this port must not be used by any other service.
This entry sets the ServicePort parameter in the DiSH configuration file.

8.

The Executable Prefix text is displayed.


Enter a prefix for your installation in order to uniquely identify it, and press Enter. By default, this is
Autonomy.

9.

The SMTP Server Details are displayed.


Enter the details and press Enter for each of the following:
IP Address
The IP address of your SMTP servers host.
Port
The port on which your SMTP server listens for SMTP commands.

10. The Pre-Installation Summary is displayed.


Check the settings you have made, and press Enter.

Page 33

Installing IDOL server


11. The Ready To Install text is displayed.
Check that the installation directory is correct, and press Enter to start installing.
12. The Installation Complete dialog is displayed. IDOL server has been installed successfully.
Press Enter to exit the installation.
13. Copy the licensekey.dat file to the DiSH subdirectory of your IDOL server installation. You can
obtain this file from Autonomy Support (see Licensing on page 40).
If you have installed Retina, you can now to deploy the Retina WAR file to your application server (the
war file is located in the webapps directory within your IDOL server installation). Please refer to your
Retina manual for details.

Page 34

Installing IDOL server

Directory structure
Once the installation of IDOL server is completed and you have started your IDOL server, your
installation directory contains the following files and subdirectories (note that bold font indicates
folders):

AutonomyServiceDashboard

Folder that contains Autonomy Service Dashboard files.

webapps

Folder that contains the autonomyservicedashboard.war file.

configuration

Folder that contains Retina and IDOL server configuration files.

idol.cfg

File that contains configuration settings that Retina uses to


communicate with your DiSH and IDOL server installation.

retina.cfg

Configuration file that contains Retina's settings.

DiSH

Folder that contains license server files.

audit

Folder that contains audit data.

documentTracking

Folder that contains document tracking data.

errors

Folder that contains internal data files.

graphs

Folder that contains generated graphs.

license

Folder that contains license data.


Note: you must not remove the content of this directory.

logs

Folder that contains log files for each configured log stream.

queue

Folder that contains internal data files.

uid

Folder that contains document tracking and license files.

DISH.exe

DiSH executable.

DISH.cfg

Configuration file that contains the DiSH's settings.

license.log

License log file.

licensekey.dat

License key that you transferred after installing IDOL server.

service.log

Service log file.

various internal files

Page 35

Installing IDOL server

IDOL

Folder that contains IDOL server files.

agentstore

Folder in which IDOL server stores agent and category data.

category

Folder in which category data is stored.

category

Folder that contains categories.

cluster

Folder that contains cluster data

2DMAPS

Folder that contains the 2D maps in gif format that have been
generated from clusters using the ClusterServe2DMap action.

CLUSTEREXPORT

Folder that contains the directory structure in which the cluster


documents are stored (in txt format) that have been written to
disk using the ClusterWriteToDisk action (this action has been
deprecated). Each folder in this directory structure contains one
cluster.

CLUSTERS

Folder that contains configuration files that store the WhatsHot


and WhatsNew information that has been generated from
snapshot data using the ClusterCluster action.

SGCLUSTDOCS

Folder that contains spectrograph documents.

SGDATA

Folder that contains the spectrograph data in sgd format


(internal proprietary format) that has been generated from a set
of snapshots using the ClusterSGDataGen action.

SGPICCACHE

Folder that caches the images (in gif format) that have been
generated from data sets using the ClusterSGPicServe action.

SNAPSHOTS

Folder that contains snapshots in binary cls format that have


been taken using the ClusterSnapshot action.

imex

Folder that contains category files to be imported and exported.

license

Folder that contains license data.


Note: you must not remove the content of this directory.

queue

Folder that stores queued actions.

taxonomy

Folder that contains taxonomy files

uid

Folder that contains document tracking and license files.


Note: you must not remove the content of this directory.

Page 36

Installing IDOL server

community
license

Folder that contains files for IDOL servers community features.


Folder that contains license data.
Note: you must not remove the content of this directory.

queue

Folder that contains internal data files.

temp

Folder that contains temporary files.

uid

Folder that contains document tracking and license files.

users

Folder that contains users.

content

Folder that in which content data is stored.

dynterm

Folder that contains a list of terms and their occurrences.

license

Folder that contains license data.


Note: you must not remove the content of this directory.

main

Folder that contains internal data files.

nodetable

Folder in which document content and metadata is stored.

numeric

Folder that contains memory mapped files for fast field text
operation on numeric fields.

queue

Folder that contains internal data files.

refindex

Folder that contains an index of document references.

status

Folder that contains items waiting to be indexed.

tagindex

Folder that contains an index of parametric values.

uid

Folder that contains document tracking and license files.

indextasks
incoming
langfiles
dic

Folder that files for pre-index processing.


Folder in which incoming data that is stored, if pre-index
processing is enabled.
Folder that contains language resource files (including stoplists,
sentence breaking files and Unicode conversion tables).
Folder that contain data for Japanese sentence breaking (not
supported for Linux).

logs

Folder that contains log files for each configured log stream.

modules

Folder that contains internally used library files.

queue

Folder that contains internal data files.

Page 37

Installing IDOL server

templates

Folder that contains templates for generating alerts.

uid

Folder that contains document tracking and license files.

<InstallationName>.cfg

Configuration file that contains the IDOL server settings.

<InstallationName>.exe

IDOL server executable.

<InstallationName>.log

IDOL server log file.

<InstallationName>cfg.log

Text file that contains important information.

license.log

License log file.

service.log

Service log file.

various internal files

IDOLUninstallerData

Folder that contains IDOL server uninstaller files.

jre

Folder that contains uninstaller files, generated by the installer.

UninstallerData

Folder that contains Retina uninstaller files.

webapps

Folder that contains the Retina.war file

IDOLserver_InstallLog.log

IDOL server installation log file.

Page 38

Installing IDOL server

Deploying Retina to your application server


After you have installed IDOL server you need to deploy the Retina.war file to your application server
before you can use IDOL server via the Retina user interface. This file is located in the webapps
directory within your IDOL server installation.

Please refer to your Retina manual for details on how to deploy Retina.

Page 39

Installing IDOL server

Licensing
The licensing that enables you to run Autonomy solutions is facilitated by an Autonomy DiSH server.
You must have a running Autonomy DiSH server that resides on a machine with a static known IP
address, MAC address or Volume Name.
To obtain a license, you need to contact Autonomy Support and request a license file for your specific
installation. This license file is tied to the IP address and ACI port of your DiSH server, and cannot be
transferred between machines. When you receive this file from Autonomy Support save it as
licensekey.dat to the DiSH subdirectory of your IDOL server installation.
Note that you can revoke licenses at any time, for example, if you want to re-allocate them to different
clients or if you want to change a client's IP address.

Important
You MUST NOT:

change the IP address of the machine on which a licensed module is running (if you are using
an IP address to lock your license).

change the service port of a module without first revoking the license.

replace the network card of a client without first revoking the license.

remove the contents of the license and uid directories.

Any of the above will cause the module to become inoperable.


All modules produce a license.log and service.log file. If a product fails to start, you should examine
the contents of these files before submitting a support ticket.

Page 40

Installing IDOL server

Displaying license information


You can check which modules you are licensed for by sending the following command from a web
browser to the running DiSH server.
http://<DiSH_host>:<DiSH_ACI_port>/action=LicenseInfo
<DiSH_host>
The IP address of the machine on which DiSH resides.
<DiSH_ACI_port>
The ACI port of DiSH (this must be the Port specified in the DiSH configuration file's [Server]
section).
In response to this command DiSH returns the requested license information. In the following example,
you are licensed to run 2 instances of IDOL server and 1 instance of DiSH:
<?xml version="1.0" encoding="UTF-8" ?>
<autnresponse xmlns:autn="http://schemas.autonomy.com/aci/">
<action>LICENSEINFO</action>
<response>SUCCESS</response>
<responsedata>
<LicenseDiSH>
<LICENSEINFO>
<autn:Product>
<autn:ProductType>IDOLSERVER</autn:ProductType>
<autn:TotalSeats>2</autn:TotalSeats>
<autn:SeatsInUse>0</autn:SeatsInUse>
</autn:Product>
<autn:Product>
<autn:ProductType>DISH</autn:ProductType>
<autn:TotalSeats>1</autn:TotalSeats>
<autn:SeatsInUse>0</autn:SeatsInUse>
</autn:Product>
</LICENSEINFO>
</LicenseDiSH>
</responsedata>
</autnresponse>

Page 41

Installing IDOL server

Revoking a client license


Once you have set up your licensing, you can revoke licenses at any time, for example, if you want to
re-allocate them to different clients or if you want to change a client's IP address.
To revoke a license you must stop the Autonomy solution which is using the license and then run the
following command from a command prompt:
<ModuleInstallationDirectory><ModuleExecutableName>.exe revokelicense
configfile <file name>
This returns the license to the license server.

You can send the following command from a web browser to the running DiSH server in order to check
for free licenses.
http://<DiSH_host>:<DiSH_ACI_port>/action=LicenseInfo
In response to this command DiSH returns the requested license information. In the following example,
one IDOL server license is available for allocation to a client:
<autn:Product>
<autn:ProductType>IDOLSERVER</autn:ProductType>
<autn:Client>
<autn:IP>192.123.51.23</autn:IP>
<autn:ServicePort>1823</autn:ServicePort>
<autn:IssueDate>1063192283</autn:IssueDate>
<autn:IssueDateText>10/09/2003 12:11:23</autn:IssueDateText>
</autn:Client>
<autn:TotalSeats>2</autn:TotalSeats>
<autn:SeatsInUse>1</autn:SeatsInUse>
</autn:Product>

Page 42

Installing IDOL server

Forcibly revoking licenses from inaccessible clients


If a client machine has become inaccessible, it is possible to revoke a license by sending the following
ACI command to the DiSH server. This allows you to free up the license from the inaccessible
machine.
Note: you must only call this function on inaccessible client machines, otherwise the module will shut
down and become inaccessible.
http://<DiSH_host>:<DiSH_ACI_port>/action=AdminRevokeLicense&ClientProductType=<prod
uct_type>&ClientIP=<client_host>&ClientServicePort=<client_service_port>

<DiSH_host>
The IP address of the machine on which DiSH resides.
<DiSH_ACI_port>
The ACI port of DiSH (this must be the Port specified in the DiSH configuration file's [Server]
section).
<product_type>
The product type of the Autonomy solution whose license you want to revoke from the
inaccessible client.
<client_host>
The IP address of the inaccessible client.
<client_service_port>
The port by which service commands are sent to the Autonomy solution on the inaccessible
client (this is set by the ServicePort parameter in the Autonomy module configuration file's
[Service] section).

Page 43

Installing IDOL server

Troubleshooting licensing errors


Error : Failed to update license from the license server. Your license cache details do not
match the current service configuration. Shutting the service down.
The configuration of the service has been altered. Check that the service port and IP address has not
changed since the service started.

Error : License for <PRODUCT_NAME> is invalid. Exiting.


The license returned from the DiSH server is invalid. Ensure that the license has not expired.

Error : failed to connect to license server, using cached licensed details


Cannot communicate with the DiSH server. The product will still run for a limited period but you should
check whether your DiSH server is still available.

Error : failed to connect to license server. Error code is SERVICE:<ERROR_CODE>


Failed to retrieve a license from the DiSH server or from the backup cache. Ensure that your DiSH
server can be contacted.

Error : failed to decrypt license keys. Please contact Autonomy support. Error code is
SERVICE:<ERROR_CODE>
Contact Autonomy Support and provide them with the exact error message and your license file.

Error : failed to update the license from the license server. Shutting down
Failed to retrieve a license from the DiSH server or from the backup cache. Ensure that your DiSH
server can be contacted.

Error : your license keys are invalid. Please contact Autonomy support. Error code is
SERVICE:<ERROR_CODE>
Your license keys appear to be corrupt. Contact Autonomy Support and provide them with the exact
error message and your license file.

Failed to revoke license : No license to revoke from server


The DiSH server cannot find a license to revoke.

Page 44

Installing IDOL server


Failed to revoke license from server
<LICENSE_SERVER_HOST>:<LICENSE_SERVER_PORT>. Error code is <ERROR_CODE>
Failed to revoke a license from the DiSH server. Contact Autonomy Support and provide them with the
exact error message.

Failed to revoke license from server. An instance of this application is already running. Please
stop the other instance first
You cannot revoke a license from a running service. Stop the service and try again.

Failed to revoke license. Error code is SERVICE:<ERROR_CODE>


Failed to revoke a license from the DiSH server. Contact Autonomy Support and provide them with the
exact error message.

Your license keys are invalid. Please contact Autonomy support. Error code is ACISERVER:<ERROR_CODE>
Failed to retrieve a license from the DiSH server. Contact Autonomy Support and provide them with
the exact error message and your license file.

Your product ID does not match the generated ID.


Your installation appears to be corrupt. Forcibly revoke the license from the DiSH server and rename
your license and uid directories.

Your product ID does not match this configuration


The service port for the module or the IP address for the machine appears to have changed. Check
your configuration file.

Page 45

Installing IDOL server

Distributing IDOL server


You can distribute IDOL server in order to spread the work load it handles and optimize its
performance.
Whether you need to distribute IDOL server depends on a number of considerations, for example:

Hardware
IDOL servers performance depends on your system hardware (operating system, disk,
system memory and so on).

Document section size


IDOL server indexes documents in sections (the number of sections that a document is split
up into increases proportionally with the size of the document). This ensures that when you,
for example, query for text that is relevant to a specific part of a book, IDOL server can find
the appropriate section and return it to you (if the book was not indexed in sections, IDOL
server might not be able to find the text you are looking for, as it may not be conceptually
relevant to the entire book).
The more document sections IDOL server stores, the more space and memory it requires.

Document types
IDOL server requires less space for storing documents with high image content than for
documents that contain a lot of textual information.

Query types
Field text or parametric queries, for example, require more processing power than a simple
text query.

Query performance
Your system architecture is dependent on your requirement for response time and the
number of concurrent users.

For specific sizing requirements, please consult the Autonomy Sizing Service (you can contact the
Autonomy Sizing Service via Autonomy Support).

Page 46

Installing IDOL server

Distributing IDOL server: An example


The following example guides you through the process of setting up a simple distributed IDOL server
setup, comprising of two IDOL server installations:
Main operations IDOL server

Data IDOL server

This IDOL server stores users, agents,


profiles, categories and carries out
classification operations.

This IDOL server is used to store the data


content that queries, agents, profiles and
categories retrieve.
In addition, this IDOL server can execute
tasks on data before indexing (see
Processing data before indexing it on
page 73).

Note: to set up a distributed IDOL server installation, you need a license for each IDOL server instance
that you want to install.

1.

Install the IDOL server that you want to be the main operations server. During the installation,
select to install all the Autonomy solutions that the installer comprises.

2.

Install the IDOL server that you want to be the data server. Only install the IDOL server (not DiSH
or web applications) and point it to the DiSH that you have installed with the main operations IDOL
server.

3.

Provide DiSH with a license key for the two servers.

4.

Open the configuration file of the Data IDOL server in a text editor.

5.

Check if the configuration file contains any of the following sections, and remove them if they are
present (these are sections that are not relevant to data storage operations):
[User]
[UserSecurityFields]
[UserSecurity]
Including all its subsections:
[Autonomy]
[NT]
[Notes]
[LDAP]
[Documentum]
[Exchange]
[Netware]
[Role]

Page 47

Installing IDOL server


[Agent]
[Profile]
[ProfileNamedAreas]
[Community]
[UserCustom]
Including all its subsections:
[Email]
[UserStructure]
[Category]
[Cluster]
[Taxonomy]
[AnalysisSchedules]
[AnalysisSchedule<N>]
6.

Save and close the configuration file.

7.

Open the configuration file of the Main operations IDOL server in a text editor.

8.

Check if the configuration file contains any of the following sections, and remove them if they are
present (these are sections that are not relevant to data storage operations):
[TermCache]
[IndexCache]
[SectionBreaking]
Databases]
[Database<N>]
[FieldProcessing]
Including all its subsections:
[SetIndexFields]
[SetIndexAndWeightHigher]
[SetSectionBreakFields]
[SetDateFields]
[SetDatabaseFields]
[SetReferenceFields]
[SetTitleFields]
[SetHighlightFields]
[SetSourceFields]
[DetectNT_V4Security]
[DetectNotes_V4Security]
[DetectNetware_V4Security]
[DetectExchange_V4Security]

Page 48

Installing IDOL server


[DetectDocumentum_V4Security]
[HideAutonomyMetaDataField]
[Properties]
Including all its subsections:
[IndexFields]
[IndexWeightFields]
[SectionFields]
[DateFields]
[DatabaseFields]
[ReferenceFields]
[TitleFields]
[HighlightFields]
[SourceFields]
[SecurityNT_V4]
[SecurityNotes_V4]
[SecurityNetware_V4]
[SecurityExchange_V4]
[SecurityDocumentum_V4]
[HideMetaDataFields]
[Security]
Including all its subsections:
[NT_V4]
[Netware_V4]
[Notes_V4]
[Exchange_V4]
[Documentum_V4]
9.

Add a [DataDRE] section to the configuration file, and use the Host and ACIPort settings to
specify the location and port of the Data IDOL server. This allows the Main operations IDOL
server to communicate with the Data IDOL server.
For example:
[DataDRE]
Host=1.23.45.6
ACIPort=9000

10. Save and close the configuration file.


11. Start the DiSH license server.
12. Start the Data IDOL server.
13. Start the Main operations IDOL server. You are now ready to index data into the Data IDOL
server.

Page 49

Installing IDOL server

Upgrading to IDOL server


You can upgrade to IDOL server 5 by exporting data and users from:

DRE4 or higher

UAServer 4 or higher

Classification Server 4 or higher

previous versions of IDOL server

In order to upgrade you need to execute the following steps:


1.

Export data

2.

Copy categories, taxonomies and clusters to IDOL server 5

3.

Copy users to IDOL server 5

4.

Import data into IDOL server 5

5.

Synchronize IDOL server 5

Step 1: Export data


From a DRE version 4.0 or higher
If you are upgrading from a DRE version 4.0 or higher, you need to export the data that is stored in
your DRE. The method you have to use to do this depends on which file types you indexed into your
DRE.
If you indexed IDX files or a mix of IDX and XML files into your DRE
Note: the method outlined in the following steps ensures that the sections into which your data has
been indexed are preserved. If you do not use sectioning or it is not important to you, it is
recommended that you use the List action (see If you indexed XML files into your DRE on
page 51) to export your data instead. The List action will change the number of sections that IDOL
server contains once the data has been transferred.
1.

Open the DREs contentbody_limitedfields.txt template (located in the DRE


installations templates directory) in a text editor.

2.

Check if the template lists all the fields that you want to export. If it does not, you need to
set up each field that you want to export in addition.

Page 50

Installing IDOL server


Set up each additional field that you want to export before the #DRECONTENT field,
using the following format:
#DREFIELD MyField="<!-- ATNMY_FIELD PREFIX="" SUFFIX="" MATCH = "*/
MyField" TRIM="0" ESCAPE="0" -->"
For example, to set up an AUTHOR field:
#DREFIELD AUTHOR="<!-- ATNMY_FIELD PREFIX="" SUFFIX="" MATCH = "*/
AUTHOR" TRIM="0" ESCAPE="0" -->"
3.

Save the template file as contentbody.txt (if your templates directory already contains a
contentbody.txt file, you need to delete this first).

4.

Issue the following command from your web browser:


http://<DRE_host>:<query_port>/
qmethod=G&xoptions=outfile=<file.idx>+nocombine
<DRE_host>
Enter the IP address (or name) of the machine on which the DRE is installed.
<query_port>
Enter the port number that client machines use to communicate with the DRE (this is
specified by the QueryPort setting in the DRE configuration file's [Server] section).
<file.idx>
Enter the name of the file to which you want to export the DRE's content (the file you
specify must have the extension idx). The file will be created in the DRE's installation
directory.

5.

Stop the DRE after the command has finished (you can check this using
http://<DRE_host>:<ACI_Port>/action=GetRequestLog).

If you indexed XML files into your DRE


1.

Issue the following command from your web browser:


http://<DRE_host>:<ACI_port>/action=List&output=File&fileName=<file.xml>
Note: you can also use this command to export IDX files or a mix of IDX and XML files
from your DRE. However, this will change the number of sections that IDOL server
contains once the data has been transferred.
<DRE_host>
Enter the IP address (or name) of the machine on which the DRE is installed.
<ACI_port>
Enter the port number that client machines use to communicate with the DRE (this is
specified by the Port setting in the DRE configuration file's [Server] section).

Page 51

Installing IDOL server


<file.xml>
Enter the name of the file to which you want to export the DRE's content (the file you
specify must have the extension xml). The file will be created in the DRE's
installation directory.
2.

Stop the DRE after the command has finished (you can check this using
http://<DRE_host>:<ACI_Port>/action=GetRequestLog).

From an IDOL server 4.5


If you are upgrading from an IDOL server 4.5, you need to export the data that is stored in IDOL
servers Suir component. The method you have to use to do this depends on which file types you
indexed into your IDOL servers Suir.

If you indexed IDX files or a mix of IDX and XML files into your IDOL server Suir
Note: the method outlined in the following steps ensures that the sections into which your data has
been indexed are preserved. If you do not use sectioning or it is not important to you, it is
recommended that you use the List action (see If you indexed XML files into IDOL server
Suir on page 53) to export your data instead. The List action will change the number of sections
that IDOL server contains once the data has been transferred.
1.

Open IDOL servers contentbody_limitedfields.txt template (located in the IDOL


server installations templates directory) in a text editor.

2.

Check if the template lists all the fields that you want to export. If it does not, you need to
set up each field that you want to export in addition.
Set up each additional field that you want to export before the #DRECONTENT field,
using the following format:
#DREFIELD MyField="<!-- ATNMY_FIELD PREFIX="" SUFFIX="" MATCH = "*/
MyField" TRIM="0" ESCAPE="0" -->"
For example, to set up an AUTHOR field:
#DREFIELD AUTHOR="<!-- ATNMY_FIELD PREFIX="" SUFFIX="" MATCH = "*/
AUTHOR" TRIM="0" ESCAPE="0" -->"

3.

Save the template file as contentbody.txt (if your templates directory already contains a
contentbody.txt file, you need to delete this first).

4.

Issue the following command from your web browser:


http://<IDOL server_host>:<query_port>/
qmethod=G&xoptions=outfile=<file.idx>+nocombine
<IDOL server_host>
Enter the IP address (or name) of the machine on which the IDOL server Suir is
installed.

Page 52

Installing IDOL server


<query_port>
Enter the port number that client machines use to communicate with the IDOL server
Suir (this is specified by the QueryPort setting in the IDOL server Suir configuration
file's [Server] section).
<file.idx>
Enter the name of the file to which you want to export the IDOL server Suir content
(the file you specify must have the extension idx). The file will be created in the IDOL
server Suir installation directory.
5.

Stop IDOL server Suir after the command has finished (you can check this using
http://<DRE_host>:<ACI_Port>/action=GetRequestLog).

If you indexed XML files into IDOL server Suir


1.

Issue the following command from your web browser:


http://<IDOL server_host>:<ACI_port>/action=List&output=File&fileName=<fil
e.xml>
Note: you can also use this command to export IDX files or a mix of IDX and XML files
from IDOL server Suir. However, this will change the number of sections that the IDOL
server contains once the data has been transferred.
<IDOL server_host>
Enter the IP address (or name) of the machine on which the IDOL server Suir is
installed.
<ACI_port>
Enter the port number that client machines use to communicate with the IDOL server
Suir (this is specified by the Port setting in the IDOL server Suir configuration file's
[Server] section).
<file.xml>
Enter the name of the file to which you want to export the IDOL server Suir content
(the file you specify must have the extension xml). The file will be created in the
IDOL server Suir installation directory.

2.

Stop the IDOL server Suir after the command has finished (you can check this using
http://<IDOL server_host>:<ACI_Port>/action=GetRequestLog).

Page 53

Installing IDOL server

Step 2: Copy categories, taxonomies and clusters to IDOL


server 5
If you are upgrading from a Classification Server or IDOL server that contains categories, taxonomies
or clusters which you want to keep, you need to copy the following subdirectories to your IDOL server
5 installation.
If you your old installation does not contain any categories, taxonomies or clusters that you want to
keep, you can skip Step 2 and continue with Step 3.

If you're upgrading from:

copy the content of the


following subdirectories:

to:

Classification Server 4.0 or higher

CATEGORY

IDOL\category\category

CLUSTER

IDOL\category\cluster

TAXONOMY

IDOL\category\taxonomy

LAUNE\LAUNE\CATEGORY

IDOL\category\category

a previous version of IDOL server

LAUNE\LAUNE\CLUSTER

IDOL\category\cluster

LAUNE\LAUNE\TAXONOMY

IDOL\category\taxonomy

Step 3: Copy users to IDOL server 5


If you are upgrading from a UAServer or IDOL server that contains users which you want to keep, you
need to copy the following subdirectories to your IDOL server 5 installation.
If you your old installation does not contain users that you want to keep, you can skip Step 3 and
continue with Step 4.

If you're upgrading from:

copy the content of the


following subdirectories:

to:

UAServer 4.0 or higher

UAServer\data

IDOL\community\users

a previous version of IDOL server

NORE\NORE\data

IDOL\community\users

Page 54

Installing IDOL server

Step 4: Import data into IDOL server 5


If you are upgrading from a DRE version 4.0 or higher
1.

Start IDOL server.

2.

Issue the following command from your web browser:


http://<IDOL server_host>:<Index_port>/DREADD?<file.idx_or_file.xml>
<IDOL server_host>
Enter the IP address (or name) of the machine on which IDOL server is installed.
<Index_port>
Enter the port number by which index commands are sent to IDOL server (this is specified by
the IndexPort setting in the IDOL server configuration file's [Server] section).
<file.idx_or_file.xml>
Enter the name of the file (including the path to the file) to which you exported your DRE's
content in Step 6: Export data. This must not contain any spaces.

3.

Stop IDOL server once the process has finished (you can check this using
http://<IDOL server_host>:<ACI_Port>/action=IndexerGetStatus).

If you are upgrading from an IDOL server 4.5


1.

Start IDOL server.

2.

Issue the following command from your web browser:


http://<IDOL server_host>:<Index_port>/DREADD?<file.idx_or_file.xml>
<IDOL server_host>
Enter the IP address (or name) of the machine on which IDOL server is installed.
<Index_port>
Enter the port number by which index commands are sent to IDOL server (this is specified by
the IndexPort setting in the IDOL server configuration file's [Server] section).
<file.idx_or_file.xml>
Enter the name of the file (including the path to the file) to which you exported your IDOL server
Suir content in Step 6: Export data. This must not contain any spaces.

3.

Stop IDOL server once the process has finished (you can check this using
http://<IDOL server_host>:<ACI_Port>/action=IndexerGetStatus).

Page 55

Installing IDOL server

Step 5: Synchronize IDOL server 5


Note: you only need to execute this step if you executed Step 2 or Step 3. If you did not execute either
of these steps, you can skip Step 5 and are ready to run IDOL server 5.

1.

Start IDOL server.

2.

If you executed Step 2 to copy categories, taxonomies or clusters to IDOL server 5, issue the
following command from your web browser in order to index all your categories:
http://<host>:<ACI_port>/action=CategorySyncCatDre
<host>
Enter the IP address (or name) of the machine on which IDOL servers Category index is
located.
<ACI_port>
Enter the port number by which action commands are sent to IDOL server (this is specified by
the Port setting in the IDOL server configuration file's [Server] section).

3.

If you executed Step 3 to copy users to IDOL server 5, issue the following command from your
web browser in order to index all your users agents and profiles.
http://<host>:<ACI_port>/action=Index
<host>
Enter the IP address (or name) of the machine on which IDOL servers Agent index is located.
<ACI_port>
Enter the port number by which action commands are sent to IDOL server (this is specified by
the Port setting in the IDOL server configuration file's [Server] section).

4.

You have now finished upgrading and can run your IDOL server.

Page 56

Installing IDOL server

Requesting support
Check that content has been moved successfully to IDOL server 5. If IDOL server does not behave as
expected, please check your log files and contact Autonomy Support.

To issue a support ticket:


1.

Enter the following URL in your web browser's Address field:


http://automater.autonomy.com

2.

Enter your Username and Password, and click on the Login button.

3.

Click on the New Request menu option and issue your ticket (see
http://automater.autonomy.com/helpdesk/help/Submitting_a_new_support_request.htm
for details).

To contact Autonomy Support by email:


Send an email to your support team.
Europe:

uksupport@autonomy.com

USA:

support@us.autonomy.com

Page 57

Installing IDOL server

Page 58

4. Running IDOL server


Starting and stopping IDOL server
Starting IDOL server
Once you have installed IDOL server, you are ready to run it.

To start IDOL server:


1.

Start the DiSH licensing server by doing one of the following:

by double-clicking on the <InstallationName>.exe file in your installation directory (for NT)

using the start script (for UNIX)

using services (for NT):


1. Display the Windows Services dialog.
2. Select the <InstallationName>DiSH service, and click on the Start button to start
IDOL server.
3. Click on the Close button to close the Services dialog.

2.

Start IDOL server by doing one of the following:

by double-clicking on the <InstallationName>.exe file in your installation directory (for NT)

using the start script (for UNIX)

using services (for NT):


1. Display the Windows Services dialog.
2. Select the <InstallationName>IDOL server service, and click on the Start button to
start IDOL server.
3. Click on the Close button to close the Services dialog.

Page 59

Running IDOL server

Stopping IDOL server


You can stop IDOL server from running using:

the stop script (for UNIX)

services (for NT):

1.

Display the Windows Services dialog.

2.

Select the <InstallationName>IDOL server service, and click on the Stop button to stop
IDOL server.

3.

Click on the Close button to close the Services dialog.

the service port:


Send the following command to IDOL server's service ports (you need to have specified a
service port in IDOL server's configuration file):
http://<host>:<Service_Port>/action=stop
<host>
The IP address (or name) of the machine on which IDOL server is running.
<Service_Port>
IDOL server's service port (which is specified in the [Service] section of the IDOL server
configuration file).

Page 60

Running IDOL server

Sending action commands to IDOL server


Displaying online help
Enter the following command to display help on IDOL server action commands:

http://<host>:<port>/action=Help
<host>
Enter the IP address (or name) of the machine on which IDOL server is installed.
<port>
Enter the ACI port by which commands are sent to IDOL server (this is specified by the Port
setting in the IDOL server configuration file's [Server] section).

Example:

http://12.3.4.56:4000/action=Help

This command uses port 4000 to request Help on action commands from IDOL server which is located
on a machine with the IP address 12.3.4.56.

Note: to display help on configuration settings, click on the config help link in the top right-hand
corner (see Displaying help on configuration settings on page 389).

Page 61

Running IDOL server

Action command syntax


IDOL server is queried via action commands which you can send from your web browser. The general
syntax of these commands is as follows:

http://<host>:<port>/action=<action>&<mandatory_parameters>&<optional_parameters>

<host>
Enter the IP address (or name) of the machine on which IDOL server is installed.
<port>
Enter the ACI port by which commands are sent to IDOL server (this is set by the Port
parameter in the IDOL server configuration file's [Server] section).
<action>
Enter the name of the action that you want IDOL server to execute (for example, Query).
<mandatory_parameters>
Enter the parameters that the action that you have specified requires (not all actions require
parameters).
<optional_parameters>
You can enter optional parameters for the action that you have specified (optional parameters
are not available for all actions).

Note: you must separate individual parameters with an ampersand.

Page 62

5. Before storing content in IDOL server


IDOL server stores the content of documents in its Data index (by default this comprises the IDOL
server databases News and Archive). The process of storing content in IDOL server is called indexing.
Before you start to index files into IDOL server you need to:

decide how you want to store content

set up field indexing

configure IDOL server to process required languages

optimize the indexing process according to your system

You can also configure IDOL server to process documents that it receives (for example from an
Autonomy connector) before it indexes them. You can set up a simple process by configuring IDOL
server to execute a single task on incoming documents, or set up a complex process by configuring
IDOL server to combine a number of tasks.
The available tasks allow you to do one or more of the following:

execute an ACI action

alert users to documents that match their agents

categorize documents

extract information from unstructured data and store it in structured fields

modify document fields

write files to disk

send an http call

import and categorize the legacy profiles in BIF files

evaluate the quality of files produced as a result of optical character recognition

route documents to different tasks

index documents

See Processing data before indexing it on page 73 for details.

Page 63

Before storing content in IDOL server

Storing content
Disabling content storage
If you dont require IDOL server to return the content of fields or summaries with results, you can set
NodeTableStoreContent in IDOL server configuration files [Server] section to false in order to save
the memory that the storing of fields normally requires.
If you disable content storage, the performance of the following actions is affected:

GetContent

Only the references and the title of results are returned.

GetTagValues

Disabled.

List

Only the references and the title of results are returned.

Query

Only the references and the title of results are returned. You cannot
restrict by fields.

Suggest

Only the references and the title of results are returned. You cannot
restrict by fields.

SuggestOnText

Only the references and the title of results are returned. You cannot
restrict by fields.

Summarize

Disabled for indexed documents. Summaries can only be generated if


text is supplied.

TermGetBest

IDOL server saves a documents best terms on indexing. These are the
only terms available.

Storing IDOL server's data files on multiple disks


If your IDOL server becomes too big to be stored on one disk (as the terms, references, content and so
on that it stores increase in size), you can store its data files across multiple partitions within your PC in
order to gain space.
For example:
[PATHS]
DyntermPath=C:\autonomy\idolserver\dynterm
NodetablePath=D:\autonomy\idolserver\nodetable
RefIndexPath=E:\autonomy\idolserver\refindex
MainPath=F:\autonomy\idolserver\main
TagPath=.G:\autonomy\idolserver\tagindex

Page 64

Before storing content in IDOL server

Allocating files to IDOL server databases


You can configure IDOL server to read from a field in the document into which database it should index
this document.

To configure IDOL server to read a documents database from a field:


1.

Open IDOL serves configuration file in a text editor.

2.

In the [FieldProcessing] section, list a process that identifies database fields.


For example:
[FieldProcessing]
Number=3
0=MyFirstProcess
1=MySecondProcess
2=DatabaseFields

3.

Create a section for the database field identifying process, in which you create a property for the
process (a property is later defined by one or more applicable configuration parameters). Identify
the fields that you want to associate with the process.
You can use the PropertyMatch parameter to identify a specific value that fields must have in
order to be processed.
Note: the properties that you create must not have the same name as processes.
For example:
[MyFirstProcess]
Property=MyFirstProperty
PropertyFieldCSVs=*/MyField,*/MySecondField
PropertyMatch=*myString*
[MySecondProcess]
Property=MySecondProperty
PropertyFieldCSVs=*/MyOtherField,*/MyOtherSecondField
[DatabaseFields]
Property=Database
PropertyFieldCSVs=*/DREDBNAME,*/DB,*/Database

4.

List the properties that you have created in a [Properties] section.


For example:
[Properties]
0=MyFirstProperty
1=MySecondProperty
2=Database

Page 65

Before storing content in IDOL server


5.

Create a section for your indexing property in which you set the DatabaseType parameter to true.
For example:
[MyFirstProperty]
HiddenType=true
[MySecondProperty]
Index=true
[Database]
DatabaseType=TRUE

6.

Save IDOL servers configuration file and restart your IDOL server in order to execute your
changes.

Page 66

Before storing content in IDOL server

Setting up field indexing


Autonomy connectors aggregate content and metadata, process it and then index it into IDOL server in
the form of fields. In order to improve IDOL server's performance, you should divide these fields into
the following groups:

prevented from storing


Prevent IDOL server from storing fields that you do not want to query by setting the
CantHaveCSVs parameter in your IDOL server configuration file, or by adding the
CantHaveFields parameter to your DREADD or DREADDDATA indexing command.

Index fields
Store fields that contain text which you want to query frequently as Index fields. Index fields
are processed linguistically when they are stored in IDOL server. This means that stemming
and stoplists are applied to text in Index field before they are stored, which allows IDOL
server to process queries for these fields more quickly (typically DRETITLE and
DRECONTENT are fields that should be set up as Index fields).
You should not store URLs or content that you are unlikely to use in Index fields. You should
also not store fields as Index fields that will be queried frequently but whose value is only ever
going to be queries in its entirety. It is more efficient to query such values using a field
specifier (for example, MATCH).
Indexing all fields in documents could potentially slow down the indexing process, increase
disk usage and requirements.
See Index fields on page 285 for details on how to set up Index fields.

numeric fields
Store fields that contain numerical values or dates as numeric fields and numeric date fields.
When these fields are indexed, IDOL server stores them in a fast-look-up table in memory
which enables it to return the fields more quickly.
See Numerical fields on page 289 and NumericDateType fields on page 287 for details on
how to set up numeric and numeric date fields.

Page 67

Before storing content in IDOL server

FieldCheckType fields
If a large number of the documents that you want to store in IDOL server contains a field
whose entire value will frequently be used to restrict results, you should store this field as a
FieldCheckType field. When this field is indexed, IDOL server stores it in a fast-look-up table
in memory which enables it to return the field more quickly.
See FieldCheckType fields on page 291 for details on how to set up FieldCheckType fields.

ordinary fields
By default IDOL server stores all fields that are not identified as special fields as ordinary
fields.

Note: you can query all stored fields using field specifiers in field text queries (see Field text
queries on page 199). Index fields can also be queried using text queries.

Page 68

Before storing content in IDOL server

Indexing XML attributes


You can index XML attributes in the same way you index ordinary fields, however, you need to refer to
them using the following format in order for IDOL server to be able to read them:

*/<tag_name>/_ATTR_<attribute_name>
<tag_name>
Enter the name of the tag.
<attribute_name>
Enter the name of the attribute you want IDOL server to read.

For example:

<FARM ANIMAL="sheep" COLOR="white"> Farmer Joe </FARM>


To identify the ANIMAL attribute to IDOL server, you need to refer to it as:
*/FARM/_ATTR_ANIMAL
To identify the COLOR attribute to IDOL server, you need to refer to it as:
*/FARM/_ATTR_COLOR

<ROOM Name="The Kitchen">


<FURNITURE>Table</FURNITURE >
<ITEM Type="China">Dish</ITEM>
</ROOM>
To identify the Name attribute to IDOL server, you need to refer to it as:
*/ROOM/_ATTR_Name
To identify the Type attribute to IDOL server, you need to refer to it as:
*/ITEM/_ATTR_Type

Page 69

Before storing content in IDOL server


To store XML attributes in Index fields:
1.

Open IDOL server's configuration file in a text editor.

2.

List an indexing process in the [FieldProcessing] section.


For example:
[FieldProcessing]
Number=2
0=MyFirstProcess
2=IndexingFields

3.

Create a section for the indexing process, in which you create a property for the process (a
property is later defined by one or more applicable configuration parameters). Identify the fields
that you want to associate with the processes.
Note: the properties that you create must not have the same name as processes.
For example:
[MyFirstProcess]
Property=MyFirstProperty
PropertyFieldCSVs=*/MyField,*/MySecondField
PropertyMatch=*myString*
[IndexingFields]
Property=IndexFields
PropertyFieldCSVs=*/FIELD/_ATTR_ANIMAL,*/FIELD/_ATTR_COLOR,*/
ROOM/_ATTR_Name,*/ITEM/_ATTR_Type

4.

List the properties that you have created in a [Properties] section.


For example:
[Properties]
0=MyFirstProperty
2=IndexFields

5.

Create a section for your indexing property in which you set the Index parameter to true.
For example:
[MyFirstProperty]
HiddenType=true
[IndexFields]
Index=true

6.

Save IDOL servers configuration file and restart your IDOL server in order to execute your
changes.

Page 70

Before storing content in IDOL server

Configuring IDOL server to process required


languages
Before you index documents that contain different languages into IDOL server, you need to configure it
to recognize the language and encoding of documents, so it can deal with them appropriately.
If your IDOL server license includes Automatic Language Detection and you have set
AutoDetectLanguagesAtIndex to true in the IDOL server configuration file [Server] section, IDOL
server automatically identifies the language and encoding of a document when it is indexed.
If your license does not include this functionality, you have to specify the language and encoding of
documents that you are indexing into IDOL server or set up a field process in the IDOL server
configuration file's [FieldProcessing] section that allows IDOL server to read the language of a
document from one of its fields.
For details on how to configure IDOL server to process multiple languages, please see Languages on
page 307.

Page 71

Before storing content in IDOL server

Optimizing indexing
The speed of the indexing process is usually less critical than the speed of the query process,
however, with large amounts of data being indexed into IDOL server, it is still important to improve the
efficiency of the process where possible. In addition, the way you configure the indexing process can
have effects on the efficiency of the query process.

The indexing process


The indexing process works in 2 stages:
1.

IDOL server creates a representation of the new data in the index cache.

2.

The cache is synchronized with data that IDOL server currently contains, and the new data is
stored on disk and removed from the index cache.

When you are scheduling indexing, you should consider this chapter's recommendations on IDOL
server content (particularly on selecting fields to be indexed), and on running indexing and querying
processes at different times. In addition, the delayed synchronization feature allows you to change the
stage at which the index cache is synchronized with IDOL server, depending on whether your priority is
achieving fast query speeds or making new information available to the user as quickly as possible.

Delayed synchronization
The delayed synchronization feature allows you to select how the index cache is synchronized with
IDOL servers data. This is useful in systems where indexing tasks are scheduled at times when IDOL
server is also handling queries.
By default, synchronization is occurs as soon as a representation of data has been made in the index
cache. New data is available to the user (as query results) quickly, so you should use this setting in
systems where up-to-date data is the priority. But synchronization uses resources that IDOL server
could otherwise use for querying. Delayed synchronization reduces the impact of this effect by
collecting multiple data representations in the index cache and then synchronizing them all with IDOL
server's data in one go. This is useful in systems where query speed is more important than having upto-date data.
Note: delayed synchronization is recommended if you are indexing a lot of small files (files that are
smaller than 100MB).
The following parameter in the [Server] section of IDOL server's configuration file allows you to specify
whether the indexing process uses delayed synchronization:
DelayedSync
Enter true if you want IDOL server to delay synchronization. If you set DelayedSync to true,
IDOL server only stores data on disk when:

the index cache is full

the index cache contains some data and the time out specified by MaxSyncDelay has
expired

Page 72

Before storing content in IDOL server

Processing data before indexing it


IDOL server can process documents that it receives (for example from an Autonomy connector) before
it indexes them. You can set up a simple process by configuring IDOL server to execute a single task
on incoming documents, or set up a complex process by configuring IDOL server to combine a number
of tasks.
IDOL server can execute the following tasks:
ACI

to execute an action command

Alert

to alert users to new documents that IDOL server has received, if these
documents are similar to agents that the users own.

Cat

to categorize incoming documents.

Educe

to extract information embedded in unstructured data and store it in structured


fields (see Eduction on page 167).

FieldOp

to modify the content of fields, or add new fields to documents.

FileWriter

to write incoming documents to disk.

HTTP

to send an HTTP call out to a web interface (for example, you can connect to a
third-party web application in order to store your data on a legacy SQL database).

LP

to import and categorize legacy profiles from BIF files.

OCR

to evaluate the quality of files produced as a result of optical character


recognition, and to route the files according to their quality.

In addition IDOL server can execute the following tasks:


Route

if you want to combine multiple tasks, you can use Route tasks in order to specify
conditions that determine which task IDOL server executes next (you can, for
example, use a Route task to route documents that IDOL server receives to an
OCR task or a Cat task).

Index

if you want to index data that has been processed by tasks into IDOL server, you
need to use an Index task.

Page 73

Before storing content in IDOL server

Setting up tasks to process data before indexing


To set up tasks
1.

Open the IDOL server configuration file in a text editor.

2.

In the [Server] section, use the StartTask parameter to specify the first task that you want IDOL
server to execute on incoming data.

3.

Create a section for the specified StartTask and for any other task that you want IDOL server to
execute before indexing.
Note:

you can give each task section a name of your choice; the type of task that each section
contains is identified by the Module parameter.

refer to the online help for details on which settings are available for the different task
types (see Displaying help on configuration settings on page 389).

you can use the NextTask and OnFailureTask settings to determine which task IDOL
server executes next, after it has carried out a task.

you can set up a Route task which allows you to direct data to appropriate processes
depending on fields that the data contains.

set up an Index task if you want to index data into IDOL servers Data index after it has
been processed.

4.

Save and close the configuration file.

5.

Restart IDOL server to execute your changes.

Page 74

Before storing content in IDOL server

Example: Automatic generation of titles for documents


before indexing

In this example, IDOL server is instructed to execute the MyACITask on any documents that it
receives.
The MyACITask automatically generates titles for incoming documents. Every time IDOL server
performs this task on an incoming document, it executes a Summarize action with the parameter
Summary set to Concept, the parameter Sentences set to 1 and the parameter Text set to the
content of the incoming document's DRETITLE field:
action=Summarize&Summary=Concept&Sentences=1&Text=<DRECONTENT_value>
When the action returns its result, IDOL server creates a DRETITLE field in the document and uses it
to store the content of the result's autn:summary field.

Configuration:
[Server]
StartTask=MyACITask
[MyACITask]
Module=ACI
Action=Summarize
Params=Summary,Sentences
Values=Concept,1
Fields=DRECONTENT
ReMapToFields=Text
XMLPaths=autnresponse/responsedata/autn:summary
XMLFieldNames=DRETITLE
NextTask=MyIndexTask
[MyIndexTask]
Module=Index

Page 75

Before storing content in IDOL server

Example: Automatic generation of titles for documents


before indexing into a different IDOL server

In this example, IDOL server is instructed to execute the MyACITask on any documents that it
receives.
The MyACITask automatically generates titles for incoming documents. Every time IDOL server
performs this task on an incoming document, it executes a Summarize action with the parameter
Summary set to Concept, the parameter Sentences set to 1 and the parameter Text set to the
content of the incoming document's DRETITLE field:
action=Summarize&Summary=Concept&Sentences=1&Text=<DRECONTENT_value>
When the action returns its result, IDOL server creates a DRETITLE field in the document and uses it
to store the content of the result's autn:summary field. The document is then indexed into a different
IDOL server specified using the IdolServer setting. This servers index port is requested via its ACI
port, so that the document to be indexed can automatically be routed to the correct index port.

Configuration:
[Server]
StartTask=MyACITask
[MyACITask]
Module=ACI
Action=Summarize
Params=Summary,Sentences
Values=Concept,1
Fields=DRECONTENT
ReMapToFields=Text
XMLPaths=autnresponse/responsedata/autn:summary
XMLFieldNames=DRETITLE
NextTask=MyIndexTask

Page 76

Before storing content in IDOL server


[MyIndexTask]
Module=Index
IdolServer=123.4.5.67:8000

Page 77

Before storing content in IDOL server

Example: Categorizing data and adding it to a legacy SQL


database

In this example, IDOL server is instructed to execute the MyCatTask on any documents that it
receives.
The MyCatTask matches documents that it receives against categories that IDOL servers Category
index contains and returns matching categories. It then tags the incoming documents according to
which categories they match, and forwards them to the MyHTTPTask.
The MyHTTPTask maps incoming documents Category and DRECONTENT fields to equivalent
fields in a SQL database, and sends the following http call via a web interface to this SQL database:
http://sqlengine/insert?Category=<Category field value>&Content=<Content field value>
This http call stores the content of the specified fields in the SQL database.
Configuration:
[Server]
StartTask=MyCatTask
[MyCatTask]
Module=Cat
TextFields=DRECONTENT
TagField=CategoryTag
NextTask=MyHTTPTask
[MyHTTPTask]
Module=HTTP
URL=http://sqlengine/insert
Fields=Category,DRECONTENT
RemapToFields=Category,Content
Page 78

Before storing content in IDOL server

Example: Simple routing of documents according to type

In this example, IDOL server is instructed to execute the MyRouteTask on any documents that it
receives.
The MyRouteTask checks if incoming documents contain an OCR field. If they do, the documents are
forwarded to the MyOCRTask, otherwise they are forwarded to the MyCatTask.
The MyOCRTask evaluates the quality of the files that contain an OCR field. Files whose quality is
satisfactory are forwarded to the MyIndexTask. Files whose quality is unsatisfactory are forwarded to
the MyFileWriterTask which writes the files to disk.
The MyCatTask matches documents that it receives from the MyRouteTask against categories that
IDOL servers Category index contains and returns matching categories. It then tags the incoming
documents according to which categories they match, and forwards them to the MyIndexTask.
The MyIndexTask indexes the files it receives into IDOL servers Data index.

Configuration:
[Server]
StartTask=MyRouteTask
[MyRouteTask]
Module=Route
Condition=Exists
Parameter1=OCR
OnTrueTask=MyOCRTask
OnFalseTask=MyCatTask

Page 79

Before storing content in IDOL server


[MyOCRTask]
Module=OCR
GoodTask=MyIndexTask
BadTask=MyFileWriterTask
[MyFileWriterTask]
Module=FileWriter
BatchSize=10
OutputDirectory=C:\Autonomy\IDXFiles
[MyCatTask]
Module=Cat
TextFields=DRECONTENT
TagField=CategoryTag
NextTask=MyIndexTask
[MyIndexTask]
Module=Index

Page 80

Before storing content in IDOL server

Example: Advanced routing of documents according to type

In this example, IDOL server is instructed to execute the MyFirstRouteTask on any documents that it
receives.
The MyFirstRouteTask checks if incoming documents contain an OCR field. If they do, the
documents are forwarded to the MyOCRTask, otherwise they are forwarded to the
MySecondRouteTask.
The MyOCRTask evaluates the quality of the files that contain an OCR field. Files whose quality is
satisfactory are forwarded to the MyIndexTask. Files whose quality is unsatisfactory are forwarded to
the MyFileWriterTask which writes the files to disk.
The MySecondRouteTask checks if documents that it receives are BIF files (for example, by
checking if they contain a BIF field). If they are, the documents are forwarded to the MyLPTask,
otherwise they are forwarded to the MyCatTask.
The MyLPTask converts the legacy profiles in the BIF files that it receives from the
MySecondRouteTask, and stores them in IDOL servers Category index.
The MyCatTask matches documents that it receives from the MySecondRouteTask against
categories that IDOL servers Category index contains and returns matching categories. It then tags
the incoming documents according to which categories they match, and forwards them to the
MyIndexTask.
The MyIndexTask indexes the files it receives into IDOL servers Data index.

Page 81

Before storing content in IDOL server


Configuration:
[Server]
StartTask=MyFirstRouteTask
[MyFirstRouteTask]
Module=Route
Condition=Exists
Parameter1=OCR
OnTrueTask=MyOCRTask
OnFalseTask=MySecondRouteTask
[MyOCRTask]
Module=OCR
GoodTask=MyIndexTask
BadTask=MyFileWriterTask
[MyFileWriterTask]
Module=FileWriter
BatchSize=10
OutputDirectory=C:\Autonomy\IDXFiles
[MySecondRouteTask]
Module=Route
Condition=Exists
Parameter1=BIF
OnTrueTask=MyLPTask
OnFalseTask=MyCatTask
[MyLPTask]
Module=LP
ProfileField=MyField
CategoryNameField=name
DocField=MyDoc
Timeout=60
[MyCatTask]
Module=Cat
TextFields=DRECONTENT
TagField=CategoryTag
NextTask=MyIndexTask
[MyIndexTask]
Module=Index

Page 82

6. Storing content in IDOL server


IDOL server stores the content of documents in its Data index (by default this comprises the IDOL
server databases News and Archive). The process of storing content in IDOL server is called indexing.
Only files in XML or IDX format can be indexed into IDOL server. If the data that you want to index into
IDOL server is in XML format, you can index it directly into IDOL server (using a DREADD or
DREADDDATA command), without having to import it first.
If your data is not in XML format, you can import it into XML or IDX format:

using a connector
The Autonomy connectors (for example, File System Fetch, HTTPFetch, Oracle Fetch
and so on) allow you to retrieve documents from different repositories and import them
into IDX file format only. Please refer to the appropriate connector manual for further
information on how to import documents.

manually
You can create a text file in XML or IDX format (see Appendix D: manually creating
IDX files on page 431), which contains the information that you want to index into your
IDOL server in specific IDOL server fields.

Once documents have been imported into XML or IDX file format, you can index them into IDOL
server:

using a connector
The Autonomy connectors allow you to index the IDX files that they have created into the
IDOL server that they connect to. Please refer to the appropriate connector manual for
further information on how to index documents.

directly
You can index XML and IDX files into an IDOL server using an HTTP request that you
can issue from your web browser.

Note: depending on where the data that IDOL server indexes is located, the indexing process takes
place in the following order:
IDOL server indexes a locally accessible
file:

IDOL server receives data over the indexing


port:

1.

IDOL server receives a filename.

1.

2.

IDOL server opens the file and reads the


data.

IDOL server receives a stream of data over


the port.

2.

IDOL server saves the data locally.

The indexing process takes place

3.

IDOL server opens the file and reads the


data.

4.

The indexing process takes place

3.

Page 83

Storing content in IDOL server

Index commands
Index commands are used by Autonomy connectors to index data into IDOL server. You can also use
them to directly index data into IDOL server.
Note: before you index data into IDOL server, you should consider the points outlined in Storing
content in IDOL server on page 83.

DREADD: directly indexing IDX and XML files


http://<host>:<port>/DREADD?<mandatory_parameter>&<optional_parameters>
The DREADD command (case sensitive) allows you to index IDX or XML files that are located on the
same machine as IDOL server directly into an IDOL server. Note that parameters that you use with
DREADD override any equivalent settings that you may have specified in IDOL server's configuration
file.
Command parameters:
Mandatory:

<file_name> or <path>
DREDbName=<database_name>

Optional:

ACLFields=<ACL_fields>
CantHaveFields=<forbidden_fields>
DatabaseFields=<database_fields>
DateFields=<date_fields>
Delete
DocumentDelimiters=<doc_delimiters>
DocumentFormat=<doc_format>
ExpiryDateFields=<expiry_date_fields>
FlattenIndexFields=<fields>
IDXFieldPrefix=<prefix>
IndexFields=<index_fields>
KeepExisting=<true/false>
KillDuplicates=<kill_duplicates_option>
LanguageFields=<language_fields>

Page 84

Storing content in IDOL server

LanguageType=<language_type>
MustHaveFields=<required_fields>
SectionFields=<section_fields>
SecurityFields=<security_fields>
SecurityType=<security_type>
TitleFields=<title_fields>
<host>
Enter the IP address (or name) of the machine on which IDOL server is installed.
<port>
Enter the IndexPort that you have specified in the IDOL server configuration files [Server]
section.
<file_name>
The IDX or XML file that you want to index.
<path>
The full path to the IDX or XML file that you want to index.
DREDbName=<database_name>
The IDOL server database into which you want the document to be indexed. You dont need to
specify this, if you your IDX or XML files already contain a database field (IDOL server is by default
configured to read from this field which database files should be indexed into).
<optional_parameters>
You can enter one or more of the following parameters (note that you must separate individual
parameters with an ampersand):
ACLFields=<ACL_fields>
Allows you to specify the fields in the document from which you want IDOL server to read ACLs
(Access Control Lists).
If you want to specify multiple fields you must separate them with commas (there must be no
space before or after a comma). You can use wildcards.
When identifying fields you should use the format /FieldName to match root-level fields, */
FieldName to match all fields except root-level or /Path/FieldName to match fields that the
specified path points to. If you just specify the FieldName, IDOL server automatically adds a */
to it.

Page 85

Storing content in IDOL server


For example:
&ACLFields=*/AUTONOMYMETADATA
In this example, IDOL server reads ACLs from any fields that are called
AUTONOMYMETADATA.

CantHaveFields=<forbidden_fields>
Allows you to specify the fields in XML documents that are discarded before the documents is
indexed. By default all fields are stored in IDOL server.
If you want to specify multiple fields you must separate them with commas (there must be no
space before or after a comma). You can use wildcards.
When identifying fields you should use the format /FieldName to match root-level fields, */
FieldName to match all fields except root-level or /Path/FieldName to match fields that the
specified path points to. If you just specify the FieldName, IDOL server automatically adds a */
to it.
For example:
&CantHaveFields=*/StandardHeader
In this example, any StandardHeader fields that a document contains are discarded before the
document is indexed.
DatabaseFields=<database_fields>
Allows you to specify the fields in the document that contain the name of the database in which
you want the document to be stored.
If you want to specify multiple fields you must separate them with commas (there must be no
space before or after a comma). You can use wildcards.
When identifying fields you should use the format /FieldName to match root-level fields, */
FieldName to match all fields except root-level or /Path/FieldName to match fields that the
specified path points to. If you just specify the FieldName, IDOL server automatically adds a */
to it.
For example:
&DatabaseFields=Document/DREDBName,*/myDB
In this example, IDOL server indexes the document into the database with the name that is
contained in any DREDBName field below the Document level and with the name that is
contained in any fields called myDB.

Page 86

Storing content in IDOL server


DateFields=<date_fields>
Allows you to specify the fields in the document from which you want IDOL server to read the
document's date.
If you want to specify multiple fields you must separate them with commas (there must be no
space before or after a comma). You can use wildcards.
When identifying fields you should use the format /FieldName to match root-level fields, */
FieldName to match all fields except root-level or /Path/FieldName to match fields that the
specified path points to. If you just specify the FieldName, IDOL server automatically adds a */
to it.
For example:
&DateFields=Document/DREDate,*/myDocDate
In this example, IDOL server extracts dates from any fields that are called DREDate that are
contained below the Document level and from any fields that are called myDocdate.
Delete
IDOL server deletes the file that you are indexing after it has been indexed.

DocumentDelimiters=<doc_delimiters>
Allows you to specify the fields in a file that indicates the beginning and end of a document, so
the documents are indexed individually. Make sure that document delimiters are not nested.
If you want to specify multiple fields, you must separate them with commas (there must be no a
comma). You can use wildcards.
When identifying fields you should use the format /FieldName to match root-level fields, */
FieldName to match all fields except root-level or /Path/FieldName to match fields that the
specified path points to. If you just specify the FieldName, IDOL server automatically adds a */
to it.
For example:
&DocumentDelimiters=*/DOCUMENT,*/SPEECH
In this example, the beginning and end of individual documents in a file is marked by opening
and closing DOCUMENT and SPEECH tags.

DocumentFormat=<doc_format>
If a document that you are indexing has an ambiguous format that IDOL server cannot easily
identify as XML or IDX, DocumentFormat allows you to specify the format of the file. Enter
XML or IDX.

Page 87

Storing content in IDOL server


ExpiryDateFields=<expiry_date_fields>
Allows you to specify the fields in the document that contain the expiry date of the document
(that is the date when the document is deleted, unless you have set ExpireIntoDatabase in
IDOL server's configuration file to move the document into another database).
If you want to specify multiple fields you must separate them with commas (there must be no
space before or after a comma). You can use wildcards.
When identifying fields you should use the format /FieldName to match root-level fields, */
FieldName to match all fields except root-level or /Path/FieldName to match fields that the
specified path points to. If you just specify the FieldName, IDOL server automatically adds a */
to it.
For example:
&ExpiryDateFields=Document/DREExpiryDate,*/myExpiryDate
In this example, IDOL server reads the expiry date from any DREExpiryDate field below the
Document level and from any fields called myExpiryDate.
FlattenIndexFields=<fields>
Allows you to specify the fields in a hierarchically structured document whose content you want
to index as one level.
If you want to specify multiple fields you must separate them with commas (there must be no
space before or after a comma). You can use wildcards.
When identifying fields you should use the format /FieldName to match root-level fields, */
FieldName to match all fields except root-level or /Path/FieldName to match fields that the
specified path points to. If you just specify the FieldName, IDOL server automatically adds a */
to it.
For example:
<documents>
<article id="_21498602">
<url>http://example.com/21490.html</url>
<hltext_display>The history of pharmacogenetics </hltext_display>
<source>Science Online</source>
<media_type>text</media_type>
<subject>
<text>The prologue to pharmacogenetics began to play out around 1850 and
spanned some 60 years into the 1900s.</text>
<text>In 1953, the molecular basis of heredity, the double helix of DNA, was
described.</text>
</subject>
<valid_time>Jul 13 2001 5:00AM</valid_time>
</article>
</documents>

Page 88

Storing content in IDOL server


If you specify FlattenIndexFields=*/subject, and index the above, any content that a subject
field or a field within a subject field comprises is indexed as this subject field's content.
If you now query for a particular term in the subject field that is actually contained in a level
below the subject field, for example the term "pharmacogenetics", the above text is returned. If
you had not flattened the subject field the query would fail, as the subject field itself does not
contain this term.

IDXFieldPrefix=<prefix>
When you index an IDX file it is transformed into XML by placing it under the Document
subtree (each of the IDX file's fields is prefixed with Document, so that a simple XML hierarchy
is constructed). If you don't want this subtree to be called Document, IDXFieldPrefix allows
you to specify an alternative name.
IndexFields=<index_fields>
Allows you to specify the fields in the document that you want to index explicitly into IDOL
server. Indexing fields explicitly optimizes the query process when you restrict queries using
these fields. Index fields should hold data that is particularly significant to you (for example the
title of the document), and that you are likely to use frequently in order to restrict queries.
If you want to specify multiple fields you must separate them with commas (there must be no
space before or after a comma). You can use wildcards.
When identifying fields you should use the format /FieldName to match root-level fields, */
FieldName to match all fields except root-level or /Path/FieldName to match fields that the
specified path points to. If you just specify the FieldName, IDOL server automatically adds a */
to it.
For example:
&IndexFields=*/DRECONTENT,*/DRETITLE
In this example, the DRECONTENT and DRETITLE field in documents are explicitly indexed
into IDOL server.
KeepExisting=<true/false>
If you have set KillDuplicates to Reference, ReferenceMatch<N> or <FieldName>, you can
set KeepExisting to true if you want IDOL server to discard the document it has received for
indexing and keep the matching document that it already contains instead.

Page 89

Storing content in IDOL server


KillDuplicates=<kill_duplicates_option>
You can enter one of the following options to determine how IDOL server handles duplicate
text. Note that if you postfix any of these options with =2, the KillDuplicates process is applied
to all IDOL server databases (rather than just the database into which the currrent IDX or XML
file is being indexed):
NONE
Documents in IDOL server are never replaced with new documents.
REFERENCE
If IDOL server receives a document for indexing that has the same DREREFERENCE
field value as a document that IDOL server already contains, IDOL server deletes the
document that it already contains and replaces it with the new one.
REFERENCEMATCH<N>
If IDOL server receives a document for indexing whose content is more than <N>
percent similar to the content of a document that this IDOL server database already
contains, IDOL server deletes the document that it already contains and replaces it with
the new one.
<FieldName>
If IDOL server receives a document for indexing that contains a <FieldName>
Reference field, which has the same content as the <FieldName> Reference field in a
document that IDOL server already contains, IDOL server deletes the document that it
already contains and replaces it with the new one.
You can specify multiple Reference fields, in which case IDOL server deletes
documents that contain any of the specified fields with identical content. If you want to
specify multiple Reference fields, you must separate them with a plus symbol, a space
or an underscore symbol.
Note: fields are identified as Reference fields through field processes in the IDOL server
configuration file (see Reference fields on page 293). If you use a <FieldName>
Reference field to eliminate duplicate documents, IDOL server automatically reads any
fields that are listed alongside this field for the PropertyFieldCSVs parameter in the
field process, and also uses these fields to eliminate duplicate documents. If you want to
define multiple reference fields but dont want them all to be used for document
elimination, you need to set up multiple field processes (see Using Reference fields to
eliminate duplicate copies of documents during indexing on page 105).
If you dont set KillDuplicates, it defaults to the option that you have specified for
KillDuplicates in the IDOL server configuration files [Server] section.

Page 90

Storing content in IDOL server


LanguageFields=<language_fields>
Allows you to specify the fields in the document that contain the language type of the
document.
If you want to specify multiple fields you must separate them with commas (there must be no
space before or after a comma). You can use wildcards.
When identifying fields you should use the format /FieldName to match root-level fields, */
FieldName to match all fields except root-level or /Path/FieldName to match fields that the
specified path points to. If you just specify the FieldName, IDOL server automatically adds a */
to it.
For example:
&LanguageFields=Document/DRELanguageType,*/myLanguageType
In this example, IDOL server reads the language type of documents from any
DRELanguageType field below the Document level and any myLanguageType fields.

LanguageType=<language_type>
Allows you to specify the language type of documents (if the document does not contain fields
from which IDOL server can read the language type of the document).
For example:
&LanguageType=myEnglish
In this example, the file is indexed with the language type myEnglish. The way IDOL server
handles this language type is determined by the way it has been defined in IDOL server's
configuration file (that is by the settings that you have associated with this language type in the
configuration file).

MustHaveFields=<required_fields>
Allows you to specify the fields in a document (IDX only) that are stored in IDOL server. By
default all fields are stored in IDOL server. Document fields that are not listed are discarded
which means that they cannot be queried or printed.
If you want to specify multiple fields you must separate them with commas (there must be no
space before or after a comma). You can use wildcards.
When identifying fields you should use the format /FieldName to match root-level fields, */
FieldName to match all fields except root-level or /Path/FieldName to match fields that the
specified path points to. If you just specify the FieldName, IDOL server automatically adds a */
to it.
For example:
&MustHaveFields=*/DRECONTENT,*/DRETITLE
In this example, IDOL server only stores a document's DRECONTENT and DRETITLE fields.

Page 91

Storing content in IDOL server


SectionFields=<section_fields>
Allows you to specify the fields in the document that indicate the start of a new section in the
document (if you are indexing XML, you dont need to specify section fields as IDOL server
automatically sections XML data).
If you want to specify multiple fields you must separate them with commas (there must be no
space before or after a comma). You can use wildcards.
When identifying fields you should use the format /FieldName to match root-level fields, */
FieldName to match all fields except root-level or /Path/FieldName to match fields that the
specified path points to. If you just specify the FieldName, IDOL server automatically adds a */
to it.
For example:
&SectionFields=Document/DRESection,*/mySection
In this example, any DRESection field below the Document level and any mySection fields
indicate the start of a new section.

SecurityFields=<security_fields>
Allows you to specify the fields in the document that contain the security type of the document.
If you want to specify multiple fields you must separate them with commas (there must be no
space before or after a comma). You can use wildcards.
When identifying fields you should use the format /FieldName to match root-level fields, */
FieldName to match all fields except root-level or /Path/FieldName to match fields that the
specified path points to. If you just specify the FieldName, IDOL server automatically adds a */
to it.
For example:
&SecurityFields=Document/DRESecurity,*/mySecurity
In this example, IDOL server reads the security type of documents from any DRESecurity field
below the Document level and any mySecurity fields.

SecurityType=<security_type>
Allows you to specify the security type of documents (for example, if the document does not
contain fields from which IDOL server can read the security type of the document).
For example:
&SecurityType=mySecurity
In this example, the file is indexed with the security type mySecurity. The way IDOL server
handles this security type is determined by the way it has been defined in IDOL server's
configuration file (that is by the settings that you have associated with this security type in the
configuration file).

Page 92

Storing content in IDOL server


TitleFields=<title_fields>
Allows you to specify the field in the document from which you want IDOL server to read the
document's title.
If you want to specify multiple fields you must separate them with commas (there must be no
space before or after a comma). You can use wildcards.
When identifying fields you should use the format /FieldName to match root-level fields, */
FieldName to match all fields except root-level or /Path/FieldName to match fields that the
specified path points to. If you just specify the FieldName, IDOL server automatically adds a */
to it.
For example:
&TitleFields=*/DRETITLE
In this example, IDOL server reads a document's title from its DRETITLE field.

Page 93

Storing content in IDOL server

DREADDDATA: indexing data over a socket


DREADDDATA?<optional_parameters><data>#DREENDDATA&<killduplicates_option>
Note: This command requires a POST request method
The DREADDDATA command (case sensitive) allows you to index data over a socket into an IDOL
server. Note that parameters that you use with DREADDDATA override any equivalent settings that
you may have specified in IDOL server's configuration file.

Command parameters:

Mandatory:

<data>

Optional:

ACLFields=<ACL_fields>
CantHaveFields=<forbidden_fields>
DatabaseFields=<database_fields>
DateFields=<date_fields>
Delete
DocumentDelimiters=<doc_delimiters>
DocumentFormat=<doc_format>
DREDbName=<database_name>
ExpiryDateFields=<expiry_date_fields>
FlattenIndexFields=<fields>
IDXFieldPrefix=<prefix>
IndexFields=<index_fields>
KeepExisting=>true/false>
LanguageFields=<language_fields>
LanguageType=<language_type>
MustHaveFields=<required_fields>
SectionFields=<section_fields>
SecurityFields=<security_fields>
SecurityType=<security_type>
TitleFields=<title_fields>

Page 94

Storing content in IDOL server

<killduplicates_option>

NONE
REFERENCE
REFERENCEMATCH<N>
<FieldName>

<data>
The data that you want to index. This has to be in IDX or XML format.
<optional_parameters>
You can enter one or more of the following parameters (note that you must separate individual
parameters with an ampersand):
ACLFields=<ACL_fields>
Allows you to specify the fields in the document from which you want IDOL server to read ACLs
(Access Control Lists).
If you want to specify multiple fields you must separate them with commas (there must be no
space before or after a comma). You can use wildcards.
When identifying fields you should use the format /FieldName to match root-level fields, */
FieldName to match all fields except root-level or /Path/FieldName to match fields that the
specified path points to. If you just specify the FieldName, IDOL server automatically adds a */
to it.
For example:
&ACLFields=*/AUTONOMYMETADATA
In this example, IDOL server reads ACLs from any fields that are called
AUTONOMYMETADATA.
CantHaveFields=<forbidden_fields>
Allows you to specify the fields in XML data that are discarded before the data is indexed.
If you want to specify multiple fields you must separate them with commas (there must be no
space before or after a comma). You can use wildcards.
When identifying fields you should use the format /FieldName to match root-level fields, */
FieldName to match all fields except root-level or /Path/FieldName to match fields that the
specified path points to. If you just specify the FieldName, IDOL server automatically adds a */
to it.
For example:
&CantHaveFields=*/StandardHeader
In this example, any StandardHeader field that a document contains is discarded before the
data is indexed.

Page 95

Storing content in IDOL server


DatabaseFields=<database_fields>
Allows you to specify the fields in the data that contain the name of the database in which you
want the data to be stored.
If you want to specify multiple fields you must separate them with commas (there must be no
space before or after a comma). You can use wildcards.
When identifying fields you should use the format /FieldName to match root-level fields, */
FieldName to match all fields except root-level or /Path/FieldName to match fields that the
specified path points to. If you just specify the FieldName, IDOL server automatically adds a */
to it.
For example:
&DatabaseFields=Document/DREDBName,*/myDB
In this example, IDOL server indexes the data into the database with the name that is contained
in any DREDBName field below the Document level and with the name that is contained in
any fields called myDB.
DateFields=<date_fields>
Allows you to specify the fields in the data from which you want IDOL server to extract the date.
If you want to specify multiple fields you must separate them with commas (there must be no
space before or after a comma). You can use wildcards.
When identifying fields you should use the format /FieldName to match root-level fields, */
FieldName to match all fields except root-level or /Path/FieldName to match fields that the
specified path points to. If you just specify the FieldName, IDOL server automatically adds a */
to it.
For example:
&DateFields=Document/DREDate,*/myDocDate
In this example, IDOL server extracts dates from any fields that are called DREDate that are
contained below the Document level and from any fields that are called myDocdate.
Delete
IDOL server deletes the data after it has been indexed.

Page 96

Storing content in IDOL server


DocumentDelimiters=<doc_delimiters>
Allows you to specify the fields in a file that indicate the beginning and end of a document, so
the documents are indexed individually. Make sure that document delimiters are not nested.
If you want to specify multiple fields you must separate them with commas (there must be no
space before or after a comma). You can use wildcards.
When identifying fields you should use the format /FieldName to match root-level fields, */
FieldName to match all fields except root-level or /Path/FieldName to match fields that the
specified path points to. If you just specify the FieldName, IDOL server automatically adds a */
to it.
For example:
&DocumentDelimiters=*/DOCUMENT,*/SPEECH
In this example, the beginning and end of individual documents in a file is marked by opening
and closing DOCUMENT and SPEECH tags.

DocumentFormat=<doc_format>
If data that you are indexing has an ambiguous format that IDOL server cannot easily identify
as XML or IDX, DocumentFormat allows you to specify the format of the data. Enter XML or
IDX.

DREDbName=<database_name>
Allows you to specify the IDOL server database into which you want the data to be indexed.

ExpiryDateFields=<expiry_date_fields>
Allows you to specify the fields in the data that contain the expiry date of the data (that is the
date when the data is deleted, unless you have set ExpireIntoDatabase in IDOL server's
configuration file to move the data to another database).
If you want to specify multiple fields you must separate them with commas (there must be no
space before or after a comma). You can use wildcards.
When identifying fields you should use the format /FieldName to match root-level fields, */
FieldName to match all fields except root-level or /Path/FieldName to match fields that the
specified path points to. If you just specify the FieldName, IDOL server automatically adds a */
to it.
For example:
&ExpiryDateFields=Document/DREExpiryDate,*/myExpiryDate
In this example, IDOL server reads the expiry date from any DREExpiryDate field below the
Document level and from any fields called myExpiryDate.

Page 97

Storing content in IDOL server


FlattenIndexFields=<fields>
Allows you to specify the fields in a hierarchically structured data whose content you want to
index as one level.
If you want to specify multiple fields you must separate them with commas (there must be no
space before or after a comma). You can use wildcards.
When identifying fields you should use the format /FieldName to match root-level fields, */
FieldName to match all fields except root-level or /Path/FieldName to match fields that the
specified path points to. If you just specify the FieldName, IDOL server automatically adds a */
to it.
For example:
<documents>
<article id="_21498602">
<url>http://example.com/21490.html</url>
<hltext_display>The history of pharmacogenetics </hltext_display>
<source>Science Online</source>
<media_type>text</media_type>
<subject>
<text>The prologue to pharmacogenetics began to play out around 1850 and
spanned some 60 years into the 1900s.</text>
<text>In 1953, the molecular basis of heredity, the double helix of DNA, was
described.</text>
</subject>
<valid_time>Jul 13 2001 5:00AM</valid_time>
</article>
</documents>
If you specify FlattenIndexFields=*/subject, and index the above, any content that a subject
field or a field within a subject field comprises is indexed as this subject field's content.
If you now query for a particular term in the subject field that is actually contained in a level
below the subject field, for example the term "pharmacogenetics", the above text is returned. If
you had not flattened the subject field the query would fail, as the subject field itself does not
contain this term.

IDXFieldPrefix=<prefix>
When you index IDX data it is transformed into XML by placing it under the Document subtree
(each of the IDX file's fields is prefixed with Document, so that a simple XML hierarchy is
constructed). If you don't want this subtree to be called Document, IDXFieldPrefix allows you
to specify an alternative name.

Page 98

Storing content in IDOL server


IndexFields=<index_fields>
Allows you to specify the fields in the document that you want to index explicitly into IDOL
server. Indexing fields explicitly optimizes the query process when you restrict queries using
these fields. Index fields should hold data that is particularly significant to you (for example the
title of the document), and that you are likely to use frequently in order to restrict queries.
If you want to specify multiple fields you must separate them with commas (there must be no
space before or after a comma). You can use wildcards.
When identifying fields you should use the format /FieldName to match root-level fields, */
FieldName to match all fields except root-level or /Path/FieldName to match fields that the
specified path points to. If you just specify the FieldName, IDOL server automatically adds a */
to it.
For example:
&IndexFields=*/DRECONTENT,*/DRETITLE
In this example, the DRECONTENT and DRETITLE field in documents are explicitly indexed
into IDOL server.
KeepExisting=<true/false>
If you have set KillDuplicates to Reference, ReferenceMatch<N> or <FieldName>, you can
set KeepExisting to true if you want IDOL server to discard the document it has received for
indexing and keep the matching document that it already contains instead.
LanguageFields=<language_fields>
Allows you to specify the fields in the data that contain the language type of the document.
If you want to specify multiple fields you must separate them with commas (there must be no
space before or after a comma). You can use wildcards.
When identifying fields you should use the format /FieldName to match root-level fields, */
FieldName to match all fields except root-level or /Path/FieldName to match fields that the
specified path points to. If you just specify the FieldName, IDOL server automatically adds a */
to it.
For example:
&LanguageFields=Document/DRELanguageType,*/myLanguageType
In this example, IDOL server reads the language type of data from any DRELanguageType
field below the Document level and any myLanguageType fields.

Page 99

Storing content in IDOL server


LanguageType=<language_type>
Allows you to specify the language type of data (for example, if the data does not contain fields
from which IDOL server can read the language type of the data).
For example:
&LanguageType=myEnglish
In this example, the data is indexed with the language type myEnglish. The way IDOL server
handles this language type is determined by the way it has been defined in IDOL server's
configuration file (that is by the settings that you have associated with this language type in the
configuration file).
MustHaveFields=<required_fields>
Allows you to specify the fields in a document (IDX only) that are stored in IDOL server. By
default all fields are stored in IDOL server. Document fields that are not listed are discarded
which means that they cannot be queried or printed.
If you want to specify multiple fields you must separate them with commas (there must be no
space before or after a comma). You can use wildcards.
When identifying fields you should use the format /FieldName to match root-level fields, */
FieldName to match all fields except root-level or /Path/FieldName to match fields that the
specified path points to. If you just specify the FieldName, IDOL server automatically adds a */
to it.
For example:
&MustHaveFields=*/DRECONTENT,*/DRETITLE
In this example, IDOL server only stores a document's DRECONTENT and DRETITLE fields.
SectionFields=<section_fields>
Allows you to specify the fields in the document that indicate the start of a new section in the
document (if you are indexing XML, you dont need to specify section fields as IDOL server
automatically sections XML data).
If you want to specify multiple fields you must separate them with commas (there must be no
space before or after a comma). You can use wildcards.
When identifying fields you should use the format /FieldName to match root-level fields, */
FieldName to match all fields except root-level or /Path/FieldName to match fields that the
specified path points to. If you just specify the FieldName, IDOL server automatically adds a */
to it.
For example:
&SectionFields=Document/DRESection,*/mySection
In this example, any DRESection field below the Document level and any mySection fields
indicate the start of a new section.

Page 100

Storing content in IDOL server


SecurityFields=<security_fields>
Allows you to specify the fields in the data that contain the security type of the data.
If you want to specify multiple fields you must separate them with commas (there must be no
space before or after a comma). You can use wildcards.
When identifying fields you should use the format /FieldName to match root-level fields, */
FieldName to match all fields except root-level or /Path/FieldName to match fields that the
specified path points to. If you just specify the FieldName, IDOL server automatically adds a */
to it.
For example:
&SecurityFields=Document/DRESecurity,*/mySecurity
In this example, IDOL server reads the security type of data from any DRESecurity field below
the Document level and any mySecurity fields.

SecurityType=<security_type>
Allows you to specify the security type of documents (for example, if the document does not
contain fields from which IDOL server can read the security type of the document).
For example:
&SecurityType=mySecurity
In this example, the file is indexed with the security type mySecurity. The way IDOL server
handles this security type is determined by the way it has been defined in IDOL server's
configuration file (that is by the settings that you have associated with this security type in the
configuration file).

TitleFields=<title_fields>
Allows you to specify the field in the document from which you want IDOL server to read the
document's title. If a document contains several of these fields, IDOL server reads its title from
the first field it finds in the document.
If you want to specify multiple fields you must separate them with commas (there must be no
space before or after a comma). You can use wildcards.
When identifying fields you should use the format /FieldName to match root-level fields, */
FieldName to match all fields except root-level or /Path/FieldName to match fields that the
specified path points to. If you just specify the FieldName, IDOL server automatically adds a */
to it.
For example:
&TitleFields=*/DRETITLE
In this example, IDOL server reads a document's title from its DRETITLE field.

Page 101

Storing content in IDOL server


<kill_duplicates_option>
You can enter one of the following options to determine how IDOL server handles duplicate
text. Note that if you postfix any of these options with =2, the KillDuplicates process is applied
to all IDOL server databases (rather than just the database into which the currrent IDX or XML
file is being indexed):
NONE
Documents in IDOL server are never replaced with new documents.
REFERENCE
If IDOL server receives a document for indexing that has the same DREREFERENCE
field value as a document that IDOL server already contains, IDOL server deletes the
document that it already contains and replaces it with the new one.
REFERENCEMATCH<N>
If IDOL server receives a document for indexing whose content is more than <N>
percent similar to the content of a document that this IDOL server database already
contains, IDOL server deletes the document that it already contains and replaces it with
the new one.
<FieldName>
If IDOL server receives a document for indexing that contains a <FieldName>
Reference field, which has the same content as the <FieldName> Reference field in a
document that IDOL server already contains, IDOL server deletes the document that it
already contains and replaces it with the new one.
You can specify multiple Reference fields, in which case IDOL server deletes
documents that contain any of the specified fields with identical content. If you want to
specify multiple Reference fields, you must separate them with a plus symbol, a space
or an underscore symbol.
Note: fields are identified as Reference fields through field processes in the IDOL server
configuration file (see Reference fields on page 293). If you use a <FieldName>
Reference field to eliminate duplicate documents, IDOL server automatically reads any
fields that are listed alongside this field for the PropertyFieldCSVs parameter in the
field process, and also uses these fields to eliminate duplicate documents. If you want to
define multiple reference fields but dont want them all to be used for document
elimination, you need to set up multiple field processes (see Using Reference fields to
eliminate duplicate copies of documents during indexing on page 105).
If you dont set KillDuplicates, it defaults to the option that you have specified for
KillDuplicates in the IDOL server configuration files [Server] section.

Page 102

Storing content in IDOL server

Adding metadata to documents after indexing


When a document is indexed into IDOL server, all its metadata is automatically stored as fields in IDOL
server (see Setting up field indexing on page 67). You can add additional fields to a document after
it has been indexed by executing a DREREPLACE command (see Changing field values in IDOL
server documents on page 375).

Page 103

Storing content in IDOL server

Indexing hyphenated terms


IDOL server treats any terms as hyphenated that contain a HyphenChars character (specified in the
individual language type sections in IDOL server's configuration file).
When a hyphenated term is indexed, IDOL server stems each of its components and indexes them. it
also removes the hyphen from the term, stems the resulting term and indexes it.
For example:
The language type English has been configured in IDOL server's configuration file as follows:
[English]
LanguageCode=3
Language=ENGLISH
Encoding=ASCII
HyphenChars=-&
At indexing time, terms that contain a hyphen or an ampersand are indexed as several terms, for
example:
Cholmondley-Warner

is indexed as:

CHOLMONDLEI
WARNER
CHOLMONDLEYWARN

Barnes&Noble

is indexed as:

BARN
NOBL
BARNESNOBL

At query time, queries for terms that contain a hyphen or an ampersand are treated as follows:
http://<host>:<port>/action=Query&Text=Cholmondley-Warner
This query returns documents that contain "Cholmondley-Warner", " Cholmondley" or "Warner",
but also documents that contain, for example, "Cholmondley-Smythe" (documents that contain
"Cholmondley-Warner" would be returned with the highest relevance).

http://<host>:<port>/action=Query&Text=Barnes%26Noble
Note that in this query the ampersand has been escaped because it forms part of the query text
and should not be treated as a query syntax character by IDOL server.
This query returns documents that contain "Barnes&Noble", " Barnes" or "Noble", but also
documents that contain, for example, "Barnes&Greenough" (documents that contain "
Barnes&Noble " would be returned with the highest relevance).

Page 104

Storing content in IDOL server

Using Reference fields to eliminate duplicate copies


of documents during indexing
You can set up an operation that eliminates duplicate copies of documents. Note that you must do this
before you start indexing documents into IDOL server.
1.

Open IDOL server's configuration file in a text editor.

2.

In the [Server] section, set the KillDuplicates parameter to the Reference fields by which you
want to eliminate duplicates (you can identify fields that contain document references by setting
up an appropriate field process; see Setting up Reference fields on page 293). This ensures
that whenever a document is indexed that has the same Reference field value as a document that
IDOL server already contains, IDOL server deletes the document that it already contains and
replaces it with the new one.

3.

Save IDOL server's configuration file and start IDOL server. You can now index documents into
IDOL server.

Note: fields are identified as Reference fields through field processes in the IDOL server configuration
file (see Reference fields on page 293). If you use a <FieldName> Reference field to eliminate
duplicate documents, IDOL server automatically reads any fields that are listed alongside this field for
the PropertyFieldCSVs parameter in the field process, and also uses these fields to eliminate
duplicate documents.
For example:
[SetReferenceFields]
Property=Reference
PropertyFieldCSVs=*/DREREFERENCE,*/URL
In this example, if KillDuplicates has been set to DREREFERENCE, IDOL server uses both a
documents DREREFERENCE field and URL field to eliminate duplicate copies.
If you want to define multiple reference fields but dont want them all to be used for document
elimination, you need to set up multiple field processes (see Using Reference fields to eliminate
duplicate copies of documents during indexing on page 105).
For example:
[SetReferenceFields]
Property=Reference
PropertyFieldCSVs=*/DREREFERENCE
[SetMoreReferenceFields]
Property=Reference
PropertyFieldCSVs=*/URL
In this example, if KillDuplicates has been set to DREREFERENCE, IDOL server uses only a
documents DREREFERENCE field to eliminate duplicate copies, not its URL field.

Page 105

Storing content in IDOL server

Checking if the indexing process was successful


You can check if the indexing of data into IDOL server has been successful by running your web
browser and entering the following:

http://<IPAddress>:<Port>/action=IndexerGetStatus
<IPAddress>

Enter the IP address (or name) of the of the machine on which IDOL
server is installed.

<Port>

Enter the Port that you have specified in the IDOL server configuration
files [Server] section).

The IndexerGetStatus command displays the status of IDOL server's index queue:

-1

Finished

The indexing process is finished.

-2

Out of disk space

IDOL server ran out of disk space before the indexing


process could be completed.

-3

File not found

The index file could not be found.

-4

Database not found

The database into which you are trying to index could not
be found.

-5

Bad parameter

The indexing command syntax is incorrect.

-6

Database exists

The database that you are trying to create already exists.

-7

Queued

The indexing command is queued and will be executed


when all preceding indexing commands are finished.

-8

Unavailable

IDOL server is about to shut down or indexing is paused.

-9

Out of Memory

IDOL server ran out of memory before the indexing


process could be completed.

-10

Interrupted

The indexing command was interrupted.

-11

XML is not well


formed

Indexing failed because your XML is not well formed.

-12

Retrying interrupted
command

IDOL server is executing an indexing command that has


previously been interrupted.

Page 106

Storing content in IDOL server

-13

Backup in progress

IDOL server is performing a backup.

-14

Max index size


reached

You have reached the maximum indexing size (the


maximum indexing size depends on your license).

-15

Max number of documents reached

You have reached the maximum number of documents


you can index (number of documents you can index
depends on your license).

-16

Index paused

The indexing process has been paused.

-17

Index restarted

The indexing process has been restarted.

-18

Index cancelled

The indexing process has been cancelled.

-19

Index out of file


descriptors

IDOL server has run out of file descriptors.

-20

Index languagetype
not found

The language type of the index data could not be found.

-21

Index securitytype not


found

The security type of the index data could not be found.

Note: if the IndexerGetStatus command returns a positive number, this number indicates the
percentage of the indexing queue that has been completed.

Page 107

Storing content in IDOL server

Tracking documents through the import and indexing


process
If your IDOL server is connected to a DiSH server version 4.2 (or higher), you can use the Autonomy
Service Dashboard (provided with DiSH) in order to track the progress of a document through the
import and indexing process.

In order to enable this in IDOL server:


1.

Open the IDOL server configuration file in a text editor.

2.

In the [DocumentTracking] section, enable document tracking by setting


DocumentTrackingActive to true and configure IDOL servers connection to the DiSH server (if
this section doesnt exist in your configuration file, you have to add it).
For example:
[DocumentTracking]
DocumentTrackingActive=true
DiSHHost=12.34.56.78
DiSHACIPort=7002
DiSHRetries=3
DiSHTimeout=120000

3.

Save IDOL server's configuration file and restart your IDOL server in order to execute your
changes.

Once you have set up document tracking in IDOL server, you can add IDOL server as a child service
to your DiSH server and use the Autonomy Service Dashboard to track documents. Please refer to
your DiSH documentation and Autonomy Service Dashboard online help for details.

Page 108

7. Storing users in IDOL server


You need to store users in IDOL server if you want to set up any of the following:

Agents
Users can store queries in the form of agents in order to always be up-to-date on the
latest available information. Users can edit and retrain their agents.

Profiling
A profile is a set of agents that are trained using the documents the user is looking at,
and return data that matches the user's interests. You can set up your application so that
every time a user looks at a document, the profile decides whether this document is
relevant to its agent's training. It then either updates the training with the document's
content or creates a new profile agent for the user.

Collaboration
You can match users with common agents or similar profiles.

Alerting
When IDOL server receives new content that matches a users agents, the user is
immediately notifies the user by email or a third party system (for example by SMS or a
pager).

Mailing
IDOL server matches the agents and profiles against its document content in regular
intervals, and automatically notifies users of documents that match their agents and / or
profiles by sending them email.

Expertise
IDOL server accepts a natural language or Boolean search string and returns users who
own matching agents or profiles. This allows instant identification of experts in any
subjects at hand, eliminating time consuming searches for specialists, and unnecessary
researching of subjects for which expert knowledge is already available.

Page 109

Storing users in IDOL server

Creating users
To create a flat user structure
Use the UserAdd action to create individual users.
For example:
http://<IPAddress>:<Port>/action=UserAdd&UserName=JaneBrown&Password=Sesame

<IPAddress>

Enter the IP address (or name) of the of the machine on which IDOL
server is installed.

<Port>

Enter the Port that you have specified in the IDOL server configuration
files [Server] section).

To create a hierarchical user structure:


1.

Decide how you want to structure your users. You can, for examples, group them according to
their roles and responsibilities in a company.

2.

Use the RoleAdd action to create a role for each user group.
For example:
http://<IPAddress>:<Port>/action=RoleAdd&RoleName=Sales

3.

Use the RoleAddRoleToRole action to create a hierarchical structure of roles.


For example:
http://<IPAddress>:<Port>/action=RoleAddRoleToRole&RoleName=Sales&ParentRoleNa
me=TeleSales

4.

Use the UserAdd action to create individual users:


For example:
http://<IPAddress>:<Port>/action=UserAdd&UserName=JaneBrown&Password=Sesame

5.

Use the RoleAddUserToRole action to associate each user with a role.


For example:
http://<IPAddress>:<Port>action=RoleAddUserToRole&UserName=JaneBrown&RoleNa
me=TeleSales

Page 110

Storing users in IDOL server

Integrating with a third party user structure


IDOL server provides the DeferLogin option which allows you to integrate it with a third party system
(for example, SiteMinder, NT, LDAP, Notes), so that this system can manage authentication. The
entitlements of the users are set to the ones given to IDOL servers default role (the root role).

To use the DeferLogin option:


1.

Set DeferLogin to true in the [Server] section of IDOL server's configuration file.

2.

Add DeferLogin=true to any user command that is issued.

When a user utilizes IDOL server for the first time, IDOL server creates a user with that user name and
allocates the default role's permissions and settings to this user.
Note: you can set DeferLoginSyncDuration in the [Server] section of IDOL server's configuration file
in order to specify how often IDOL server syncs the users it stores with the users in the third party
system.

Page 111

Storing users in IDOL server

Page 112

8. Setting up security
If IDOL servers default security settings should not suit your environment, you can apply specific
security settings to documents that are indexed into IDOL server by identifying fields in the documents
that determine which security settings are appropriate to each of the documents (unless you want to
specify the security property of a document every time you index a document by sending an additional
parameter).
For details on the settings that the [Security] section can contain and on how you can configure them,
please refer to IDOL servers online help (see Displaying help on configuration settings on
page 389).
To set up automatic security application for documents:
1.

Open IDOL servers configuration file in a text editor.

2.

In the [Security] section, list the security types that you want to use, and specify the security keys
that identify IDOL servers security type.
For example:
[Security]
SecurityInfoKeys=123,234,345,456
0=NT
1=Netware
2=Notes
3=Exchange

3.

Define a section for each of the security types that you have defined (the section must have the
same name as the security type), and specify appropriate settings for each security type in order
to determine how IDOL server handles this security type.
For example:
[NT]
SecurityCode=1
Library=nt_security.dll
Type=AUTONOMY_SECURITY_V4_NT_MAPPED
ReferenceField=*/AUTONOMYMETADATA
[Netware]
SecurityCode=2
Library=netware_security.dll
Type=AUTONOMY_SECURITY_NETWARE_MAPPED
ReferenceField=*/AUTONOMYMETADATA
[Notes]
SecurityCode=3
Library=notes_security.dll
Type=AUTONOMY_SECURITY_V4_NOTES_MAPPED
ReferenceField=*/AUTONOMYMETADATA

Page 113

Setting up security
[Exchange]
SecurityCode=4
Library=exchange_security.dll
Type=AUTONOMY_SECURITY_EXCHANGE_MAPPED
ReferenceField=*/AUTONOMYMETADATA
4.

In the [FieldProcessing] section, set up processes that allow IDOL server to recognize the
security type of documents (unless you want to specify the security property of a document every
time you index a document by sending an additional parameter). If you are using a version 4
security type (for example, AUTONOMY_SECURITY_V4_NOTES_MAPPED), you must include a
process that defines how you want to handle metadata.
For example:
[FieldProcessing]
Number=4
0=DetectNT
1=DetectNetware
2=DetectNotes
3=DetectExchange
4=DefineMetaData

5.

Create a section for each of the processes that you have listed, in which you create a property for
the process (security properties always point to a defined security type). Identify the field that you
want to associate with the processes (when identifying the fields from which IDOL server can read
a document's language type you should use the format /FieldName to match root-level fields, */
FieldName to match all fields except root-level or /Path/FieldName to match fields that the
specified path points to).
You can use the PropertyMatch parameter to identify a specific value that fields must have in
order to be processed.
Note: the properties that you create must not have the same name as processes.
For example:
[DetectNT]
Property=SetNTProperty
PropertyFieldCSVs=*/DRESECURITYTYPE
PropertyMatch=*nt
[DetectNetware]
Property=SetNetwareProperty
PropertyFieldCSVs=*/DRESECURITYTYPE
PropertyMatch=*netware
[DetectNotes]
Property=SetNotesProperty
PropertyFieldCSVs=*/DRESECURITYTYPE
PropertyMatch=*notes

Page 114

Setting up security
[DetectExchange]
Property=SetExchangeProperty
PropertyFieldCSVs=*/DRESECURITYTYPE
PropertyMatch=*exchange
[DefineMetaData]
Property=HideMetaData
PropertyFieldCSVs=*/AUTONOMYMETADATA
6.

List all the Properties that you have created in a [Properties] section.
For example:
[Properties]
0=SetNTProperty
1=SetNetwareProperty
2=SetNotesProperty
3=SetExchangeProperty
4=HideMetaData

7.

Create a section for each of the properties and specify appropriate configuration settings for each.
These configuration parameters define the processes that are applied to all the fields (or all
documents that contain the fields) that you have previously associated with the processes.
Note: if you are using a version 4 security type (for example,
AUTONOMY_SECURITY_V4_NOTES_MAPPED), you must set ACLType to true in the section
that sets up how IDOL server handles metadata, in order to implement optimized security.
[SetNTProperty]
SecurityType=NT
[SetNetwareProperty]
SecurityType=Netware
[SetNotesProperty]
SecurityType=Notes
[SetExchangeProperty]
SecurityType=Exchange
[HideMetaData]
HiddenType=true
ACLType=true

8.

Save and close the configuration file.

9.

Restart IDOL server to execute your changes.

Note: for details on ensuring security in an Autonomy infrastructure, please refer to your IAS manual.

Page 115

Setting up security

Page 116

9. Checking that IDOL server is running


correctly
Once you have installed IDOL server and stored content in it, you can do the following to check that
IDOL server is running correctly:

execute GetRequestLogs actions

execute an GetLicenseInfo action

execute a GetStatus command

use the Autonomy Service Dashboard

Executing GetRequestLog actions


You can send a GetRequestLog action to IDOL server to return a log of the requests that have been
made to IDOL server, including the date and time that a request was made, the client IP address that
made the request and the internal thread that handled the action command.
For example:
http://<host>:<port>/action=GetRequestLog
For further details on the GetRequestLog action, please refer to the IDOL server online help (see
Sending action commands to IDOL server on page 61).

Alternatively, you can display the IDOL server online help, and click on the request log link in the top
right-hand corner. This displays the helps Log page which contains the log of requests that the
GetRequestLog action returns.

Page 117

Checking that IDOL server is running correctly

Executing an GetLicenseInfo action


You can send a GetLicenseInfo action to IDOL server to return information on your license. This
allows you to check if your license is valid, which IDOL server operations your license includes (these
are , and which action commands you are permitted to execute.
For example:
Indicates that your license is valid:
- <autn:license>
<autn:validlicense>false</autn:validlicense>
</autn:license>
Indicates that your license includes the IDOL server Agent operation:
- <autn:section>
<autn:name>Agent</autn:name>
</autn:section>
Indicates that you are permitted to execute Query actions:
- <autn:section>
<autn:name>Query</autn:name>
</autn:section>

Executing a GetStatus command


You can use the GetStatus command to check if the IDOL server service is running. Please refer to
GetStatus on page 426 for details.

Using the Autonomy Service Dashboard


You can use the Autonomy Service Dashboard (which the IDOL server installer includes) to operate
and monitor IDOL server (and other Autonomy services). This allows you, for example, to audit IDOL
server or set up alerts that notifies you if IDOL server does not perform correctly.
Please refer to you DiSH manual and the Autonomy Service Dashboard online help for further
information.

Page 118

IDOL server
operations

11. Agents
Agents automatically find documents for you that you are interested in. A user who is interested in
football and gardening, could, for example, create a Real Madrid and a Pest Control agent. Each agent
is given training text when it is created. This training provides an example of the type of text the agent
is looking for, so that an agent will only return documents, profiles, categories or other agents that
conceptually match its training.
Note that while agents by default are matched against all IDOL servers databases (which store IDOL
servers data content, agents, profiles and categories), the matching can be restricted to one or more
databases (see Querying with an agent on page 122).
For example:
A user creates a Mortgage agent and trains it with text that is similar to the type of results he
expects the agent to return. The user can train the agent with text that he types himself or with
documents. Once the user has finished training the agent and specifying details for it (such as
the maximum number of results the agent can return, the minimum conceptual similarity of
results and so on), he can run the agent. The user can edit or retrain the agent at any time in
order to fine tune it.

Creating an agent
You can create agents using the AgentAdd action command. For details on this action, please refer to
the IDOL server online help (see Displaying online help on page 61).
For example:
http://12.3.4.56:4000/action=AgentAdd&UserName=Administrator&AgentName=Global+War
ming&Training=Factors+affecting+global+warming&FieldMinScore=60
This command uses port 4000 to create an agent called Global Warming for the Administrator user.
The agent is stored in IDOL servers Agent index which is situated on a machine with the IP address
12.3.4.56. The agent is trained to find documents whose concept matches the concept of the text
Factors affecting global warming. Only documents that have a conceptual relevance of at least 60%
to this text can be returned as results.

Page 121

Agents

Editing an agent
You can edit agents using the AgentEdit action command. For details on this action, please refer to
the IDOL server online help (see Displaying online help on page 61).
For example:
http://12.3.4.56:4000/action=AgentEdit&UserName=Administrator&AgentName=Global+War
ming&FieldMinScore=75
This command uses port 4000 to change the value of the Global Warming agents MinScore field to
75.

Retraining an agent
You can retrain agents using the AgentRetrain action command. For details on this action, please
refer to the IDOL server online help (see Displaying online help on page 61). When an agent is
retrained, the concepts of its training are modified with the concepts of the text that is used for the
retraining.
For example:
http://12.3.4.56:4000/action=AgentRetrain&UserName=Administrator&AgentName=Global+
Warming<br>&PositiveDocs=534+352+4534
This command uses port 4000 to retrain the Administrator user's Global Warming agent with the
documents that have the IDs 534, 352 and 4534.

Querying with an agent


You can query with an agent using the AgentGetResults action command. For details on this action,
please refer to the IDOL server online help (see Displaying online help on page 61). Note that when
an agent is matched against IDOL servers databases, all the agents terms are internally postfixed
with a tilde (~) to indicate that the terms have already been stemmed and should not be stemmed
again.
For example:
http://12.3.4.56:4000/action=AgentGetResults&UserName=Administrator&AgentName=Glob
al+Warming&DREDatabaseMatch=News,Archive
This command uses port 4000 to request the results of the Administrator user's Global Warming
agent from IDOL server, which is situated on a machine with the IP address 12.3.4.56. The Global
Warming agent is matched against IDOL servers News and Archive databases.

Page 122

Agents

Copying an agent
You can copy an agent using the AgentCopy action command. For details on this action, please refer
to the IDOL server online help (see Displaying online help on page 61). Copying an agent is useful, if
you want to use one of your agents or another users agent as a template. You can copy the agent and
then modify the copy.
For example:
http://12.3.4.56:4000/action=AgentCopy&UserName=Administrator&AgentName=Global+W
arming<br>&DestinationUserName=JSmith&DestinationAgentName=Environment
This command uses port 4000 to copy the Administrator user's Global Warming agent details. The
agent's details are copied to the JSmith user's Environment agent.

Viewing an agents details


You can view an agents details using the AgentRead action command. For details on this action,
please refer to the IDOL server online help (see Displaying online help on page 61).
For example:
http://12.3.4.56:4000/action=AgentRead&UserName=Administrator&AgentName=Global+Wa
rming
This command requests the details of the Administrator users Global Warming agent from IDOL
server.

Deleting an agent
You can delete an agent from IDOL servers Agent index using the AgentDelete action command. For
details on this action, please refer to the IDOL server online help (see Displaying online help on
page 61).
For example:
http://12.3.4.56:4000/
action=AgentDelete&UserName=Administrator&AgentName=Global+Warming
This command deletes the Administrator users Global Warming agent from IDOL server.

Page 123

Agents

Page 124

12. Alerting
IDOL server analyzes data in new documents (when it receives the documents) and compares the
concepts in documents with users agents. If new data matches a users agent, it immediately notifies
the user by email.

Alerting users to new content


In order to alert a user by email of new content that matches the users agent, you need to set up an
Alert task (see Processing data before indexing it on page 73) in IDOL servers configuration file.
Note: see Writing templates for alert emails on page 127 for details of how you can customize alert
email.
To set up an Alert task to send email alerts to users:
1.

Open the IDOL server configuration file in a text editor.

2.

Add a new section for the task. The name that you give this section must be unique.
For example:
[MyAlertTask]

3.

Add the following line to the section, in order to identify the task as an Alert task:
Module=Alert

4.

Use the IDOLserver parameter to specify the IP address (or name) of the machine that stores
IDOL server's Agent index, and the port that is used to query this Agent index.
For example:
IDOLserver=123.4.5.67

5.

Specify the fields in the new documents that you want to use to query fields in IDOL servers
Agent index:
For example:
Fields=Text,Title

6.

Specify the fields in IDOL servers Agent index that you want to query with the document fields
you have specified in step 5.
For example:
FieldMappings=DRECONTENT,DRETITLE

Page 125

Alerting
7.

Specify settings for your mail server. For details on available settings, please refer to the IDOL
server online help (see Displaying help on configuration settings on page 389).
For example:
SMTPServer=smtp.company.com
SMTPPort=25
SMTPSendFrom=administrator@mycompany.com
SMTPSendFromUsername=administrator
SMTPSendFromPassword=secret
SMTPSubject=Alert: new document DREREFERENCE

8.

Specify any other settings that you want to apply to your Alert task. For details on available
settings, please refer to the online help (see Displaying help on configuration settings on
page 389).
For example:
[MyAlertTask]
Module=Alert
IDOLserver=123.4.5.67
Fields=DRECONTENT,DRETITLE
FieldMappings=DRECONTENT,title
SMTPServer=smtp.company.com
SMTPPort=25
SMTPSendFrom=administrator@mycompany.com
SMTPSendFromUsername=administrator
SMTPSendFromPassword=secret
SMTPSubject=Alert: new document DREREFERENCE
AttachFileFromReference=true
AlwaysSendAttachment=true
Template=AlertTemplate1.txt
AttachmentTemplate=AlertTemplate2.txt

9.

Save the IDOL server configuration file and restart IDOL server for your configuration changes to
take effect.

Page 126

Alerting

Writing templates for alert emails


The layout of alert emails is determined by templates. For each Alert task that you set up in IDOL
servers configuration file, you can use the following settings to specify which templates you want to
use:
Template
This is the template that IDOL server uses for alert emails without attachments.
AttachmentTemplate
This is the template that IDOL server uses for alert emails with attachments.
By default both settings point to the alertTemplate.html template which is located in the templates
subdirectory of IDOL servers res directory. You can save this template with a different name and
customize it according to your requirements.
To create a new template:
1.

Open the alertTemplate.html template file in a text editor and save it with a new name.
Alternatively, you can create a new file.
To display any of the following in alert emails, enter the associated field in the template:
To display:

Enter this field:

A documents
reference

DREREFERENCE

For example:
Ref: DREREFERENCE
If you enter the above in the template and a
documents DREREFERENCE field contains
the value http://news.bbc.co.uk/index.html,
the alert email will contain the text:
Ref: http://news.bbc.co.uk/index.html

A documents title

DRETITLE

New document: DRETITLE


If you enter the above in the template and a
documents DRETITLE field contains the value
Tropical fish, the alert email will contain the
text:
New document: Tropical fish

A documents
content

DRECONTENT

DRECONTENT
If you enter the above in the template and a
documents DRECONTENT field contains the
value Brime shrimp: popular with
aquaritics. High in protein and a nice snack
for many freshwater fish, the alert email will
contain the text:
Brime shrimp: popular with aquaritics.
High in protein and a nice snack for
many freshwater fish

Page 127

Alerting

Field value to
display

Format for adding to


template

The content of any


other document
fields

"FIELD<field_name>"

Examples
Author: "FIELDauthor"
If you enter the above in the template and a
documents author field contains the value JR
Hartley, the alert email will contain the text:
Author: JR Hartley

Link terms (terms


in the document
that match terms in
the agent)

RESULTLINKS

Link terms: RESULTLINKS


If you enter the above in the template, and the
stemmed terms cat, dog and bird in a
document match terms in an agent, the alert
email will contain the text:
Link terms: CAT,DOG,BIRD
Note that link terms are stemmed.

The agents
relevance to the
document

RESULTWEIGHT

Relevance: RESULTWEIGHT %
If you enter the above in the template, and a
result agent has a conceptual similarity of 78%
to the document, the alert email will contain the
text:
Relevance: 78 %

The agents
training

AGENTTRAINING

Training text: AGENTTRAINING


If you enter the above in the template, and a
documents AGENTTRAINING field contains
the value Dry-fly fishing for Woolhead
Sculpins and Leadhead Muddlers, eager
browns, rainbows, and cutts, the alert email
will contain the text:
Training text: Dry-fly fishing for
Woolhead Sculpins and Leadhead
Muddlers, eager browns, rainbows, and
cutts

Note: you should enter the fields in the position you want them to be displayed in the email.
2.

If you have set the SendToList configuration parameter to false, you can also include the
AGENTNAME and USERNAME fields in a template. (If SendToList is set to true, a single email
is sent to all users, so it is not possible for separate user and agent names to be displayed in the
email.) The email that the template creates will display the name of the agent that the new
document matches, and the user to whom the email is sent.

3.

Save the file.

Page 128

13. Categorization
IDOL servers Categorization operation allows you to do the following:

create a hierarchical category structure


You can create a hierarchical category structure by creating or importing categories, training
them and moving them.

view and administer categories


You can use a number of administrative actions to view and maintain your hierarchical
category structure.

categorize data
You can automatically tag, categorize and index documents.

suggest categories
You can suggest conceptually similar categories for documents, text and other categories.

match categories
You can match categories against data, agents, profiles and other categories.

Page 129

Categorization

Creating a hierarchical category structure


IDOL server provides a single category, the root category which cannot be deleted or modified and
serves as a base for the hierarchical category structure that you can create. You can create a
hierarchical category structure under the root category:

from scratch

from clusters

from legacy topic sets

by copying categories

by generating a taxonomy

from XML

Note that all categories are stored on disk. They only become available for querying if they are indexed
into IDOL servers Category index.
Once you have created categories you can:

train the categories

retrain the categories

move the categories

Creating categories from scratch


You can use the CategoryCreate action to create categories from scratch.
Categories that you create from scratch are by default stored as child categories of the root category.
However, you can specify an alternative parent category when you create a category. Once you have
created categories, you need to train them. You can also move categories in order to create a
hierarchical structure or to modify their position in the category hierarchy (see Moving categories on
page 140).
For example:
http://<host>:<port>/action=CategoryCreate&Category=Botanics
In this example, IDOL server is instructed to create the Botanics category. The new category is a child
of the root category, which has the ID 0.
Note: IDOL server returns an ID for the category it creates. You can use this ID to identify the category,
for example, to add a child category to it:
http://<host>:<port>/action=CategoryCreate&Category=Perennials&Parent=0981239872
348764
In this example, IDOL server is instructed to create the Perennials category. The new category is a
child of the category with the ID 0981239872348 (for example, the Botanics category).

Page 130

Categorization

Creating categories from clusters


You can use the CategoryImportFromCluster action to create categories from clusters that IDOL
server has previously created (see Clustering on page 151 for details on how to generate clusters).
These categories are imported with training that is generated from the clusters concepts.
IDOL server stores the categories that it imports from clusters in the root category, unless you specify
a parent category for them.
For example:
http://<host>:<port>/action=CategoryImportFromCluster&SourceJobName=Job1&BuildNo
w=true
In this example, IDOL server is instructed to import all the clusters in the cluster source job Job1 to
categories in the root category. The BuildNow parameter instructs IDOL server to build the categories
immediately, so they become active. You can also activate the category at a later point using the
CategoryBuild action (see Building categories on page 144).

Creating categories from legacy topic sets


You can use the CategoryImportFromTopic action to import categories from existing legacy topic
sets. IDOL server creates one category per topic set. When you import a topic set, you can specify
whether you want to maintain the original Boolean rules of the topic or import the topic as an Autonomy
concept matching agent.
All categories that are imported from legacy topic sets are child categories of the root category. You
can move them in order to create a hierarchical structure.
For example:
http://<host>:<port>/action=CategoryImportFromTopic&Topic=MyTopicFile.otl&BuildNow=t
rue
In this example, IDOL server is instructed to import the topic sets that are stored in the MyTopicFile.otl
to categories in the root category. The BuildNow parameter instructs IDOL server to build the
categories immediately, so they become active. You can also activate the category at a later point
using the CategoryBuild action (see Building categories on page 144).

Page 131

Categorization

Creating categories by copying categories


You can create a category by copying an existing category and retraining or editing it. IDOL server
stores the new category in the same position in the root category as the original category, unless you
specify a parent category for it.
For example:
http://<host>:<port>/action=CategoryCopy&Category=123456789012345&Name=BotanicsC
opy&Parent=98765432109876&BuildNow=true
In this example, IDOL server is instructed to copy the category with the ID 123456789012345. The
new category is called BotanicsCopy and is stored as a child category of the category with the ID
98765432109876. The BuildNow parameter instructs IDOL server to build the categories immediately,
so they become active. You can also activate the category at a later point using the CategoryBuild
action (see Building categories on page 144).

Creating categories by generating a taxonomy


You can use the TaxonomyGenerate action to generate a taxonomy in order to build categories from
clusters or query results. IDOL server stores the imported categories in the root category, in a
hierarchical structure that reflects the hierachical structure of the taxonomy.
Please see Generating a taxonomy from clusters on page 262 and Generating a taxonomy from
query results on page 262 for details.

Creating categories from XML


The CategoryImportFromXML action allows you to import an existing category XML hierarchy. This
can be a third party category XML hierarchy, provided that you convert it to Autonomy XML (see
Mandatory tags on page 133). The categories are imported with the training set in the XML file.
If you dont specify a parent category, IDOL server stores the imported categories in the root category,
in a hierarchical structure that reflects the hierachical structure of the XML file.
For example:
http://<host>:<port>/action=CategoryImportFromXML&ImportFilename=MyCategory.xml&B
uildNow=true
In this example, IDOL server is instructed to import the category information contained in the
MyCategory.xml file to categories in the root category. The BuildNow parameter instructs IDOL
server to build the categories immediately, so they become active. You can also activate the category
at a later point using the CategoryBuild action (see Building categories on page 144).
Note: the CategoryImportFromXML action can only import XML that contains the following
mandatory tags. The XML file you are importing can also contain the optional tags listed below.

Page 132

Categorization

Mandatory tags
If you want to create an XML file to set up your category structure, you must use the following
Autonomy tags:
<autn:categories>
Marks the beginning of the XML categories that IDOL server reads. When you use the
CategoryImportFromXML action, IDOL server reads the XML within the opening and
closing <autn:categories> tags.
Required tags within <autn:categories>.
tag name

number allowed

<autn:category>

one or more

Note: you must include an XML namespace in the tag.


For example:
<autn:categories xmlns:autn="http://schemas.autonomy.com/aci">
<autn:category>
The <autn:category> tag marks the limits of each category that you want to import in
your XML file. You must include at least one <autn:category> within the
<autn:categories> tags.
Required tags within <autn:category>.
tag name

number allowed

<autn:name>

one

Optional tags within <autn:category>.


tag name

number allowed

<autn:positivetraining >

one

<autn:negativetraining>

one

<autn:details>

one

<autn:settings>

one

Page 133

Categorization
<autn:name>
Sets the name of the category. You must include one <autn:name> within each set of
<autn:category> tags.
Required tags within <autn:name>
none
Example content
<autn:name>UKpolitics</autn:name>

Optional tags
<autn:positivetraining>
Sets the positive training for a category. IDOL server identifies concepts that belong to
the category from this training set. You can include one <autn:positivetraining> within
each set of <autn:category> tags.
Required tags within <autn:positivetraining>.
tag name

number allowed

At least one of the following:


<autn:trainingtext>

one

<autn:trainingdoc>

one or more

<autn:negativetraining>
Sets the negative training for a category. IDOL server identifies concepts that do not
belong to the category from this training set. You can include one
<autn:negativetraining> within each set of <autn:category> tags.
Required tags within <autn:negativetraining>
tag name

number allowed

At least one of the following:

Page 134

<autn:trainingtext>

one

<autn:trainingdoc>

one or more

Categorization
<autn:details>
Sets training details for the category. This can include the following:

Boolean conditions for the category

Terms and weights

You can include one set of <autn:details> within each set of <autn:category> tags.
Required tags within <autn:details>
none
Optional tags within <autn:details>
tag name

number allowed

<autn:boolean>

one

<autn:modifiedterms> and
<autn:modifiedweights>

one

<autn:settings>
Sets additional details for the category. You can include one set of <autn:settings>
within each set of <autn:category> tags.
Required tags within <autn:settings>
tag name

number allowed

<autn:categoryparameters>

one

<autn:trainingtext>
Sets a training text for a category. You can include only one <autn:training text> within
each set of <autn:positivetraining> or <autn:negativetraining> tags.
Required tags within <autn:trainingtext>
tag name

number allowed

<autn:training>

one

Page 135

Categorization
<autn:trainingdoc>
Sets a training document for a category. You can include any number of training
documents within each set of <autn:positivetraining> or <autn:negativetraining>
tags; each one must be marked by its own <autn:trainingdoc> tag.
Required tags within <autn:trainingdoc>
tag name

number allowed

<autn:trainingdoc>

one

<autn:title>

one

<autn:boolean >
Sets Boolean training for a category. You can include one <autn:boolean> within each
set of <autn:details> tags.
Required tags within <autn:boolean>
none
Example content
<autn:boolean>(phone AND mobile)</autn:boolean>

<autn:generatedterms>
Sets terms for a category only do this if you are editing an existing category from
which you can take the terms. You can include one <autn:generatedterms> within
each set of <autn:details> tags.
Note: if you are specifying terms for a category, then you must enter a corresponding list
of weights with <autn:generatedweights> tags.
Required tags within <autn:generatedterms>
none
Example content
<autn:generatedterms>LYMPH,MISDIAGNOS,PATHOLOGI</autn:generatedterms>
<autn:generatedweights>
Sets weights for a categorys terms only do this if you are editing an existing category
from which you can take the weights. You can include one <autn:generatedweights>
within each set of <autn:details> tags.
Note: if you are specifying weights for a category, then you must enter a corresponding
list of terms with <autn:generatedterms> tags.

Page 136

Categorization
Required tags within <autn:generatedweights>
none
Example content
<autn:generatedweights>5960,4035,4001</autn:generatedweights>

<autn:categoryparameters>
Sets additional category information. This can include the following details:

Number of results you require from a category query

Threshold of results you require from a category query

Your own fields and values

You can specify one set of <autn:categoryparameters> within each set of


<autn:settings> tags.
Required tags within <autn:categoryparameters>
none
Optional tags within <autn:categoryparameters>
tag name

number allowed

<autn:numresults>

one

<autn:threshold>

one

<autn:[My_Field]>

any number

<autn:training>
Sets the training text to be used for training a category. You can enter one text with
<autn:training> for each set of <autn:trainingtext> or <autn:trainingdoc> tags.
Required tags within <autn:training>
none
Example content
<autn:trainingtext>The internet is coming to the South Pole following a decision to lay a
fibre-optic cable nearly two thousand kilometres across the polar ice. It will be one of the
most dramatic and challenging engineering tasks ever carried out in Antarctica. It will
take years to design and construct, but when finished it will revolutionise
communications with the South Pole. </autn:trainingtext>

Page 137

Categorization
<autn:title>
Sets the title of a training document to be used for training a category. You can enter one
title with <autn:title> for each set of <autn:trainingdoc> tags.
Required tags within <autn:title>
none
Example content
<autn:title>Internet to reach South Pole. </autn:title>

<autn:numresults>
Sets the number of results you require from category queries. You can include one
<autn:numresults> for each category within <autn:categoryparameters> tags.
Required tags within <autn:numresults>
none
Example content
<autn:numresults>10</autn:numresults>

<autn:threshold>
Sets the threshold you require for results of category queries. You can include
<autn:threshold> for each category within <autn:categoryparameters> tags.
Required tags within <autn:threshold>
none
Example content
<autn:threshold>25</autn:threshold>

Setting your own fields


You can set your own fields and values within <autn:categoryparameters> tags.
For example:

Page 138

<autn:author>Dickens</autn:author>

Categorization

Examples
The minimum information you can give in your XML:
<?xml version="1.0" encoding="UTF-8" ?>
<autn:categories xmlns:autn="http://schemas.autonomy.com/aci/">
<autn:category>
<autn:name>MyCategory</autn:name>
</autn:category>
</autn:categories>

Example with one category:


<?xml version="1.0" encoding="UTF-8" ?>
<autn:categories xmlns:autn="http://schemas.autonomy.com/aci/">
<autn:category>
<autn:name>antarctic</autn:name>
<autn:positivetraining>
<autn:trainingtext>
<autn:training>internet reaches the south pole and antarctica</autn:training>
</autn:trainingtext>
</autn:positivetraining>
<autn:details>
<autn:modifiedterms>ANTARCTICA,POLE,SOUTH</autn:modifiedterms>
<autn:modifiedweights>4262,823,242</autn:modifiedweights>
</autn:details>
<autn:settings>
<autn:numresults>5</autn:numresults>
<autn:threshold>30</autn:threshold>
<autn:author>MyName</autn:author>
</autn:settings>
</autn:category>
</autn:categories>

Page 139

Categorization

Training categories
Note: you only need to train categories that you have created with the CategoryCreate action.
Categories that you have created or imported using another action are already trained (you can,
however, retrain them).
You can use the CategorySetTraining action to train a category. A categorys training can comprise of
text, documents, a Boolean expression and category content or a combination of all of these. These
elements serve to identify text, documents, agents, profiles and other categories that match the
category.
For example:
http://<host>:<port>/action=CategorySetTraining&Category=323499876022105571056&Doc
ID=238,785,9912&BuildNow=true
In this example, IDOL server is instructed to train the category with the ID 323499876022105571056
using the content of the documents with the ID 238, 785 and 9912. The BuildNow parameter instructs
IDOL server to build the categories immediately, so they become active. You can also activate the
category at a later point using the CategoryBuild action (see Building categories on page 144).

Retraining categories
You can use the CategorySetTraining action to retrain a category. You can use text, documents, a
Boolean expression and category content or a combination of all of these to retrain a category. When a
category is retrained, its original training is merged with the new training supplied.
For example:
http://<host>:<port>/action=CategorySetTraining&Category=323499876022105571056&Bool
ean=dog AND NOT cat&BuildNow=true
In this example, IDOL server is instructed to retrain the category with the ID 323499876022105571056
using the Boolean expressions dog AND NOT cat. The BuildNow parameter instructs IDOL server to
build the categories immediately, so they become active. You can also activate the category at a later
point using the CategoryBuild action (see Building categories on page 144).

Moving categories
You can use the CategoryMove action to move individual categories in the category hierarchy.
For example:
http://<host>:<port>/action=CategoryMove&Category=124365780934532&Parent=12309823
4987345876
In this example, IDOL server is instructed to move the category that has the ID 124365780934532 to
the category with the ID 123098234987345876 (to make category 123098234987345876 the new
parent of category 124365780934532).

Page 140

Categorization

Viewing and administering categories


IDOL server allows you to do the following in order to maintain your category hierarchy:

view category details

view category hierarchy details

view category terms and weights

view category training

change category fields

change category term weights

replace categories

activate categories

build categories

delete categories

delete category training

export categories to XML

sync IDOL servers Category index with the categories stored on disk

Viewing category details


You can use the CategoryGetDetails action to view a categorys fields.
For example:
http://<host>:<port>/action=CategoryGetDetails&Category=124365780934532
In this example, IDOL server is instructed to return all fields in the category with the ID
124365780934532.

Viewing category hierarchy details


You can use the CategoryGetHierDetails action to view a categorys fields.
For example:
http://<host>:<port>/action=CategoryGetHierDetails&Category=124365780934532
In this example, IDOL server is instructed to return the hierarchy details for the category with the ID
124365780934532.
Page 141

Categorization

Viewing category terms and weights


You can use the CategoryGetTNW action to view a categorys stemmed terms and their weights.
For example:
http://<host>:<port>/action=CategoryGetTNW&Category=124365780934532
In this example, IDOL server is instructed to return the terms and weights of the category with the ID
124365780934532.

Viewing category training


You can use the CategoryGetTraining action to view a categorys training.
For example:
http://<host>:<port>/action=CategoryGetTraining&Category=124365780934532
In this example, IDOL server is instructed to return the training of the category with the ID
124365780934532.

Changing category fields


You can use the CategorySetDetails action to set the value of one or more category fields, or to
create new fields in a category.
By default each category has a threshold of 0 and is set to return 6 results. The CategorySetDetails
action allows you to set a category's threshold and the number of results a category can return by
setting the fields THRESHOLD and NUMRESULTS for this category.
For example:
http://<host>:<port>/action=CategorySetDetails&Category=124365780934532&fields=THRE
SHOLD,NUMRESULTS&values=60,10&BuildNow=true
In this example, IDOL server is instructed to set the THRESHOLD field of the category with the ID
124365780934532 to 60 and its NUMRESULTS to 10. The BuildNow parameter instructs IDOL
server to build the categories immediately, so they become active. You can also activate the category
at a later point using the CategoryBuild action (see Building categories on page 144).

Page 142

Categorization

Changing category term weights


You can use the CategorySetTNW action to increase or reduce the weights of terms in the category
that you believe are weighted inappropriately.
For example:
http://<host>:<port>/action=CategorySetTNW&Category=124365780934532&Terms=tax,mo
nei,budget&Weights=2353,1223,1023&BuildNow=true
In this example, IDOL server is instructed to set the weight of the term tax to 2353, the weight of the
term monei to 1223 and the weight of the term budget to 1023. (tax, money and budget are what
IDOL server stems the words "Tax", "Money" and "Budget" to). The BuildNow parameter instructs
IDOL server to build the categories immediately, so they become active. You can also activate the
category at a later point using the CategoryBuild action (see Building categories on page 144).

Replacing categories
You can use the CategoryReplace action to replace a category with another category.
For example:
http://<host>:<port>/action=CategoryReplace&FromCategory=123456789012345&ToCatego
ry=98765432109876&BuildNow=true
In this example, IDOL server is instructed to replace the 98765432109876 category with the
123456789012345 category. The BuildNow parameter instructs IDOL server to build the categories
immediately, so they become active. You can also activate the category at a later point using the
CategoryBuild action (see Building categories on page 144).

Activating or deactivating categories


You can use the CategoryActivate action to activate or deactivate a category. Inactive categories
cannot be queries or returned as results. By default a category is activated every time a category is
built. Use the CategoryGetHierDetails action to find out if categories are active or not (see Viewing
category hierarchy details on page 141).
For example:
http://<host>:<port>/action=CategoryActivate&Category=32349987602210557106&Active=tr
ue
In this example, IDOL server is instructed to activate the category with the ID
32349987602210557106.

Page 143

Categorization

Building categories
You can use the CategoryBuild action to build a category. You need to build a category after you have
created a new category and trained it, as well as every time you retrain a category. Building a category
identifies the concepts of the categorys training and indexes the category into the IDOL server's
Category index.
Note: if you have trained or retrained a category using the CategorySetTraining action with
TrainNow set to true, you do not have to execute a CategoryBuild action, as the category was built
immediately after it was trained.
For example:
http://<host>:<port>/action=CategoryBuild&Category=32349987602210557106
In this example, IDOL server is instructed to build the category with the ID 32349987602210557106.

Deleting categories
You can use the CategoryDelete action to delete a category. Deleting a category removes the
category from disk and from IDOL servers Category index.
For example:
http://<host>:<port>/action=CategoryDelete&Category=32349987602210557106
In this example, IDOL server is instructed to delete the category with the ID 32349987602210557106.

Deleting category training


You can use the CategoryDeleteTraining action to delete all or part of a categorys training. Deleting
a category removes the category from disk and from IDOL servers Category index.
For example:
http://<host>:<port>/action=CategoryDelete&Category=32349987602210557106
In this example, IDOL server is instructed to delete the category with the ID 32349987602210557106.

Page 144

Categorization

Exporting categories to XML


You can use the CategoryExportToXML action to export a category including its descendants,
training documents and terms and weights to XML format.
For example:
http://<host>:<port>/action=CategoryExportToXML
In this example, IDOL server is instructed to export the entire category structure to XML.

Synchronizing IDOL servers Category index with the


categories stored on disk
You can use the CategorySyncCatDRE action to synchronize IDOL servers Category index with the
categories stored on disk. CategorySyncCatDRE deletes the current contents of the Category index,
and overwrites it with the category information stored on disk.
For example:
http://<host>:<port>/action=CategorySyncCatDRE
In this example, IDOL server is instructed to synchronize its Category index with the categories stored
on disk.

Page 145

Categorization

Categorizing data
You can configure IDOL server to automatically categorize data and index it.
To automatically categorize documents before they are stored in IDOL server, you need to set up a Cat
task. IDOL server matches incoming documents against categories that its Category index contains
and returns matching categories. It then tags the incoming documents according to which categories
they match.
For details on how to set up a Cat task, please see Processing data before indexing it on page 73.

Page 146

Categorization

Suggesting categories
IDOL server can suggest conceptually similar categories for:

documents

text

categories

Suggesting conceptually similar categories for documents


You can use the CategorySuggestFromDocument action to suggest categories from IDOL servers
Category index that are conceptually similar to a specified document.
For example:
http://<host>:<port>/action=CategorySuggestFromDocument&DocID=125
In this example, IDOL server is instructed to return categories that are conceptually similar to the
document with the ID 125.

Suggesting conceptually similar categories for text


You can use the CategorySuggestFromText action to suggest categories from IDOL servers
Category index that are conceptually similar to specified text.
For example:
http://<host>:<port>/action=CategorySuggestFromText&QueryText=Caring for passiflora
incarnata
In this example, IDOL server is instructed to return categories that are conceptually similar to the text
Caring for passiflora incarnata.

Suggesting conceptually similar categories for categories


You can use the CategorySuggestFromCategory action to suggest categories from IDOL servers
Category index that are conceptually similar to a specified category.
For example:
http://<host>:<port>/action=CategorySuggestFromCategory&Category=3234998760221055
7106
In this example, IDOL server is instructed to return categories that are conceptually similar to the
category withe the ID 3234998760221055.

Page 147

Categorization

Matching categories
You can use the CategoryQuery action to match categories against data, agents, profiles and other
categories.
For example:
http://<host>:<port>/action=CategoryQuery&Category=32349987602210557106
In this example, IDOL server matches the category with the ID 32349987602210557106 against all its
databases and returns conceptually similar data, agents, profiles and categories.

Page 148

14. Channels
IDOL server can automatically provide users with a set of hierarchical channels with highly relevant
information pertinent to the respective channel. Eliminating the requirement for manual intervention or
pre-tagging, real-time information is dynamically updated into the channels automatically, minimizing
the maintenance effort required. Moreover, the administrator can add and remove channels on the fly,
without having to re-categorize all of the data.

Setting up and using channels


You can set up channels Autonomy Retinas Category Administration or Autonomys Category
Administration Tool portlet. Users can access channels through Autonomy Retinas Channels page or
Autonomys Channels portlet.
Please refer to the Autonomy Retina manual or Autonomy Portlets manual for details.

Page 149

Channels

Page 150

15. Clustering
IDOL server can automatically cluster information in order to make trends and developments in this
information visible. Clustering is the process of taking a large repository of unstructured data and
automatically partitioning it, so that similar information is clustered together. Each cluster represents a
concept area within the knowledge base and contains a set of items with common properties.
To cluster information, you need to take a snapshot of data that IDOL server stores. You can then
automatically cluster data within this snapshot (this does not require the setup of an initial taxonomy).

IDOL server takes a snapshot of the data it stores and, based on these snapshots, clusters related
information together. Each cluster represents a concept area that contains a set of items, which share
common properties.

Page 151

Clustering

Generating snapshots

The ClusterSnapshot action allows you to take a snapshot of the data stored in IDOL servers Data
index (by default this comprises the IDOL server databases News and Archive ). A snapshot
represents the content of the Data index at a particular time, and enable you to generate cluster
information and spectrographs at a later point, even if the Data index has changed. You can use a
single snapshot to generate both cluster information and spectrograph data in order to save process
time.
Each snapshot that is taken is time-stamped (with the number of seconds since 1st January 1970) and
stored in binary cls format in the Snapshots subdirectory of IDOL servers Cluster directory in your
IDOL server installation directory. This allows you to have several snapshots with the same name (for
example, of one particular IDOL server) and snapshots with different names (for example, of different
data sets).
You can set up a schedule that executes the ClusterSnapshot action in regular intervals (see
Setting up schedules on page 160).
Note: the IDOL server Data index of which you are taking a snapshot should ideally contain at least
several thousand documents with good quality content (that is relevant text for various topics).

Page 152

Clustering

Generating spectrograph data

The ClusterSGDataGen action allows you to generate spectrograph data from a set of snapshots that
you have taken using the ClusterSnapshot action.
Each spectrograph data set takes a succession of clusters from different time periods, calculates
cluster similarity measures across days, and applies a graph theoretic matching algorithm.
Calculations are made as to the conceptual spread of a cluster and its general quality. The size
(number of documents in a cluster) and quality of a cluster is represented by width and intensity on the
spectrograph.
All spectrograph data sets that you are generating are stored in the Sgdata subdirectory of the Cluster
directory in your IDOL server installation directory.
You can set up a schedule that executes the ClusterSGDataGen action in regular intervals (see
Setting up schedules on page 160).
You can retrieve the spectrograph image, data or documents using the ClusterSGPicServe,
ClusterSGDataServe and ClusterSGDocsServe actions, which are executed by the Spectrograph
applet.

Page 153

Clustering

Generating WhatsNew and WhatsHot information

The ClusterCluster action allows you to analyze clusters in a snapshot that you have taken using the
ClusterSnapshot action.
Clustering is a multi-stage, hybrid algorithm. After IDOL servers Adaptive Probabilistic Concept
Modelling (APCM) technology has identified similar documents, a hierarchical agglomerative
clustering algorithm groups documents into conceptually similar areas. Dynamic binding and fixating
produces the required clusters, whose title is generated automatically by cross-correlating important
concepts within a cluster with concepts within the titles of documents in that cluster.
You can set up a schedule that executes the ClusterCluster action in regular intervals (see Setting
up schedules on page 160).
Depending on which parameters you combine the action with, you can generate:

WhatsHot information
WhatsHot information is the most relevant information that is available for the clusters that
IDOL server identifies in your snapshot. Unlike WhatsNew information this is not restricted to
new information, which means that it can be used to follow the progress of particular news
items over time.
You can cluster WhatsHot information from a snapshot and use the Autonomy HotNews
portlet to display this information in a portal. You can also generate a 2D map from WhatsHot
information and display it in a portal using the Autonomy 2DMap portlet.
The 2D map gives a visualization of the similarities and difference between clusters. A
dimensionality reduction algorithm is used to maintain inter-cluster similarity measures so
that, clusters that are close together have some similarity and clusters that are not similar are
not close together. The distribution of documents throughout the space, along with non-linear
remapping, is then used to create the landscape.

Page 154

Clustering

WhatsNew information
WhatsNew information is the latest information that is available for the clusters that IDOL
server identifies in your snapshot.
You can generate WhatsNew information by comparing two snapshots (that have the same
name or different names).

The results of the ClusterCluster action are saved in cfg files in the Clusters subdirectory of the IDOL
server installation's Cluster directory from where you can retrieve them in XML format using the
ClusterResults action.
If you have configured the ClusterCluster action to generate a 2D map of WhatsHot cluster
information, you can use the ClusterServe2DMap action to return this map in one of the supported
image formats (that is, GIF, PNG or JPEG).

Page 155

Clustering

Configuring clustering
You can take a snapshot of the data content IDOL server stores. This snapshot identifies clusters of
conceptually similar documents, which enables you to generate a view of trends in the data. You don't
need to generate an initial taxonomy in order to take a snapshot.
A set of data can contain a few large clusters or many small clusters, as well as a number of outliers
that aren't part of any cluster. Clusters may consist of highly similar documents or of less closely
related ones. What constitutes optimal clustering depends to some extent on how you intend to use
your clusters, but the aim of clustering is always to generate an accurate characterization of the data
content in your IDOL server.
By default IDOL server uses internal settings to produce clusters. These default settings do not usually
need to be changed, but in some cases you may require more or less detail in your clusters, or the
amount and nature of your data may mean that default clustering is not satisfactory. You can optimize
clustering in these cases by setting parameters that adjust the size of the units on which clusters are
based, the degree of conceptual similarity that documents within clusters must have, or the number of
clusters that are created.

Changing the number and size of clusters


There are two main stages to the clustering process:

building "seeds"

grouping seeds into clusters

Seed-building is implemented when the ClusterSnapshot action is executed. IDOL server takes a
sample of the documents it stores and tries to associate individual documents with each other - based
on the similarity of the concepts that the documents contain. Each of the groups of sample document
and similar documents produced at this stage is a seed. IDOL server stops trying to build a seed when
the seed meets the requirements that SeedSize specifies or when there are no more documents that
meet the similarity requirement that SeedBindLevel specifies (whichever condition is reached first).
IDOL server discards any seeds that don't reach the required size. The number of clusters you specify
with NumClusters affects the number of sample documents from which IDOL server tries to create
seeds at this stage (note that you can adjust the relationship between the number you specify here and
the size of the sample used by changing the value of StartingSuggestOverrideFactor).
Grouping seeds into clusters is implemented when the ClusterSGDataGen or ClusterCluster actions
are executed. IDOL server tries to create clusters by grouping seeds together. The grouping is based
on the similarity of the concepts that the seeds or clusters contain. Clustering is complete when the
number of clusters specified by NumClusters has been created, or when no more clusters can be
created that meet the similarity requirement specified by BindLevel (whichever condition is reached
first). Clusters that don't meet the quality requirement set by BindLevel or the size requirement set by
MinClusterDocs are discarded.
For details of the clustering actions, and the settings you can make to generate the clusters from your
data, please refer to the IDOL server online help (see Displaying online help on page 61).

Page 156

Clustering

Configuration settings
The ideal values for the parameters that affect clustering depend on the nature and amount of data in
your IDOL server. It is possible to make some general recommendations about how to change these
parameters according to your data. Parameters are closely interdependent, so you should make these
changes in combination with each other (rather than just changing one of the settings), and change
values in small increments or decrements.
Although you can make many changes to clustering, the number and size of clusters that IDOL server
can identify depends ultimately on the data content it contains:

clustering a small amount of data

clustering a large amount of data

clustering very similar data

clustering very different data

changing the data view

Clustering a small amount of data


If your IDOL server has a small amount of data, it is likely to identify fewer clusters, since it is less likely
that your data will contain a lot of similar documents for a number of different topics. You can edit the
following parameters in order to change clustering in this situation.
Note: ideally, your IDOL server should contain at least 500 documents.
SeedSize

Decrease SeedSize (by 3-4 points at a time). This reduces


the size that seeds are required to reach, which means
that more seeds are likely to be successfully created.

MinClusterDocs

Decrease MinClusterDocs. This means that clusters that


contain fewer documents are not discarded.

StartingSuggestOverrideFactor

Increase StartingSuggestOverrideFactor (by 1 or 2


points only). This increases the number of sample
documents from which IDOL server creates seeds, which
in some cases increases the possibility of finding clusters
in the data.

SeedBindLevel

Decrease SeedBindLevel (by 1 point at a time). This


reduces the similarity threshold for clusters. You should
not change this until you have tried changing SeedSize,
since lowering SeedBindLevel is more likely to allow into
clusters documents that are less relevant.

Page 157

Clustering
Clustering a large amount of data
If your IDOL server has a large amount of data, you will probably not need to edit any clustering
settings - since this is the situation in which clustering is most successful. In some cases (for example,
if your IDOL server contains more than 1 million documents), it may be beneficial to alter the following
setting:
StartingSuggestOverrideFactor

Increase the value of StartingSuggestOverrideFactor.


This increases the number of sample documents from
which IDOL server creates seeds. This is sometimes
necessary in order to allow a broader section of the data
content to be represented by the clusters that are created.

Clustering very similar data


If the documents in your IDOL server contain highly similar concepts, this may be reflected by IDOL
server identifying a small number of large clusters. For example, if your IDOL server contains mostly
documents about sport, then you may get one large "sports" cluster. This situation is a realistic
characterization of the data in your IDOL server, but in many circumstances is not useful. You can edit
the following settings in order to generate smaller, more specific clusters (for example, breaking
"sports" into "football", "tennis", "golf" etc.):

SeedBindLevel

Increase SeedBindLevel. This requires greater similarity between the


documents that form a seed, which can have the effect of reducing the
breadth of topics covered by the concepts in the documents that a seed
contains.
Note: increase SeedBindLevel 1 point at a time; increasing by too much
can result in seeds getting discarded because they don't contain enough
documents.

BindLevel

Page 158

Increase BindLevel. This requires greater similarity between the concepts


in seeds or clusters that are merged to create a cluster, which can have
the effect of decreasing the size of clusters, as well as increasing the
number of clusters identified, because merging seeds and clusters
together is stopped at an earlier stage.

Clustering
Clustering very different data
If the documents in your IDOL server contain a wide variety of concepts, there may not be enough
similar documents for IDOL server to create seeds or clusters that characterize the data it stores. You
can lower the similarity requirement with the following settings:
SeedBindLevel

Decrease SeedBindLevel. This reduces the similarity requirement


between the documents that form a seed, which can have the effect of
increasing the breadth of topics covered by the concepts in the documents
that a seed contains.
Note: decrease SeedBindLevel 1 point at a time; decreasing by too much
can result in seeds and clusters containing documents that are less
relevant, because the similarity requirement is too low.

BindLevel

Decrease BindLevel. This reduces the similarity requirement between the


concepts in seeds or clusters that are merged to create a cluster, which
can have the effect of increasing the size of clusters, as well as increasing
the number of clusters identified (since fewer get discarded for not
meeting the quality requirement).

Changing the data view


It might be the case that although IDOL server identifies clusters that characterize your data
successfully, you want to change the view of the data that clustering creates. The following settings
enable you to change the data view that clusters generate:
NumClusters

Increase NumClusters to get a more low-level view of your data by


identifying more clusters.
Decrease NumClusters to get a more high-level view by identifying fewer
clusters.

MinClusterDocs

Decrease MinClusterDocs to reduce the number of clusters that are


discarded. This allows smaller clusters to be identified.
Increase MinClusterDocs to increase the number of clusters that are
discarded. Only larger clusters are kept.

BindLevel

Decrease BindLevel to reduce the similarity requirement between the


concepts in seeds or clusters that are merged to create a cluster. This can
have the effect of increasing the size of clusters, as well as increasing the
number of clusters identified (since fewer get discarded for not meeting
the quality requirement).
Increase BindLevel to increase the similarity requirement between the
concepts in seeds or clusters that are merged to create a cluster. This can
have the effect of decreasing the size of clusters, as well as increasing the
number of clusters identified, since merging seeds and clusters together is
stopped at an earlier stage.

Page 159

Clustering

Setting up schedules
You can set up up to 1024 schedules, which allow you to run the following actions in regular intervals:

ClusterSnapshot

ClusterCluster

ClusterSGDataGen

TaxonomyGenerate

For details on the settings that each [AnalysisSchedule] section can contain and on how you can
configure them, please refer to IDOL servers online help (see Displaying help on configuration
settings on page 389).

To set up schedules:
1.

Open IDOL servers configuration file in a text editor.

2.

Create an [AnalysisSchedule<N>] section for each schedule that you want to run. Start the
numbering of the [AnalysisSchedule<N>] sections from 0 (so that the first schedule section is
called [AnalysisSchedule0]).
For example:
[AnalysisSchedule0]
[AnalysisSchedule1]
[AnalysisSchedule2]
[AnalysisSchedule3]
[AnalysisSchedule4]
[AnalysisSchedule5]
In this example 6 schedules have been created. Note that the schedules are listed in consecutive
order, starting from 0.

3.

Specify the settings that you want to apply to each schedule in the appropriate schedule's section.
You can specify the action that should be scheduled, the interval in which each schedule should
be executed, the number of times each schedule should be executed and so on.
For example:
[AnalysisSchedule0]
schedulestarttime=now
scheduleinterval=1 day
schedulecycles=1
scheduleaction=CLUSTERSNAPSHOT
targetjobname=myjob

Page 160

Clustering
[AnalysisSchedule1]
schedulestarttime=now
scheduleinterval=1 day
schedulecycles=1
scheduleaction=CLUSTERCLUSTER
sourcejobname=myjob
targetjobname=myjob_clusters
domapping=true
[AnalysisSchedule2]
schedulestarttime=now
scheduleinterval=1 day
schedulecycles=1
scheduleaction=CLUSTERCLUSTER
sourcejobname=myjob
targetjobname=myjob_clusters_new
whatsnew=true
interval=86400
[AnalysisSchedule3]
schedulestarttime=now
scheduleinterval=1 day
schedulecycles=1
scheduleaction=CLUSTERSGDATAGEN
interval=604800
sourcejobname=myjob
targetjobname=myjob_sg
[AnalysisSchedule4]
schedulestarttime=now
scheduleinterval=1 day
schedulecycles=1
scheduleaction=CLUSTERSGDATAGEN
interval=86400
sourcejobname=myjob_content
targetjobname=compare_snapshots_sg
[AnalysisSchedule5]
schedulestarttime=now
scheduleinterval=1 day
schedulecycles=1
scheduleaction=TAXONOMYGENERATE
cluster=0,1,2,3,4,5,6,7,8,9
sourcejobname=myjob_clusters
targetjobname=myjob_taxonomy
writetaxonomy=true
numresults=25
4.

Save the configuration file and restart IDOL server.

Page 161

Clustering

Page 162

16. Collaboration
IDOL server automatically matches users with common explicit interest agents or similar implicit
profiles. This information can be used to create virtual expert knowledge groups.

To match users with community agents or profiles


You can use the Community action in order to find agents and / or profiles in the community that
match the agents and / or the profiles of a specific user.
For example:
http://<host>:<port>/action=Community&UserName=JSmith&Agents=true&Profiles=true&A
gentsFindProfiles=true&ProfilesFindAgents=true
In this example, IDOL server is instructed to find agents and profiles in the user community that match
both the agents and the profiles of the user JSmith.

Page 163

Collaboration

Page 164

17. Dynamic Thesaurus


When it executes a query, IDOL server can automatically generate query summaries that comprise the
most salient terms and phrases of the querys result documents. These terms and phrases can then be
used to suggest alternative queries to the user, allowing him to further refine his query and quickly
produce a variety of relevant result sets.

To automatically generate query summaries


Add the QuerySummary action parameter to a Query, Suggest or SuggestOnText action.
IDOL server identifies the most relevant terms and phrases that a querys result documents contain,
and returns them in an <autn:querysummary> field. Each term or phrase in this field can be used as
an alternative query suggestion.
For example:
http://<host>:<port>/action=Query&Text=Gene analysis discovered methods to determine
the exact sequence of nucleotides that compose a specific gene&QuerySummary=true
In this example, IDOL server is instructed to return results that match the specified Text and a single
summary of the best terms and phrases that these result documents contain.
For the specified query IDOL server could, for example, return the following query summary field (note
that the QuerySummaryLength parameter in the IDOL server configuration file's [Server] section
determines the maximum number of terms and phrases that this field can contain):
<autn:querysummary>abnormal gene, Cell Differentiation, defective genes, chromosomes,
clone, disorders, DNA</autn:querysummary>
Each of these terms and phrases can be used as an alternative query.

Page 165

Dynamic Thesaurus

Page 166

18. Eduction
IDOL servers eduction feature allows you to extract information that is embedded in unstructured data
and store it in fields. The information that you can extract comprises:

built-in data types


IDOL server automatically extracts common information types an stores them in predefined fields.

user-defined data types


You can define information types that you want IDOL server to recognize and extract,
and store them in custom felds.

Extracting built-in data types


If you enable the extraction of built-in data types, IDOL server automatically extracts the following
information types from any data that it receives and stores them in structured fields:

addresses

personal names

company names

dates

telephone numbers

other numbers

In order to extract built-in data types, you need to set up an eduction task in IDOL servers
configuration file before you start storing content in your IDOL server (see Processing data before
indexing it on page 73).
For example:
[EductionTask]
Module=Educe
NextTask=MyIndexTask

Page 167

Eduction
If you set up a task that extracts built-in data types, the eduction features creates a field for each builtin data type it finds in content that is indexed into IDOL server. Content is stored in fields as follows:
field name:

data type stored:

EDUCE_NAME

a persons name

EDUCE_ADDRESS

an address

EDUCE_DATE

a date

EDUCE_STREET

a street

EDUCE_CODE

a post code, serial number, license plate or any combination of


letters and numbers

EDUCE_PERCENT

a percentage (note that this data can only be extracted if it


includes a percentage sign or the word "percent")

EDUCE_INTEGER

an integer (any positive or negative number that does not


include a fraction or decimal, including zero)

EDUCE_FLOAT

a float (any positive or negative number that includes a fraction


or decimal)

EDUCE_PHONE

a phone number

EDUCE_COMPANY

a company name

Note: you can improve the automatic detection of names that are stored in the EDUCE_NAME field by
creating phrase set files that train IDOL server to recognize what good names are (names that it
should extract) and what bad names are (names that it should not extract):
1.

Create a file that list examples of good names and a file that lists examples of bad names. List
each name on a seperate line.
For example:
In the good name file:

In the bad name file:

Tom
Kate
Richard
Harry
Fred
Anna

The
Her
With
Which
That
His
This

Note that the more training you specify, the better the name recognition will work.

Page 168

Eduction
2.

Save each file with a .dat extension.

3.

Add the PhraseSets parameter to your eduction task section, and use it to list both name types.
For example:
[EductionTask]
Module=Educe
NextTask=MyIndexTask
PhraseSets=GoodNames,BadNames

4.

Create configuration section for each of the name types you have listed. Use the
PhraseFileNames parameter to specify the location of the phrase list dat file that contains the
training and the ListType parameter to identify the training type of the file. Set the TagName
parameter to None to indicate that the content of the phrase lists is used for training and not for
exact matching.
For example:
[GoodNames]
PhraseFileNames=Good.dat
ListType=PosNames
TagName=None
[BadNames]
PhraseFileNames=Bad.dat
ListType=NegNames
TagName=None

5.

Save the configuration file. You can now start indexing data into IDOL server.
Every time data is indexed that contains names, IDOL server extracts these names and stores
them in EDUCE_NAME fields. Note that only names that start with capital letters are extracted.

Page 169

Eduction

Extracting user-defined data types


If you define information types that you want IDOL server to recognize and extract, IDOL server
automatically extracts them from any data that it receives and stores them in custom felds.

To define and extract user-defined data types:


1.

Create a phrase list for each data type that you want to extract, and save it with a .dat extension.
In each file you need to list all words that belong to the files data type. List each word on a
separate line.
For example, if you want to extract country names, you could create a country.dat file, and list all
words that you want IDOL server to recognize as country data:
united kingdom
u.k.
uk
united states
u.s.a.
usa
u s a

2.

Create an eduction task in IDOL servers configuration file and use the PhraseSets parameter to
list all your user-defined data types. If you have already set up an eduction task to extract built-in
data types, you can add the PhraseSets parameter to this section.
For example:
[EductionTask]
Module=Educe
NextTask=MyIndexTask
PhraseSets=CountryNames

3.

Create configuration section for each of the data types you have listed and use the
PhraseFileNames and TagName parameters to specify the location of the phrase list dat file that
defines which data should be extracted, and in which field this data should be stored.
For example:
[CountryNames]
PhraseFileNames=Countries.dat
TagName=Country

4.

Save the configuration file. You can now start indexing data into IDOL server.
Every time data is indexed that contains a word that matches one of the words listed in one of the
phrase list dat files, IDOL server extracts this word and stores it in the appropriate field. Note that
the matching of phrase list word is not case sensitive.

Page 170

19. Expertise
IDOL server accepts a natural language or Boolean search string and returns users who own matching
agents or profiles. This allows instant identification of experts in any subjects at hand, eliminating time
consuming searches for specialists, and unnecessary researching of subjects for which expert
knowledge is already available

To find specialist community agents or profiles


You can use the Community action in order to find agents and / or profiles in the community that
match a natural language or Boolean search string.
For example:
http://<host>:<port>/action=Community&Text=how does the cost of funds, such as the
costs of performing a credit evaluation on the business requesting a loan, determine the
spread between the federal funds rate and the prime rate&AgentsFindProfiles=true&Profiles
FindAgents=true
In this example, IDOL server is instructed to find agents and profiles in the user community that match
the specified text.

Page 171

Expertise

Page 172

20. Hyperlinking
When IDOL server returns results, it automatically generates hyperlinks in real time. These point to
contextually similar content and can be used to recommend related articles, documents, affinity
products or services, or media content that relates to textual content.
Because links are automatically inserted at the time a document is retrieved, they can include
references to documents and articles written long before, or hyperlinks from archived material can link
to the latest news or material on that subject.
For example:

New Media
When viewing an article on a new media internet site, Autonomy can be used to dynamically
link to contextually similar content and recommends related articles in real time.

Corporate
Within a corporate environment, as an employee is reading or writing a document, contextually
similar documents from various sources can be suggested to the person through dynamic
hyperlink creation enabling the user to immediately view documents, multimedia content and
related e-mails.

E-Commerce
Through contextual association, e-commerce vendors are able to increase customer retention
of their site through the ability to cross-sell and push other relevant content or products as they
browse product catalogues or content.

Legal
Typically within the legal arena, Autonomy facilitates the ability to suggest contextually relevant
legal content pertinent to the legal issues being researched. Through automatic Hyperlinking,
Autonomy significantly reduces the time taken to navigate to the right information, identify
previous precedents and facilitate reuse of existing material.

CRM
As a customer service representative attends a customer's enquiry, answers to frequently
asked questions and related e-mails are presented in the form of dynamic hyperlinks, enabling
the organization to raise its level of customer service, reduce the requirement for expertise in
the front line and ensure all issues are dealt with in shorter cycle times.

Page 173

Hyperlinking

Implementing hyperlinking
If you are connecting IDOL server to an Autonomy interface application (for example, Retina),
hyperlinks are automatically generated, for example, when query results are returned or when a user
refines a query.
If you are connecting IDOL server to a third party interface application, you can implement automatic
hyperlink generation by executing a Suggest action when query results are returned. Please refer to
your Autonomy ACI API documentation for details on the functions you require for this.

Page 174

21. Mailing
IDOL server matches users agents and subscription channels against its document content in regular
intervals, and automatically sends users email to notify them of documents that match their agents and
channels that they are subscribed to.

IDOL server can send the following email types to users:

scheduled automatic emails containing agent and channel results

custom emails containing details of a single document

The format of email that the IDOL server sends is determined by templates (Mailer templates on
page 180 for details of these templates).

Page 175

Mailing

Automatically emailing agent and channel results


Note: IDOL server is configured to email users agent and channel results by default.
You can configure IDOL server to automatically mail users the results that their agents and channels.
produce This enables you to schedule the emailing of agent and channel results and optionally store
lists of sent results to prevent the duplication of email.

To set up automatic emailing of agent and channel results


1.

Open IDOL servers configuration file in a text editor, and find the [UserCustom] section. This
section lists all the custom processes that IDOL server executes.

2.

Check if the [UserCustom] section lists a section for emailing. If it doesnt, you need to add one.
For example:
[UserCustom]
0=Email

3.

Create a configuration file section for the emailing process you have listed.
For example:
[Email]

4.

In your new section, set Library to <IDOL server installation location>/IDOL/modules/


user_email, RunMailer to true and DefaultSendEmail to true in order to enable IDOL servers
mailing operation.

5.

Specify a TestUser. While you are configuring mailing, all mail is sent to the TestUser email
address until you are ready to start mailing properly.

6.

If you are using a proxy server, specify the ProxyHost, ProxyPort, your ProxyUsername and
your ProxyPassword.

7.

Use SMTPHost and SMTPPort to specify the details of your mail server.

8.

Use Cycles and Interval to determine how many times the mailing operation should run and the
time span that you want to elapse between the sending of email. Set StartTime to now, so you
can test the mailing operation immediately when you start IDOL server.

9.

Set Retries to the number of times that IDOL server attempts to connect to its Agent index before
it times out, and use TimeoutMS to specify how long each of these attempts can take.

10. Use From, FromHost and FromName to set the details that are displayed for the sender of email
that the mailing operation sends. Specify the DefaultSubject that is displayed as the mail's
subject line.
11. Use XSLTemplate to specify which template you want to use for the email. The
DefaultEmailFormat and DefaultEmailResultsType settings allow you to specify the email
format and whether results are sent individually or in sets.

Page 176

Mailing
12. Use DefaultAddSetToReadDocuments and DefaultExcludeReadDocuments to determine if a
list of the results that a user has already viewed should be created, and if results that are
contained in this list should be excluded from mail that the IDOL server sends to users (so each
result can only sent to them once). Set DreTemplateReferenceStart and
DreTemplateReferenceEnd to ensure that IDOL server can extract the reference of documents
and determine if they have been viewed.
13. If you want to include channel results in the email that the mailing operation sends, you need to
configure the following settings:
ClassificationServerXSLTemplate
The template that you want to use to display channel results.
ClassificationServerNumResults
The maximum number of channel results to include in the email.
ClassificationServerThreshold
The quality of channel results to include in the email.
ClassificationServerParams
Parameters that should be included in the channels query that the mailing operation sends to
IDOL servers Category index.
ClassificationServerValues
The values of the specified ClassificationServerParams parameters.
ClassificationServerRetries
The number of times that the mailing operation attempts to connect to IDOL servers
Category index.
ClassificationServerTimeout
Specifies how long each of the ClassificationServerRetries can take, before the mailing
operation times out.
Note:

users will only receive channel results for categories that you have subscribed them to. You
can subscribe a user to one or more categories by sending a UserEdit action to IDOL server.
Use the CategorySubscribe action parameter to specify the categories whose results you
want to be mailed to the user. (You can unsubscribe a user by issuing a UserEdit action with
the CategoryUnsubscribe action parameter set to the categories whose results should no
longer be mailed to the user).

if you want to include channel results from another IDOL server installation in the emails that
the mailing operation sends, you need to use ClassificationServerHost and
ClassificationServerPort to specify the location of that IDOL server.

14. If you need to minimize the impact that the mailing operation has on your system resources, you
can set SleepBetweenRequests and MaxEmailsPerUser to values that are appropriate for your
environment.

Page 177

Mailing
15. Save IDOL servers configuration file and restart IDOL server. The mailing operation will start
immediately because you have set StartTime to now, so mail should be send to the TestUser
address you have specified. Check that the mail process is working smoothly.
16. Make any adjustments to your settings that you need, then save the configuration file again and
restart IDOL server. Note that you can enable VerboseLogging if you experience problems with
the mailing operation.

Once you are satisfied with the mailing operation:


1.

Open IDOL servers configuration file in a text editor, and find the [UserCustom] section.

2.

Delete the email address you have specified for TestUser and set StartTime to the time when you
want the mailing operation to start.

3.

Save IDOL servers configuration file and restart IDOL server.

Page 178

Mailing

Sending custom emails


You can configure IDOL server to send an email containing details of a specific single document to a
user:
1.

Open IDOL servers configuration file in a text editor, and find the [UserCustom] section.
If you have already added a custom section in order to automatically email results to users (see
Automatically emailing agent and channel results on page 176), the same settings enable the
sending of custom emails. If you are using this existing section, ensure that you specify the
template to use for custom emails with EmailActionXSLTemplate. Continue with step 7.
If you want IDOL server to send custom emails without enabling automatic agent and channel
results emailing, specify a new custom section. Continue with step 2.
Note: for details of configuration parameters, please refer to IDOL servers online help (see
Displaying help on configuration settings on page 389).

2.

Add a section to the configuration file with the name that you specified in the [UserCustom]
section.

3.

In your new section, set Library to <IDOL server installation location>/IDOL/modules/


user_email in order to enable the mailing operation.

4.

If you are using a proxy server, specify the ProxyHost, ProxyPort, your ProxyUsername and
your ProxyPassword.

5.

Use SMTPHost and SMTPPort to specify the details of your mail server.

6.

Specify the template to use to create custom emails with EmailActionXSLTemplate.

7.

Save IDOL servers configuration file and start IDOL server.

8.

Send a Custom action to IDOL server, with Function set to email and Library set with the name
of the custom section in the IDOL server configuration file that sets up the mailing operation.
Please refer to the online help for details on the Custom action (see Displaying online help on
page 61).

9.

The mailing operation uses the template you specified with EmailActionXSLTemplate to create
the email that it sends to the specified user.

Page 179

Mailing

Mailer templates
The IDOL server installation comprises the following XSL templates for the mailing operation:

For automatically emailing agent and channel results:

email.xss

Main template that the user_email library uses for results emails.
email.xss specifies the overall structure of emails and includes
specific instructions for displaying individual agent results.

channels.xss

Template that the user_email library uses for formatting the channel
results the email.xss template includes.

For sending custom emails:

ondemand.xss

Template that specifies how to display the emails that IDOL server
sends in response to a Custom action command.

You can modify these templates in order to customize email layout.

Page 180

Mailing

Editing templates
The XSL templates use XPath and XSLT to identify fields to sort and display from the XML returned in
response to action commands sent to IDOL server.
The XML fields that the template uses to create emails are identified by the select attribute in the
templates XSL tags. To identify the XML fields that a template can use, use a web browser to send the
HTTP action command for which IDOL server uses the template to display results. You can then
determine available field names from the autn tags in the XML that is returned. The action command
to send depends on the template you are editing:

template

action command

email.xss

AgentGetResults

channels.xss

CategoryQuery

ondemand.xss

Custom (see Sending custom emails on page 179)

For details of how to send these action commands, please refer to IDOL servers online help (see
Displaying online help on page 61).

For example, if you send an AgentGetResults action command to IDOL server, the following XML
could be returned:
<?xml version='1.0' encoding='UTF-8' ?>
<autnresponse xmlns:autn='http://schemas.autonomy.com/aci/'>
<action>AGENTGETRESULTS</action>
<response>SUCCESS</response>
<responsedata>
<autn:agent>
<autn:aid>2-A2</autn:aid>
<autn:training />
<autn:parent>2</autn:parent>
<autn:agentname>agent21</autn:agentname>
<autn:fields>
<retrained>true</retrained>
<private>false</private>
<fromdocument>true</fromdocument>
</autn:fields>
<autn:results>
<autn:numhits>1</autn:numhits>
<autn:hit>
<autn:reference>http://193.115.251.40/ArchiveData/
encarta\38000\msdata39439.htm</autn:reference>

Page 181

Mailing
<autn:id>1254</autn:id>
<autn:section>0</autn:section>
<autn:weight>70.77</autn:weight>
<autn:links>TAPESTRI,REVIV,WEAV,REACH,EUROPEAN,OCCUR,PRACTIC,TRA
DIT,REMAIN,EUROP,EXAMPL,WESTERN,ALTHOUGH,EAR</autn:links>
<autn:database>News</autn:database>
<autn:title>Tapestry Tapestry weaving may have been practiced in Europe as ...</
autn:title>
<autn:summary>Tapestry Tapestry weaving may have been practiced in Europe as
... . Tapestry Tapestry weaving may have been practiced in Europe as early as the
8th century, although no examples remain. Western European tapestry reached its
greatest development between the 14th and 18th centuries. During the 19th and
20th centuries, however, revivals of the tapestry tradition occurred. . </
autn:summary>
<autn:content>
<DOCUMENT>
<DREREFERENCE>http://193.115.251.40/ArchiveData/
encarta\38000\msdata39439.htm</DREREFERENCE>
<DRETITLE>Tapestry Tapestry weaving may have been practiced in Europe
as ... </DRETITLE>
<BLANK />
<IMAGE>archiv</IMAGE>
<PAPER />
<SUMMARY>Tapestry Tapestry weaving may have been practiced in
Europe as early as the 8th century, although no examples remain. Western
European tapestry reached its greatest development between the 14th and
18th centuries</SUMMARY>
<DOCTYPE>ARCHIVE</DOCTYPE><
DREDATE>907347778</DREDATE>
<DREDBNAME>ARCHIVE</DREDBNAME>
<DRECONTENT>Tapestry Tapestry weaving may have been practiced in
Europe as early as the 8th century, although no examples remain. Western
European tapestry reached its greatest development between the 14th and
18th centuries. During the 19th and 20th centuries, however, revivals of the
tapestry tradition occurred. </DRECONTENT>
<autn:content>
</autn:hit>
</autn:results>
</autn:agent>
</responsedata>
</autnresponse>

Page 182

Mailing
In this example, you can see from the XML that IDOL server returns that the following fields are
available as values for the select attribute:
agent

private

section

aid

fromdocument

weight

training

results

links

parent

numhits

database

agentname

hit

title

fields

reference

summary

retrained

id

content

You can include these fields as values in the XSL tags. For example, to display the value of the <autn:
title> tag for each result document, include the following lines in your template:
<xsl:for-each select=responsedata/hit">
<xsl:value-of select="title">
</xsl:for-each>

Note that you should remove the autn: part of the tag from the XSL tag that you specify. For example,
if the XML that IDOL server returns contains a tag called autn:title you should specify the tag as title
(as in select="title", in the example here).

Page 183

Mailing

Page 184

22. Profiling
IDOL server automatically creates profiles for users, in real time. You can configure IDOL server to
create up to four different profile types. By default it creates an interest and an expertise profile for
each user.
An interest profiles is created by tracking the content that a user views and extracting a conceptual
understanding of it. IDOL server then uses this understanding to keep the users interest profile up-todate. Interest profiles can be used to target information at users, recommend content to users, alert
users to the existence of content and to put users in touch with other users who have similar interests.
An expertise profile is created by tracking the content that a user creates and extracting a conceptual
understanding of it. IDOL server then uses this understanding to keep the users expertise profile upto-date. Expertise profiles can be used to trace users who are experts in particular subject areas.

Profiling a user
You can profile a user using the ProfileUser action command. For details on this action, please refer
to the IDOL server online help (see Displaying online help on page 61).

To create an interest profile for a user:


Execute the ProfileUser action when a user views a document. IDOL server analyzes the document
the user is viewing and determines if it is similar to any of the concepts in the users interest profile
(using MatchThreshold).
If the content of the viewed document is similar to an existing interest profile concept, IDOL server
updates the existing concept with the new document (if several concepts are similar, only the most
similar one is updated). If the viewed documents content is not similar to an existing interest profile
concept, IDOL server creates a new concept in the interest profile.
Note: IDOL server only uses the five strongest concepts in a users interest profile for
recommendations, alerting and similar user matching.
For example:
http://12.3.4.56:4000/action=ProfileUser&UserName=Administrator&Document=3422+5776&
MatchThreshold=60&NamedArea=Interest
This command instructs IDOL server to analyze the content in the 3422 and 5776 documents. If it has
a conceptual relevance of at least 60% to a concept in the Administrator users interest profile, IDOL
server uses it to update the matching interest profile concept (if several concepts are similar, only the
most similar one is updated). If the documents content does not have a conceptual relevance of at
least 60% to an existing interest profile concept, IDOL server creates a new interest profile concept
from it.

Page 185

Profiling
To create an expertise profile for a user:
Execute the ProfileUser action when a user creates text (for example, a document in IDOL server that
was authored by a user or text that a user enters in a helpdesk environment). IDOL server analyzes
the text the user has created and determines if it is similar to any concepts in the users existing
expertise profile (using MatchThreshold).
If the content of the viewed text is similar to an existing expertise profile concept, IDOL server updates
the existing concept with the text (if several concepts are similar, only the most similar one is updated).
If the text is not similar to an existing expertise profile concept, IDOL server creates a new concept in
the expertise profile.
Note: IDOL server only uses the five strongest concepts in a users expertise profile for expertise
matching.
For example:
http://12.3.4.56:4000/action=ProfileUser&UserName=Administrator&Document=The
chemical structure of everyone's DNA is the same. The only difference between people (or
any animal) is the order of the base pairs& MatchThreshold=60&NamedArea=Expertise
This command instructs IDOL server to analyze the specified text. If it has a conceptual relevance of at
least 60% to any concept the Administrator users expertise profiles, IDOL server uses it to update
the matching expertise profile concept (if several concepts are similar, only the most similar one is
updated). If the text does not have a conceptual relevance of at least 60% to an existing expertise
profile concept, IDOL server creates a new expertise profile concept from it.

Page 186

Profiling

Editing a profile
IDOL server stores interest and expertise agents in the form of terms and weights. You can edit a
profiles terms and weights using the ProfileEdit action command. For details on this action, please
refer to the IDOL server online help (see Displaying online help on page 61).
For example:
http://12.3.4.56:4000/action=ProfileEdit&PID=1-P2.3&TermCOLOR=2322
This command changes the weight of the 1-P2.3 profiles COLOR term to 2322.

Querying with a profile


You can query with a profile using the ProfileGetResults action command. For details on this action,
please refer to the IDOL server online help (see Displaying online help on page 61). When you query
with a profile, the profile is by default matched against IDOL servers Data index.

Viewing a profiles details


You can view a profiles details using the ProfileRead action command. For details on this action,
please refer to the IDOL server online help (see Displaying online help on page 61).
For example:
http://12.3.4.56:4000/action=ProfileRead&UserName=Administrator&PID=3422
This command requests the details of the Administrator users 3422 profile from IDOL server.

Deleting a profile
You can delete a profile from IDOL servers Profile index using the ProfileClear action command. For
details on this action, please refer to the IDOL server online help (see Displaying online help on
page 61).
For example:
http://12.3.4.56:4000/action=ProfileClear&UserName=Administrator&PID=450-P0.1
This command deletes the Administrator users 450-P0.1 profile from IDOL server.

Page 187

Profiling

Page 188

23. Retrieval
You can query IDOL server with action commands using a web browser, an Autonomy interface
application (for example, Retina) or a third party portal that uses Autonomy portlets.

Action commands
IDOL server is queried via action commands. The following action commands are available to all clients that have permission to query IDOL server (set by QueryClients in the IDOL server configuration
file's [Server] section):
GetContent

Allows you to display the content of one or more specified


documents.

GetQueryTagValues

Allows you to return the values of parametric fields in query results.

GetTagNames

Allows you to return all fields of a specified type.

GetTagValues

Allows you to execute a parametric search.

Highlight

Allows you to highlight link terms in text.

Query

Allows you to submit different query types to IDOL server.

Suggest

Allows you to retrieve documents that are conceptually similar to


one or more specified documents.

SuggestOnText

Allows you to retrieve documents that are conceptually similar to


the terms with the highest weighting in specified text.

Summarize

Allows you to generate a summary for documents or text.

TermGetBest

Allows you to list the conceptually most important terms in one or


more specified documents.

TermGetInfo

Allows you to return the weight and other available information for
specified terms.

Page 189

Retrieval
In addition, the following actions are available to administrative clients of IDOL server (set by
AdminClients in the IDOL server configuration file's [Server] section):
DetectLanguage

Allows you to determine the language of a piece of text.

GetStatus

Allows you display configured details about IDOL server's setup.

IndexerGetStatus

Allows you to display the status of index commands in IDOL


server's index queue.

List

Allows you to list all documents that are stored in IDOL server or
any of its databases.

TermGetAll

Allows you list all terms that are stored in IDOL server.

Note:

for further details on action commands (see Displaying online help on page 61), please
refer to the online help.

for details on action command syntax, please see Action command syntax on page 62.

Page 190

Retrieval

Conceptual matching
You can use Query, Suggest and SuggestOnText action commands to perform conceptual matching.
IDOL server uses advanced pattern-matching technology to conceptually match the data with which it
is queried (via action commands) against the content it holds.

Content matching
You can submit natural language text or a piece of content to IDOL server, for which it returns
references to conceptually related documents ranked by relevance, or contextual distance.
Natural language queries make it possible for users to find the results they are looking for without
having to be familiar with search algorithms or syntax. Online shoppers, for example, can find specific
items without knowing the exact product or brand name.

Active matching
If you are using the Autonomy Desktop Suite (or one of the Active products that the Desktop Suite
comprises), IDOL server conceptually matches natural language text content in whichever application
a user is currently using, and returns a list of documents ordered by contextual relevance to the active
text.

Community matching
You can create agents from natural language and then match them conceptually. Profiles or natural
language text can also be submitted to IDOL server, for which it returns agents ranked by conceptual
similarity. This determines which users have similar interests (thus promoting collaboration) and
identifies experts in a field.

Category matching
You can submit a piece of content to IDOL server, for which it returns categories ranked by conceptual
similarity. This determines for which categories the piece of content is most appropriate for, so that the
piece of content can subsequently be tagged, routed or filed accordingly.

Clustering
You can use IDOL server to organize large volumes of content or large numbers of profiles into selfconsistent clusters. Clustering is an automatic agglomerative technique which allows IDOL server to
partition a corpus by grouping together information that contains similar concepts.

Page 191

Retrieval

Example queries
Agent or category query:
http://localhost:5552/action=Query&Text=MAMMALIAN~[254] MICROBIOLOGI~[112]
GENOM~[103] GENET~[100] MOLECULAR~[75] BIOTECHNOLOGI~[71] BIOLOGI~[69]
GENE~[59] BIOLOG~[43] CELL~[37]
In this example, an agent (or category) query is sent to IDOL server. The query contains the terms that
the agent's training comprises and the weight of each of the terms. IDOL server can return agents,
profiles, categories or documents that conceptually match the terms of the query.
Profile query:
http://localhost:5552/action=Query&Text= CHAMPIONLEAGU~[551] EVERTON~[407]
BAYERN~[402] UEFA~[391] PREMIERSHIP~[383] FIFA~[257] STRIKER~[226]
WORLDCUP~[215] EURO~[124] SOCCER~[114] CUP~[66]
In this example, a profile query is sent to IDOL server. The query contains the terms that the profile's
training comprises and the weight of each of the terms. IDOL server can return agents, profiles,
categories or documents that conceptually match the terms of the query.
Text query:
http://localhost:5552/action=Query&Text=Gene analysis discovered methods to determine
the exact sequence of nucleotides that compose a specific gene
In this example, a text query is sent to IDOL server. IDOL server can return agents, profiles, categories
or documents that conceptually match the query text.
Suggest query:
http://localhost:5552/action=Suggest&ID=10
In this example, a Suggest query is sent to IDOL server. IDOL server can return agents, profiles,
categories or documents that conceptually match the specified document (that is the document with
the ID 10).

SuggestOnText query:
http://localhost:5552/action=SuggestOnText&Text=Gene analysis discovered methods to
determine the exact sequence of nucleotides that compose a specific gene
In this example, a SuggestOnText query is sent to IDOL server. IDOL server can return agents,
profiles, categories or documents that conceptually match the terms with the highest weighting in the
query text.

Page 192

Retrieval

Advanced keyword searches


Advanced keyword search enables IDOL server to match any term or phrase that appears in quotation
marks in its exact pre-stemmed form.
For example:
http://<host>:<port>/action=Query&Text="Tony Browne"
This query only returns documents that contain "Tony Browne", "tony browne", "ToNy BrOwNe",
"Tony browne" or "tony Browne" (irrespective of whether stemming is enabled or disabled). It does
not return documents that contain, "Tony Brown", "Tony Browning", "Toni Brown", "Tonis Browns"
and so on (IDOL server returns all of these if AdvancedSearch is disabled).

Note that if you enable advanced keyword searches, you can still execute a conceptual phrase search
that uses stemming by using DNEAR1.
For example:
http://<host>:<port>/action=Query&Text=Tony DNEAR1 Browne
This query returns documents that contain "Tony Brown", "Toni Browning" and so on.

To enable Advanced keyword searches:


Note: you must enable advanced keyword searches before you index the data that you want to
search.
1.

Open IDOL server's configuration file in a text editor.

2.

In the [Server] section, set the AdvancedSearch parameter to true. (If the [Server] section
doesn't contain the AdvancedSearch parameter, you should add it). Note that if you are enabling
AdvancedSearch, it is recommended that you set ProperNames to 0 or 7 in the appropriate
language type sections of the configuration file.

3.

Save the configuration file and start IDOL server.

4.

Index documents into IDOL server. Once you have finished indexing, you can execute advanced
keyword searches using the Query action command.

Page 193

Retrieval

Boolean and bracketed Boolean searches


You can use the Query action command to submit standard Boolean searches to IDOL server.
The following operators allow you to manipulate a query by applying them to words, exact phrases or
other Boolean expressions. Note that APCM (Adaptive Probabilistic Concept Modeling) is used to rank
the results that match the Boolean query.
AND

Binary operator. Ensures that both terms are matched in every document that is
returned.
For example:
action=Query&Text=cat+AND+dog
This query only returns documents that contain both cat and dog.

NOT

Unary operator. Ensures that the term following NOT is excluded from any of the
returned documents.
For example:
action=Query&Text=cat+NOT+dog
This query only returns documents that contain "cat" but not "dog".
Note: if you want to use NOT to exclude multiple terms, you need to use brackets,
otherwise NOT only applies to the term that immediately follows it. If you want to use
NOT to exclude a phrase, you need to put the phrase in quotation marks and in
brackets.
For example:
Doc 1: I went to the city for the New Year
Doc 2: I went to New York City for the New Year
This query would match neither of the above documents:
action=Query&Text=city NOT (New York)
This query matches the first document but not the second:
action=Query&Text=city NOT ("New York")

OR

Binary operator. One or both terms must appear for the document to be returned. This
is the default behavior if no explicit operator is given between two terms.
For example:
action=Query&Text=cat+OR+dog
This query only returns documents that contain either cat, dog or both terms.

Page 194

Retrieval

EOR
or
XOR

Binary operator. Logical exclusive OR. Only one of the terms is permitted to appear for
the document to be returned. This is a rarely used operator.
For example:
action=Query&Text=cat+XOR+dog
This query only returns documents that contain either the term cat or the term dog.
Documents that contain both "cat" and "dog" are not returned.

()

Bracketed expressions. These are evaluated left to right and can be nested. They
dictate the precedence and behavior of combined operator statements.
For example:
action=Query&Text=(fish EOR pie) AND (chips EOR mash)
This query only returns documents that contain one of the following:
"fish and chips
fish and mash
pie and chips
pie and mash

Note: Boolean operators must be specified using capital letters.

Precedence of Boolean and Proximity operators


Boolean and Proximity operators have the following precedence:

Highest precedence:

NOT
NEAR; DNEAR
AND; BEFORE; AFTER

Lowest precedence:

OR; XOR; EOR; WNEAR

Operators that have the same level of precedence have neither left or right associativity. You can use
brackets to bind terms together as appropriate (note that Proximity operators must have terms on
either side and cannot be adjacent to brackets).

Page 195

Retrieval

Exact Phrase searches


You can search for an exact phrase using one of the following:

quotation marks
Phrases are stemmed and then matched by IDOL server. As with all query text, any
stopwords that the phrases contain are removed before matching and any punctuation that a
phrase contains is ignored.

a TERMEXACTPHRASE{} field text query


Phrases are not stemmed before they are matched against the specified field by IDOL server.
Stopwords are not removed before matching. IDOL server ignores any punctuation that a
phrase contains.

a TERMPHRASE{} field text query


Phrases are stemmed and then matched against the specified field by IDOL server. Any
stopwords that the phrases contain are not removed before matching. IDOL server ignores
any punctuation that a phrase contains.

Quotation marks
You can use the Query action commands Text parameter to match a phrase by putting quotation
marks around it. Note that the phrase is stemmed and that any stopwords that the phrase contains are
removed before it is matched. IDOL server ignores any punctuation that the phrase contains.
If you want to specify multiple phrases, IDOL server matches any one of them. You must separate the
individual phrases with plus symbols or spaces and put quotation marks around each phrase.

Examples:
http://<host>:<port>/action=Query&Text="world market"
This query returns documents that contain, for example, world market, all over the world. This
market, in world markets and so on.

http://<host>:<port>/action=Query&Text="Bank of England"
This query returns documents that contain, for example, Bank of England, banking in
England, the river bank. On Englands shores and so on.

Page 196

Retrieval
http://<host>:<port>/action=Query&Text="bird watching" "birds of prey"
This query returns documents that contain, for example, bird watching, the bird was watching
the worm, birds of prey bird of prey the bird is the prey and so on.

http://<host>:<port>/action=Query&Text="Batman! and Robins"


This query returns documents that contain, for example, Showing now, Batman and Robin's
film", "Showing now, 'Batman and Robin' the movie", Batman, Robin and Penguin and so on.

TERMEXACTPHRASE{ } field specifier


You can use the Query, Suggest and SuggestOnText action commands FieldText parameter to
match a phrase against a field by using the TERMEXACTPHRASE field specifier. Note that the phrase
is not stemmed and that stopwords that the phrase contains are not removed before it is matched.
IDOL server ignores any punctuation that a phrase contains.
Example:
http://<host>:<port>/action=Query&FieldText=TERMEXACTPHRASE{Batman! and
Robins}:/DRETITLE
This query returns documents whose DRETITLE field contains, for example, contains "Showing
now, Batman and Robin's film", while a document whose DRETITLE field contains "Showing
now, 'Batman and Robin' the movie" or Batman, Robin and Penguin would not be returned.

TERMPHRASE{ } field specifier


You can use the Query, Suggest and SuggestOnText action commands FieldText parameter to
match a phrase against a field by using the TERMEXACTPHRASE field specifier. Note that the phrase
is stemmed and that stopwords that the phrase contains are not removed before it is matched. IDOL
server ignores any punctuation that a phrase contains.
Example:
http://<host>:<port>/action=Query&FieldText=TERMEXACTPHRASE{Batman! and
Robins}:/DRETITLE
This query returns documents whose DRETITLE field contains, for example, "Showing now,
Batman and Robin's film" or "Showing now: 'Batman and Robin'", while a document whose
DRETITLE field contains Batman, Robin and Penguin would not be returned.

Page 197

Retrieval

Field restrictions
You can use simple field restrictions within a Query action's Text parameter in order to return results
that contain specific values in specific fields or, if you combine query text with a field restriction,
increase the relevance of results that contain specific values in specific fields. Note that these fields
must have been stored as Index fields in IDOL server (see Setting up field indexing on page 67).
You can use wildcards, but you cannot match more than one value or a value that contains spaces or
punctuation. You cannot use field restrictions on terms in brackets.

Example queries
http://<host>:<port>/action=Query&Text=cat:DRETITLE
This query only returns documents that contain the value cat in their DRETITLE field.

http://<host>:<port>/action=Query&Text=cat dog:DRETITLE
This query returns documents that contain the term cat in any field and the term dog in their
DRETITLE field. Documents that contain either cat (in any field) or dog in their DRETITLE field
are also returned, but with a lower relevance.

http://<host>:<port>/action=Query&Text=cat:CREATURE:FAUNA dog:ANIMAL
This query only returns documents that contain the value cat in their CREATURE or FAUNA
field and the value dog in their ANIMAL field. Documents that contain either cat in their
CREATURE or FAUNA field or dog in their ANIMAL field are also returned, but with a lower
relevance.

http://<host>:<port>/action=Query&Text=engin*:Title
This query only returns documents whose title field contains the specified string (for example,
"engineer", "engineering" and so on). Note that wildcard matching is carried out after stemming
has taken place.

Page 198

Retrieval

Field text queries


Field text queries are Query, Suggest or SuggestOnText action commands that include the
FieldText parameter. You can combine this parameter with the Text parameter in order to restrict the
results that your query returns to documents that contain a specific value in a specified field. You can
also use the FieldText parameter on its own in order to query for documents that contain specific
values in specific fields (however, this is not recommended as it slows down IDOL servers processing
speed).
To specify how documents must match fields and field values in order to be returned as results, the
FieldText parameter uses field specifiers which identify the pattern of values in fields. These field
specifiers fall into the following groups:

for common restrictions (see page 199)

for advanced restrictions (see page 210)

for biasing result scores (see page 226)

Note:

When identifying fields you should use the format /FieldName to match root-level fields,
FieldName to match all fields except root-level or /Path/FieldName to match fields that the
specified path points to. To identify XML attributes, use the format <tag_name>/
_ATTR_<attribute_name>, for example, FARM/_ATTR_ANIMAL. You can also use Wildcards
when identifying fields, for example, /Fi*d1, /Field* and so on.

All string matching is case insensitive, unless the parameter CaseSensitive=true is used.

Strings can contain punctuation (except curly brackets).


Ampersands (&) in strings must be URI escaped to %26.
If you want to match a string that contains a comma, you need to escape the comma with a
backslash, otherwise IDOL server reads it as a separator.

Field specifiers for common restrictions


To find documents in which a specified field contains:

use the field specifier:

a value that exactly matches one or more specified strings

MATCH

a number

EQUAL, GREATER, LESS or


NRANGE

a date

GTNOW, LTNOW or RANGE

a value that matches a specified Wildcard string

WILD

Page 199

Retrieval

Finding documents with fields whose value exactly matches one or


more strings
MATCH
The MATCH field specifier (case sensitive) allows you to find documents in which a specified field
contains a value that exactly matches a specified string.
Format:

FieldText=MATCH{<your strings>}:<your fields>


<your strings>
Enter one or more strings. A document is only returned if one of these strings is matched by
one of <your fields> exactly.
If you want to specify multiple strings, you must separate them with commas (there must be
no space before or after a comma). You can match strings that contain punctuation or
comprise several words. The matching is case independent.
<your fields>
Enter one or more fields. A document is only returned if it contains one of these fields, and if
the value in this field exactly matches one of <your strings>.
If you want to specify multiple fields, you must separate them with colons (there must be no
space before or after a colon).

Examples:
FieldText=MATCH{Archive,Web,docs}:DB:DATABASE
A document's DB or DATABASE field must have the value Archive, Web or docs for this
document to be returned as a result.
FieldText=MATCH{Premier league}:DB
A document's DB field must have the value Premier League for this document to be
returned as a result.
FieldText=MATCH{0-226-10389-7}:ISBN
A document's ISBN field must have the value 0-226-10389-7 for this document to be
returned as a result.

Page 200

Retrieval

Finding documents with fields that contain a number


You can use the following field specifiers (case sensitive) to return documents with fields that contain
numbers. Note that to optimize the processing time of queries for fields that contain numbers, you
should store them as numeric fields in IDOL server during the indexing process (see Numerical
fields on page 289):

EQUAL
The EQUAL field specifier (case sensitive) allows you to find documents in which a specified field
contains a number that matches one of the numbers specified by you.
Format:

FieldText=EQUAL{<your numbers>}:<your fields>


<your numbers>
Enter one or more numbers. A document is only returned if one of <your fields> contains
one of these numbers.
If you want to specify multiple numbers you must separate them with commas (there must
be no space before or after a comma).
<your fields>
Enter one or more fields. A document is only returned if it contains one of these fields, and if
this field contains one of <your numbers>.
If you want to specify multiple fields, you must separate them with colons (there must be no
space before or after a colon).

Examples:
FieldText=EQUAL{1234567890123}:ACCOUNT:KONTO
A document's ACCOUNT or KONTO field must contain the number 1234567890123 for this
document to be returned.
FieldText=EQUAL{3.9,4.9,7}:ID
A document's ID field must contain the number 3.9, 3.90, 4.9, 4.90, 7 or 7.0 for this
document to be returned.

Page 201

Retrieval
GREATER
The GREATER field specifier (case sensitive) allows you to find documents in which a specified field
contains a number that is greater than a number specified by you.
Format:

FieldText=GREATER{<your number>}:<your fields>


<your number>
Enter a number. A document is only returned if one of <your fields> contains a number that
is greater than this number.
<your fields>
Enter one or more fields. A document is only returned if it contains one of these fields, and if
the number in this field is greater than <your number>.
If you want to specify multiple fields, you must separate them with colons (there must be no
space before or after a colon).

Examples:
FieldText=GREATER{66}:ID
A document's ID field must contain a number greater than 66 for this document to be
returned.
FieldText=GREATER{5.59}:PRICE:PREIS
A document's PRICE or PREIS field must contain a number greater than 5.59 for this
document to be returned.

LESS
The LESS field specifier (case sensitive) allows you to find documents in which a specified field
contains a number that is smaller than a number specified by you.
Format:

FieldText=LESS{<your number>}:<your fields>


<your number>
Enter a number. A document is only returned if one of <your fields> contains a number that
is smaller than this number.
<your fields>
Enter one or more fields. A document is only returned if it contains one of these fields, and if
the number in this field is smaller than <your number>.
If you want to specify multiple fields, you must separate them with colons (there must be no
space before or after a colon).

Page 202

Retrieval
Examples:
FieldText=LESS{66}:ID
A document's ID field must contain a smaller number than 66 for this document to be
returned.
FieldText=LESS{5.59}:PRICE:PREIS
A document's PRICE or PREIS field must contain a smaller number than 5.59 for this
document to be returned.

NOTEQUAL
The NOTEQUAL field specifier (case sensitive) allows you to find documents in which a specified field
contains a number that does not match a number specified by you.
Format:

FieldText=NOTEQUAL{<your number>}:<your fields>


<your number>
Enter a number. A document is only returned if one of <your fields> does not contain this
number.
<your fields>
Enter one or more fields. A document is only returned if it contains one of these fields, and if
this field does not contain <your number>.
If you want to specify multiple fields, you must separate them with colons (there must be no
space before or after a colon).

Examples:
FieldText=NOTEQUAL{1234567890123}:ACCOUNT:KONTO
A document's ACCOUNT or KONTO field must not contain the number 1234567890123 for
this document to be returned.
FieldText=NOTEQUAL{3.9}:ID
A document's ID field must not contain the number 3.9 for this document to be returned.

Page 203

Retrieval
NRANGE
The NRANGE field specifier (case sensitive) allows you to find documents in which a specified field
contains a number that falls within the inclusive range of two numbers specified by you.
Format:

FieldText=NRANGE{<your numbers>}:<your fields>


<your numbers>
Enter two numbers separated by a comma (there must be no space before or after the
comma). A document is only returned if one of <your fields> contains a number that falls
within the inclusive range of the specified numbers (including decimal numbers).
<your fields>
Enter one or more fields. A document is only returned if it contains one of these fields, and if
this field contains a number that falls within the inclusive range of <your numbers>.
If you want to specify multiple fields, you must separate them with colons (there must be no
space before or after a colon).

Examples:
FieldText=NRANGE{1,99}:CODE
A document's CODE field must contain a number between 1 and 99 (inclusive) for this
document to be returned.
FieldText=NRANGE{1234567890123,2345678901234}:ACCOUNT:KONTO
A document's ACCOUNT or KONTO field must not contain a number between
1234567890123 and 2345678901234 (inclusive) for this document to be returned.
FieldText=NRANGE{36.5,42.3}:CODE
A document's CODE field must contain a number between 36.5 and 42.3 (inclusive) for this
document to be returned.

Finding documents with fields that contain a date


You can use the following field specifiers (case sensitive) to return documents with fields that contain
dates. Note that to optimize the processing time of queries for fields that contain dates, you must store
them as numeric date fields in IDOL server during the indexing process (see NumericDateType
fields on page 287).

GTNOW
The GTNOW field specifier (case sensitive) allows you to find documents in which a specified field
contains a date that is greater than the current number of seconds since 1st January 1970 (or since
170 AD, if you have set ExtendedDateRange to true in IDOL server's configuration file).

Page 204

Retrieval
Format:

FieldText=GTNOW{}:<your fields>
<your fields>
Enter one or more fields. A document is only returned if it contains one of these fields, and if
this field contains a date that is greater than the current number of seconds since 1st
January 1970 (or since 170 AD, if you have set ExtendedDateRange to true in IDOL
server's configuration file).
If you want to specify multiple fields, you must separate them with colons (there must be no
space before or after a colon).

Examples:
FieldText=GTNOW{}:TIME
A document's TIME field must contain a date that is greater than the current number of
seconds since 1970 (that is all documents that were indexed with dates after the current
time) for this document to be returned.
FieldText=GTNOW{}:TIME:DATE
A document's TIME or DATE field must contain a date that is greater than the current
number of seconds since 1970 (that is all documents that were indexed with dates after the
current time) for this document to be returned.

LTNOW
The LTNOW field specifier (case sensitive) allows you to find documents in which a specified field
contains a date that is smaller than the current number of seconds since 1st January 1970 (or since
170 AD, if you have set ExtendedDateRange to true in IDOL server's configuration file).
Format:

FieldText=LTNOW{}:<your fields>
<your fields>
Enter one or more fields. A document is only returned if it contains one of these fields, and if
this field contains a date that is smaller than the current number of seconds since 1st
January 1970 (or since 170 AD, if you have set ExtendedDateRange to true in IDOL
server's configuration file).
If you want to specify multiple fields, you must separate them with colons (there must be no
space before or after a colon).

Examples:
FieldText=LTNOW{}:*/TIME
A document's TIME field must contain a date that is smaller than the current number of
seconds since 1970 (that is all documents that were indexed with dates before the current
time) for this document to be returned.

Page 205

Retrieval
FieldText=LTNOW{}:TIME:DATE
A document's TIME or DATE field must contain a date that is smaller than the current
number of seconds since 1970 (that is all documents that were indexed with dates before
the current time) for this document to be returned.

RANGE
The RANGE field specifier (case sensitive) allows you to find documents in which a specified field
contains a date that falls within the inclusive range of two dates specified by you.
Format:

FieldText=RANGE{<your dates>}:<your fields>


<your dates>
Enter two dates separated by a comma (there must be no space before or after the
comma). A document is only returned if one of <your fields> contains a date that falls within
the inclusive time span specified by the specified dates.
You can use the following formats to specify each date (unless you have set
ExtendedDateRange to true in the IDOL server configuration file's [Server] section, in
which case you can only use DD/MM/YY and DD/MM/YYYY):
DD/MM/YY

A date.
For example, 1/3/05, 23/12/99 or 10/07/40.
If the year is a number less than 40, it is read as a year in the
2000s. If the year is a number between 40 and 99, it is read as a
year in the 1900s. For example, 1/02/1 is read as January 1st
2001, while 01/3/40 is read as March 3rd 1940.

DD/MM/YYYY

A date.
For example, 1/3/2005, 23/12/1999 or 10/07/1940.

<N>

A positive or negative number of days from the current date.


For example, -1 specifies yesterday's date, 0 specifies today's
date, 1 specifies tomorrow's date, 2 specifies two days from
today (the current date plus two) and so on.

<N>s

A positive or negative number of seconds from now.


For example, -60s specifies 1 minute before now, -900s specifies 15 minutes before now, -3600s specifies 1 hour before now
and so on. 60s specifies 1 minute from now, 900s specifies 15
minutes from now, 3600s specifies 1 hour from now and so on.

<N>e

Epoch seconds (seconds since January 1st 1970).


For example, 1012345000e specifies 22:56:40 on January 29th
2002.

Page 206

Retrieval

No restriction.
If you enter a full stop for the first point in time you are specifying, the beginning of the period is unrestricted (so the period
ranges up to the specified date, including any date before the
specified date).
If you enter a full stop for the second points in time you are
specifying, the end of the period is unrestricted (so the period
ranges from the specified date, including any date after the
specified date).

<your fields>
Enter one or more fields. A document is only returned if it contains one of these fields, and if
this field contains a date that falls within the inclusive range of <your dates>.
If you want to specify multiple fields, you must separate them with colons (there must be no
space before or after a colon).
Examples:
FieldText=RANGE{01/01/90,1/1/01}:DATE
A document's DATE field must contain a date between 01/01/1990 and 1/1/2001 for this
document to be returned.
FieldText=RANGE{01/01/02,01/01/2003}:DATE:DATUM
A document's DATE or DATUM field must contain a date between 01/01/2002 and 01/01/
2003 for this document to be returned.
FieldText=RANGE{-14,-7}:DATE
A document's DATE field must contain a date 14 to 7 days before the current date for this
document to be returned.
FieldText=RANGE{0,1}:DATE
A document's DATE field must contain today's or tomorrow's date (which is possible, for
example, if the document originates from a different time zone or if the field contains an
expiry date) for this document to be returned.
FieldText=RANGE{01/01/99,.}:DATE:FECHA
A document's DATE or FECHA field can contain any date after 01/01/1999 for this
document to be returned.
FieldText=RANGE{.,10/10/04}:DATE
A document's DATE field can contain any date before 10/10/2004 for this document to be
returned.

Page 207

Retrieval
FieldText=RANGE{-172800s,-1}:DATE
A document's DATE field must contain a time between 48 and 24 hours ago.
FieldText=RANGE{198765e,.}:DATE
A document's DATE field must contain a date between 198765 seconds after the epoch and
the current time.

Finding documents with fields whose value matches one or more


wildcard strings
WILD
The WILD field specifier (case sensitive) allows you to find documents in which a specified field
contains a string that matches a wildcarded string specified by you.
Format:

FieldText=WILD{<your strings>}:<your fields>


<your strings>
Enter one or more wildcarded strings. A document is only returned if one of <your fields>
matches one of these strings.
If you want to specify multiple strings you must separate them with commas (there must be
no space before or after a comma).
<your fields>
Enter one or more fields. A document is only returned if it contains one of these fields, and if
this field contains one of <your strings>.
If you want to specify multiple fields, you must separate them with colons (there must be no
space before or after a colon).

Examples:
FieldText=WILD{*.html,*.htm}:URL
A document's URL field value must end with html or htm for this document to be returned
as a result.
FieldText=WILD{passi*incarnata}:Climbers:Plants
A document's Climbers or Plants field value must contain a phrase that begins with passi
and ends with incarnata (for example, passionflower incarnata or passiflora incarnata)
for this document to be returned as a result.

Page 208

Retrieval
FieldText=WILD{*www.autonomy.com*.txt}:PATH
A document's PATH field value must contain a path that contains www.autonomy.com and
ends with .txt (for example, http://www.autonomy.com/files/doc.txt) for this document to
be returned as a result.
FieldText=WILD{wom?n }:Clothes
A document's Clothes field value must contain a word that matches the specified wildcard
string (for example, woman or women) for this document to be returned as a result.
Note:
You can also use the WILD field specifier to find documents in which one of the following meta fields
(see Meta fields on page 301) contains a string that matches a wildcarded string specified by you:
autn_database
autn_langtype
Examples:
FieldText=WILD{Ira*}:autn_database
A document's autn_database field must contain a value that starts with Ira (for example,
Irak or Iran) for this document to be returned as a result.
FieldText=WILD{eng*}:autn_langtype
A document's autn_langtype meta field must contain a value that starts with eng (for
example, englishASCII or English_UTF8) for this document to be returned as a result.

Page 209

Retrieval

Field specifiers for advanced restrictions


To find documents in which a specified field:

use the field specifier:

contains a value that fall within a specific alphabetical


range

ARANGE

contains a value that results in a non-zero value when


a bitwise AND operation is carried out against it

BITAND, BITANDHEX or
BITANDOFFHEX

contains a Boolean agent

BOOLEANFIELD

does not exist or does not contain a value

EMPTY

exists, irrespective of its value

EXISTS

contains a value that is similar to a specified string

FUZZY

contains a specified string

STRING, STRINGALL or
SUBSTRING

whose value match specific terms or phrases

TERM, TERMALL, TERMEXACT,


TERMEXACTALL,
TERMEXACTPHRASE or
TERMPHRASE

Finding documents with fields whose value falls within a specific


alphabetical range
ARANGE
The ARANGE field specifier (case sensitive) allows you to find documents in which a specified field
contains a term that falls within the inclusive alphabetical range of two terms specified by you.
Format:

FieldText=ARANGE{<your terms>}:<your fields>


<your terms>
Enter two terms separated by a comma (there must be no space before or after the
comma). A document is only returned if one of <your fields> contains a term that falls
within the inclusive alphabetical range of the specified terms.
Note that Unicode tables are used to determine alphabetical order. This means that non-7bit ASCII characters (, , , , , , , , , , , , etc.) come after z in the alphabet.

Page 210

Retrieval
<your fields>
Enter one or more fields. A document is only returned if it contains one of these fields, and if
this field contains a term that falls within the inclusive alphabetical range of <your terms>.
If you want to specify multiple fields, you must separate them with colons (there must be no
space before or after a colon).
Examples:
FieldText=ARANGE{aardvark,alligator}:ANIMAL
A document's ANIMAL field must contain a value that alphabetically falls between aardvark
and alligator. If a document's ANIMAL field contains the value aardvark, ant, anteater,
antelope or alligator, the document is returned. If a document's ANIMAL field contains the
value armadillo, it is not returned.
FieldText=ARANGE{bear,buffalo}:ANIMAL:TIER
A document's ANIMAL or TIER field must contain a value that alphabetically falls between
bear and buffalo. If a document's ANIMAL or TIER field contains the value bear, bee,
Biene, bird or buffalo, the document is returned. If a document's ANIMAL field contains
the value Bffel or chipmunk, it is not returned.

Finding documents with fields whose values results in a non-zero value


when a bitwise AND operation is carried out against a specified value
You can use the following field specifiers (case sensitive) to return documents with fields whose value
result in a non-zero value when a bitwise AND operation is carried out against a specified value.

BITAND
The BITAND field specifier (case sensitive) allows you find documents with a field whose integer value
does not result in 0 when a bitwise AND operation is carried out between this value and an integer
value specified by you.
Format:

FieldText=BITAND{<your integer>}:<your bit fields>


<your integer>
Enter an integer. A document is only returned if one of <your bit fields> contains a value
that results in a non-zero value when a bitwise AND operation is carried out between this
value and the specified integer.

Page 211

Retrieval
<your bit fields>
Enter one or more fields. A document is only returne, if it contains one of these fields, and if
this field contains an integer that results in a non-zero value when a bitwise AND operation
is carried out between it and <your integer>.
If you want to specify multiple fields, you must separate them with colons (there must be no
space before or after a colon).
For example:
FieldText=BITAND{128}:BitField
The binary representation of the integer value 128 is compared with the binary
representations of the integer values that BitField fields in IDOL server contain. Only
documents whose BitField values result in a non-zero value when they are compared to
the binary representation of 128 are returned.
If a document's BitField, for example, contains the integer value 129, it is returned, while a
document whose BitField contains the value 127 is not returned.
Field value comparison:
Integer

Binary

128

1000 0000

129

1000 0001
1000 0000

Integer

Binary

128

1000 0000

127

0111 1111
0000 0000

Page 212

this evaluates to true

this evaluates to false

Retrieval
BITANDHEX
The BITANDHEX field specifier (case sensitive) allows you find documents with a field whose
hexadecimal string value does not result in 0 when a bitwise AND operation is carried out between this
value and a hexadecimal string specified by you.
Format:

FieldText=BITANDHEX{<your hexadecimal string>}:<your bit fields>


<your hexadecimal string>
Enter a hexadecimal string. A document is only returned if one of <your bit fields> contains
a value that result in a non-zero value when a bitwise AND operation is carried out between
this value and the specified hexadecimal string.
<your bit fields>
Enter one or more fields. A document is only returned if it contains one of these fields, and if
this field contains a hexadecimal string that results in a non-zero value when a bitwise AND
operation is carried out between it and <your hexadecimal string>.
If you want to specify multiple fields, you must separate them with colons (there must be no
space before or after a colon).

For example:
FieldText=BITANDHEX{7F}:BitField
The binary representation of the hexadecimal value 7F is compared with the binary
representations of the hexadecimal values that BitField fields in IDOL server contain. Only
documents whose BitField values result in a non-zero value when they are compared to
the binary representation of 7F are returned.
If a document's BitField, for example, contains the hexadecimal value C0, it is returned,
while a document whose BitField contains the hexadecimal value 80 is not returned.
Field value comparison:
Hex

Binary

7F

0111 1111

C0

1100 0000
0100 0000

Hex

Binary

7F

0111 1111

80

1000 0000
0000 0000

this evaluates to true

this evaluates to false

Page 213

Retrieval
BITANDOFFHEX
The BITANDOFFHEX field specifier (case sensitive) allows you find documents with a field whose
hexadecimal string value does not result in 0 when a bitwise AND operation is carried out between this
value and a hexadecimal string specified by you.
Format:

FieldText=BITANDOFFHEX{<nn>,<your hexadecimal string>}:<your bit fields>


<nn>
The number of 16 bit chunks by which the value in <your hexadecimal string> and in
<your bit fields> is shifted before the bitwise AND operation is carried out (this allows
sparse bit masks to be stored more efficiently).
<your hexadecimal string>
Enter a hexadecimal string. A document is only returned if one of <your bit fields> contains
a value that result in a non-zero value when a bitwise AND operation is carried out between
this value and the specified hexadecimal string.
<your bit fields>
Enter one or more fields. A document is only returned if it contains one of these fields, and if
this field contains a hexadecimal string that results in a non-zero value when a bitwise AND
operation is carried out between it and <your hexadecimal string>.
If you want to specify multiple fields, you must separate them with colons (there must be no
space before or after a colon).

For example:
FieldText=BITANDOFFHEX{01,0a001}:BitOffField
The binary representation of the hexadecimal value 01,0a001 is compared with the binary
representations of the hexadecimal values that BitOffField fields in IDOL server contain.
Only documents whose BitOffField values result in a non-zero value when they are
compared to the binary representation of 01,0a001 (after they have been left shifted by one
16 bit chunk) are returned.
If a document's BitOffField, for example, contains the value 1,bc01, it is returned, while a
document whose BitOffField contains the value 0,5ffeffff is not returned.
Field value comparison:
nn,hexstring

Hex

Binary

01,0a001

A0010000

1010 0000 0000 0001 0000 0000 0000 0000

1,bc01

BC010000

1011 1100 0000 0001 0000 0000 0000 0000


1010 0000 0000 0001 0000 0000 0000 0000
this evaluates to true

Page 214

Retrieval

nn,hexstring

Hex

Binary

01,0a001

A0010000

1010 0000 0000 0001 0000 0000 0000 0000

0,5ffeffff

5FFEFFFF

0101 1111 1111 1110 1111 1111 1111 1111


0000 0000 0000 0000 0000 0000 0000 0000
this evaluates to false

Finding documents with fields whose values are Boolean agents


BOOLEANFIELD
The BOOLEANFIELD field specifier (case sensitive) allows you to find documents in which a specified
Boolean agent field contains an expression that matches text specified by you. A Boolean agent is a
Boolean or Proximity expression that legacy technologies use to categorize documents.
Note: if you are using a Query action, it is recommended that you use the AgentBooleanField action
parameter rather than the BOOLEANFIELD field specifier. However, if you want to match more than
one Boolean agent field, you have to use the BOOLEANFIELD field specifier.
Format:

FieldText=BOOLEANFIELD{<your text>}:<your fields>


<your text>
Enter text. A document is only returned if one of <your fields> contains a Boolean or
Proximity expression that matches the specified text.
<your fields>
Enter one or more Boolean agent fields. A document is only returned if it contains one of
these fields, and if this field contains a Boolean or Proximity expression that matches <your
text>.
If you want to specify multiple fields, you must separate them with colons (there must be no
space before or after a colon).

For example:
BOOLEANFIELD{The cat sat on the mat}:MyFirstBooleanField:MySecondBooleanField
Any document that has a MyFirstBooleanField or MySecondBooleanField field which
contains a Boolean or Proximity expression that matches the specified text is returned. For
example, the Boolean/Proximity expressions cat AND mat, cat OR mat, cat BEFORE mat
and cat DNEAR1 sat could match The cat sat on the mat, therefore documents that
contain any of these Boolean/Proximity expressions would be returned.
Documents whose MyFirstBooleanField or MySecondBooleanField fields contain, for
example, cat AND mat AND dog or mat BEFORE cat would not be returned.

Page 215

Retrieval

Finding documents in which specific fields don't exist or contain no


value
EMPTY
The EMPTY field specifier (case sensitive) allows you to find documents in which a specified field
doesn't exist or contains no value.
Format:

FieldText=EMPTY{}:<your fields>
<your fields>
Enter one or more fields. A document is only returned if it doesn't contain any of these fields
or if these fields are empty.
If you want to specify multiple fields, you must separate them with colons (there must be no
space before or after a colon).

Examples:
FieldText=EMPTY{}:ID
A document must not contain an ID field or hold no value within its ID field to be returned.
FieldText=EMPTY{}:ID:Name
A document must not contain an ID or Name field, or hold no value in its ID or Name field to
be returned.

Finding documents that contain specific fields, irrespective of their


value
EXISTS
The EXISTS field specifier (case sensitive) allows you to find documents that contain a specified field
even if this field contains no value.
Format:

FieldText=EXISTS{}:<your fields>
<your fields>
Enter one or more fields. A document is only returned if it contains one of these fields (even
if the field is empty).
If you want to specify multiple fields, you must separate them with colons (there must be no
space before or after a colon).

Page 216

Retrieval
Examples:
FieldText=EXISTS{}:ID
A document must contain an ID field to be returned.
FieldText=EXISTS{}:ID:NAME
A document must contain an ID or NAME field (or both) to be returned.

Finding documents with fields whose values are similar to a specified


string
FUZZY
The FUZZY field specifier (case sensitive) allows you to find documents in which a specified field
contains a term that is similar to a specified term or phrase.
Format:

FieldText=FUZZY{<your terms>}:<your fields>


<your terms>
Enter one or more terms (or phrases). A document is only returned if one of these terms (or
phrases) is similar to a string in one of <your fields>.
If you want to specify multiple terms (or phrases) you must separate them with commas
(there must be no space before or after a comma).
<your fields>
Enter one or more fields. A document is only returned if it contains one of these fields, and if
the value in this field is similar to one of <your terms>.
If you want to specify multiple fields, you must separate them with colons (there must be no
space before or after a colon).

For example:
FieldText=FUZZY{Bisiness News,Arkive}:DRETITLE
A document's DRETITLE field value must be similar to the term Bisiness News or Arkive
for this document to be returned. (A document whose DRETITLE field contains Business
News would be returned, while a document whose DRETITLE field contains Document
Arkive would not).

Page 217

Retrieval

Finding documents with fields that contain a specified string


You can use the following field specifiers (case sensitive) to return documents with fields that contain a
specified string.
STRING
The STRING field specifier (case sensitive) allows you to specify one or more strings of which one
must be contained as a substring in a specified field.
Format:

FieldText=STRING{<your strings>}:<your fields>


<your strings>
Enter one or more strings. A document is only returned if one of these strings is a substring
of the value in one of <your fields>. You can match strings that contain punctuation
(commas must be escaped) or comprise several words.
If you want to specify multiple strings you must separate them with commas (there must be
no space before or after a comma).
<your fields>
Enter one or more fields. A document is only returned if it contains one of these fields, and if
the value in this field is contains one of <your strings> as a substring.
If you want to specify multiple fields, you must separate them with colons (there must be no
space before or after a colon).

Examples:
FieldText=STRING{cat,dog}:ANIMAL
A document's ANIMAL field value must contain the substring cat or dog for this document
to be returned. If a document's ANIMAL field, for example, has the value scattering this
document will be returned.
FieldText=STRING{old cat}:ANIMAL:TOPIC
A document's ANIMAL or TOPIC field value must contain the substring old cat for this
document to be returned. If a document's ANIMAL field, for example, has the value old cat,
old caterpillar or bold cats, this document will be returned.
FieldText=STRING{autonomy.com}:COMPANY
A document's COMPANY field value must contain the substring autonomy.com for this
document to be returned. If a document's COMPANY field, for example, has the value
autonomy.com or http://www.autonomy.com/content/home, this document will be
returned.
FieldText=STRING{a\,b}:MISC
A document's MISC field value must contain the substring a,b for this document to be
returned. If a document's MISC field, for example, has the value a,b or a,b,c, this document
will be returned.

Page 218

Retrieval
Note:
You can also use the STRING field specifier to find documents in which one of the following meta fields
(see Meta fields on page 301) contains a substring specified by you:
autn_database
autn_langtype
Examples:
FieldText=STRING{Archiv}:autn_database
A document's autn_database field must contain the substring Archive for this document to
be returned. If a document's autn_database meta field, for example, has the value Archive
or Archives, this document will be returned.
FieldText=STRING{english}:autn_langtype
A document's autn_langtype meta field must contain the substring english for this
document to be returned. If a document's autn_database meta field, for example, has the
value englishASCII or English_UTF8, this document will be returned.

STRINGALL
The STRINGALL field specifier (case sensitive) allows you to specify one or more strings, which all
must be contained as a substring in a specified field.
Format:

FieldText=STRINGALL{<your strings>}:<your fields>


<your strings>
Enter one or more strings. A document is only returned if all of these strings are substrings
of the value in one of <your fields>. You can match strings that contain punctuation
(commas must be escaped) or comprise several words.
If you want to specify multiple strings you must separate them with commas (there must be
no space before or after a comma).
<your fields>
Enter one or more fields. A document is only returned if it contains one of these fields, and if
the value in this field is contains <your strings> as substrings.
If you want to specify multiple fields, you must separate them with colons (there must be no
space before or after a colon).

Examples:
FieldText=STRINGALL{cat,dog}:ANIMAL
A document's ANIMAL field value must contain the substrings cat and dog for this
document to be returned. If a document's ANIMAL field, for example, has the value
grooming cats and dogs or doggedly scattering seeds, this document will be returned.
Page 219

Retrieval
FieldText=STRINGALL{old cat,young dog}:ANIMAL:TOPIC
A document's ANIMAL or TOPIC field value must contain the substrings old cat and young
dog for this document to be returned. If a document's ANIMAL field, for example, has the
value old cat chases young dog, or young.doggedly chasing bold cats, this document
will be returned.
FieldText=STRINGALL{a\,b,e\,f}:MISC
A document's MISC field value must contain the substring a,b and e,f for this document to
be returned. If a document's MISC field, for example, has the value a,b,c,d,e,f or 0=e,fx
1=da,ba, this document will be returned.

SUBSTRING
The SUBSTRING field specifier (case sensitive) allows you to return documents whose field value is a
substring of a specified string (or equal to a specified strings).
Format:

FieldText=SUBSTRING{<your strings>}:<your fields>


<your strings>
Enter one or more strings. A document is only returned if one of <your fields> contains a
substring of one of the specified strings. You can match strings that contain punctuation
(commas must be escaped) or comprise several words.
If you want to specify multiple strings you must separate them with commas (there must be
no space before or after a comma).
<your fields>
Enter one or more fields. A document is only returned if it contains one of these fields, and if
the value in this field is a substring of <your strings>.
If you want to specify multiple fields, you must separate them with colons (there must be no
space before or after a colon).

Examples:
FieldText=SUBSTRING{Telecommunications,Technology}:SECTOR
A document's SECTOR field must contain a string that is a substring of
Telecommunications or Technology. If a document's SECTOR field, for example, has the
value Telecom or Technology, the document will be returned. If a document's SECTOR
field has the value Latest Technology, the document will not be returned.
FieldText=SUBSTRING{scattering,doggedly}:ANIMAL
A document's ANIMAL field value must contain a substring of scattering or doggedly for
this document to be returned. If a document's ANIMAL field, for example, has the value cat
or dog, this document will be returned.

Page 220

Retrieval

Finding documents with fields whose values match specific terms or


phrases
You can use the following field specifiers (case sensitive) to return documents in which specified fields
contain specified terms or phrases.

TERM
The TERM field specifier (case sensitive) allows you to find documents with a specified field whose
value contains a conceptual match of one or more terms specified by you. A conceptual match exists if
a term you specify matches a term in a specified field after it has been stemmed.
Note: if the language that you are using does not match the DefaultLanguageType that you have
specified in IDOL server's configuration file, you must add the LanguageType parameter to your query
command (see Specifying the language type of your query on page 321).
Format:

FieldText=TERM{<your terms>}:<your fields>


<your terms>
Enter one or more terms. A document is only returned if one of <your fields> contains a
value that includes a term which conceptually matches of one of the specified terms.
If you want to specify multiple terms, you must separate them with commas (there must be
no space before or after a comma).
<your fields>
Enter one or more fields. A document is only returned if it contains one of these fields, and if
a term in this field conceptually matches one of <your terms>.
If you want to specify multiple fields, you must separate them with colons (there must be no
space before or after a colon).

Examples:
FieldText=TERM{shopping,centers}:DRETITLE
A document's DRETITLE field must contain a term that conceptually matches shopping or
centers for this document to be returned. If a document's DRETITLE field, for example, has
the value shop this document will be returned, while if it has the value bookshopping, it
will not be returned.
FieldText=TERM{training,football}:ITEM:PRODUCT
A document's ITEM or PRODUCT field must contain a term that conceptually matches
trainers or football for this document to be returned. If a document's ITEM or PRODUCT
field, for example, has the value train or footballers, this document will be returned, while if
it has the value trainer or soccer, it will not be returned.

Page 221

Retrieval
TERMALL
The TERMALL field specifier (case sensitive) allows you to find documents with a specified field
whose value contains conceptual matches of several terms specified by you. A conceptual match
exists if the terms you specify match terms in a specified field after they have been stemmed.
Note: if the language that you are using does not match the DefaultLanguageType that you have
specified in IDOL server's configuration file, you must add the LanguageType parameter to your query
command (see Specifying the language type of your query on page 321).
Format:

FieldText=TERMALL{<your terms>}:<your fields>


<your terms>
Enter multiple terms. A document is only returned if one of <your fields> contains a value
that includes terms which conceptually match the specified terms.
Separate the terms with commas (there must be no space before or after a comma).
<your fields>
Enter one or more fields. A document is only returned if it contains one of these fields, and if
a term in this field conceptually matches one of <your terms>.
If you want to specify multiple fields, you must separate them with colons (there must be no
space before or after a colon).

Examples:
FieldText=TERMALL{shopping,centers}:DRETITLE
A document's DRETITLE field value must contain a term that conceptually matches
shopping or centers for this document to be returned. If a document's DRETITLE field, for
example, has the value town center shop this document will be returned.
FieldText=TERMALL{walk,climb}:DRETITLE:TITLE
A document's DRETITLE or TITLE field value must contain a term that conceptually
matches walking or climbing for this document to be returned. If a document's DRETITLE
or TITLE field, for example, has the value hill walking and rock climbing this document
will be returned.

TERMEXACT
The TERMEXACT field specifier (case sensitive) allows you to find documents with a specified field
that contains an exact match of any of the terms specified by you.
Note: if the language that you are using does not match the DefaultLanguageType that you have
specified in IDOL server's configuration file, you must add the LanguageType parameter to your query
command (see Specifying the language type of your query on page 321).

Page 222

Retrieval
Format:

FieldText=TERMEXACT{<your terms>}:<your fields>


<your terms>
Enter one or more terms. A document is only returned if one of <your fields> contains a
value that exactly matches one of the specified terms.
If you want to specify multiple terms, you must separate them with commas (there must be
no space before or after a comma).
<your fields>
Enter one or more fields. A document is only returned if it contains one of these fields, and if
this field contains an exact match of one of <your terms>.
If you want to specify multiple fields, you must separate them with colons (there must be no
space before or after a colon).

Examples:
FieldText=TERMEXACT{help,helped}:DRETITLE
A document's DRETITLE field value must contain the term help or helped for this
document to be returned. If a document's DRETITLE field, for example, has the value helps
or helping, the document will not be returned.
FieldText=TERMEXACT{Word,Excel}:FILE:DATEI
A document's FILE or DATEI field value must contain the term Word or Excel for this
document to be returned. If a document's FILE or DATEI field, for example, has the value
WordPerfect, the document will not be returned.

TERMEXACTALL
The TERMEXACTALL field specifier (case sensitive) allows you to find documents with a specified
field that contains an exact match of all terms specified by you.
Note: if the language that you are using does not match the DefaultLanguageType that you have
specified in IDOL server's configuration file, you must add the LanguageType parameter to your query
command (see Specifying the language type of your query on page 321).
Format:

FieldText=TERMEXACTALL{<your terms>}:<your fields>


<your terms>
Enter multiple terms. A document is only returned if one of <your fields> contains exact
matches of the specified terms.
Separate the terms with commas (there must be no space before or after a comma).

Page 223

Retrieval
<your fields>
Enter one or more fields. A document is only returned if it contains one of these fields, and if
this field contains an exact match of all <your terms>.
If you want to specify multiple fields, you must separate them with colons (there must be no
space before or after a colon).
Examples:
FieldText=TERMEXACTALL{rabbits,eating,carrots}:DRETITLE
This query returns only documents whose DRETITLE field contains all the specified terms
(in their specified form). For example, a document whose DRETITLE field has the value
Rabbits like eating carrots or The carrots were there but the rabbits ate all the
cabbage will be returned as a result, while a document with a field that contains Rabbits
like to eat a carrot each day will not be returned.
FieldText=TERMEXACTALL{flour,milk,eggs}:DRETITLE:TITLE
This query returns only documents whose DRETITLE or TITLE field contains all the
specified terms (in their specified form). For example, a document whose DRETITLE or
TITLE field has the value Most cake recipes include milk, eggs and flower will be
returned as a result, while a document with a field that contains Use a cup of milk, two
cups of flour and one egg will not be returned.

TERMEXACTPHRASE
The TERMEXACTPHRASE field specifier (case sensitive) allows you to return documents in which a
specified field contains an exact match of a phrase specified by you. Your phrase is matched before
stemming is applied (stopwords are not removed). Any punctuation in the specifier or field is ignored.
Note: if the language that you are using does not match the DefaultLanguageType that you have
specified in IDOL server's configuration file, you must add the LanguageType parameter to your query
command (see Specifying the language type of your query on page 321).
Format:

FieldText=TERMEXACTPHRASE{<your phrase>}:<your fields>


<your phrase>
Enter a phrase. A document is only returned if one of <your fields> contains an exact match
of the specified phrase.
<your fields>
Enter one or more fields. A document is only returned if it contains one of these fields, and if
this field contains an exact match of <your phrase>.
If you want to specify multiple fields, you must separate them with colons (there must be no
space before or after a colon).

Page 224

Retrieval
Examples:
FieldText=TERMEXACTPHRASE{Batman! and Robins}:FILM
A document whose FILM field contains Showing now, Batman and Robin's film, will be
returned as a result, while a document whose FILM field contains Showing now, 'Batman
and Robin' the movie will not be returned.
FieldText=TERMEXACTPHRASE{gift horse }:DRETITLE:TITLE
A document whose DRETITLE or TITLE field contains looking a gift horse in the mouth,
will be returned as a result, while a document whose DRETITLE or TITLE field contains the
gift horse's mouth had rotting teeth will not be returned.

TERMPHRASE
The TERMPHRASE field specifier (case sensitive) allows you to return documents in which a specified
field contains a conceptual match of a phrase specified by you. Your phrase is matched after stemming
is applied (stopwords are not removed). Any punctuation in the specifier or field is ignored.
Note: if the language that you are using does not match the DefaultLanguageType that you have
specified in IDOL server's configuration file, you must add the LanguageType parameter to your query
command (see Specifying the language type of your query on page 321).
Format:

FieldText=TERMPHRASE{<your phrase>}:<your fields>


<your phrase>
Enter a phrase. A document is only returned if one of <your fields> contains a conceptual
match of the specified phrase.
<your fields>
Enter one or more fields. A document is only returned if it contains one of these fields, and if
this field contains a conceptual match of <your phrase>.
If you want to specify multiple fields, you must separate them with colons (there must be no
space before or after a colon).

Examples:
FieldText=TERMPHRASE{Batman! and Robins}:FILM
A document whose FILM field contains Showing now: 'Batman and Robin', will be
returned as a result.
FieldText=TERMPHRASE{gift horse }:DRETITLE:TITLE
A document whose DRETITLE or TITLE field contains the gift horse's mouth had rotting
teeth will be returned.

Page 225

Retrieval

Field specifiers for biasing result scores


BIAS
The BIAS field specifier (case sensitive) allows you to bias the score of results according to the
numerical proximity of the specified field to a given value.
Note that you can also boost the percentage relevance that is given to query results by setting up
specific field process or by using multipliers. See Manipulating the relevance of query results on
page 266 for details on BIAS and other methods that allow you to manipulate result scores.

Page 226

Retrieval

Fuzzy queries
If you are not quite sure how some of the words are spelled that you want to query for, you can use the
Query action command to submit a fuzzy query to IDOL server. A fuzzy query returns results that
contain words, which are similar to the entered string.
If you want to submit a fuzzy query, you have to specify the Query actions Text parameter using one
of the following formats:

Text=<my_query_text>DREFUZZY(fuzzy_query_text)

For example:
http://<host>:<port>/action=Query&Text=best selling author DREFUZZY(Rowlling)

Text=DREFUZZY(fuzzy_query_text)

For example:
http://<host>:<port>/action=Query&Text=DREFUZZY(Caroll Jabberwalky)

Page 227

Retrieval

Parametric searches
The GetTagValues and GetQueryTagValues action commands allows you to execute parametric
searches.
A parametric search allows you to search for items by their characteristics (values in certain fields).
When you provide fixed values in parametric fields, the parametric search returns consistent values in
the non-fixed parametric fields. For example, you can search an IDOL server wine database for
specific wine varieties from a specific region by specifying which fields must match these
characteristics, so that only wines that are of the specified variety and from the specified region are
returned.
Before you can execute parametric searches, you need to configure IDOL server to recognize
parametric fields.

To configure IDOL server to recognize parametric fields:


Note: you must configure parametric field recognition before you index the data that you want to
search.
1.

Open IDOL server's configuration file in a text editor.

2.

In the [Server] section, set the ParametricRefinement parameter to true (if the section doesnt
contain this parameter, you have to add it).

3.

List a parametric field process in the [FieldProcessing] section.


For example:
[FieldProcessing]
Number=2
0=MyFirstProcess
1=ParametricFields

4.

Create a section for each field process that you have listed, in which you create a property for the
process (a property is later defined by one or more applicable configuration parameters). Identify
the fields that you want to associate with the process.
Note: the properties that you create must not have the same name as processes.
For example:
[MyFirstProcess]
Property=MyProperty
PropertyFieldCSVs=*/MyField,*/MyOtherField
[ParametricFields]
Property=Parametric
PropertyFieldCSVs=*/Grape,*/Color,*/Region,*/Price

Page 228

Retrieval
5.

List the properties that you have created in the [Properties] section.
For example:
[Properties]
0=MyProperty
1=Parametric

6.

Create a section for the parametric property in which you set the ParametricType parameter to
true. This enables IDOL server to recognize the associated PropertyFieldCSVs fields as
parametric fields.
For example:
[Parametric]
ParametricType=true

7.

Save and close the configuration file. You can now index your data into IDOL server.

To execute a parametric search:


Once you have configured IDOL server to index and recognize parametric fields, you can use the
following action commands to execute a parametric search:

GetTagValues
This action allows you to specify one or more parametric fields and return all values that are
stored within these fields in IDOL server. This includes values in documents that you dont have
access to and values in documents that have been deleted (unless you have compacted IDOL
servers Data index since they were deleted).
For example:
http://localhost:5552/action=GetTagValues&FieldName=Grape
This action command requests the different values that are stored in IDOL servers Grape
fields. This allows you to return a list of all grape varieties stored in an IDOL server wine
database, for example.
You can also restrict the command, so it only returns Grape field values if they are contained in
a document that also contains other specific fields that have specific values.
For example:
http://localhost:5552/
action=GetTagValues&FieldName=Grape&Restriction=MATCH{Barossa
Valley}:Region+MATCH{Red}:Color
This action command returns only Grape field values if they are contained in a document that
also contains a Region field that has the value Barossa Valley and a Color field that has the
value Red.
Page 229

Retrieval
GetQueryTagValues
This action allows you to combine query text with one or more parametric fields. When IDOL
server executes the query, it finds documents that match the specified query text, and returns
the values of the specified parametric fields for these documents. Unlike the GetTagValues
action, the GetQueryTagValues action does not return field values that are contained in
documents that you dont have access to or that have been deleted.
For example:
http://localhost:5552/action=GetQueryTagValues&FieldName=GRAPE,COUNTRY&Text=
A smooth red wine that complements game
This action command requests the different values that are stored in the GRAPE and
COUNTRY fields of documents that are conceptually similar to the specified Text.

You can also restrict the command by combining it with various action parameters.
For example:
http://localhost:5552/action=GetQueryTagValues&FieldName=GRAPE,COUNTRY&Text=
A smooth red wine that complements game&MaxValues=10&Sort=Alphabetical
This action command requests the 10 top values that are stored in the GRAPE and COUNTRY
fields of documents that are conceptually similar to the specified Text. IDOL server displays the
values in alphabetical order when it returns them.
http://localhost:5552/action=GetQueryTagValues&FieldName=GRAPE,COUNTRY&Text=
A smooth red wine that complements game&DocumentCount=true
This action command requests the different values that are stored in the GRAPE and
COUNTRY fields of documents that are conceptually similar to the specified Text. The
DocumentCount parameter instructs IDOL server to return the number of documents that
contain each value.
http://localhost:5552/action=GetQueryTagValues&FieldName=GRAPE,COUNTRY&Text=
A smooth red wine that complements game&FieldDependence=true
This action command requests the different values that are stored in the GRAPE and
COUNTRY fields of documents that are conceptually similar to the specified Text. The
FieldDependence parameter instructs IDOL server to find sets of values that occur together. If
IDOL server finds documents that contain the first parametric field listed, it checks if they also
contain the subsequently listed parametric fields.

For further details on available parameters for the GetTagValues and GetQueryTagValues actions,
please refer to the online help ( see Displaying online help on page 61).

Page 230

Retrieval

Proper Names queries


You can enable Proper Names queries if you want IDOL server to recognizes names and treat them as
a unit.

To enable Proper Names queries:


Note: you must enable Proper Names queries before you index the data that you want to query
against.
1.

Open IDOL server's configuration file in a text editor.

2.

Before content is stored in IDOL server, individual terms are always stemmed and individual
stopwords are always discarded. If you want to store Proper Name terms (adjacent terms that
begin with a capital letter) in addition to the normal content, you can set the ProperNames
parameter in the [LanguageTypes] section to one of the following.
0

Proper Name terms are not stored.

Adjacent terms* are compounded, then stemmed and indexed as a unit.

Adjacent terms are compounded (regardless of capitalization), then stemmed


and indexed as a unit. Note that this considerably increases the number of terms
that are stored in IDOL server which can slow down its performance.

The following ProperNames options are only required, if you need to be able to query for Proper
Names that contain stopwords (for example, "The Who" or "The Queen"):
3

Adjacent stopwords* are compounded, then stemmed and indexed as a unit.


Adjacent terms* are compounded, then stemmed and indexed as a unit.

Stopwords* that are adjacent to terms* are compounded with these, then
stemmed and indexed as a unit.
Adjacent stopwords* are compounded, then stemmed and indexed as a unit.
Adjacent terms* are compounded, then stemmed and indexed as a unit.

Adjacent stopwords* are compounded and indexed unstemmed as a unit.


Adjacent terms* are compounded and indexed unstemmed as a unit.

Stopwords* that are adjacent to terms* are compounded with these, and indexed
unstemmed as a unit.
Adjacent stopwords* are compounded, then stemmed and indexed unstemmed
as a unit.
Adjacent terms* are compounded and indexed unstemmed as a unit.

Page 231

Retrieval

Stopwords* that are adjacent to terms* are compounded with these, and indexed
unstemmed as a unit.
Adjacent stopwords* are compounded, then stemmed and indexed unstemmed
as a unit.
Note: it is recommended that you use this setting, if you have set
AdvancedSearch to true in the IDOL server configuration file's [Server] section.
* these must begin with a capital letter (followed by lower case).

Note that you need to set this parameter for each of the languages that you want to enable name
recognition for (if a language's settings don't include the ProperNames parameter, you should
add it).
For example:
[LanguageTypes]
DefaultLanguageType=English
LanguageDirectory=C:\IDOLserver\IDOL\langfiles
0=English
1=Deutsch
2=Francais
[English]
LanguageCode=1
Language=ENGLISH
Encoding=ASCII
ProperNames=1
[Deutsch]
LanguageCode=2
Language=GERMAN
Encoding=ASCII
ProperNames=1
[Francais]
LanguageCode=2
Language=FRENCH
Encoding=ASCII
ProperNames=1
3.

Save the configuration file and start IDOL server.

4.

Index documents into IDOL server. Once you have finished indexing, any Query action command
is automatically treated by IDOL server as a Proper Name query.

Page 232

Retrieval
Example

Depending on the ProperNames setting, IDOL server stores the following terms for the sentence Tom
Jones And His greatest hits:
0

TOM

JONE

GREAT

HIT

TOM

TOMJON

JONE

GREAT

HIT

TOM

TOMJON

JONE

GREAT

TOM

TOMJON

JONE

TOM

TOMJON

JONE

TOM

TOMJONES

JONE

TOM

TOMJONES

JONE

TOM

JONE

GREATESTHIT

HIT

ANDHI

GREAT

HIT

ANDHI

GREAT

HIT

ANDHIS

GREAT

HIT

JONESAND

ANDHIS

GREAT

HIT

JONESAND

ANDHIS

GREAT

HIT

JONESAND

If IDOL server contains the following documents, the queries below produce different results according
to what ProperNames has been set to:

Doc 1:

Doc 2:

Tom Waits and The The


in concert with Norah
Jones

Tom Jones and the the in


concert with Katie Melua

action=Query&Text=Tom Jones
If ProperNames has been set to 0 or 7, both documents are returned with the same
relevance (in both cases, IDOL server is queried with the terms TOM and JONE which are
matched by both documents).
If ProperNames has been set to 1, 2, 3, 4, 5 or 6, Doc 2 is returned with a higher relevance
than Doc 1 (because it matches not just the terms TOM and JONE but also TOMJON or
TOMJONES).

action=Query&Text=tom jones
If ProperNames has been set to 0, 1, 3, 4, 5, 6 or 7, both documents are returned with the
same relevance (in both cases, IDOL server is queried with the terms TOM and JONE which
are matched by both documents).
If ProperNames has been set to 2, Doc 2 is returned with a higher relevance than Doc 1
(because it matches not just the terms TOM and JONE but also TOMJON).
Page 233

Retrieval

action=Query&Text=The The
If ProperNames has been set to 0, 1 or 2, the query returns no results (because both
instances of the word "The" are discarded as stopwords).
If ProperNames has been set to 3, 4, 5, 6 or 7, only Doc 1 is returned (because in all these
cases IDOL server is queried with the term THETH or THETHE which are only matched by
Doc 1).

action=Query&Text=the the
If ProperNames has been set to 0, 1, 2, 3, 4, 5, 6 or 7, no results are returned (because both
instances of the word "the" are discarded as stopwords).

Page 234

Retrieval

Proximity searches
You can use the Query action command to submit proximity searches which allow you to give words
that appear close together in the search string a higher weighting.
You apply the following operators to words, exact phrases or Boolean expressions in order to execute
a Proximity search. Note that APCM (Adaptive Probabilistic Concept Modeling) is used to rank the
results that match the Boolean query.

NEAR<N>

Only returns documents in which the second term is within <N> words of the first
term. If you dont specify <N>, NEAR defaults to 6.
For example:
action=Query&Text=cat+NEAR1+dog
This query only returns documents in which the term cat is no more than 1 word
away from dog. This means that documents, which contain "cats and dogs" and
documents that contain "dogs and cats" are returned, while documents that contain
"cats do not like dogs" are not returned (as the terms are not close enough to each
other).

DNEAR<N>

Directed NEAR. Only returns documents in which the second term is within <N>
words of the first term, in the specified order. If you dont specify <N>, DNEAR
defaults to 6.
For example:
action=Query&Text=cat+DNEAR1+dog
This query only returns documents in which the term "dog" follows the term "cat", but
is no more than 1 word away from the term "cat". This means that documents, which
contain "cats and dogs" are returned, while documents that contain "dogs and cats"
or "cats do not like dogs" are not returned.

WNEAR<N>

Weighted NEAR. Proximity operator that promotes relevance when term spacing is
less than the specified <N> word distance (closer together implies higher relevance).
If you dont specify <N>, WNEAR defaults to 6.
For example:
action=Query&Text=dog+WNEAR7+cat
In this query extra relevance is given to documents in which "cat" and "dog" appear
within 7 words of each other in a piece of text. This weight increases as the terms
get closer to each other.

Page 235

Retrieval

BEFORE

Only returns documents in which the first term precedes the second one.
For example:
action=Query&Text=cat+BEFORE+dog
This query only returns documents in which the term "dog" appears later than the
term "cat".

AFTER

Only returns documents in which the first term appears later than the second one.
For example:
action=Query&Text=cat+AFTER+dog
This query only returns documents in which the term "cat" appears later than the
term "dog".

Note: Proximity operators must be specified using capital letters.

Precedence of Boolean and Proximity operators


Boolean and Proximity operators have the following precedence:

Highest precedence:

NOT
NEAR; DNEAR
AND; BEFORE; AFTER

Lowest precedence:

OR; XOR; EOR; WNEAR

Operators that have the same level of precedence have neither left or right associativity. You can use
brackets to bind terms together as appropriate (note that Proximity operators must have terms on
either side and cannot be adjacent to brackets).

Page 236

Retrieval

Soundex keyword searches


If the spelling of a keyword is not quite accurate but phonetically correct a Soundex keyword search
returns results that contain the keyword and phonetically similar keywords (using a Soundex
algorithm).

To enable Soundex keyword searches:


Note: you must enable Soundex keyword searches before you index the data that you want to search.
1.

Open IDOL server's configuration file in a text editor.

2.

In the [Server] section, set the Soundex parameter to 1. (If the [Server] section doesn't contain
the Soundex parameter, you should add it).

3.

Save the configuration file and start IDOL server.

4.

Index documents into IDOL server. Once you have finished indexing, you can perform Soundex
keyword searches using the Query action command.

Executing Soundex keyword searches:


You can use Soundex with a single or multiple keywords, or as part of a query text string.
Examples:
http://localhost:5552/action=Query&Text=SOUNDEX(einstine)
http://localhost:5552/action=Query&Text=albert SOUNDEX(einstine) examined the
phenomenon discovered by max SOUNDEX(plank) in 1905

Page 237

Retrieval

Synonym queries
A synonym query returns results which are conceptually similar to the terms in a Query actions Text
parameter and / or conceptually similar to the synonyms that are available for the Text terms.
To be able to send synonym queries to IDOL server, you need to:

1.

Set up a synonym file

2.

Configure IDOL server to use a synonym file

For details on the settings that the [Synonym] sections can contain and on how you can configure
them, please refer to IDOL servers online help (see Displaying help on configuration settings on
page 389).

To set up a synonym file:


Note: you must set up a synonym file before you index the data that you want to search.
1.

Create a text file and save it in IDOL server's installation directory using the File name you have
specified in the IDOL server configuration file's [<Synonym_type>] section.

2.

Create sections for each language type that you have defined in IDOL server's configuration file.
For example:
[English_ASCII]
[German_UTF8]

3.

In each section create a line for each word that you want to list synonyms for (using the same
encoding that you are using for the associated language type).
For example:
[English_ASCII]
cat
dog
[German_UTF8]
Katze
Hund

Page 238

Retrieval
4.

List synonym strings next to each word and save the file. Note that you must separate the word
and each string with commas and that there must be no space before or after a comma. The
individual terms can contain spaces but must not contain any punctuation.
Note: the synonym file should not comprise more than 100 lines.
For example:
[English_ASCII]
cat,feline,grimalkin,moggy,mouser,puss,pussy,tabby
dog,bitch,cur,hound,mans best friend,mongrel,mutt,pooch,puppy
[German_UTF8]
Katze,Mietze,Mietzekatze,Mietzekater,Kater,Mulle,Ktzchen
Hund,Wau Wau,Hndin,Tle,Klffer,Hndchen,Welpe

To configure IDOL server to use a synonym file:


Note: you must configure IDOL server to use the synonym file before you index the data that you want
to search.
1.

Open the IDOL server configuration file in a text editor.

2.

In the IDOL server configuration file's [FieldProcessing] section:


Set up a synonym process. The synonym process will allow IDOL server to determine when it
should apply synonym settings. For example:
[FieldProcessing]
Number=1
0=SynonymMatch

3.

Create a section for the synonym process that you have listed, in which you create a property for
the process (synonym properties always point to a defined synonym job). Identify the fields that
you want to associate with the process (when identifying the fields that IDOL server uses for
synonym matching you should use the format /FieldName to match root-level fields, */FieldName
to match all fields except root-level or /Path/FieldName to match fields that the specified path
points to).
Note: the properties that you create must not have the same name as processes.
For example:
[SynonymMatch]
Property=ApplySynonymMatch
PropertyFieldCSVs=*/DRETITLE,*/DRECONTENT
In this example IDOL server will only return documents for synonym queries, if their DRETITLE or
DRECONTENT field values match the query.

Page 239

Retrieval
4.

List the property that you have created in a [Properties] section.


For example:
[Properties]
0=ApplySynonymMatch

5.

Create a section for the property in which you set the SynonymType parameter to the name of
the synonym job that specifies which settings IDOL server should apply to synonym queries.
[ApplySynonymMatch]
SynonymType=Synonym_job

6.

In the IDOL server configuration file's [Synonym] section:


List the synonym job whose settings you want to apply when a synonym query is sent to IDOL
server. (You can set up multiple jobs, however, normally you only require one).
For example:
[Synonym]
0=Synonym_job

7.

Define a section for your synonym job (the section must have the same name as the synonym job)
in which you specify the settings that you want to apply to synonym queries.
For example:
[Synonym_job]
File=animals.txt
MaxExpandLevel=1

8.

Save the configuration file and restart IDOL server.

Sending a synonym query to IDOL server


Once you have created a synonym file and configured IDOL server to use it, you can turn any Query
action command that you send to IDOL server into a synonym query by adding &Synonym=true to it.
For example:
http://localhost:5552/action=Query&Text=Felix is a great mouser&Synonym=true
This query returns documents that conceptually match the term mouser, as well as documents that
conceptually match any of the terms that have been listed as synonyms for the term mouser in the
synonym file.

Page 240

Retrieval

Combining different query types


You can combine the following query types:

Synonym and Boolean searches


If you combine a synonym query with a Boolean search, IDOL server obeys the Boolean rules while
executing the synonym query.
For example:
http://<host>:<port>/action=Query&Text=cat+AND+dog&Synonym=true
Provided the synonym file lists synonyms for the terms cat and dog, IDOL server only returns
documents that contain both the term cat or one of the synonyms for cat and the term dog or one of
the synonyms for dog (for example, documents that contain "mouser" and "cur", "cat" and "man's best
friend" and so on).

Synonym and Field restrictions


You can use field restrictions within a synonym query in order to only return documents that contain the
synonym within a specific field.
For example:
http://<host>:<port>/action=Query&Text=cat:Title&Synonym=true
Provided the synonym file lists synonyms for the term cat, IDOL server only returns documents that
contain the term cat or one of the synonyms for cat in their Title fields.

Soundex and Proper Names


You can send a proper name query to IDOL server which comprises a SOUNDEX keyword search.
However, as the SOUNDEX keywords are spelled phonetically, IDOL server cannot match proper
names for them.

Page 241

Retrieval

Soundex and Boolean searches


If you combine a Soundex keyword search with a Boolean search, IDOL server obeys the Boolean
rules while executing the Soundex keyword search.
For example:
http://<host>:<port>/action=Query&Text=Munich+AND+SOUNDEX(einstine)
This query only returns documents that contain both the term Munich and a term that phonetically
matches "einstine" (for example, "Einstein").

Soundex and Proximity searches


If you combine a Soundex keyword search with a Proximity search, IDOL server obeys the Proximity
rules while executing the Soundex keyword search.
For example:
http://<host>:<port>/action=Query&Text=Munich+NEAR2+SOUNDEX(einstine)
This query only returns documents in which the term Munich is no more than 2 terms away from a
term that phonetically matches einstine (for example, "Einstein").

Soundex and Field restrictions


You can use field restrictions to restrict a Soundex keyword search to specific fields.
For example:
http://<host>:<port>/action=Query&Text=SOUNDEX(einstine:Name)
This query only returns documents that contain a term that phonetically matches einstine (for example,
"Einstein") in their Name fields.

Page 242

Retrieval

Exact Phrase searches and Boolean searches


If you combine an exact phrase search for multiple phrases (in which the individual phrases are framed
by quotation marks) with a Boolean search, IDOL server obeys the Boolean rules while executing the
exact phrase search. Note that enclosing phrases in brackets ensures that Boolean operators tie to
entire phrases.
For example:
http://<host>:<port>/action=Query&Text=("Egyptian cats")+AND+("Phoenician sailors")
This query only returns documents that match the phrase Egyptian cats as well as the phrase
Phoenician sailors (after stemming and stopping).
For example:
http://<host>:<port>/action=Query&Text=("Egyptian cats")+NOT+("Phoenician sailors")
This query only returns documents that match the phrase Egyptian cats but not the phrase
Phoenician sailors (after stemming and stopping).

Exact Phrase searches and Proximity searches


If you combine an exact phrase search for multiple phrases (in which the individual phrases are framed
by quotation marks) with a Proximity search, IDOL server obeys the Proximity rules while executing
the exact phrase search. Note that enclosing phrases in brackets ensures that Proximity operators tie
to entire phrases.
For example
http://<host>:<port>/action=Query&Text=("Egyptian cats")+ NEAR2+("Phoenician sailors")
This query only returns documents in which a match of the phrase Egyptian cats is no further away
from a match of the phrase Phoenician sailors than 2 words (after stemming and stopping).
http://<host>:<port>/action=Query&Text=("Egyptian cats")+BEFORE+("Phoenician sailors")
This query only returns documents that match the phrase Egyptian cats and the phrase Phoenician
sailors (after stemming and stopping). The phrase Egyptian cats must occur before the phrase
Phoenician sailors.

Page 243

Retrieval

Exact Phrase searches and Field restrictions


You can use field restrictions to restrict an exact phrase search to specific fields.
For example
http://<host>:<port>/action=Query&Text="birds of prey" "bird watching":DRETITLE
This query returns documents that contain a match for birds of prey in any field or a match for bird
watching in the documents' DRETITLE field (after stemming and stopping).

Boolean searches and Proximity searches


Boolean and Proximity operators have the following precedence:

Highest precedence:

NOT
NEAR; DNEAR
AND; BEFORE; AFTER

Lowest precedence:

OR; XOR; EOR; WNEAR

Operators that have the same level of precedence have neither left or right associativity. You can use
brackets to bind terms together as appropriate (note that Proximity operators must have terms on
either side and cannot be adjacent to brackets).

Boolean searches and Field restrictions


If you combine a field restriction keyword search with a Boolean search, IDOL server obeys the
Boolean rules while only returning documents that contain the specified values in the specified fields.
For example
http://<host>:<port>/action=Query&Text=cat:Animal AND dog:Animal:Fauna
This query only returns documents that contain the term cat in their Animal fields and the term dog in
their Animal or Fauna fields.

Page 244

Retrieval

Proximity searches and Field restrictions


If you combine a field restriction keyword search with a Proximity search, IDOL server obeys the
Proximity rules while only returning documents that contain the specified values in the specified fields.
For example
http://<host>:<port>/action=Query&Text=cat NEAR2 dog:Animal
This query only returns documents that contain the term cat and term dog in their Animal fields. The
terms must be no more than 2 terms away from each other.

Page 245

Retrieval

Using wildcards in queries


Note: wildcard matching is carried out after stemming has taken place. You should use wildcard
matching sparingly as it will slow down IDOL server's performance.
You can use the following wildcards in query text and field text queries:

to match one character.

to match zero, one or more characters.

Using wildcards in query text


You can use wildcards within a Query action's Text string.
You can use the Text action parameter to specify field restrictions for result documents, provided that
the fields that you are restricting results to have been stored as Index fields in IDOL server (see
Setting up field indexing on page 67). If you are matching fields, you can use wildcards and match
multiple fields simultaneously by separating them with colons.

Examples:

http://<host>:<port>/action=Query&Text=rollersk*
Wildcard matching is carried out after stemming has taken place. The term "rollerskating", for
example, is stemmed to rollersk when it is indexed into IDOL server.
This means that the query above returns documents that contain any terms that have been
stemmed to rollersk, for example, "rollerskating", "rollerskater", "rollerskate", "rollerskates".
The query http://<host>:<port>/action=Query&Text=rollerskat*, however, would not return
any results.

http://<host>:<port>/action=Query&Text=Mi?rotech
This query returns documents that contain the term "Mikrotech" or "Microtech".

http://<host>:<port>/action=Query&Text="Co*ins":Name:Author+Arm?dale:Title
This query returns documents that contain a Name or Author field whose value matches the
wildcard string Co*ins (for example, "Collins") and documents that contain a Title field whose
value matches the wildcard string Arm?dale (for example "Armadale").

Page 246

Retrieval

Using wildcards in field text queries


If you are using a Query, Suggest or SuggestOnText action to send a field text query to IDOL server
(using the FieldText action parameter), you can use wildcards to match a single string or to match
multiple strings.
Note:

When identifying fields you should use the format /FieldName to match root-level fields,
FieldName to match all fields except root-level or /Path/FieldName to match fields that
the specified path points to.

All string matching is case insensitive.

Matching one or more strings


The field specifier WILD allows you to wildcard match:

one or more strings

one or more strings that comprise several words

one or more strings that contain punctuation

one or more strings that comprise several words and punctuation

Strings can contain punctuation (except curly brackets), which means that if you want to match a string
that contains html with IDOL server content, you may need to escape the html to avoid confusion with
"&" and so on.
If you want to match a string that contains a comma, you need to escape the comma with a backslash,
otherwise IDOL server reads it as a separator.
You can match multiple fields simultaneously by separating them with colons.

Examples:

http://<host>:<port>/action=Query&FieldText=WILD{wom?n }:Clothes
A document's Clothes field must contain a word that matches the specified wildcard string (for
example, "woman" or "women") for this document to be returned as a result.

http://<host>:<port>/action=Query&FieldText=WILD{of mice and m?n }:Title:Book


A document's Title or Book field must contain a string that matches the specified wildcard
string (for example, "of mice and man" or " of mice and men") for this document to be returned
as a result.
Page 247

Retrieval
http://<host>:<port>/action=Query&FieldText=WILD{Glory is fleeting\, but * is
forever}:QuotesNapoleon
A document's QuotesNapoleon field must contain a string that matches the specified wildcard
string (for example, "Glory is fleeting, but obscurity is forever") for this document to be returned
as a result.
http://<host>:<port>/action=Query&FieldText=WILD{*.html,*.htm}:URL
A document's URL field value must end with html or htm for this document to be returned as a
result.
http://<host>:<port>/action=Query&FieldText=WILD{passi*incarnata}:Climbers
A document's Climbers field must contain a phrase that begins with passi and ends with
incarnata (for example, "passionflower incarnata" or "passiflora incarnata") for this document to
be returned as a result.
http://<host>:<port>/action=Query&FieldText=WILD{
passi*incarnata,passi*alata*}:Climbers
A document's Climbers field must contain a string that matches one of the specified wildcard
strings (for example, "passionflower incarnata", "passiflora incarnata", "passionflower alata", "
passiflora alata","passionflower alata shannon" or " passiflora alata shannon") for this
document to be returned as a result.
http://<host>:<port>/action=Query&FieldText=WILD{*www.autonomy.com*.txt,*www.aut
onomy.com*.pdf}:PATH:URL
A document's PATH or URL field must contain a path that contains www.autonomy.com and
ends with .txt or .pdf (for example, "http://www.autonomy.com/files/doc.txt" or "http://
www.autonomy.com/fields/technicalbrief.pdf") for this document to be returned as a result.

Wildcard searches in Japanese, Chinese, Korean and Thai


Oriental languages do not include spaces or word boundaries. For this reason 'sentence breaking' is
applied to Oriental text when it is processed by IDOL server in order to split the text into individual
words or 'terms'.
You can carry out wildcard searches in Japanese, Chinese Korean and Thai, provided you query IDOL
server with one or more terms rather than a single string in which words are not delimited by spaces.
Note that the question mark wildcard may not behave as expected because it represents a single
character, and each Oriental 'letter' actually consists of multiple characters (usually two). For example,
if you want to use a ? single-character wildcard in a multibyte language query, you have to use one ?
character for each byte (for example, ??? for a single Japanese character).

Page 248

Retrieval

Querying for non-alphanumeric characters


Although IDOL server does not store non-alphanumeric characters, you can query for them using field
text queries.
In order to speed up the processing time of the query, you should combine the field text queries with
an ordinary text query, by using the Text parameter and the FieldText parameter as follows:

Text
Specify your entire query string including the non-alphanumeric characters.
Note:

if the string you are searching for comprises an ampersand (&), you must escape it (since it is a
special character used by the query syntax).

If any of the following characters occurs in the middle of the string you are searching for, you must
replace them with a space, unless you have explicitly removed them from the list of characters
that IDOL server uses as separators (using the DiminishSeparators parameter in IDOL server's
configuration file):
~[]*?:()"
Alternatively, you can set the IgnoreSpecials action parameter (which you can set for the Query
and GetQueryTagValues action) to true to instruct IDOL server to interpret the following
characters as normal characters in query syntax:
*?":() and Boolean / Proximity operators AND, NOT, OR, EOR, XOR, NEAR, DNEAR,
WNEAR, BEFORE, AFTER
This disables wildcarding, phrase queries, field restriction and Boolean operations.

FieldText
Use the STRING field specifier to search for your entire query string including the non-alphanumeric
characters in the appropriate field in IDOL server.
Note:

if the string you are searching for comprises an ampersand (&), you must escape it (since it is a
special character used by the query syntax).

if the string you are searching for comprises a comma, you must escape it by prefixing it with a
backslash (\)

Page 249

Retrieval
Examples:
To search for "Auto*":
http://<host>:<port>/action=query&text=Auto&FieldText=STRING{Auto*}:DRECONTENT

To search for "yahoo!":


http://<host>:<port>/action=query&text=yahoo!&FieldText=STRING{yahoo!}:DRECONTENT

To search for "AT&T":


http://<host>:<port>/action=query&text=AT%26T&FieldText=STRING{AT%26T}:DRECONTE
NT

To search for "r-t":


http://<host>:<port>/action=query&text=r-t&FieldText=STRING{r-t}:DRECONTENT

To search for "eat, drink and be merry":


http://<host>:<port>/action=query&text=eat, drink and be merry&FieldText=STRING{eat\,
drink and be merry}:DRECONTENT

To search for "politics [and] their effects":


http://<host>:<port>/action=query&text=politics and their
effects&FieldText=STRING{politics [and] their effects}:DRECONTENT

To search for "*NSYNC":


http://<host>:<port>/action=query&text=*NSYNC&IgnoreSpecials=true

Page 250

Retrieval

Optimizing the retrieval of tagged documents


In order to include one or more attributes that are specific to a document, you can create fields in the
document when you store it in IDOL server. When you send a query to IDOL server, you can then use
any of the fields you have created to restrict which documents IDOL server returns.
For example, you can create fields that contain authors, categories, folders, product types or any other
attribute, and then restrict a query to return only documents that were, for example, written by a
specific author or belong to one or more specific categories.
When a query is restricted in this way, its processing time may increase. In order to ensure that the
processing time is kept to a minimum, you need to use the appropriate query syntax for the fields you
have created.

Query syntaxes
A querys processing time depends on the syntax that the query uses. While different syntaxes are
available, some of them require fields to have been created in a specific way.
Fastest

Syntax:

action=Query&Text=<text>&FieldText=EQUAL{<numerical attribute>}:<field name>

Requires:

that you create a numeric field for each attribute

that you store these fields as numeric fields in IDOL server

Example:

4 attributes are available to indicate which categories a document belongs


to:
England
France
Germany
USA
Numbers are used to indicate which attributes apply to a document:
1 - England
2 - France
3 - Germany
4 - USA
In documents a numeric field is created for each category that they belong
to. For example, if a document belongs to the categories France and USA:
#DREFIELD Cat1=2
#DREFIELD Cat2=4
The following query returns documents that match the specified Text and
contain the value 4 in one of their Cat fields (for example, Cat1). The value
4 indicates that the documents belong to the USA category.
action=Query&Text=presidential election&FieldText=EQUAL{4}:Cat*

Page 251

Retrieval

Syntax:

action=Query&Text=<text>&FieldText=MATCH{<attribute>}:<field name>

Requires:

Example:

4 attributes are available to indicate which categories a document belongs


to:

that you create a field for each attribute

England
France
Germany
USA
In documents a field is created for each category that they belong to. For
example, if a document belongs to the categories France and USA:
#DREFIELD Cat1=France
#DREFIELD Cat2=USA
The following query returns documents that match the specified Text and
contain the value France in one of their Cat fields (for example, Cat1).
action=Query&Text=presidential election&FieldText=MATCH{France}:Cat*
Syntax:

action=Query&Text=<text>&FieldText=BITAND{<numerical attribute>}:<field name>

Requires:

that you assign each attribute a bit

that you create a numeric field for all attributes (if you have more than
32 bits, you need more fields)

that you store this field as a numeric field in IDOL server

Example:

4 attributes are available to indicate which categories a document belongs


to:
England
France
Germany
USA
A bit value is assigned to each attribute:
1 - France
2 - England
4 - Germany
8 - USA
In documents a field is created for the categories that they belong to. For
example, if a document belongs to the categories France and USA:
#DREFIELD Cat=9
The following query returns documents that match the specified Text and
contain the value 10 in its Cat field. The value 10 indicates that documents
must belong to the England or the USA category.
action=Query&Text=presidential election&FieldText=BITAND{10}:Cat

Page 252

Retrieval

Slowest

Syntax:

action=Query&Text=<text>&FieldText=STRING{<attribute>}:<field name>

Requires:

Example:

4 attributes are available to indicate which categories a document belongs


to:

that you create a field that contains a CSV of the documents


attributes

England
France
Germany
USA
In documents a field is created that contains a CSV of the categories that
the documents belong to. For example, if a document belongs to the
categories France and USA:
#DREFIELD Cat=France,USA
The following query returns documents that match the specified Text and
contain the value France in their Cat field.
action=Query&Text=presidential election&FieldText=STRING{France}:Cat

Page 253

Retrieval

Page 254

24. Spelling correction


IDOL server can automatically spell check query text that it receives and suggest correct spelling for
terms that it doesnt contain.
For example, if you submit a query for "Ludwig von Beethofen", IDOL server returns matching results,
a field that suggests a correct spelling for the misspelled term "Beethofen" and a field that contains the
original query in corrected form. Note that if IDOL server contains a match for an incorrectly spelled
term, it will return this match in addition to the spelling suggestion.
If a query contains several words that IDOL server doesn't contain, it returns a spelling suggestion for
each of them in a comma-separated list.

To set up spelling correction:


1.

In the IDOL server configuration file, configure the following settings in the [Server] section:
SpellCheckCorrectMinDocOccs
The minimum number of documents that a term has to appear in before IDOL server can use
it as a spell check suggestion.
SpellCheckIncorrectMaxDocOccs
The maximum number of documents that a term can appear in for IDOL server to search for a
spelling correction for it.
SpellCheckMaxCheckTerms
IDOL server's spelling correction has no effect on queries that comprise more than the
specified number of non-stopword terms (ProperName and hyphenated terms are also
ignored).

2.

When executing a query, add Spellcheck=true to the query string.


For example:
http://<host>:<port>/action=Query&Text=Beethofen and Mozart&Spellcheck=true
IDOL stems the query text and then checks if it can match any of the resulting terms. In this case,
the term Beethofen would not occur in any documents at all or only in a small number of
documents. If the number of documents that a term occurs in lies between the specified
SpellCheckCorrectMinDocOccs and SpellCheckCorrectMaxDocOccs, IDOL server generates
a spelling suggestion for this term and the original query in corrected form.
Note that IDOL server can only suggest terms that its Data index contains.

Page 255

Spelling correction

Page 256

25. Summarization
IDOL server can automatically generate one of the following summary types for the results it produces.
All summaries are generated in real time.

Concept
A conceptual summary of each result document. A concept summary comprises sentences that
are typical of the result's content (these sentences can be from different parts of the result
document).

Context
A conceptual summary of each result document that is biased by the terms in the querys Text
and/or FieldText. A context summary comprises sentences that are particularly relevant to the
terms in the query (these sentences can be from different parts of the result document).

Quick
A brief summary of each result document. A quick summary comprises the first few sentences
of the result document.

ParagraphConcept
A conceptual summary of each result document which comprises the paragraphs that are most
typical of the result's content (these paragraphs can be from different parts of the result
document).

ParagraphContext
A conceptual summary of each result document that is biased by the terms in the query Text
and/or FieldText. This summary comprises paragraphs that are particularly relevant to the
terms in the query.

Page 257

Summarization

Returning summaries with query action results


To generate summaries for query action results:
1.

Send a Query, Suggest or SuggestOnText action to IDOL server that includes the Summary
parameter. Set the Summary parameter to the type of summary that you want to return for results
(Concept, Context, Quick, ParagraphConcept or ParagraphContext).
For example:
http://<host>:<port>/action=Query&Text=Undulant fever&Summary=Concept
Each result of this query that is returned with a conceptual summary.

2.

You can optionally set the following settings in the IDOL server configuration file depending on
which type of summary you want to generate:
For Concept, Context or Quick summaries
In the [Summary] section, use the SourceFields parameter to specify the fields from which the
summary should be generated.
For Concept or Context summaries
In the [Summary] section, set the MinWordsPerSentence parameter to the minimum number
of words that a sentence must comprise in order to be considered as a sentence that can be
used in the summary.
For Context summaries
In the [Server] section, set the ContextSummaryQueryTermWeight parameter to the weight
that should be used for the terms in the user's query. The context summary will give this weight
to sentences that contain terms in common with the query text. The other terms will be given
their APCM weight.

3.

Save the IDOL server configuration file and restart IDOL server for your configuration changes to
take effect.

Page 258

Summarization

Summarizing text or documents


To generate summaries for text or documents:
Send a Summarize action to IDOL server that includes the Summary parameter. Set the Summary
parameter to the type of summary that you want to generate (Concept, Context, Quick,
ParagraphConcept or ParagraphContext).
If you want to generate a summary for text, you need to use the Text parameter to specify it. If you
want to generate a summary for one or more documents, you need to use the ID or Reference
parameter to identify them.
For example:
http://<host>:<port>/action=Summarize&ID=30&Summary=Concept
In this example a concept summary is generated from the content of the document with the ID 30.

Page 259

Summarization

Page 260

26. Taxonomy generation


IDOL server's taxonomy generation feature allows you to automatically create hierarchical contextual
taxonomies of clusters or other information. This provides you with an overview of the "information"
landscape and an insight into specific areas of the information.

Generating taxonomies
The TaxonomyGenerate action allows you to generate a hierarchical taxonomy from one or more
clusters (see Clustering on page 151 for details on how to generate clusters) or query results.
The taxonomy generator adapts the Bayesian and information theoretic methods to concept selection.
Bayesian algorithms are applied to identify statistical relationships between concepts and sets of
concepts (at the document and document set level), which are then filtered to form the hierarchic
structure of the final taxonomy.
You can write the taxonomy to disk as a directory structure, or import the taxonomy into the category
hierarchy.
Note that before you create a taxonomy from an IDOL server, you must make sure that IDOL server
does not contain duplicate documents or text that is repeated in multiple documents (for example,
document headers). Ensure that these are stripped out at the import stage in order to gain optimal
results.
You can set up a schedule that executes the TaxonomyGenerate action in regular intervals.

Page 261

Taxonomy generation

Generating a taxonomy from clusters


Use the TaxonomyGenerate action command with the SourceJobName and Cluster parameter in
order to generate a taxonomy from one or more clusters.
For example:
http://<host>:<port>/action=TaxonomyGenerate&SourceJobname=Taxonomy1&Cluster=0,1
In this example, IDOL server is instructed to generate a taxonomy from the Taxonomy1 cluster.

Generating a taxonomy from query results


Use the TaxonomyGenerate action command with the SourceJobName and Cluster parameter in
order to generate a taxonomy from one or more clusters.
For example:
http://<host>:<port>/action=TaxonomyGenerate&DREQuery=new+tax+cuts
In this example, IDOL server is instructed to generate a taxonomy from the results that it returns from it
Data index for the query new tax cuts.

Scheduling taxonomy generation


You can set up a schedule in order to run the TaxonomyGenerate action in regular intervals.
Please see Setting up schedules on page 160 for details.

Page 262

Results

28. Results
Relevance ranking
In evaluating all types of queries, IDOL server employs complex algorithms based on a combination of
Information Theory and Bayesian methods to weight and rank the document returns by statistical
relevance. In doing so it makes use of information theoretic values calculated dynamically for all
concepts on indexing, allowing relevance to be evaluated both as a percentage, and in the case of
agents, as absolute values.
In practice, the relevance can be seen as a measure of the conceptual overlap between the query text
and the text within a document. This can be affected in several ways; certain fields can be given extra
weight by associating a weighting factor with them at indexing time. For example, extra weight can be
given when query terms appear in a document's title as opposed to the body of the text.

Page 265

Results

Manipulating the relevance of query results


You can boost the percentage relevance that is given to query results:

by setting up a field process


You can set up a field process in IDOL servers configuration file that boosts the percentage
relevance of a query's results according to the number of times terms in specified fields match
the query's terms.

using BIAS
You can use the BIAS field specifier at query time to boost the percentage relevance of a
query's results according to the numerical proximity of a specified field to a given value.

using multipliers
You can use multiply the weight of individual query terms in order to boost the relevance of
results that match these terms accordingly.

Setting up a field process to boost result relevance


You can set up a field process that identifies specific fields in documents and manipulates the weight of
terms in these fields if they match a query's terms.
For example, if you want to boost the weight of results that contain a query's terms in their DRETITLE
and SUMMARIES field, you can do the following:
1.

For each field that whose content you want to use to determine if a result's weight is boosted, list
a process that indexes the field and manipulates its term weights in the [FieldProcessing]
section. Note that if you want to boost terms in several fields by the same factor, you only need to
create one process for this.
For example:
[FieldProcessing]
Number=2
0=IndexAndWeightHigher1
1=IndexAndWeightHigher2

Page 266

Results
2.

Create a section for each of the processes that you have listed, in which you create a property for
the process (a property is later defined by one or more applicable configuration parameters).
Identify the fields that you want to associate with the processes.
Note: the properties that you create must not have the same name as processes.
For example:
[IndexAndWeightHigher1]
Property=IndexHigherWeight1
PropertyFieldCSVs=*/DRETITLE
[IndexAndWeightHigher2]
Property=IndexHigherWeight2
PropertyFieldCSVs=*/SUMMARY

3.

List all the properties that you have created in a [Properties] section.
For example:
[Properties]
0=IndexHigherWeight1
1=IndexHigherWeight2
Create a section for each of the properties and specify configuration settings for each. The Index
parameter ensures that the fields that are associated with the field process are indexed, while the
Weight parameter determines the factor by which terms in the associated PropertyFieldCSVs
fields are boosted if they match query terms.
For example:
[IndexHigherWeight1]
Index=true
Weight=4
[IndexHigherWeight2]
Index=true
Weight=2
Save the configuration file and restart IDOL server. When you send a query to IDOL server, the
percentages that indicate the results' conceptual similarity to the query will now be affected by
how many times a result's SUMMARY and DRETITLE field terms match the query's terms.
For example:
If you send the following query to IDOL server, results whose SUMMARY and DRETITLE field
matches the query's terms "cat" and "dog" are boosted:
http://<IP_address>:<port>/action=query&text=cat and dog

Page 267

Results
This means that the following results would be returned in the following order:
Result 1
Title = Cats & Dogs
Summary = Cats and dogs duke it out in this live action feature about a professor on the brink
of discovering a cure for dog allergies. The dogs assign an agent to protect the professor and
his family from a feline invasion
Content = Unbeknownst to humans, dogs have fought for thousands of years to keep mankind
from falling under the rule of cats. Using combinations of live animals, animatronic puppets,
and digital wizardry, this film has just enough imagination to match its effects, climaxing with a
feline global-domination scheme involving mice sprayed with chemicals that will make all
humans allergic to their canine friends.
Result 2
Title = Garfield
Summary = Garfield comes to life in an all new live action major motion picture.
Content = Garfield is a fat cat. A cat that eats lots of Lasagne. A cat that is lazy and sleeps as
much as possible. Nevertheless, Garfield is a clever cat, always able to outwit his owner, Jon
and the neighbor's dog, Odie. Garfield is a cool and sarcastic cat but he is also a cat with a
heart as is shown when he comes to the rescue of Odie the dog, in the movie that is coming
out this year. The hapless pup disappears and is kidnapped by a nasty dog trainer, and
Garfield feels responsible. Pulling himself away from the TV, Garfield springs into action.
Maybe it's friendship for cat and dog after all.
Result 3
Title = Tom and Jerry : The movie
Summary = The celebrated cat and mouse team meets a young run-away who desperately
needs their help to find her missing father. Along the way they run into her evil Aunt who tosses
them into a pet prison. Bonding together, Tom & Jerry outwit the Aunt and mastermind a great
escape to set off on the wildest adventures of their cat and mouse careers.
Content = The popular animated duo team up again to appear this time on the big screen.
Homeless, the 'toons end up helping out a young girl who stays with a nasty auntie while she is
separated from her father. Will the young Robyn be reunited with her loving father? Will the odd
pair make it on the streets? Will they find a home? Those are some of the burning questions
that may plague the minds of young viewers of this fun adventure.
If the weight of the SUMMARY and DRETITLE field had not been boosted, Result 2 would have been
the top result, with Result 1 following in second place. Note that Result 3 is not ranked higher than
Result 2. Although its weight is slightly boosted because its SUMMARY field contains one of the query
terms, this boost is not sufficient to outrank Result 2 whose SUMMARY and DRETITLE field does not
contain any of the query's terms (the conjunction "and" in the query text is stripped before matching).

Page 268

Results

Using the BIAS field specifier to boost result relevance


The BIAS field specifier allows you to bias the score of results at query time according to the numerical
proximity of the specified field to a given value. Initial $, or - characters in the field are ignored.
Specify BIAS using the format:

BIAS{<optimum>,<range>,<percentage>}
<optimum>
The value that the specified field must contain to increase or decrease the result's weight by the
maximum percentage.
<range>
A positive number that determines the range of the specified optimum. If the specified field
contains a value that is in the range of (optimum - range) to (optimum + range), the result's
weight is increased or decreased according to the specified percentage.
<percentage>
A percentage in the range -100 to 100. If the value of the specified field is within the specified
range, the score of the result is increased or decreased according to how close the value is to
the specified optimum.
For example:

http://<IP_address>:<port>/action=Query&FieldText=BIAS{100,50,10}:*/PRICE
A document whose PRICE field value is within the range 50 either side of 100 will have its
weight increased on a linear scale from 10% if the price is 100, to 0% if the price is 50 or 150:

Page 269

Results
http://<IP_address>:<port>/action=Query&FieldText=BIAS{100,50,-10}:*/PRICE
A document whose PRICE field value is within the range 50 either side of 100 will have its
weight decreased on a linear scale from -10% if the price is 100, to -0% if the price is 50 or 150:

Note:
You can also use the BIAS field specifier to bias the score of results according to the numerical
proximity in their autn_date meta field (see Meta fields on page 301) to a given value.
For example:
FieldText=BIAS{1103918400,259200,25}:autn_date
A document whose autn_date field value is within the range 259200 either side of 1103918400
will have its weight increased on a linear scale from 25% if the price is 1103918400, to 0%, if
the date is 1103659200 or 1104177600.

Page 270

Results

Using multipliers to boost result relevance


You can add multipliers to individual query terms in order to boost the relevance of results that match
these terms accordingly.
If you want to use a multiplier with a query term, you need to use the following format:
<query_term>[*<N>]
<query_term>
The query term whose weight you want to multiply.
<N>
The factor by which you want to multiply the specified query terms weight. This can be any
positive number.

For example:
http:<host>:<ACI_port>/action=Query&Text=bread[*2.5]+brown+loaf
In this example, the weight of the query term bread is multiplied by 2.5 while the weight of the
query terms brown and loaf does not change.
When results are returned for the query, the relevance of documents that contain the term
bread is boosted relative to those that do not.

http:<host>:<ACI_port>/action=Query&Text=SOUNDEX(bred)+bred[*4]
In this example, a supermarket wants to ensure that a customers online search for bread
returns appropriate results. The supermarket has found that customers tend to misspell "bread"
as bred. If a customer queries for "bread", appropriate results are returned as usual. If a
customer queries for bred, the term is submitted twice - once as a Soundex keyword search
(see Soundex keyword searches on page 237) and once with a multiplier. This ensures that if
results exist that match bred (for example, a new CD by a band called bred), they are returned
with a higher relevance than results that match bred phonetically.

Similarly, multipliers can be used to reduce the influence of individual query terms.
For example:
http:<host>:<ACI_port>/action=Query&Text=cat[*0.5]+dog
In this example, the weight of the query term cat is halved by multiplying it by 0.5 while the
weight of the query terms dog does not change.
When results are returned for the query, the relevance of documents that contain the term cat is
reduced relative to those that do not.

Page 271

Results

Using Reference fields to filter results at query time


When you send a Query, Suggest and SuggestOnText action to IDOL server, you can use the
following parameters to filter the query's results:

the MatchReference action parameter

the DontMatchReference action parameter

the Combine action parameter

MatchReference
The MatchReference action parameter allows you to specify one or more references that a
document's Reference field must match for the document to be returned as a result.
For example:
http://<host>:<port>/action=Query&Text=Bayes&MatchReference=http://
www.autonomy.com/Content/Technology.html
This query only returns documents that have a Reference field with the value
http://www.autonomy.com/Content/Technology.html.

DontMatchReference
The DontMatchReference action parameter allows you to specify one or more references that a
document's Reference field must not match for the document to be returned as a result.
For example:
http://<host>:<port>/action=Query&Text=Bayes&DontMatchReference=http://
www.autonomy.com/Content/Technology.html
This query only returns documents if they don't have a Reference field with the value
http://www.autonomy.com/Content/Technology.html.

Combine
The Combine action parameter allows you to ensure that, if several results derive from the same
document or contain the same content or the same value in a specific Reference field, only one of
these results is displayed (by default this is the result with the highest relevance, however, you can use
the Sort action parameter to set alternative sorting methods).

Page 272

Results
You can set Combine to one of the following:
Simple
This is the recommended Combine option.
When very long texts are indexed into IDOL server, they are by default broken up into sections and
then indexed as individual documents (each document has its own ID but they all have the same
document reference). This makes the indexing process more stable and ensures that when you
query IDOL server, the most relevant section of a text is returned (rather than, for example, an
entire book). However, if several sections are relevant to the query, each of them is returned as a
result. This means that a query can return multiple results that have the same document reference
and belong to the same text, for example, different pages that belong to the same book (if you
displayed each of these results using Print=AllSections you would receive the same text every
time).
You can prevent IDOL server from returning different sections of the same source text by adding
Combine=Simple to the query. IDOL server will only display the section that has the highest
conceptual similarity to the query (unless you add Print=AllSections to the query, in which case
the entire source text would be displayed). If multiple sections have the same conceptual
relevance, IDOL server returns the one with the lowest section number.
For example:
http://<host>:<port>/action=Query&Text=The Moonstone&Combine=Simple
In this example, if several results derive from the same source text, only the result that has the
highest relevance to the query's text is displayed.
FieldCheck
Results are combined based on the hash value of their FieldCheckType field (see
FieldCheckType fields on page 291). The FieldCheckType field holds a value that is frequently
used to restrict results (for example, a field that stores category names). When a FieldCheckType
field is indexed, IDOL server stores it in a fast-look-up table in memory, so it can be returned
quickly.
Note: if you set URLAnalysis to true in your IDOL server configuration files [Server] section, you
cannot identify a field as a FieldCheckType field, as IDOL server automatically uses the domain of
the URL it finds in the documents Reference fields as the FieldCheck value.
<reference_fields>
A plus, space or comma separated list of Reference fields. If a query produces several results that
contain the same value in one or more of the specified Reference fields, IDOL server only returns
the most relevant result. If several results have the same relevance, the result with the highest
DOCID is returned (unless a Sort option has been enabled that overrides this).
For example:
http://<host>:<port>/action=Query&Text=The Moonstone&Combine=DRETITLE
In this example, if several results contain the same value in the DRETITLE field, only the result that
has the highest relevance to the query's text is displayed.

Page 273

Results
Note:
When you instruct IDOL server to combine using a specific Reference field, it automatically uses
any field that is listed for PropertyFieldCSVs alongside this Reference field in IDOL server
configuration file to combine as well. To ensure that IDOL server only combines using the specified
field, you can set up an individual process (See Processing fields and documents that contain
specific fields on page 281) that identifies this field as a Reference field. If you want to combine
using multiple Reference fields, it can be useful to set up a separate process that identifies each of
these fields as Reference fields.
For example:
[SetupReferenceFields]
Property=ReferenceFields
PropertyFieldCSVs=*/DREREFERENCE,*/url
[CombineField1]
Property=ReferenceFields
PropertyFieldCSVs=*/DRETITLE
[CombineField2]
Property=ReferenceFields
PropertyFieldCSVs=*/CombineField
If you instructed IDOL server to combine using the DRETITLE and CombineField fields and they
were listed alongside the DREREFERENCE and url field in the [SetupReferenceFields] section,
IDOL server would automatically use the DREREFERENCE and url fields to combine as well.
Note:
You can combine the Simple and FieldCheck options, in which case you must specify Simple first.
For example:
Combine=Simple+FieldCheck
If you set Combine to <reference_fields>, you cannot combine the fields with another Combine
option.

Page 274

Results

Displaying additional fields with results


When IDOL server returns results, it displays by default only the results reference, ID, section, weight,
links, database and title fields.
You can display additional fields for results using the following methods:

Configure IDOL server to automatically display additional fields

Display additional fields for individual queries

Configure IDOL server to automatically display additional


fields
To configure IDOL server to automatically display additional fields:
1.

Open IDOL servers configuration file in a text editor.

2.

List a print fields process in the [FieldProcessing] section.


For example:
[FieldProcessing]
Number=2
0=MyFirstProcess
1=PrintFields

3.

Create a section for the print fields process that you have listed, in which you create a property for
the process (a property is later defined by one or more applicable configuration parameters).
Identify the fields that you want to associate with the process.
Note: the properties that you create must not have the same name as processes.
For example:
[PrintFields]
Property=Print
PropertyFieldCSVs=*/AUTHOR,*/TITLE,*/ISBN

Page 275

Results
4.

List the property that you have created in the [Properties] section.
For example:
[Properties]
0=MyFirstProperty
1=Print

5.

Create a section for the property in which you set the PrintType parameter to true. This displays
the associated PropertyFieldCSVs fields for query results.
For example:
[Print]
PrintType=true

6.

Save and close IDOL servers configuration file, and restart IDOL server to execute your changes.

Display additional fields for individual queries


If you are sending a Query, Suggest, SuggestOnText or GetContent query to IDOL server, you can
display additional fields for results by adding one of the following action parameters to the query string:
Print
Allows you to specify the type of field that you want to display in addition to the fields that IDOL
server displays by default (that is the fields it displays out-of-the-box and the fields that IDOL
server has been configured to display automatically). Please refer to the IDOL server online
help for details on the available field types (see Displaying online help on page 61).
For example:
http://12.3.4.56:4000/action=Query&Text=Hogwarts school of witchcraft and
wizardry&Print=Index
This query returns results that are conceptually similar to the specified query text. Each result is
returned with fields that have been set up as Index fields in IDOL servers configuration file.
PrintFields
Allows you to specify the fields that you want to display for all results. This overrides any fields
that have been set up in IDOL servers configuration file to be automatically displayed for all
results (the fields that IDOL server displays out-of-the-box are still displayed).
For example:
http://12.3.4.56:4000/action=Query&Text=Hogwarts school of witchcraft and
wizardry&PrintFields=Author,Title
This query returns results that are conceptually similar to the specified query text. Each result is
returned with its Author and Title fields in addition to the fields that IDOL server displays outof-the-box.

Page 276

Fields

30. Fields
Data is passed to IDOL server (for example, from Autonomy Connectors) in the form of IDX or XML
fields. IDOL server stores all the fields that it receives, so that you can search any of the fields using
field text queries. However, in order to make sure that IDOL servers performance is optimized, you
need to determine how it should process and store the fields it receives (see Setting up field
indexing on page 67).
This is done through IDOL servers configuration file where you can associate some fields with special
properties, for example, in order to instruct IDOL server to treat these fields (or documents that contain
them) in a specific way or read specific information from them. Note that you can associate a field with
more than one property, provided the properties dont clash.
You can associate fields with the following properties:

ACLType

Fields that hold ACLs (Access Control Lists).

DatabaseType

Fields that hold the database that documents belongs to.

DateType

Fields that hold the date of documents.

DocumentTrackingType

Fields that hold the tracking IDs of documents.

ExpireDateType

Fields that hold the expiry dates of documents.

FieldCheckType

A field that occurs in a large number of documents and holds a


value that is frequently used to restrict query results.

FlattenIndexType

Fields that originate from hierarchically structured documents


and whose content is stored as one level.

HiddenType

Fields whose content is hidden.

HighlightType

If fields contain terms that match a query, these terms are


highlighted (see Highlight fields on page 297).

Index

Fields that are stored as Index fields (see Index fields on


page 285).

InvertedAgentType

Fields that are contained within inverted agents.

LanguageType

Fields that hold the language type of documents.

NumericDateType

Fields that hold numeric dates which should be memory


mapped (see NumericDateType fields on page 287).

NumericType

Fields that hold numeric data which should be memory


mapped (see Numerical fields on page 289).

Page 279

Fields

ParametricType

Fields that hold parametric values.

PrintType

Fields whose content is displayed with results, if the query


actions Print parameter has been set to Fields.

ReferenceType

Fields that hold document references (see Reference


fields on page 293).

SectionBreakType

Fields that hold the section number of documents that have


been split up by the Import module.

SecurityType

The security type of documents that contain associated fields.

SourceType

Fields that are used to generate summaries and to suggest


conceptually similar documents.

SynonymType

Field that hold the name of the synonym job whose settings
apply to documents that contain associated fields.

TitleType

Fields that hold document titles.

TrimSpaces

Fields from which multiple, leading or trailing spaces should be


removed before they are stored in IDOL server.

Weight

The factor by which the weight of terms in associated fields is


increased if they match query terms.

Please refer to the IDOL server online help (see Displaying online help on page 61) for further
details on properties settings that identify the field types.
For details on how to associate properties with fields, please refer to Processing fields and
documents that contain specific fields on page 281.

Page 280

Fields

Processing fields and documents that contain


specific fields
The [FieldProcessing] section in IDOL server's configuration file allows you to identify particular fields
in documents and, depending on their value, apply any type of processing to them or the document
that contains them during the indexing process.
This means that you can apply multiple processes to documents without having to set up a
configuration section for each process combination.
Note: when identifying fields you should use the format /FieldName to match root-level fields,
*/FieldName to match all fields except root-level or /Path/FieldName to match fields that the specified
path points to.

To apply processes to specific fields or documents that contain specific fields:


1.

List the processes that you want to apply to fields in the [FieldProcessing] section.
For example:
[FieldProcessing]
Number=4
0=MyFirstProcess
1=IndexFields
2=MyCombinedProcess
3=IndexAndWeightHigher

2.

Create a section for each of the processes that you have listed, in which you create a property for
the process (a property is later defined by one or more applicable configuration parameters).
Identify the fields that you want to associate with the processes.
You can use the PropertyMatch parameter to identify a specific value that fields must have in
order to be processed (this is useful if you are setting up a process that identifies security or
language fields).
Note: the properties that you create must not have the same name as processes.
For example:
[MyFirstProcess]
Property=MyFirstProperty
PropertyFieldCSVs=*/MyField,*/MySecondField
PropertyMatch=*myString*

Page 281

Fields
[IndexFields]
Property=MySecondProperty
PropertyFieldCSVs=*/DRECONTENT,*/DRETITLE
[MyCombinedProcess]
Property=MyCombinedProperty
PropertyFieldCSVs=*/MyDateField,*/MyIndexField
[IndexAndWeightHigher]
Property=IndexHigherWeight
PropertyFieldCSVs=*/SUMMARIES
3.

List all the properties that you have created in a [Properties] section.
For example:
[Properties]
0=MyFirstProperty
1=MySecondProperty
2=MyCombinedProperty
3=IndexHigherWeight

4.

Create a section for each of the properties and specify appropriate configuration settings for each.
These configuration parameters define the processes that are applied to all the fields (or all
documents that contain the fields) that you have previously associated with the processes.
For example:
[MyFirstProperty]
HiddenType=true
[MySecondProperty]
Index=true
[MyCombinedProperty]
DateType=true
Index=true
[IndexHigherWeight]
Index=true
Weight=2

Note: for details on available configuration settings please refer to IDOL server's configuration online
help (See Displaying help on configuration settings on page 389).

Page 282

Fields
Example:
[FieldProcessing]
Number=6
0=IndexFields
1=IndexAndWeightHigher
2=SectionBreakFields
3=DateFields
4=DatabaseFields
5=SetReferenceFields
[IndexFields]
// Controls which fields are indexed
Property=Index
PropertyFieldCSVs=*/DRECONTENT,*/DRETITLE
[IndexAndWeightHigher]
// Fields which are indexed with a weight
Property=IndexWeight
PropertyFieldCSVs=*/SUMMARIES
[SectionBreakFields]
// Field containing document section number
Property=Section
PropertyFieldCSVs=*/DRESECTION
[DateFields]
// Fields containing the document date
Property=Date
PropertyFieldCSVs=*/DREDATE,*/harvest_time
[DatabaseFields]
// CSV of field names that define the document's database
Property=Database
PropertyFieldCSVs=*/DREDBNAME
[SetReferenceFields]
//CSV of fields that define the document's URL
Property=Reference
PropertyFieldCSVs=*/DREREFERENCE,*/DRETITLE

Page 283

Fields
//---------------------------Properties----------------------//
[Properties]
0=Index
1=IndexWeight
2=Section
3=Date
4=Database
5=Reference
[Index]
Index=TRUE
[IndexWeight]
Index=TRUE
Weight=2
[Section]
SectionBreakType=TRUE
[Date]
DateType=TRUE
[Database]
DatabaseType=TRUE
[Reference]
ReferenceType=TRUE
TrimSpaces=TRUE

Page 284

Fields

Index fields
You should store fields that contain text which you want to query frequently as Index fields. Index fields
are processed linguistically when they are stored in IDOL server. This means that stemming and
stoplists are applied to text in Index field before they are stored, which allows IDOL server to process
queries for these fields more quickly (typically DRETITLE and DRECONTENT are fields that should be
set up as Index fields).
You should not store URLs or content that you are unlikely to use in Index fields. You should also not
store fields as Index fields that will be queried frequently but whose values are only ever going to be
queried in their entirety. It is more efficient to query such values using a field specifier (for example,
MATCH).
Also, you should not store fields that contain numeric values or dates as index fileds. Instead store
these fields as numerical fields and numeric date type fields (see Numerical fields on page 289 and
NumericDateType fields on page 287).

Setting up Index fields


1.

Open IDOL servers configuration file in a text editor.

2.

List an indexing process in the [FieldProcessing] section.


For example:
[FieldProcessing]
Number=3
0=MyFirstProcess
1=MySecondProcess
2=IndexingFields

3.

Create a section for the indexing process, in which you create a property for the process (a
property is later defined by one or more applicable configuration parameters). Identify the fields
that you want to associate with the process.
You can use the PropertyMatch parameter to identify a specific value that fields must have in
order to be processed.
Note: the properties that you create must not have the same name as processes.
For example:
[MyFirstProcess]
Property=MyFirstProperty
PropertyFieldCSVs=*/MyField,*/MySecondField
PropertyMatch=*myString*

Page 285

Fields
[MySecondProcess]
Property=MySecondProperty
PropertyFieldCSVs=*/MyOtherField,*/MyOtherSecondField
[IndexingFields]
Property=IndexFields
PropertyFieldCSVs=*/DRECONTENT,*/DRETITLE
4.

List the properties that you have created in a [Properties] section.


For example:
[Properties]
0=MyFirstProperty
1=MySecondProperty
2=IndexFields

5.

Create a section for your indexing property in which you set the Index parameter to true.
For example:
[MyFirstProperty]
HiddenType=true
[MySecondProperty]
Index=true
[IndexFields]
Index=true

6.

Save IDOL servers configuration file and restart your IDOL server in order to execute your
changes.

Page 286

Fields

NumericDateType fields
You can configure IDOL server to identify fields that contain dates. When these fields are indexed,
IDOL server stores them in a fast lookup table in memory, so it can quickly return the fields.
IDOL server converts dates to numerical values (epoch seconds) and identifies the fields that contain
the numerical date values.

Setting up memory mapping for numerical date fields


1.

Open IDOL server's configuration file in a text editor.

2.

List a process that identifies numerical date fields in the [FieldProcessing] section.
For example:
[FieldProcessing]
Number=2
0=MyFirstProcess
1=NumericDateFields

3.

Create a section for each process that you have listed, in which you create a property for it (a
property is later defined by one or more applicable configuration parameters). Identify the fields
that you want to associate with the process.
Note: the properties that you create must not have the same name as processes.
For example:
[MyFirstProcess]
Property=MyProperty
PropertyFieldCSVs=*/MyField,*/MyOtherField
[NumericDateFields]
Property=NumDate
PropertyFieldCSVs=*/BIRTHDAY,*/STARTDATE

4.

List the property that you have created in the [Properties] section.
For example:
[Properties]
0=MyProperty
1=NumDate

Page 287

Fields
5.

Create a section for the property in which you set the NumericDateType parameter to true. This
enables IDOL server to memory map the associated PropertyFieldCSVs fields, and identify them
as fields that contain date values.
For example:
[NumDate]
NumericDateType=true

6.

Save IDOL server's configuration file and restart IDOL server to execute your changes.

If you now send a query for a specific value that is stored in the BIRTHDAY field, IDOL server will
memory map the range that this value is in, so it can return results more quickly next time a value that
lies in this range is queried.
Example:
http://12.3.4.56:4000/action=Query&FieldText=RANGE{01/01/1980,31/12/1980}:BIRTHDAY
A document's BIRTHDAY field must contain a numerical date value that is between 01/01/1980 and
31/12/1980 for this document to be returned.

Page 288

Fields

Numerical fields
You can configure IDOL server to identify fields that contain numerical values. When these fields are
indexed, IDOL server stores them in a fast-look-up table in memory, so it can quickly return the field.
Note that a numerical field can contain a comma-separated list of numbers, each of which will be
stored as a numeric value for this field, for this document.

Setting up numerical fields to speed up numerical queries


1.

Open IDOL server's configuration file in a text editor.

2.

List a process that identifies numerical fields in the [FieldProcessing] section.


For example:
[FieldProcessing]
Number=2
0=MyFirstProcess
1=PriceFields

3.

Create a section for each process that you have listed, in which you create a property for it (a
property is later defined by one or more applicable configuration parameters). Identify the fields
that you want to associate with the process.
Note: the properties that you create must not have the same name as processes.
For example:
[MyFirstProcess]
Property=MyProperty
PropertyFieldCSVs=*/MyField,*/MyOtherField
[PriceFields]
Property=Price
PropertyFieldCSVs=*/PRICE

4.

List the property that you have created in the [Properties] section.
For example:
[Properties]
0=MyProperty
1=Price

Page 289

Fields
5.

Create a section for the property in which you set the NumericType parameter to true. This
enables IDOL server to memory map the associated PropertyFieldCSVs fields.
For example:
[Price]
NumericType=true

6.

Save IDOL server's configuration file and restart IDOL server to execute your changes.

If you now send a query for a specific value that is stored in the PRICE field, IDOL server will memory
map the range that this value is in, so it can return results more quickly next time a value that lies in
this range is queried.
Examples:
http://12.3.4.56:4000/action=Query&FieldText=NRANGE{50,100}:PRICE
A document's PRICE field must contain a number between 50 and 100 (including decimal numbers)
for this document to be returned.

http://12.3.4.56:4000/action=Query&Text=computer&Sort=PRICE:numberincreasing
The results that IDOL server returns for the query are sorted according to the values they their PRICE
fields contain. The results whose PRICE field contains the smallest value is listed first, followed by
results with increasing values in the PRICE field.

Page 290

Fields

FieldCheckType fields
You can configure IDOL server to identify a field contained in a large number of documents whose
entire value is frequently used to restrict results (for example, a field that stores category names).
When this field is indexed, IDOL server stores it in a fast-look-up table in memory, so it can quickly
return the field.
Note: if you set URLAnalysis to true in your IDOL server configuration files [Server] section, you
cannot identify a field as a FieldCheckType field, as IDOL server automatically uses the domain it
finds in documents Reference fields as FieldCheck value.

Setting up FieldCheckType fields


1.

Open IDOL server's configuration file in a text editor.

2.

List a process that identifies numerical fields in the [FieldProcessing] section.


For example:
[FieldProcessing]
Number=2
0=MyFirstProcess
1=FieldCheckTypeIdentification

3.

Create a section for each process that you have listed, in which you create a property for it (a
property is later defined by one or more applicable configuration parameters). Identify the fields
that you want to associate with the process.
Note: the properties that you create must not have the same name as processes.
For example:
[MyFirstProcess]
Property=MyProperty
PropertyFieldCSVs=*/MyField,*/MyOtherField
[FieldCheckTypeIdentification]
Property=FieldCheck
PropertyFieldCSVs=*/CATEGORY

4.

List the property that you have created in the [Properties] section.
For example:
[Properties]
0=MyProperty
1=FieldCheck

Page 291

Fields
5.

Create a section for the property in which you set the FieldCheckType parameter to true. This
enables IDOL server to memory map the associated PropertyFieldCSVs fields.
For example:
[FieldCheck]
FieldCheckType=true

6.

Save IDOL server's configuration file and restart IDOL server to execute your changes.

When you now use a Query, Suggest or SuggestOnText action to query for results, you can:

use the Combine action parameter to restrict the result output to the most relevant result for
each available FieldCheckType field value (by setting it to FieldCheck).

use the FieldCheck action parameter to restrict the result output to documents whose
FieldCheckType field matches a specific value (this is also available for the
GetQueryTagValues action).

Combine parameter example


In this example, IDOL server is configured to store the Category field as a FieldCheckType field.
The following query is executed:
http://12.3.4.56:4000/action=Query&Text=The best thing to do in your spare
time&Combine=FieldCheck
If IDOL server contains 50 documents that match the query text, of which 8 contain a Category field
with the value Sport, 5 contain a Category field with the value Gardening and 1 contains a Category
field with the value Cooking, the above query only return 3 results:

the most relevant of the documents whose Category contains the value Sport

the most relevant of the documents whose Category contains the value Gardening

the document whose Category contains the value Cooking.

FieldCheck parameter example


In this example, IDOL server is configured to store the Color field as a FieldCheckType field.
The following query is executed:
http://12.3.4.56:4000/action=Query&Text=A fast sports car&FieldCheck=Red
The above query only returns results whose content matches the specified Text and whose
FieldCheckType field has the value Red.

Page 292

Fields

Reference fields
Reference fields are used to identify documents. Before a document is indexed into IDOL server, you
have to set up a field process that determines which of the fields in a document will be used as its
Reference field (note that a document can have multiple Reference fields).
At index time Reference fields can be used to eliminate duplicate copies of documents (see Using
Reference fields to eliminate duplicate copies of documents during indexing on page 105). At
query time Reference fields can be used to filter results (for example, by using the Combine action
parameter or by specifying references that results must or mustn't match, see Using Reference fields
to filter results at query time on page 272).
Note that if you want to eliminate duplicate document copies and use the Combine action parameter,
you should set up separate Reference fields for these processes (see Simultaneously using
KillDuplicates and Combine on Reference fields on page 295).

Setting up Reference fields


Note that you must set up a field process to identify Reference fields before you start indexing
documents into IDOL server.
1.

Open IDOL server's configuration file in a text editor.

2.

In the [FieldProcessing] section add a process that identifies Reference fields.


For example:
[FieldProcessing]
Number=3
0=MyFirstProcess
1=MySecondProcess
3=SetReferenceFields

3.

Create a section for the process that you have added, in which you create a property for the
process (a property is later defined by one or more applicable configuration parameters). Identify
the fields that you want to associate with the process.
Note: the properties that you create must not have the same name as processes.
For example:
[MyFirstProcess]
Property=MyFirstProperty
PropertyFieldCSVs=*/MyField,*/MySecondField

Page 293

Fields
[MySecondProcess]
Property=MySecondProperty
PropertyFieldCSVs=*/MyThirdField
[SetReferenceFields]
Property=Reference
PropertyFieldCSVs=*/DREREFERENCE,*/URL
4.

List all the properties that you have created in a [Properties] section.
For example:
[Properties]
0=MyFirstProperty
1=MySecondProperty
2=Reference

5.

Create a section for each of the properties and specify appropriate configuration settings for each.
These configuration parameters define the processes that are applied to all the fields (or all
documents that contain the fields) that you have previously associated with the processes.
For example:
[MyFirstProperty]
HiddenType=true
[MySecondProperty]
Index=true
[Reference]
ReferenceType=TRUE
TrimSpaces=TRUE

6.

Save IDOL server's configuration file and start IDOL server. You can now index documents into
IDOL server.

Note:
If you don't set up a field process that identifies Reference fields, IDOL server automatically allocates a
unique number to each document that is indexed. This number will be used as the document's
reference.

Page 294

Fields

Simultaneously using KillDuplicates and Combine on


Reference fields
When you instruct IDOL server to eliminate duplicate document copies at index time using a specific
Reference field (by setting the KillDuplicates parameter in IDOL server configuration file, see Using
Reference fields to eliminate duplicate copies of documents during indexing on page 105), it
automatically uses any field that is listed for PropertyFieldCSVs alongside this Reference field in the
IDOL server configuration to eliminate duplicate document copies as well.
However, IDOL server cannot use the same field for deduplication as for the Combine action
parameter, since the Combine operation clashes (carried out at query time) with IDOL server
eliminating duplicate fields. This means that, if you want to eliminate duplicate document copies and
use the Combine action parameter, you should set up separate Reference fields for these processes:
1.

Open IDOL server's configuration file in a text editor.

2.

In the [FieldProcessing] section add two processes that identify Reference fields (note that you
must set up a field process to identify Reference fields before you start indexing documents into
IDOL server). One of them will be used to eliminate duplicate copies of documents and the other
one will be use for the Combine operation.
For example:
[FieldProcessing]
Number=4
0=MyFirstProcess
1=MySecondProcess
3=SetUpReferenceFields
4=SetUpMoreReferenceFields

3.

Create a section for the processes that you have added, in each of which you create a property for
the respective process (a property is later defined by one or more applicable configuration
parameters). Identify the fields that you want to associate with each process.
Note: the properties that you create must not have the same name as processes.
For example:
[MyFirstProcess]
Property=MyFirstProperty
PropertyFieldCSVs=*/MyField,*/MySecondField
[MySecondProcess]
Property=MySecondProperty
PropertyFieldCSVs=*/MyThirdField
[SetUpReferenceFields]
Property=ReferenceFields
PropertyFieldCSVs=*/DREREFERENCE,*/URL

Page 295

Fields
[SetUpMoreReferenceFields]
Property=MoreReferenceFields
PropertyFieldCSVs=*/DRETITLE
4.

List all the properties that you have created in a [Properties] section.
For example:
[Properties]
0=MyFirstProperty
1=MySecondProperty
2=ReferenceFields
3=MoreReferenceFields

5.

Create a section for each of the properties and specify appropriate configuration settings for each.
These configuration parameters define the processes that are applied to all the fields (or all
documents that contain the fields) that you have previously associated with the processes.
For example:
[MyFirstProperty]
HiddenType=true
[MySecondProperty]
Index=true
[ReferenceFields]
ReferenceType=TRUE
TrimSpaces=TRUE
[MoreReferenceFields]
ReferenceType=TRUE
TrimSpaces=TRUE

6.

Save IDOL server's configuration file and start IDOL server.

Once you have indexed documents into IDOL server, you can use, for example, the */
DREREFERENCE field to eliminate duplicate copies of documents. (IDOL server then automatically
also uses the */URL field for deduplication because it is listed alongside */DREREFERENCE for
PropertyFieldCSVs.) This leaves you free to use the */DRETITLE field for the Combine operation.

Page 296

Fields

Highlight fields
When you execute a Query, Suggest or SuggestOnText action command, you can highlight
sentences or words in the results that are related to the terms in the query (or the terms in the text or
document that you are suggesting on).
IDOL server checks which fields highlighting applies to and then highlights all sentences or words that
are based on the terms in the results that it returns.

Setting up Highlight fields


1.

Open IDOL server's configuration file in a text editor.

2.

List a highlighting process in the [FieldProcessing] section.


For example:
[FieldProcessing]
Number=2
0=MyFirstProcess
1=HighlightFields

3.

Create a section for each process that you have listed, in which you create a property for the
process (a property is later defined by one or more applicable configuration parameters). Identify
the fields that you want to associate with the process.
Note: the properties that you create must not have the same name as processes.
For example:
[MyFirstProcess]
Property=MyProperty
PropertyFieldCSVs=*/MyField,*/MyOtherField
[HighlightFields]
Property=Highlight
PropertyFieldCSVs=*/DRETITLE,*/DRECONTENT

4.

List the property that you have created in the [Properties] section.
For example:
[Properties]
0=MyProperty
1=Highlight

Page 297

Fields
5.

Create a section for the property in which you set the HighlightingType parameter to true. This
enables the highlighting of all matched terms that are contained in the associated
PropertyFieldCSVs fields.
For example:
[Highlight]
HighlightType=true

6.

Save and close IDOL server's configuration file and restart IDOL server to execute your changes.

Page 298

Fields

Agentboolean fields
If you are upgrading to IDOL server from legacy technologies that use Boolean agents (a Boolean or
Proximity expression) to categorize documents, you can store these agents in agentboolean IDOL
server fields. You can then query IDOL server with text and an agentboolean field to return categories
that this text matches.

Storing Boolean agents in agentboolean fields


You can store Boolean agents in agentboolean fields manually (see Appendix D: manually creating
IDX files on page 431) or using an appropriate Autonomy connector for your data source.
For example:
A legacy Boolean agent comprises the following Boolean expression:
cat AND mat
Use an Autonomy connector to import the Boolean agent into an IDX file where the Boolean agent
is identified as a DREFIELD (or do this manually), for example:
#DREREFERENCE 947344A0
#DRETITLE
Cat
#DREFIELD MyABField="cat AND mat"
#DRECONTENT
Professor Baldwicks old cat Ginger was curled up on the mat in front
of the fire, purring contentedly.
#DREENDDOC
This IDX file is stored in IDOL servers Category index, creating a Cat category that comprises the
Boolean expression cat AND mat.

Page 299

Fields

Matching documents against agentboolean categories


You can combine a Query action with the AgentBooleanField action parameter in order to return
categories that contain a matching Boolean or Proximity expression in this field.
For example:
A document that needs to be categorized contains the following text:
The cat sat on the mat
You can use the documents text to query against IDOL servers Category index. The
AgentBooleanField action parameter allows you to specify the IDOL server field that stores the
legacy Boolean agent:
action=Query&Text=The cat sat on the mat&AgentBooleanField=MyABField
IDOL server returns the following categories for this:

categories that do not contain a MyABField field but match the query text in an Index
field (for example a DRECONTENT field).

categories that match the query text in an Index field field (for example a DRECONTENT
field), and have a MyABField field which contains a Boolean or Proximity expression
that matches The cat sat on the mat (for example, cat AND mat, cat OR mat, cat
BEFORE mat and cat DNEAR1 sat could return The cat sat on the mat, therefore
caegories that contain any of these Boolean/Proximity expressions in a MyABField field
would be returned).
Categories whose MyABField fields contain, for example, cat AND mat AND dog or
mat BEFORE dog would not be returned.

Note: if you are always storing the Boolean agents in the same field, you can use the
AgentBooleanCacheField configuration parameter to load this field into memory, so that
agentboolean queries which use this field can be executed more quickly.

Tip:
You can use ACI, ALert and Cat tasks to automatically match documents that IDOL server receives
against agentboolean categories, automatically alert users to documents that match specific
categories and automatically categorize documents (see Processing data before indexing it on
page 73).

Page 300

Fields

Meta fields
Meta fields are fields that IDOL server creates for documents at index time in order to display
information about the documents when they are returned as results for a query. Some of a documents
meta fields are always displayed when IDOL server returns this document as a query result. You can
display all a documents meta fields by adding XMLMeta=true to your query.
The following meta fields are displayed for results:

<autn:baseid
If the document has multiple sections, this is the ID of the documents first section. If the
document is not sectioned, this is the same as the documents ID.

<autn:content>
The documents text content.

<autn:database>
The IDOL server database in which the document is stored.

<autn:date>
The date (in epoch seconds) the document was created. This date is read from the field
that has been identified by the DateType parameter in IDOL servers configuration file. If
no field has been identified, the date the document was indexed is used instead.

<autn:expiredate>
The date (in epoch seconds) the document will expire. This date is read from the field
that has been identified by the ExpireDateType parameter in IDOL servers
configuration file. When a document expires it is deleted from IDOL server or moved to a
different database (depending on what ExpireIntoDatabase has been set to in IDOL
servers configuration file).

<autn:id>
The documents ID. A documents ID is assigned to it at index tme. If IDOL server is
compacted, the IDs of documents change.

Page 301

Fields
<autn:language>
<autn:languageencoding>
<autn:languagetype>
The language, encoding and language type associated with the document. The
documents language type is read from the field that has been identified by the
LanguageType parameter in IDOL servers configuration file. The language and
encoding of the document are read from the Language and Encoding parameters that
have been set for this language type in the configuration file.
If no field from which the language type can be read has been identified, the
DefaultLanguageType that has been set in the configuration file is used instead,
unless Automatic Language Detection is enabled, or the document has been submitted
to IDOL server with an index command that sets a specific language type for the
document .

<autn:links>
A list of stemmed terms that are contained both in the query and in the result document.

<autn:reference>
The documents reference. This is read from the field that has been identified by the
ReferenceType parameter in IDOL servers configuration file. If no field has been
identified, IDOL server automatically generates a reference for the document at index
time.

<autn:section>
The number of sections the document has been split up into at index time.

<autn:title>
The documents title. This is read from the field that has been identified by the TitleType
parameter in IDOL servers configuration file. If no field has been identified, the
document is not given a title.

<autn:weight>
The percentage relevance that the document has to the query.

Page 302

Fields

Changing field values


You can use the DREREPLACE command to change the values of fields or add fields to a document
after you have indexed content into IDOL server. Please see Changing field values in IDOL server
documents on page 375 for details.

Page 303

Fields

Page 304

Languages

32. Languages
IDOL server is based on probabilistic modeling and therefore does not require any form of language
dependent parsing, dictionaries or translation modules.
Treating words as abstract symbols of meaning allows Autonomy's technology to derive understanding
through the context in which symbols occur rather than a rigid definition of grammar. Slang and other
variations in language do not confuse the software.
Building up a statistical understanding of the patterns in any language, IDOL server can be trained on
the patterns of any language. The more information IDOL server is given about a particular type of
information (for example, legal terms, pharmaceutical developments, technology and so on), the more
understanding it gains of those topics.
A new language can be thought of as simply another type of information, for which IDOL server needs
enough material to learn from. Therefore, it is possible to mix more than one language in IDOL server
as long as the amounts for each language are sufficient to build its understanding.
The choice of language does not compromise the accuracy of the concepts extracted by IDOL server.
The underlying algorithm is the same regardless of the language used.

Autonomy's internationalization functionality enables:

Automatic language detection


IDOL server can detect the language and encoding of documents that it processes
automatically. This allows you to set up processes that are automatically applied to documents
or document metadata if they are in a specific language. For example, if a document is
identified as Chinese, the appropriate preliminary linguistic tools are automatically applied to it.
Note: if a document contains multiple languages, IDOL server determines which language it
contains most, and processes the document according to the settings for this language.

Cross-lingual systems
IDOL server can be used to set up cross-lingual systems. This allows you to produce
multilingual results for queries or to restrict results to documents in a specific language or
encoding. For example, an English query may return information both in English and Spanish.

Page 307

Languages
While Autonomy's technology is language independent, it can be beneficial to use language
dependent features in order to optimize IDOL servers ability to match concepts irrespective of their
appearance in text. Autonomy therefore provides the following features:

Stemming
In languages some words have a common morphological root. Autonomy provides stemming
algorithms that reduce words to this form. This is useful because it allows concepts to be
matched regardless of the grammatical use of words. In English for example, the words "help",
"helpful", "helping" and "helped" can all be stripped down to their stem "help" without significant
loss of meaning.
Autonomy provides as standard a set of stemming algorithms for the most commonly used
languages. Stemming is applied after stopwords have been discarded both at index time (when
content is stored in IDOL server) and at query time (query text is stopped and stemmed before
it is matched).
Stoplists
Every language has words that do not carry much significant meaning. In grammatical terms
these are normally prepositions, conjunctions, auxiliary verbs and so on (for example, words
such as "the", "a", "and", "to" in English). These words can be safely ignored when processing
content.
Autonomy provides as standard a set of stoplists for the most commonly used languages.
Multiple encodings
Autonomy supports multiple encodings for languages such as Greek and Russian. Different
encodings can be used interchangeably which means that it does not matter which encoding a
language is given in. This makes it, for example, possible to query in one recognized encoding
for a language and receive results that are in other encodings.
Transliteration schemes
Transliteration is the ability to represent letters that do not belong to the Latin alphabet or words
that comprise accented letters with the corresponding characters of another alphabet. This
make familiarity with the accents and special characters of different languages unnecessary.

Canonicalization of characters
Some encodings have more than one way of representing a character. The Japanese katakana
script, for example, can be written in full width or half width characters. Regardless of its width
the character in itself carries the same meaning.
Autonomy's software infrastructure uses canonicalization to ensure that all character forms are
treated equally through automatic conversion to an internationally recognized canonical form.

Page 308

Languages

Running IDOL server in multiple languages


You can combine multiple languages in one IDOL server. Use the outline below to determine what you
have to do:

Before indexing your documents:


1.

Check that the IDOL server configuration file contains the languages you want to use (see
Checking which languages are set up in IDOL server on page 311).

2.

If the configuration file does not contain all the languages you want to use, you need to add the
missing languages (see Defining language types in IDOL server's configuration file on
page 312), and set up a field process that enables IDOL server to associate these languages with
documents (see Configuring IDOL server to associate language types with documents on
page 314).

3.

Check the documents that you want to index into IDOL server:

if your documents (or some of your documents) do not contain fields from which IDOL server
can read their language type (see Adding language type fields to documents on
page 318), IDOL server assumes that the default language type applies to the documents
(see Defining a default language type in IDOL server's configuration file on page 319).
If you dont want the default language type to be associated with your documents, you need
to enable automatic language detection (see Enabling Automatic Language Detection on
page 320).
Alternatively, you can manually index your documents into IDOL server, adding the language
type of the documents to each index command (see Index commands on page 84). In this
case you have to index your documents in batches, where each batch must have the same
language type (that is language and encoding).

documents that contain fields from which IDOL server can read their language type, are
automatically processed correctly (provided you have added any missing languages to IDOL
servers configuration file in step 2).

When querying IDOL server:


By default IDOL server only returns documents for a query that have the same language as the
language type of the query. If you dont want it to do this, you can do one of the following:

If the language type (that is the querys language and encoding) of a query is not the default
language type that you have defined in IDOL servers configuration file, you need to include the
querys language type in the query string (see Specifying the language type of your query on
page 321).

Page 309

Languages

If you want to return results in a specific encoding for your query, you need to include the
OutputEncoding parameter in your query (see Converting results to a specific encoding on
page 322). Note that you can only return encodings that are compatible with the querys language.

If you want to return documents in multiple languages for your query, you need to include the
AnyLanguage parameter in your query (see Returning documents in multiple languages for
your query on page 323).

If you want to return documents in a specific language for your query, you need to include the
AnyLanguage and MatchLanguage parameters in your query (see Returning documents in a
specific language for your query on page 324).

Page 310

Languages

Checking which languages are set up in IDOL server


You can find out which languages IDOL server can process by looking at its configuration file.

To check which languages are set up in IDOL server


Open IDOL servers configuration file in a text editor and find the [LanguageTypes] section. This
section lists all language types that IDOL server can process. Each language type is defined by a
language and an encoding.
For example:
[LanguageTypes]
DefaultLanguageType=englishASCII
LanguageDirectory=C:\IDOLserver\IDOL\langfiles
0=englishASCII
1=englishUTF8
2=afrikaansASCII
3=afrikaansUTF8
4=albanianASCII
5=albanianUTF8
6=arabicARABIC_ISO
7=arabicARABIC
8=arabicUTF8
9=basqueASCII
10=basqueUTF8
In this example, IDOL server can process the specified ten language types. The specified
DefaultLanguageType is applied to documents that do not contain fields from which IDOL server can
read their language type. Resource files (for example, stoplists) that IDOL server uses when
processing languages are stored in the specified LanguageDirectory.

Page 311

Languages

Defining language types in IDOL server's configuration file


In order to run IDOL server in multiple languages, you need to specify language types for each of the
language and encoding combination that you want to IDOL server to be able to process.

To specify language types


Note: you must specify language types before you index data into IDOL server.
1.

Open IDOL server's configuration file in a text editor.

2.

Find the [LanguageTypes] section and list the language types that you want IDOL server to be
able to process (note that you must use ASCII characters when specifying a language type).
For example:
[LanguageTypes]
LanguageDirectory=C:\IDOLserver\IDOL\langfiles
0=englishASCII
1=englishUTF8
2=afrikaansASCII
3=afrikaansUTF8
4=albanianASCII
5=albanianUTF8
Note: resource files (for example, stoplists) that IDOL server uses when processing languages
are stored in the specified LanguageDirectory

3.

For each of the language types that you have listed, create a section with the same name. In this
section, specify appropriate settings that determine how IDOL server handles this language type.
For details on the configuration settings that you can use, please refer to IDOL server's online help
(see Displaying help on configuration settings on page 389)
For example:
[englishASCII]
LanguageCode=1
Language=ENGLISH
Encoding=ASCII
Stoplist=english.dat
IndexNumbers=1
[englishUTF8]
LanguageCode=2
Language=ENGLISH
Encoding=UTF8
Stoplist=english.dat
IndexNumbers=1

Page 312

Languages
[afrikaansASCII]
LanguageCode=3
Language=AFRIKAANS
Encoding=ASCII
IndexNumbers=1
[afrikaansUTF8]
LanguageCode=4
Language=AFRIKAANS
Encoding=UTF8
IndexNumbers=1
[albanianASCII]
LanguageCode=5
Language=ALBANIAN
Encoding=ASCII
IndexNumbers=0
Note: in IDOL server the StripLanguage and CharConv settings have been deprecated (the
functionality has been automated according to language and encoding).
4.

Save the configuration file.

5.

You can now configure IDOL server to associate the language types you have defined with
documents (see Configuring IDOL server to associate language types with documents on
page 314).

Page 313

Languages

Configuring IDOL server to associate language types with


documents
Once you have defined all the language types that you want IDOL server to process (see Defining
language types in IDOL server's configuration file on page 312), you need to set up a field process
that enables IDOL server to associate these language types with documents.
The way you need to configure this field process depends on the documents that you want to index
into IDOL server:

if all the documents that you want to index into IDOL server contain a field that contains the
language type, you can configure your IDOL server as follows:
1.

Set up a process for looking up the language of a document in the [FieldProcessing]


section.
For example:
[FieldProcessing]
Number=1
0=LookForLanguage

2.

Create a section for the process, in which you create a Property for the process and
identify the field that you want to apply the process to.
For example:
[LookForLanguage]
Property=SetLanguage
PropertyFieldCSVs=*/DRELANGUAGE,*/myLanguageType

3.

List the Property that you have created in the [Properties] section.
For example:
[Properties]
0=SetLanguage

4.

Create a section for this property, in which you set the LanguageType parameter to true
to map the values of the */DRELANGUAGE fields to the equivalent language type in the
[LanguageTypes] section.
For example:
[SetLanguage]
LanguageType=true
[LanguageTypes]
0=russianISO
1=russianKOI8
2=russianUTF8

Page 314

Languages
[russianISO]
LanguageCode=1
Language=Russian
Encoding=CYRILLIC_ISO
[russianKOI8]
LanguageCode=2
Language=Russian
Encoding=CYRILLIC_KOI8
[russianUTF8]
LanguageCode=3
Language=Russian
Encoding=UTF8

5.

Save the configuration file and start IDOL server.

6.

You can now index documents into IDOL server (see Index commands on page 84).

if all the documents that you want to index into IDOL server contain a field that contains data that
can be used to identify the language type, you can configure your IDOL server as follows:
1.

Use the [FieldProcessing] section of IDOL server's configuration file to define each
language property that you want IDOL server to be able to detect.
For example:
[FieldProcessing]
Number=6
0=DetectArabic
1=DetectArabicISO
2=DetectEnglish
3=DetectChSimplified
4=DetectChTraditional
5=DetectFrench

2.

For each of the languages that you have defined in the [FieldProcessing] section, you
need to define a section with the name of the respective language type. In this section
you can then specify the fields that IDOL server should look for and the values that those
fields must have in order for the document to be recognized as a particular language
type.
For example:
[DetectArabic]
Property=SetArabicProperty
PropertyFieldCSVs=*/DRELANGUAGETYPE,*/LANG
PropertyMatch=arabic

Page 315

Languages
[DetectArabicISO]
Property=SetArabicISOProperty
PropertyFieldCSVs=*/DRELANGUAGETYPE,*/LANG
PropertyMatch=arabicISO,ISOarab*
[DetectEnglish]
Property=SetEnglishProperty
PropertyFieldCSVs=*/DRELANGUAGETYPE,*/LANG
PropertyMatch=*eng*,uk,*british
[DetectChSimplified]
Property=SetChSimplifiedProperty
PropertyFieldCSVs=*/DRELANGUAGETYPE,*/LANG
PropertyMatch=*ChSimp*,ChineseSimp*
[DetectChTraditonal]
Property=SetChTraditionalProperty
PropertyFieldCSVs=*/DRELANGUAGETYPE,*/LANG
PropertyMatch=*ChTrad*,ChineseTrad*
[DetectFrench]
Property=SetFrenchProperty
PropertyFieldCSVs=*/DRELANGUAGETYPE, */DRELANGAGETYPE,*/LANG
PropertyMatch=*fre*,fran*
3.

For each Property that you have defined in the [FieldProcessing] subsections, you
need to define a section with the same value of the respective property. In this section
you can then specify the language type (which you also need to list in IDOL server's
[LanguageTypes] section where you define how you want IDOL server to handle the
individual languages).
For example:
[SetArabicProperty]
LanguageType=Arabic
HiddenType=TRUE
[SetArabicISOProperty]
LanguageType=ArabicISO
HiddenType=TRUE
[SetEnglishProperty]
LanguageType=English
HiddenType=TRUE

Page 316

Languages
[SetChSimplifiedProperty]
LanguageType=ChSimplified
HiddenType=TRUE
[SetChTraditionalProperty]
LanguageType=ChTraditional
HiddenType=TRUE
[SetFrenchProperty]
LanguageType=French
HiddenType=TRUE
4.

Save the configuration file and start IDOL server.

5.

You can now index documents into IDOL server (see Index commands on page 84).

Page 317

Languages

Adding language type fields to documents


You can configure any of the Autonomy connectors to add fields to documents from which IDOL server
can read the language type of the documents.

To add a language type field to documents during fetching


1.

Open the configuration file of your Autonomy connector in a text editor.

2.

Use the FixedField<N> and FixedFieldValue<N> settings to specify the name and the value of
the field that you want to add to documents the connector retrieves.
For example:
FixedField0=DRELanguage
FixedFieldValue0=englishASCII
Note: if you add these settings to a connectors fetch job section, they only apply to the fetch job
defined in that section. If you add the settings to a connectors default section, they apply to all
fetch jobs.

3.

Save the configuration file.

Page 318

Languages

Defining a default language type in IDOL server's


configuration file
In order to run IDOL server in multiple languages, you need to specify language types for each of the
language and encoding combination that you want to IDOL server to be able to process.

To specify language types


1.

Open IDOL server's configuration file in a text editor.

2.

Find the [LanguageTypes] section and list the language type that you want IDOL server to
associate with any document that doesnt contain a language type field (note that if you are using
automatic language detection, IDOL server uses this to determine the language type of
documents and not the default language type).
For example:
[LanguageTypes]
DefaultLanguageType=englishASCII
LanguageDirectory=C:\IDOLserver\IDOL\langfiles
Note: resource files (for example, stoplists) that IDOL server uses when processing languages
are stored in the specified LanguageDirectory.

3.

For the default language types that you have listed, create a section with the same name. In this
section, specify appropriate settings that determine how IDOL server handles this language type.
For details on the configuration settings that you can use, please refer to IDOL server's online help
(see Displaying help on configuration settings on page 389)
For example:
[englishASCII]
LanguageCode=1
Language=ENGLISH
Encoding=ASCII
Stoplist=english.dat
IndexNumbers=1
Note: in IDOL server the StripLanguage and CharConv settings have been deprecated (the
functionality has been automated according to language and encoding).

4.

Save and close the configuration file.

5.

Restart IDOL server to execute your changes.

Page 319

Languages

Enabling Automatic Language Detection


If your IDOL server license includes Automatic Language Detection, IDOL server can automatically
identify the language and encoding of a document when it is indexed.

To enable Automatic Language Detection


1.

Open IDOL server's configuration file in a text editor.

2.

Find the [Server] section and add the following setting to it:
AutoDetectLanguagesAtIndex=true

3.

Use the DiscardUnconfiguredLanguagesAtIndex and DiscardUnknownLanguagesAtIndex


settings to determine how IDOL server handles documents whose language types have not been
defined in the configuration file or whose language it cannot recognize (for example, because the
document does not contain language or the documents text is too short for IDOL server to be able
to determine its language).
If you set DiscardUnconfiguredLanguagesAtIndex and DiscardUnknownLanguagesAtIndex
to true, IDOL server discards documents whose language types have not been defined in the
configuration file or whose language it cannot recognize. By default IDOL server indexes the
document using the default language type. It also logs a warning message in the index log, so that
you can add an appropriate language type.

4.

Save and close the configuration file.

5.

Restart IDOL server to execute your changes.

Note: if you have Automatic Language Detection enabled and a field process set up that reads a
document's language from one of its fields, IDOL server uses the field process rather than autodetection to determine the document's language and encoding.

Page 320

Languages

Specifying the language type of your query


When you send a query to IDOL server, it assumes by default that the querys language type is the
DefaultLanguageType that you have defined in IDOL server's configuration file (see Defining a
default language type in IDOL server's configuration file on page 319).
It is essential that IDOL server knows the language type of any query text that is submitted to it, so that
can handle it appropriately (for example, by applying correct stemming and so on). This means that if
you want to send a query that does not use the default language type, you need to add the
LanguageType parameter to your query command, which instructs IDOL server that this query uses
the language and encoding that have been set in IDOL server configuration file for the specified
LanguageType.

For example:
This query uses the language and encoding that has been specified for the DefaultLanguageType, so
you can send it to IDOL server without adding the LanguageType parameter:
http://12.3.4.56:4000/action=Query&Text=The Bayes theory of probability

This query uses the language and encoding that has been specified for the GermanASCII language
type:
http://12.3.4.56:4000/action=Query&Text=Einsteins Relativittstheorie&LanguageType=
GermanASCII

Page 321

Languages

Converting results to a specific encoding


You can send the following types of query to IDOL server:

Text queries
Queries that contain some form of query text (for example, Query, SuggestOnText,
Summarize and so on).

Text-free queries
Queries that do not contains any query text (for example, Suggest, List, GetContent and so
on).

Text queries
When you send a query action to IDOL server, it returns by default results that use the same language
and encoding as the query text (that is the language that has been specified for the LanguageType
that is sent with the query, or for the DefaultLanguageType if no LanguageType is sent with the
query).
If you want a query action to return results in a specific encoding, you must add the OutputEncoding
to your query. This parameter allows you to convert the results of a query to any type of encoding that
is compatible with the query's language (if you specify an encoding that is not compatible with the
query's language, IDOL server indicates this in the results).
For example:
http://12.3.4.56:4000/action=Query&Text=Neurologia i Neurochirurgia&LanguageType=Polis
hEASTERNEUROPEAN&OutputEncoding=EASTERNEUROPEAN_ISO
In this example, IDOL server will convert all query results to EASTERNEUROPEAN_ISO.

Text-free queries
Query actions that do not contain any query text by default return results in the OutputEncoding that
has been specified for the DefaultLanguageType. If any of the query's results is not compatible with
this encoding, IDOL server indicates this in the results.
If you want a query action to return results in a specific encoding, you can add the OutputEncoding to
your query. IDOL server converts all results to this encoding, provided they are compatible with it. If
any of the query's results is not compatible with this encoding IDOL server returns an appropriate
message.
For example:
http://12.3.4.56:4000/action=Suggest&ID=9016&OutputEncoding=EASTERNEUROPEAN_ISO
In this example, IDOL server will convert all query results to EASTERNEUROPEAN_ISO.

Page 322

Languages

Returning documents in multiple languages for your query


When you send a query to IDOL server, it returns by default results that use the same language as the
query text (that is the language that has been specified for the LanguageType that is sent with the
query, or for the DefaultLanguageType if no LanguageType is sent with the query).
If you want to return documents in any language for your query rather than only in the query's
language, you need to add AnyLanguage=true to your query.
When IDOL server receives the query, it applies the stemming algorithm and stop list that is
appropriate for the querys language type, and only returns documents that contain words which match
the stopped and stemmed terms in the query (that is the words in result documents must stem to the
same as the words in the query text).
For example:
http://12.3.4.56:4000/action=Query&Text=Innovative internet marketing solutions in
Baghdad&AnyLanguage=true
In this example, IDOL server will return documents in multiple languages that contain terms that match
terms in the specified Text.

Note that the query will only return documents in multiple languages if they contain terms that match
terms in the query (for example, query text that contains the term "Baghdad" might return documents in
English, French, German and so on).

Page 323

Languages

Returning documents in a specific language for your query


When you send a query to IDOL server, it returns by default results that use the same language as the
query text (that is the language that has been specified for the LanguageType that is sent with the
query, or for the DefaultLanguageType if no LanguageType is sent with the query).
If you want to return documents in one or more specific languages for your query rather than only in
the query's language, you need to add AnyLanguage to your query. This removes the restriction only
to return documents that have the same language as the query's language type, and you can then add
MatchLanguage to your query which allows you to specify which languages you want to return.
For example:
http://<IP_address>:<port>/action=Query&Text=university of Birmingham&LanguageType=
EnglishASCII&AnyLanguage=true&MatchLanguage=DutchASCII+GermanASCII
Note that if you are specifying MatchLanguage, you cannot specify MatchLanguageType or
MatchEncoding for your query.

Page 324

33. Language settings and files


Encoding settings for supported languages
Note:

All IDOL server Encoding settings can alternatively be set to UTF8 or UCS2.

The internal IDOL server storage encoding is UTF8.

Afrikaans
Script:

Latin

Set Language parameter to:

AFRIKAANS

For encoding:

set Encoding parameter to:

windows-CP1252 / iso-8859-1

ASCII

UTF-8

UTF8

Script:

Latin

Set Language parameter to:

ALBANIAN

For encoding:

set Encoding parameter to:

windows-CP1252 / iso-8859-1

ASCII

UTF-8

UTF8

Albanian

Page 325

Language settings and files

Arabic
Script:

Arabic

Set Language parameter to:

ARABIC

For encoding:

set Encoding parameter to:

windows-CP1256

ARABIC

iso-8859-6

ARABIC_ISO

UTF-8

UTF8

Script:

Cyrillic

Set Language parameter to:

AZERI

For encoding:

set Encoding parameter to:

windows-CP1251

CYRILLIC

KOI8-R

CYRILLIC_KOI8

iso-8859-5

CYRILLIC_ISO

UTF-8

UTF8

Script:

Latin

Set Language parameter to:

BASQUE

For encoding:

set Encoding parameter to:

windows-CP1252 / iso-8859-1

ASCII

UTF-8

UTF8

Azeri

Basque

Page 326

Language settings and files

Belarussian
Script:

Cyrillic

Set Language parameter to:

BELARUSSIAN

For encoding:

set Encoding parameter to:

windows-CP1251

CYRILLIC

KOI8-R

CYRILLIC_KOI8

iso-8859-5

CYRILLIC_ISO

UTF-8

UTF8

Script:

Latin

Set Language parameter to:

BRETON

For encoding:

set Encoding parameter to:

windows-CP1252 / iso-8859-1

ASCII

UTF-8

UTF8

Script:

Cyrillic

Set Language parameter to:

BULGARIAN

For encoding:

set Encoding parameter to:

windows-CP1251

CYRILLIC

KOI8-R

CYRILLIC_KOI8

iso-8859-5

CYRILLIC_ISO

UTF-8

UTF8

Breton

Bulgarian

Page 327

Language settings and files

Catalan
Script:

Latin

Set Language parameter to:

CATALAN

For encoding:

set Encoding parameter to:

windows-CP1252 / iso-8859-1

ASCII

UTF-8

UTF8

Script:

Big-5

Set Language parameter to:

CHINESE

For encoding:

set Encoding parameter to:

Big-5

CHINESETRADITIONAL

UTF-8

UTF8

Script:

GB2312-80

Set Language parameter to:

CHINESE

For encoding:

set Encoding parameter to:

gb2312

CHINESESIMPLIFIED

UTF-8

UTF8

Chinese
Traditional

Chinese
Simplified

Page 328

Language settings and files

Croatian
Script:

Latin

Set Language parameter to:

CROATIAN

For encoding:

set Encoding parameter to:

windows-CP1250

EASTERNEUROPEAN

iso-8859-2

EASTERNEUROPEAN_ISO

UTF-8

UTF8

Script:

Latin

Set Language parameter to:

CZECH

For encoding:

set Encoding parameter to:

windows-CP1250

EASTERNEUROPEAN

iso-8859-2

EASTERNEUROPEAN_ISO

UTF-8

UTF8

Script:

Latin

Set Language parameter to:

DANISH

For encoding:

set Encoding parameter to:

windows-CP1252 / iso-8859-1

ASCII

UTF-8

UTF8

Czech

Danish*

* A stemming algorithm is available for this language and is applied by default. If you do not want to
apply stemming to this language, set Stemming to false for this language in the configuration file.

Page 329

Language settings and files

Dutch*
Script:

Latin

Set Language parameter to:

DUTCH

For encoding:

set Encoding parameter to:

windows-CP1252 / iso-8859-1

ASCII

UTF-8

UTF8

Script:

Latin

Set Language parameter to:

ENGLISH

For encoding:

set Encoding parameter to:

windows-CP1252 / iso-8859-1

ASCII

UTF-8

UTF8

Script:

Latin

Set Language parameter to:

ESTONIAN

For encoding:

set Encoding parameter to:

windows-CP1257

NORTHERNEUROPEAN

iso-8859-4

NORTHERNEUROPEAN_ISO

UTF-8

UTF8

English*

Estonian

* A stemming algorithm is available for this language and is applied by default. If you do not want to
apply stemming to this language, set Stemming to false for this language in the configuration file.:

Page 330

Language settings and files

Faroese
Script:

Latin

Set Language parameter to:

FAROESE

For encoding:

set Encoding parameter to:

windows-CP1252 / iso-8859-1

ASCII

UTF-8

UTF8

Script:

Latin

Set Language parameter to:

FINNISH

For encoding:

set Encoding parameter to:

windows-CP1252 / iso-8859-1

ASCII

UTF-8

UTF8

Script:

Latin

Set Language parameter to:

FRENCH

For encoding:

set Encoding parameter to:

windows-CP1252 / iso-8859-1

ASCII

UTF-8

UTF8

Finnish

French*

* A stemming algorithm is available for this language and is applied by default. If you do not want to
apply stemming to this language, set Stemming to false for this language in the configuration file.:

Page 331

Language settings and files

Gaelic
Script:

Latin

Set Language parameter to:

GAELIC

For encoding:

set Encoding parameter to:

windows-CP1252 / iso-8859-1

ASCII

UTF-8

UTF8

Script:

Latin

Set Language parameter to:

GALICIAN

For encoding:

set Encoding parameter to:

windows-CP1252 / iso-8859-1

ASCII

UTF-8

UTF8

Script:

Latin

Set Language parameter to:

GERMAN

For encoding:

set Encoding parameter to:

windows-CP1252 / iso-8859-1

ASCII

UTF-8

UTF8

Galician*

German*

* A stemming algorithm is available for this language and is applied by default. If you do not want to
apply stemming to this language, set Stemming to false for this language in the configuration file.

Page 332

Language settings and files

Greek*
Script:

Greek

Set Language parameter to:

GREEK

For encoding:

set Encoding parameter to:

windows-CP1253

GREEK

iso-8859-7

GREEK_ISO

UTF-8

UTF8

Script:

Latin

Set Language parameter to:

GREENLANDIC

For encoding:

set Encoding parameter to:

windows-CP1257

NORTHERNEUROPEAN

iso-8859-4

NORTHERNEUROPEAN_ISO

UTF-8

UTF8

Script:

Hebrew

Set Language parameter to:

HEBREW

For encoding:

set Encoding parameter to:

windows-CP1255

HEBREW

iso-8859-8

HEBREW_ISO

UTF-8

UTF8

Greenlandic

Hebrew

* A stemming algorithm is available for this language and is applied by default. If you do not want to
apply stemming to this language, set Stemming to false for this language in the configuration file.

Page 333

Language settings and files

Hindi
Script:

UTF8

Set Language parameter to:

HINDI

For encoding:

set Encoding parameter to:

UTF-8

UTF8

Script:

Latin

Set Language parameter to:

HUNGARIAN

For encoding:

set Encoding parameter to:

windows-CP1250

EASTERNEUROPEAN

iso-8859-2

EASTERNEUROPEAN_ISO

UTF-8

UTF8

Script:

Latin

Set Language parameter to:

ICELANDIC

For encoding:

set Encoding parameter to:

windows-CP1252 / iso-8859-1

ASCII

UTF-8

UTF8

Hungarian

Icelandic

Page 334

Language settings and files

Indonesian
Script:

Latin

Set Language parameter to:

INDONESIAN

For encoding:

set Encoding parameter to:

windows-CP1252 / iso-8859-1

ASCII

UTF-8

UTF8

Script:

Latin

Set Language parameter to:

ITALIAN

For encoding:

set Encoding parameter to:

windows-CP1252 / iso-8859-1

ASCII

UTF-8

UTF8

Script:

Japanese

Set Language parameter to:

JAPANESE

For encoding:

set Encoding parameter to:

Shift-JIS

SHIFTJIS

EUC

EUC

JIS

JIS

UTF-8

UTF8

Italian*

Japanese**

* A stemming algorithm is available for this language and is applied by default. If you do not want to
apply stemming to this language, set Stemming to false for this language in the configuration file.
** The language has stemming embedded in sentence breaking.

Page 335

Language settings and files

Kazakh
Script:

Cyrillic

Set Language parameter to:

KAZAKH

For encoding:

set Encoding parameter to:

windows-CP1251

CYRILLIC

KOI8-R

CYRILLIC_KOI8

iso-8859-5

CYRILLIC_ISO

UTF-8

UTF8

Script:

Hangul

Set Language parameter to:

KOREAN

For encoding:

set Encoding parameter to:

KS C 5601-1987

KOREAN

KS C 5601-1992

KOREAN

UTF-8

UTF8

Script:

Latin

Set Language parameter to:

KURDISH

For encoding:

set Encoding parameter to:

windows-CP1252 / iso-8859-1

ASCII

UTF-8

UTF8

Korean**

Kurdish

** The language has stemming embedded in sentence breaking.

Page 336

Language settings and files

Kyrgyz
Script:

Cyrillic

Set Language parameter to:

KYRGYZ

For encoding:

set Encoding parameter to:

windows-CP1251

CYRILLIC

KOI8-R

CYRILLIC_KOI8

iso-8859-5

CYRILLIC_ISO

UTF-8

UTF8

Script:

Latin

Set Language parameter to:

LAPPISH

For encoding:

set Encoding parameter to:

windows-CP1257

NORTHERNEUROPEAN

iso-8859-4

NORTHERNEUROPEAN_ISO

UTF-8

UTF8

Script:

Latin

Set Language parameter to:

LATIN

For encoding:

set Encoding parameter to:

windows-CP1252 / iso-8859-1

ASCII

UTF-8

UTF8

Lappish

Latin

Page 337

Language settings and files

Latvian
Script:

Latin

Set Language parameter to:

ITALIAN

For encoding:

set Encoding parameter to:

windows-CP1257

NORTHERNEUROPEAN

iso-8859-4

NORTHERNEUROPEAN_ISO

UTF-8

UTF8

Script:

Latin

Set Language parameter to:

LITHUANIAN

For encoding:

set Encoding parameter to:

windows-CP1257

NORTHERNEUROPEAN

iso-8859-4

NORTHERNEUROPEAN_ISO

UTF-8

UTF8

Lithuanian

Luxembourgish

Page 338

Script:

Latin

Set Language parameter to:

LUXEMBOURGISH

For encoding:

set Encoding parameter to:

windows-CP1252 / iso-8859-1

ASCII

UTF-8

UTF8

Language settings and files

Macedonian
Script:

Cyrillic

Set Language parameter to:

MACEDONIAN

For encoding:

set Encoding parameter to:

windows-CP1251

CYRILLIC

KOI8-R

CYRILLIC_KOI8

iso-8859-5

CYRILLIC_ISO

UTF-8

UTF8

Script:

Latin

Set Language parameter to:

MALAY

For encoding:

set Encoding parameter to:

windows-CP1252 / iso-8859-1

ASCII

UTF-8

UTF8

Script:

UTF8

Set Language parameter to:

MALTESE

For encoding:

set Encoding parameter to:

UTF-8

UTF8

Malay

Maltese

Page 339

Language settings and files

Maori
Script:

Latin1

Set Language parameter to:

MAORI

For encoding:

set Encoding parameter to:

windows-CP1252 / iso-8859-1

ASCII

UTF-8

UTF8

Script:

Cyrillic

Set Language parameter to:

MONGOLIAN

For encoding:

set Encoding parameter to:

windows-CP1251

CYRILLIC

KOI8-R

CYRILLIC_KOI8

iso-8859-5

CYRILLIC_ISO

UTF-8

UTF8

Script:

Latin

Set Language parameter to:

NORWEGIAN

For encoding:

set Encoding parameter to:

windows-CP1252 / iso-8859-1

ASCII

UTF-8

UTF8

Mongolian

Norwegian*

* A stemming algorithm is available for this language and is applied by default. If you do not want to
apply stemming to this language, set Stemming to false for this language in the configuration file.

Page 340

Language settings and files

Persian
Script:

UTF8

Set Language parameter to:

PERSIAN

For encoding:

set Encoding parameter to:

UTF-8

UTF8

Script:

Latin

Set Language parameter to:

POLISH

For encoding:

set Encoding parameter to:

windows-CP1250

EASTERNEUROPEAN

iso-8859-2

EASTERNEUROPEAN_ISO

UTF-8

UTF8

Script:

Latin

Set Language parameter to:

PORTUGUESE

For encoding:

set Encoding parameter to:

windows-CP1252 / iso-8859-1

ASCII

UTF-8

UTF8

Polish

Portuguese*

* A stemming algorithm is available for this language and is applied by default. If you do not want to
apply stemming to this language, set Stemming to false for this language in the configuration file.

Page 341

Language settings and files

Romanian
Script:

Latin

Set Language parameter to:

ROMANIAN

For encoding:

set Encoding parameter to:

windows-CP1250

EASTERNEUROPEAN

iso-8859-2

EASTERNEUROPEAN_ISO

UTF-8

UTF8

Script:

Cyrillic

Set Language parameter to:

RUSSIAN

For encoding:

set Encoding parameter to:

windows-CP1251

CYRILLIC

KOI8-R

CYRILLIC_KOI8

iso-8859-5

CYRILLIC_ISO

UTF-8

UTF8

Script:

Cyrillic

Set Language parameter to:

SERBIAN

For encoding:

set Encoding parameter to:

windows-CP1251

CYRILLIC

KOI8-R

CYRILLIC_KOI8

iso-8859-5

CYRILLIC_ISO

UTF-8

UTF8

Russian*

Serbian

* A stemming algorithm is available for this language and is applied by default. If you do not want to
apply stemming to this language, set Stemming to false for this language in the configuration file.

Page 342

Language settings and files

Slovak
Script:

Latin

Set Language parameter to:

SLOVAK

For encoding:

set Encoding parameter to:

windows-CP1250

EASTERNEUROPEAN

iso-8859-2

EASTERNEUROPEAN_ISO

UTF-8

UTF8

Script:

Latin

Set Language parameter to:

SLOVENIAN

For encoding:

set Encoding parameter to:

windows-CP1250

EASTERNEUROPEAN

iso-8859-2

EASTERNEUROPEAN_ISO

UTF-8

UTF8

Script:

Latin

Set Language parameter to:

SOMALI

For encoding:

set Encoding parameter to:

windows-CP1252 / iso-8859-1

ASCII

UTF-8

UTF8

Slovenian

Somali

Page 343

Language settings and files

Sorbian
Script:

Latin

Set Language parameter to:

SORBIAN

For encoding:

set Encoding parameter to:

windows-CP1250

EASTERNEUROPEAN

iso-8859-2

EASTERNEUROPEAN_ISO

UTF-8

UTF8

Script:

Latin

Set Language parameter to:

SPANISH

For encoding:

set Encoding parameter to:

windows-CP1252 / iso-8859-1

ASCII

UTF-8

UTF8

Script:

Latin

Set Language parameter to:

SWAHILI

For encoding:

set Encoding parameter to:

windows-CP1252 / iso-8859-1

ASCII

UTF-8

UTF8

Spanish*

Swahili

* A stemming algorithm is available for this language and is applied by default. If you do not want to
apply stemming to this language, set Stemming to false for this language in the configuration file.

Page 344

Language settings and files

Swedish*
Script:

Latin

Set Language parameter to:

SWEDISH

For encoding:

set Encoding parameter to:

windows-CP1252 / iso-8859-1

ASCII

UTF-8

UTF8

Script:

Latin

Set Language parameter to:

TAGALOG

For encoding:

set Encoding parameter to:

windows-CP1252 / iso-8859-1

ASCII

UTF-8

UTF8

Script:

Cyrillic

Set Language parameter to:

TATAR

For encoding:

set Encoding parameter to:

windows-CP1251

CYRILLIC

KOI8-R

CYRILLIC_KOI8

iso-8859-5

CYRILLIC_ISO

UTF-8

UTF8

Tagalog

Tatar

* A stemming algorithm is available for this language and is applied by default. If you do not want to
apply stemming to this language, set Stemming to false for this language in the configuration file.

Page 345

Language settings and files

Thai
Script:

Thai

Set Language parameter to:

THAI

For encoding:

set Encoding parameter to:

windows-CP874 / iso-8859-11

THAI

UTF-8

UTF8

Script:

Latin

Set Language parameter to:

TURKISH

For encoding:

set Encoding parameter to:

windows-CP1254 / iso-8859-9

TURKISH

UTF-8

UTF8

Script:

Cyrillic

Set Language parameter to:

UKRAINIAN

For encoding:

set Encoding parameter to:

windows-CP1251

CYRILLIC

KOI8-R

CYRILLIC_KOI8

iso-8859-5

CYRILLIC_ISO

UTF-8

UTF8

Turkish

Ukrainian

Page 346

Language settings and files

Urdu
Script:

UTF8

Set Language parameter to:

URDU

For encoding:

set Encoding parameter to:

UTF-8

UTF8

Script:

Cyrillic

Set Language parameter to:

UZBEK

For encoding:

set Encoding parameter to:

windows-CP1251

CYRILLIC

KOI8-R

CYRILLIC_KOI8

iso-8859-5

CYRILLIC_ISO

UTF-8

UTF8

Script:

Latin

Set Language parameter to:

VALENCIAN

For encoding:

set Encoding parameter to:

windows-CP1252 / iso-8859-1

ASCII

UTF-8

UTF8

Uzbek

Valencian

Page 347

Language settings and files

Vietnamese
Script:

Vietnamese

Set Language parameter to:

VIETNAMESE

For encoding:

set Encoding parameter to:

windows-CP1258

VIETNAMESE

UTF-8

UTF8

Script:

Latin

Set Language parameter to:

WELSH

For encoding:

set Encoding parameter to:

windows-CP1252 / iso-8859-1

ASCII

UTF-8

UTF8

Welsh

Page 348

Language settings and files

TermSize setting for supported languages


Allows you to specify the maximum number of characters that any term in IDOL servers Data index
can comprise. By default this is 20. The recommended values for the different languages are:
20

English and other European languages

30

Arabic

30

Chinese

30

Hebrew

30

Korean

30

Japanese

30

Thai

30

German

40

Greek

Page 349

Language settings and files

Transliteration settings for supported languages


Transliteration is the process of converting accented characters such as into equivalent non
accented characters. This is useful in environments where accented keyboards are not available.

Transliteration is always applied to the following languages:


Language:

Transliteration:

Japanese

Half width katakana to full width katakana


Full width 0-9, A-Z, a-z to single byte 0-9, A-Z, a-z

Chinese

Full width 0-9, A-Z, a-z to single byte 0-9, A-Z, a-z

Greek

Accented Greek characters to non accented characters

Spanish

Accented vowels to non accented vowels

Portuguese

Accented vowels to non accented vowels

Transliteration is optional for the following languages:


Language:

Transliteration:

Western
European

=a

=aa

=c

=e

=i

=o

=oe

=u

(oe)=oe

=ae

=ss

=nh

=y

=d

=th

German

Same as Western European apart from:


=ae

Scandinavian

Page 350

=ue

Same as Western European apart from:


=ae

Russian

=oe

=oe

=ue

All characters mapped to A-Z

Language settings and files

SentenceBreaking files for supported languages


If you want to use languages in which words are not delimited by spaces (Japanese, Chinese, Thai
and Korean), you need to place external sentence breaking files into the IDOL server installations
IDOL/res/languages directory. If you are running your IDOL server on a UNIX platform, you need to
specify the LD_LIBRARY_PATH to ensure that IDOL server can find the sentence breaking files that it
requires.
The following table lists the files that the individual languages require and the language settings that
you need to enter in IDOL servers configuration file.

Japanese
Required files:

NT

UNIX

japanesebreaking.dll

japanesebreaking.so

\dic\jtag.attr

/dic/system/jtag.attr

dic\JTAG.hash

/dic/system/jtag.hash

dic\jtag.id

/dic/system/jtag.id

\dic\jtag.mrph

/dic/system/jtag.mrph

dic\JTAG.offset

/dic/system/jtag.offset

\dic\jtag.table

/dic/system/jtag.table

jtag.dll

/dic/system/jtag.trie

jtag.ini

jtag.ini

jtag_at.dll

libcodeconv.so

dic\JTAG.trie

Traditional Chinese
Required files:

NT

UNIX

chinesebreaking.dll

chinesebreaking.so

big5togb.txt

big5togb.txt

wordlist.txt

wordlist.txt.so

Page 351

Language settings and files

Simplified Chinese
Required files:

NT

UNIX

chinesebreaking.dll

chinesebreaking.so

big5togb.txt

big5togb.txt

wordlist.txt

wordlist.txt

NT

UNIX

thaibreaking.dll

thaibreaking.so

thaidict.txt

thaidict.txt

NT

UNIX

koreanbreaking.dll

koreanbreaking.so

Koma.dll (NT only)

main.dat

HanTag.dll (NT only)

prob.dat

main.dat

main.fst

prob.dat

prob.fst

main.fst

pos.nam

prob.fst

tag.nam

pos.nam

tagout.nam

tag.nam

connection.txt

tagout.nam

stopposnam.txt

connection.txt

tagname.txt

Thai
Required files:

Korean
Required files:

stopposnam.txt
tagname.txt

Page 352

Language settings and files

Stoplists for supported languages


Each language that is supported needs a stoplist (if a stoplist isn't provided for a language, you can
create one), which contains a list of common words that are not indexed into IDOL server. Words as,
for example, "the" or "a" are used too frequently to carry any significance and IDOL server does not
require them to understand the concept of text.
You can use a standard text editor to edit the stoplist that your IDOL server uses (stoplists are located
in IDOL servers IDOL/res/languages directory), for example, if you want to add other words that are
common to most or all of your documents.
You can list the words in the stoplist in any of the valid encodings for that language (for example, in
Russian you could specify stopwords in KOI8, UTF8, ISO and so on). You can use different encodings
within the same stoplist file (see Encoding settings for supported languages on page 325).
You only need to specify each word once. There is, for example, no need to specify a word in several
different encodings.

For all operations, IDOL server will recognize words as stopwords irrespective of the encoding they
are given in. For example, in Russian you could list a stopword in the KOI8 encoding in the stoplist file
and it would be recognized if it occurred in a document in UTF8.

Note:
For each encoding that you want to use you must create a section in your stoplist file. Name the
section after the language type that you are using (the language types are listed in the "Set encoding
parameter to" column of the "Encoding settings for supported languages" list). Words can be in
upper or lower case, and can be separated by spaces or new lines.
For example:
[cyrillic_koi8]



[cyrillic_iso]
s
In this example, a Russian stoplist contains 10 words, of which 5 are in CYRILLIC_KOI8 encoding and
five are in the CYRILLIC_ISO encoding.

Page 353

Language settings and files

Page 354

Administration

35. Administering IDOL server


You can administer IDOL server by doing the following:

execute IDOL server configuration changes

delete documents from IDOL server by reference

delete individual documents and ranges of documents from IDOL server

create new IDOL server databases

delete a database and its documents

delete all documents from a database

expire documents

export IDX documents from IDOL server

export XML documents from IDOL server

change the date, expire date or database of IDOL server documents

change field values in IDOL server documents

compact IDOL servers Data index

back up IDOL servers Data index

initialize IDOL servers Data index

export users, roles, agents and profiles

import users, roles, agents and profiles

set up log streams

Page 357

Administering IDOL server

Executing configuration changes


If you have made changes to IDOL server's configuration file, you need to stop and restart IDOL server
to ensure that it acknowledges them:

To stop and restart IDOL server


For Windows:
Display the Windows Services dialog, stop the IDOL server service and then start it again.

For UNIX:
Use the Stop.sh stop script to stop IDOL server and then start it again using the Start.sh script
(the scripts are supplied in the IDOL server installer).

Page 358

Administering IDOL server

Deleting documents from IDOL server by reference


You can identify one or more documents by their reference, and delete them from IDOL servers Data
index by issuing a DREDELETEREF command (case sensitive) from your web browser:

http://<host>:<port>/DREDELETEREF?docs=<document references>&field=<fields>&DREdbn
ame=<database name>

<host>
Enter the IP address (or name) of the machine on which IDOL server is installed.
<port>
Enter the IndexPort that you have specified in the IDOL server configuration files [Server]
section.
<document references>
Enter the escaped references of the documents that you want to delete. If you want to specify
multiple references, you must separate them with plus symbols (there must be no space before or
after a plus symbol).
<fields>
This parameter is optional.
Allows you to restrict which documents are deleted by specifying one or more fields that a
document must contain in order to be deleted. Only documents that have one of the specified
references and at least one of the specified fields are deleted.
If you want to specify multiple fields, you must separate them with commas or spaces (there must
be no space before or after a comma).
<database name>
This parameter is optional.
Allows you to specify the name of the database, which contains the documents that you want to
delete. If you don't specify a database and the specified document is contained in several
databases, it is deleted from all of them.

For example:
http://12.3.4.56:4001/DREDELETEREF?docs=http%3A%2F%2Fnews%2Enewssite%2Ecom%
2Findex%2Ehtml+http%3A%2F%2Fnews%2Enewssite%2Ecom%2Fcoverstory%2Ehtml
This command uses port 4001 to delete the documents with the specified URLs from IDOL server
which is located on a machine with the IP address 12.3.4.56.

Page 359

Administering IDOL server

Deleting individual documents and ranges of


documents from IDOL server
You can identify individual documents and / or ranges of documents by their ID, and delete them from
IDOL servers Data index by issuing a DREDELETEDOC command (case sensitive) from your web
browser:

http://<host>:<port>/DREDELETEDOC?docs=<doc IDs>
<host>
Enter the IP address (or name) of the machine on which IDOL server is installed.
<port>
Enter the IndexPort that you have specified in the IDOL server configuration files [Server]
section.
<doc IDs>
Specify one or more individual documents and / or a range of documents that you want to delete.
Use one or a combination of the following formats to do this (if you want to combine the two formats
you must separate them with plus symbols (there must be no space before or after a plus symbol):
doc ID
Specify the IDs of one or more documents. If you want to specify multiple document IDs,
you must separate them with plus symbols (there must be no space before or after a plus
symbol).
range=[<min doc ID>,<max doc ID>]
Enter the document ID of the first and last document in a range of documents that you want
to delete. You can delete up to 5000 documents at a time.

For example:
http://12.3.4.56:4001/DREDELETEDOC?docs=3+5+range=[7,10]
This command uses port 4001 to delete the documents with the DOCID 3, 5, 7,8,9 and 10 from IDOL
server which is located on a machine with the IP address 12.3.4.56.

Page 360

Administering IDOL server

Restoring deleted documents to IDOL server


If you have used a DREDELETEDOC command to delete documents from IDOL servers Data index,
you can use a DREUNDELETEDOC command (case sensitive) to restore some or all of the individual
documents that you have deleted. Note that this is only possible if you have not executed a
DRECOMPACT command in the meanwhile (as this command removes unused documents and
space from IDOL server):

http://<host>:<port>/DREUNDELETEDOC?docs=<doc IDs>
<host>
Enter the IP address (or name) of the machine on which IDOL server is installed.
<port>
Enter the IndexPort that you have specified in the IDOL server configuration files [Server]
section.
<doc IDs>
Specify one or more individual documents and / or a range of deleted documents that you want to
restore. Use one or a combination of the following formats to do this (if you want to combine the
two formats you must separate them with plus symbols (there must be no space before or after a
plus symbol):
doc ID
Specify the IDs of one or more deleted documents. If you want to specify multiple document
IDs, you must separate them with plus symbols (there must be no space before or after a
plus symbol).
range=[<min doc ID>,<max doc ID>]
Enter the document ID of the first and last document in a range of deleted documents that
you want to restore. You can restore up to 5000 documents at a time.

For example:
http://12.3.4.56:4001/DREUNDELETEDOC?docs=3+5+range=[7,10]
This command uses port 4001 to restore the documents with the DOCID 3, 5, 7,8,9 and 10 to IDOL
server which is located on a machine with the IP address 12.3.4.56.

Page 361

Administering IDOL server

Creating a new database in IDOL server


You can create a new database in IDOL server (for example, to store documents that related to one
particular subject or to store documents that are relevant to a particular user group), by doing one of
the following:

send a DRECREATEDBASE command

add the database to the IDOL server configuration file

To send a DRECREATEDBASE command to IDOL server


Issue a DRECREATEDBASE command (case sensitive) from your web browser:
http://<host>:<port>/DRECREATEDBASE?DREdbname=<database name>

<host>
Enter the IP address (or name) of the machine on which IDOL server is installed.
<port>
Enter the IndexPort that you have specified in the IDOL server configuration files [Server]
section.
<database name>
Enter the name of the database that you want to create in IDOL server.

For example:
http://12.3.4.56:4001/DRECREATEDBASE?DREdbname=Archive
This command uses port 4001 to create a new Archive database in IDOL server which is
located on a machine with the IP address 12.3.4.56.

Page 362

Administering IDOL server

To add a database to IDOL server's configuration file


1.

Open IDOL servers configuration file in a text editor and find the [Databases] section. This
section contains the NumDBs setting which indicated how many databases IDOL server
currently contains. It also contains a section for each of these databases with settings that
apply to these databases. Note that the names of the individual database sections use the
format Database<N>, where <N> numbers the databases in consecutive order, starting from
0.
For example:
[Databases]
NumDBs=2
[Database0]
Name=News
[Database1]
Name=Archive

2.

Increase the NumDBs setting by one.


For example:
[Databases]
NumDBs=3

3.

Create a new section for the database that you want to add. Note that the name of the section
must use the format Database<N>, where <N> numbers the databases in consecutive order,
starting from 0.
Use the Name setting to specify a name for your new database. Please refer to the IDOL
server online help for details on which other settings are available for databases (see
Displaying help on configuration settings on page 389).
For example:
[Databases]
NumDBs=3
[Database0]
Name=News
[Database1]
Name=Archive
[Database2]
Name=myNewDatabase

4.

Save and close the configuration file.

5.

Restart IDOL server to execute your changes.


Page 363

Administering IDOL server

Deleting a database and all the documents it contains


You can delete an IDOL server database and all the documents it contains by issuing a
DREREMOVEDBASE command (case sensitive) from your web browser:

http://<host>:<port>/DREREMOVEDBASE?DREdbname=<database name>
<host>
Enter the IP address (or name) of the machine on which IDOL server is installed.
<port>
Enter the IndexPort that you have specified in the IDOL server configuration files [Server]
section.
<database name>
Enter the name of the database that you want to delete from IDOL server. The documents in this
database are deleted from IDOL server as well.

For example:
http://12.3.4.56:4001/DREREMOVEDBASE?DREdbname=Archive
This command uses port 4001 to delete the Archive database and all documents that this database
contains from IDOL server which is located on a machine with the IP address 12.3.4.56.

Page 364

Administering IDOL server

Deleting all documents from a database


You can delete all documents from an IDOL server database by issuing a DREDELBASE command
(case sensitive) from your web browser:

http://<host>:<port>/DREDELDBASE?DREdbname=<database name>
<host>
Enter the IP address (or name) of the machine on which IDOL server is installed.
<port>
Enter the IndexPort that you have specified in the IDOL server configuration files [Server]
section.
<database name>
Enter the name of the database from which you want to delete all documents.

For example:
http://12.3.4.56:4001/DREDELDBASE?DREdbname=Archive
This command uses port 4001 to delete all documents from IDOL server's Archive database. IDOL
server is located on a machine with the IP address 12.3.4.56.

Page 365

Administering IDOL server

Expiring documents
In order to ensure that the documents in your IDOL server are up to date, you can execute an Expiry
operation which deletes or archives documents that have reached a specific age. You can expire
documents:

immediately by sending a DREEXPIRE command

in regular intervals by setting up a scheduled Expiry operation

By default documents are deleted when they expire. If you want to archive them instead, enter the
name of the database that you want to use for archiving for the ExpireIntoDatabase setting in each of
the IDOL server configuration file's database sections.
The date that determines whether a document should be expired can be read from a field in the
document or from the expiry time that has been set for the database that contains the document. If
IDOL server is unable to determine whether a document should be expired (because the document
does not contain a field that sets its expiry date and the document's database has no expiry time set),
IDOL server does not expire the document.

To set up fields that determine when documents should expire


1.

Open IDOL server's configuration file in a text editor and find the [FieldProcessing]
section.

2.

Add a new field process to the list of field processes that the [FieldProcessing] section
contains and increase the Number setting by one.
For example:
[FieldProcessing]
Number=2
0=IndexFields
1=IndexAndWeightHigher
The above [FieldProcessing] section lists two field processes. To add a new field
process, you need to add a new line to the list:
[FieldProcessing]
Number=3
0=IndexFields
1=IndexAndWeightHigher
2=ExpireDateFields
Note that the listed field processes are numbered in consecutive order, starting from 0.

Page 366

Administering IDOL server


3.

Create a section for your new field process in the configuration file. Create a property for
the new process and use the PropertyFieldCSVs settings to identify the document fields
that should determine whether documents should be expired (a document expires once
the time in this field has elapsed).
For example:
[ExpireDateFields]
Property=SetExpireDate
PropertyFieldCSVs=*/DREEXPIRE,*/valid_time

4.

Find the [Properties] section and add your new property to the list of properties that the
[Properties] section contains.
For example:
[Properties]
0=Index
1=IndexWeight
3=SetExpireDate

5.

Create a section for your new property in the configuration file and set the
ExpireDateType to true in order to indicate that the associated PropertyFieldCSVs
fields hold the document expiry date.
For example:
[SetExpireDate]
ExpireDateType=TRUE

To expire documents immediately


Issue a DREEXPIRE command (case sensitive) from your web browser:
http://<host>:<port>/DREEXPIRE
<host>
Enter the IP address (or name) of the machine on which IDOL server is installed.
<port>
Enter the IndexPort that you have specified in the IDOL server configuration files [Server]
section.
For example:
http://12.3.4.56:4001/DREEXPIRE
This command uses port 4001 to expire the documents from IDOL server which is located on a
machine with the IP address 12.3.4.56.

Page 367

Administering IDOL server


To expire documents in regular intervals
1.

Open IDOL server's configuration file in a text editor and find the [Schedule] section. If
the configuration file does not contain a [Schedule] section, you can add one.

2.

Specify the following settings in the [Schedule] section:


Expire
Enter true to enable an Expiry schedule.
ExpireTime
Specify the time (hh:mm) when you want the Expiry operation to start.
ExpireInterval
Specify the number of hours that elapse between individual Expiry operations. Enter
0 if you want the operation to take place daily.
For example:
[Schedule]
Expire=true
ExpireTime=00:00
ExpireInterval=24

3.

Set the following setting in the individual database sections to specify where a
database's documents are archived when they expire:
ExpireIntoDatabase
Enter the name of the database that you want to use to archive expired documents.
If you want documents to be deleted when they expire, don't specify this setting.
For example:
[News]
ExpireIntoDatabase=Archive

4.

Page 368

Save and close the file.

Administering IDOL server

Exporting IDX documents from IDOL server


You can issue a DREEXPORTIDX command (case sensitive) from your web browser in order to
export IDX documents from one or more IDOL databases (use DREEXPORTXML to export XML
documents):

http://<host>:<port>/DREEXPORTIDX?FileName=<file name>&Compress=<true / false>&Datab


aseMatch=<database CSV>&BatchSize=<size>&MinDate=<min date>&MaxDate=<max date>
<host>
Enter the IP address (or name) of the machine on which IDOL server is installed.

<port>
Enter the IndexPort that you have specified in the IDOL server configuration files [Server]
section.

<file name>
The path to the directory where the IDX files that are exported will be stored. The path must include
a basic file name which IDOL server will postfix with incremental numbers and an appropriate
extension. If you dont specify a file name the files are exported to the current working directory
(IDOLserver\IDOL\content), and IDOL server creates a filename in the format AUTN-IDXEXPORT-<date>-<time>-<incremental number>.<extension>.

<true / false>
Enter true if you want to compress the exported files (this is the default). Enter false if you dont
want to compress the files.

<database CSV>
If you dont want to export documents from all IDOL server databases, enter one or more
databases to which you want to restrict the export. If you want to specify multiple databases, you
must separate them with plus symbols, commas or spaces (there must be no space before or after
plus symbols or commas).

<size>
The number of document sections that you want to export to one IDX file. By default this is 100,000
sections.

Page 369

Administering IDOL server


<min date>
The earliest creation date or time that a document can have in order to be exported.

<max date>
The latest creation date or time that a document can have in order to be exported.

Examples:

http://12.3.4.56:4001/DREEXPORTIDX?FileName=/export/data/backup/
output&Compress=true&DatabaseMatch=News,Archive&BatchSize=1000&mindate=01/01/
2003&maxdate=01/01/2004
In this example, all IDX documents that have dates between the 1st of January 2003 and the 1st of
January 2004 are exported from the News and Archive databases to a series of compressed files
in the /export/data/backup directory. The files that are created in this directory will be called
output-0.idx.gz, output-1.idx.gz and so on.

http://12.3.4.56:4001/DREEXPORTIDX?
In this example, all IDX documents in IDOL server are exported to a series of compressed files in
the IDOL server's current working directory (IDOLserver\IDOL\content). The files that are created
in this directory will be called AUTN-IDX-EXPORT-12.04.2005-02.15.41-0.idx.gz, AUTN-IDXEXPORT-12.04.2005-02.15.41-1.idx.gz and so on.

Note:

Multisection documents are not split across chunks, so the specified BatchSize is not used
exactly if this would require a multisection document to be split.

You dont need to uncompress compressed IDX files before indexing them. For example, the
command DREADD?output-0.idx.gz indexes the output-0.idx.gz file correctly without you
having to uncompress the file first.

Page 370

Administering IDOL server

Exporting XML documents from IDOL server


You can issue a DREEXPORTXML command (case sensitive) from your web browser in order to
export XML documents from one or more IDOL databases (use DREEXPORTIDX to export IDX
documents):

http://<host>:<port>/DREEXPORTIDX?FileName=<file name>&Compress=<true / false>&Datab


aseMatch=<database CSV>&BatchSize=<size>&MinDate=<min date>&MaxDate=<max date>
<host>
Enter the IP address (or name) of the machine on which IDOL server is installed.

<port>
Enter the IndexPort that you have specified in the IDOL server configuration files [Server]
section.

<file name>
The path to the directory where the XML files that are exported will be stored. The path must
include a basic file name which IDOL server will postfix with incremental numbers and an
appropriate extension. If you dont specify a file name the files are exported to the current working
directory (IDOLserver\IDOL\content), and IDOL server creates a filename in the format AUTNXML-EXPORT-<date>-<time>-<incremental number>.<extension>.

<true / false>
Enter true if you want to compress the exported files (this is the default). Enter false if you dont
want to compress the files.

<database CSV>
If you dont want to export documents from all IDOL server databases, enter one or more
databases to which you want to restrict the export. If you want to specify multiple databases, you
must separate them with plus symbols, commas or spaces (there must be no space before or after
plus symbols or commas).

<size>
The number of document sections that you want to export to one XML file. By default this is
100,000 sections.

Page 371

Administering IDOL server


<min date>
The earliest creation date or time that a document can have in order to be exported.

<max date>
The latest creation date or time that a document can have in order to be exported.

Examples:

http://12.3.4.56:4001/DREEXPORTXML?FileName=/export/data/backup/
output&Compress=true&DatabaseMatch=News,Archive&BatchSize=1000&mindate=01/01/
2003&maxdate=01/01/2004
In this example, all XML documents that have dates between the 1st of January 2003 and the 1st
of January 2004 are exported from the News and Archive databases to a series of compressed
files in the /export/data/backup directory. The files that are created in this directory will be called
output-0.xml.gz, output-1.xml.gz and so on.

http://12.3.4.56:4001/DREEXPORTXML?
In this example, all XML documents in IDOL server are exported to a series of compressed files in
the IDOL server's current working directory (IDOLserver\IDOL\content). The files that are created
in this directory will be called AUTN-XML-EXPORT-12.04.2005-02.15.41-0.xml.gz, AUTN-XMLEXPORT-12.04.2005-02.15.41-1.xml.gz and so on.

Note:

Multisection documents are not split across chunks, so the specified BatchSize is not used
exactly if this would require a multisection document to be split.

You dont need to uncompress compressed XML files before indexing them. For example, the
command DREADD?output-0.xml.gz indexes the output-0.xml.gz file correctly without you
having to uncompress the file first.

Page 372

Administering IDOL server

Changing the index date, expire date or database of


IDOL server documents
You can identify individual documents and / or document ranges by their references or IDs (note that
ranges can only be identified using IDs) and change the value of their index date, expire date or
database. Issue a DRECHANGEMETA command (case sensitive) from your web browser:

http://<host>:<port>/DRECHANGEMETA?Type=<type>&Refs=<doc refs>&Docs=<doc
IDs>&NewValue=<value>
<host>
Enter the IP address (or name) of the machine on which IDOL server is installed.

<port>
Enter the IndexPort that you have specified in the IDOL server configuration files [Server]
section.

<type>
Enter one of the following to specify which value type you want to set to the specified new <value>:
date
The index date of the documents. Note the date you specify must be in a format that you
have set for DateFormatCSVs in IDOL servers configuration file.
expiredate
The expire date of the documents. Note the date you specify must be in a format that you
have set for DateFormatCSVs in IDOL servers configuration file.
database
The database of the documents.
<doc refs>
Enter the references of the documents whose index date, expire date or database values you want
to change (you must escape the references). If you want to specify multiple references, you must
separate them with plus symbols (there must be no space before or after a plus symbol).

Page 373

Administering IDOL server


<doc IDs>
Specify one or more individual documents and / or a range of documents whose index date, expire
date or database values you want to change. Use one or a combination of the following formats to
do this (if you want to combine the two formats you must separate them with plus symbols (there
must be no space before or after a plus symbol):
doc ID
Specify the IDs of one or more documents. If you want to specify multiple document IDs,
you must separate them with plus symbols (there must be no space before or after a plus
symbol).
range=[<min doc ID>,<max doc ID>]
Enter the document ID of the first and last document in a range of documents whose index
date, expire date or database values you want to change. You can change the values in up
to 5000 documents at a time.
<value>
Enter the value that you want to change the specified <type> to.

For example:
http://12.3.4.56:4001/DRECHANGEMETA?Type=database&Docs=3+5+range=[7,10]&NewVal
ue=Archive
This command uses port 4001 to change the database that stores the documents with the ID 3, 5, 7, 8,
9 and 10 to the Archive database. IDOL server is located on a machine with the IP address 12.3.4.56.

Page 374

Administering IDOL server

Changing field values in IDOL server documents


You can change the values of fields in documents using the following command:
DREREPLACE?<data>#DREENDDATA
Note: This command requires a POST request method

<data>
The fields that you want to replace in IDOL server. You need to specify each field as follows:
#DREDOCID <N> or #DREDOCREF <N>
#DREFIELDNAME <X>
#DREFIELDVALUE <Y>
<N>
The DocID or reference (URL) of the document that contains the field, which you want to
replace.
<X>
The name of the field whose value you want to change.
<Y>
The value that you want field x to change to. For example:
#DREDOCID 1
#DREFIELDNAME Price
#DREFIELDVALUE 10
#DREDOCREF http://www.autonomy.com/autonomy/dynamic/autopage442.shtml
#DREFIELDNAME Country
#DREFIELDVALUE UK
#DREENDDATA
In this example, the value of the Price field in the document with the DocID 1 is changed to
10. The value of the Country field in the document with the reference
http://www.autonomy.com/autonomy/dynamic/autopage442.shtml is changed to UK.
If the fields whose values you are changing are Index or ACL fields (see page 279and page 285),
IDOL server needs to reindex the documents in which you are making the changes. If you are
changing numerical fields, numerical date fields (see page 287 and page 289) or fields that you have
assigned another property to, IDOL server can execute your changes without reindexing, so that these
changes are made very quickly.

Page 375

Administering IDOL server

Compacting IDOL servers Data index


You can reduce the space that the documents in IDOL servers Data index takes up by executing a
Compact operation. This operation fills up the space that has been created through the deletion of
documents with new documents (similar to the defragmentation process).
You can compact IDOL servers Data index:

immediately by sending a DRECOMPACT command

in regular intervals by setting up a scheduled Compact operation

Note: you can automatically back up IDOL servers Data index whenever a DRECOMPACT
command is issued (see To back up the Data index automatically whenever a DRECOMPACT
command is issued: on page 380). It is good practice to back up IDOL server (see Backing up IDOL
servers Data index on page 378) before compacting it.

To compact IDOL server immediately


Issue a DRECOMPACT command (case sensitive) from your web browser:

http://<host>:<port>/DRECOMPACT

<host>
Enter the IP address (or name) of the machine on which IDOL server is installed.
<port>
Enter the IndexPort that you have specified in the IDOL server configuration files [Server]
section.

For example:
http://12.3.4.56:4001/DRECOMPACT
This command uses port 4001 to compact the data content of an IDOL server that is located on
a machine with the IP address 12.3.4.56.

Page 376

Administering IDOL server


To compact IDOL server in regular intervals
1.

Open IDOL server's configuration file in a text editor and find the [Schedule] section. If
the configuration file does not contain a [Schedule] section, you can add one.

2.

Specify the following settings within the [Schedule] section:


Compact
Enter true to enable a Compact schedule.
CompactTime
Specify the time (hh:mm) when you want the Compact operation to start.
CompactInterval
Specify the number of hours that elapse between individual Compact operations.
Enter 0 if you want the operation to take place daily.

For example:
[Schedule]
Compact=true
CompactTime=00:00
CompactInterval=24

Page 377

Administering IDOL server

Backing up IDOL servers Data index


You can back up IDOL servers Data index:

immediately by sending a DREBACKUP command

in regular intervals by setting up a scheduled backup

automatically whenever a DRECOMPACT command is issued

To back up the Data index immediately:


3.

Issue a DREBACKUP command (case sensitive) from your web browser to copy all the
IDOL server Data index's *.DB files to a new location:
http://<host>:<port>/DREBACKUP?<path>
<host>
Enter the IP address (or name) of the machine on which IDOL server is installed.
<port>
Enter the IndexPort that you have specified in the IDOL server configuration files
[Server] section.
<path>
Enter the path to the location where you want to create IDOL server's backup.
For example:
http://12.3.4.56:4001/DREBACKUP?E:\Backup
This command uses port 4001 to create a backup of IDOL servers Data index on
E:\Backup. The IDOL server whose Data index is backed up is located on a machine
with the IP address 12.3.4.56.

4.

Issue a DREINITIAL command (case sensitive) from your web browser in order to
restore the files to an IDOL server:
http://<host>:<port>/DREINITIAL?<path>
<host>
Enter the IP address (or name) of the machine on which IDOL server is installed.
Alternatively, if you are using multiple IDOL servers, enter the IP address (or name)
of the machine on which your DIH is installed.

Page 378

Administering IDOL server


<port>
Enter the IndexPort that you have specified in the IDOL server configuration files
[Server] section.
Alternatively, if you are using multiple IDOL servers, enter the DIHPort that you have
specified in the DIH configuration files [Configuration] section.
<path>
Enter the path to the location of IDOL server Data indexs backup.

For example:
http://12.3.4.56:4001/DREINITIAL?E:\DataIndex_Backup
This command uses port 4001 to restore the files backed up on E:\DataIndex_Backup
to an IDOL server that is located on a machine with the IP address 12.3.4.56.

To back up the Data index in regular intervals:


1.

Open IDOL server's configuration file in a text editor and find the [Schedule] section. If
the configuration file does not contain a [Schedule] section, you need to add one.

2.

Specify the following settings within the [Schedule] section:


Backup
Enter true to enable a schedule backup.
BackupCompression
Enter true to compress IDOL servers Data index before it is backed up.
BackupTime
Specify the time (hh:mm) when you want the backup to start.
BackupInterval
Specify the number of hours that elapse between individual backups. Enter 0 if you
want the backup to take place daily.
BackupDir<N>
Enter the path to the location where you want to create the Data index backup. You
must specify one directory for each of the NumberOfBackups.
BackupMaintainDirStructure
Enter true to maintain the directory structure when the Data index is backed up.

Page 379

Administering IDOL server


NumberOfBackups
Specify the number of times you want to back up IDOL servers Data index. You can
cycle the backing up procedure by specifying multiple backups (note that the number
of backups you specify must correspond to the number of BackupDir<N> directories
you specify). Multiple backups are executed as follows:
The first backup is created at the specified BackupTime, the next one is created
after the specified BackupInterval and so on. Once IDOL servers Data index has
been backed up as many times as you have specified for NumberOfBackups, the
first backup is overwritten the next time the specified BackupInterval elapses. This
means that you always have an up to date set of Data index backups.
For example:
[Schedule]
Backup=true
BackupCompression=true
BackupTime=00:00
BackupInterval=24
BackupMaintainStructure=true
NumberOfBackups=3
BackupDir0=E:\DataIndex_Backup0
BackupDir1=E:\DataIndex_Backup1
BackupDir2=E:\DataIndex_Backup2

To back up the Data index automatically whenever a DRECOMPACT command is issued:


Open IDOL server's configuration file in a text editor and specify the following settings in the
[Schedule] section:
PreCompactionBackup
Enter true if you want a backup of key files to be performed automatically whenever a
DRECOMPACT command is issued (via a web browser or a schedule that has been set up
in the IDOL server configuration file). IDOL servers Data index is compressed before it is
backed up and its directory structure is maintained.
The files are backed up before IDOL server is compacted, so that you can restore them if
corruption should occur.
PreCompactionBackupPath
If you have set PreCompactionBackup to true, you can use PreCompactionBackupPath
to specify the path to the directory where you want files to be backed up.
If IDOL server shuts down without completing a compaction, it uses the contents of this
directory to restore itself.

Page 380

Administering IDOL server

Initializing IDOL servers Data index


You can get rid of the data that your IDOL servers Data index contains and reset it to the state it was in
when you first installed IDOL server. Your configuration file is not reset. Issue a DREINITIAL
command (case sensitive) from your web browser to reset the Data index:

http://<host>:<port>/DREINITIAL?
<host>
Enter the IP address (or name) of the machine on which IDOL server is installed.
<port>
Enter the IndexPort that you have specified in the IDOL server configuration files [Server]
section.

For example:

http://12.3.4.56:4001/DREINITIAL?
This command uses port 4001 to reset the Data index of an IDOL server that is located on a machine
with the IP address 12.3.4.56 to its original state.

Page 381

Administering IDOL server

Exporting users, roles, agents and profiles


You can export IDOL servers users, roles, agents and profiles to a specified XML file from where they
can be imported into an IDOL server again. This is useful, for example, if you want to transfer IDOL
server to a different platform:

http://<host>:<port>/action=Export&FileName=<file name>
<host>
Enter the IP address (or name) of the machine on which IDOL server is installed.
<port>
Enter the Port that you have specified in the IDOL server configuration files [Server] section.
<file name>
Enter the name of the XML file to which you want to export IDOL servers users, roles, agents and
profiles. If the XML file is not stored in the same directory as IDOL server, you must specify the
path to the file as well.

For example:

http://12.3.4.56:4000/action=Export&FileName=MyFile.xml
This command uses port 4000 to export IDOL servers users, roles, agents and profiles to the
MyFile.xml file.

Page 382

Administering IDOL server

Importing users, roles, agents and profiles


You can import IDOL servers users, roles, agents and profiles from a specified XML file (into which
you have previously exported them from an IDOL server using the Export action). This is useful, for
example, if you want to transfer IDOL server to a different platform:

http://<host>:<port>/action=Import&FileName=<file name>&UserFields=<field CSV>


<host>
Enter the IP address (or name) of the machine on which IDOL server is installed.
<port>
Enter the Port that you have specified in the IDOL server configuration files [Server] section.
<file name>
Enter the name of the XML file that contains the users, roles, agents and profiles that you want to
import. If the XML file is not stored in the same directory as IDOL server, you must specify the path
to the file as well.
<field CSV>
This parameter is optional.
Allows you to restrict the import of user fields by specifying a wild card list of fields you want to
import. If you want to specify multiple fields, you must separate them with commas (there must be
no space before or after a comma). Only user fields that match a listed field are imported.

For example:

http://12.3.4.56:4000/action=Import&FileName=MyFile.xml
This command uses port 4000 to import IDOL servers users, roles, agents and profiles from the
MyFile.xml file.

Page 383

Administering IDOL server

Setting up log streams


If IDOL servers default logging should not suit your environment, you can set up your own log
streams. Each log stream creates a separate log file in which specific log message types (for example,
query, index and application) are logged.
For details on the settings that the [Logging] section can contain and on how you can configure them,
please refer to IDOL server's online help (see Displaying help on configuration settings on
page 389).
To set up log streams:
1.

Open IDOL servers configuration file in a text editor.

2.

Find the [Logging] section. (If the configuration file does not contain a [Logging] section, you
need to create one).

3.

Under the [Logging] section's heading, create a list of the log streams that you want to set up
using the format <N>=<log stream name>.
For example:
[Logging]
0=INDEX_LOG_STREAM
1=QUERY_LOG_STREAM
2=APP_LOG_STREAM
In this example 3 log streams have been defined, which log index, query and application
occurrences. Note that the log streams are listed in consecutive order, starting from 0.

4.

Create a new section for each of the log streams that you have defined. Each section must have
the same name as the log stream.
For example:
[INDEX_LOG_STREAM]
[QUERY_LOG_STREAM]
[APP_LOG_STREAM]

5.

Specify the settings that you want to apply to each log stream in the appropriate log stream's
section. You can specify the type of logging that should be performed (for example, full logging), if
log messages should be displayed on the console, the maximum size of log files and so on.
For example:
[INDEX_LOG_STREAM]
logfile=logs/index.log
loghistorysize=50
logtime=true
logecho=false
maxlogsizekbs=1024
logtypecsvs=index
loglevel=full

Page 384

Administering IDOL server


[QUERY_LOG_STREAM]
logfile=logs/query.log
loghistorysize=50
logtime=true
logecho=false
maxlogsizekbs=1024
logtypecsvs=query
loglevel=full
[APP_LOG_STREAM]
logfile=application.log
loghistorysize=50
logtime=true
logecho=false
maxlogsizekbs=1024
logtypecsvs=application
loglevel=full
6.

Save and close the configuration file.

7.

Restart IDOL server to execute your changes.

Page 385

Administering IDOL server

Page 386

Appendices

Appendix A: The IDOL server


configuration file
The settings that determine how IDOL server operates are contained in the <InstallationName>
configuration file, which is located in your installation directory. You can modify these settings in order
to customize IDOL server according to your requirements.

Displaying help on configuration settings


For details on the settings that the individual configuration file sections can contain and on how you
can configure them, please refer to IDOL servers online help.

To display the online help


1.

Issue the following command from your web browser:


http://<host>:<port>/action=Help

<host>
Enter the IP address (or name) of the machine on which IDOL server is installed.
<port>
Enter the port number that client machines use to communicate with IDOL server (this is
specified by the Port setting in the IDOL server configuration file's [Server] section).

2.

Click on the config help link in the top right-hand corner to display the configuration parameter
help (by default the action command help is displayed).
Note: the configuration file sections that each configuration parameter can be used in are listed
under Allowed in Sections.

Note:
You can also generate configuration help without starting IDOL server. Issue the following command
from the command line to generate html files in your installation directory:
<IDOLserver_installation_directory_path><IDOLserver_installation_name>.exe -help

Page 389

Modifying configuration parameter values


Note: when setting configuration parameter values, you must use ASCII (the only exception to this is
the PropertyFieldCSVs parameter which accepts UTF8).

Entering Boolean values


For parameters that require Boolean settings the following settings are interchangeable
TRUE = true = ON = on = Y = y = 1
FALSE = false = OFF = off = N = n =0

Entering string values


If the value that you want to enter for a parameter that requires a string contains quotation marks, you
must put the value into quotation marks and escape each quotation mark that the string contains by
putting a slash in front of it.
For example:
FIELDSTART0="<font face=\"arial\"size=\"+1\"><b>"
Here the beginning and end of the string is indicated by quotation marks while all quotation marks that
are contained in the string are escaped.

If you want to enter a comma separated list of strings for a parameter, and one of the strings contains
a comma, you must indicate the start and the end of this string with quotation marks.
For example:
ParameterName=cat,dog,bird,"wing,beak",turtle

If any string within a comma separated list contains quotation marks, you must put this string into
quotation marks and escaped the quotation marks in the string by putting a slash in front of them.
For example:
ParameterName="<font face=\"arial\"size=\"+1\"><b>",dog,bird,"wing,beak",turtle

Applying modifications to IDOL server's operation


New configuration settings only take effect once the IDOL server service is stopped and restarted.

Page 390

Configuration file sections


IDOL servers configuration file comprises a number of sections, which represent different areas that
you can configure by setting appropriate configuration parameters. For details on all available
configuration parameters, please refer to IDOL servers HTML online help (see Displaying help on
configuration settings on page 389).
Note that the configuration file sections that each configuration parameter can be used in are listed
under Allowed in Sections.

The configuration file can contain the following configurable segments:

[License]
[Service]
[Server]
[TermCache]
[IndexCache]
[SectionBreaking]
[Paths]
[Databases]
[Schedule]
[Summary]
[FieldProcessing]
[Properties]
[Security]
[User]
[UserSecurityFields]
[UserSecurity]
[Role]
[Agent]
[Profile]
[ProfileNamedAreas]
[Community]
[UserCustom]
[UserStructure]
[DRE]
[DataDRE]
[Cluster]
[Taxonomy]

Page 391

[AnalysisSchedules]
[IndexTasks]
[DocumentTracking]
[Synonym]
[Templates]
[Logging]
[LanguageTypes]

Note: which of the above listed configuration segments you require depends on which operations you
want your IDOL server to carry out.

[License] section
The [License] section contains licensing details which you should not change.
For example:
LicenseServerHost=127.0.0.1
LicenseServerACIPort=20000
LicenseServerTimeout=600000
LicenseServerRetries=1

[Service] section
The [Service] section contains settings that determine which machines are permitted to use
and control the IDOL server service.
For example:
[Service]
ServicePort=40010
ServiceControlClients=127.0.0.1
ServiceStatusClients=127.0.0.1

Page 392

[Server] section
The [Server] section contains general settings.
For example:
IndexClients=*.*.*.*
AdminClients=*.*.*.*
IndexPort=9001
Port=9000
Threads=4
MaxInputString=16000
DelayedSync=TRUE
AutoDetectLanguagesAtIndex=TRUE
XSLTemplates=FALSE
DateFormatCSVs=SHORTMONTH#SD+#SYYYY,DD/MM/YYYY,YYYY/MM/DD,YYYYMM-DD,EPOCHSECONDS
KillDuplicates=*/DREREFERENCE
DocumentDelimiterCSVs=*/DOCUMENT
CantHaveFieldCSVs=*/DRESTORECONTENT,*/CHECKSUM,*/DREWORDCOUNT,*/
DRETYPE,*/IMPORTBODYLEN,*/IMPORTMETALEN,*/IMPORTLINKLEN,*/
IMPORTTITLELEN,*/IMPORTQUALITY,*/DREPAGE,*/DREFILENAME,*/
dredoctype
InactiveSchedules=all

[TermCache] section
The [TermCache] section contains settings that determine how much memory IDOL server
uses to cache query terms.
For example:
[TermCache]
TermCacheMaxSize=102400

[IndexCache] section
The [IndexCache] section contains settings that determine how much memory IDOL server
uses to cache data for indexing.
For example:
[IndexCache]
IndexCacheMaxSize=102400

Page 393

[SectionBreaking] section
The [SectionBreaking] section contains settings that determine the size of the sections that
documents are broken up into before they are indexed.
For example:
[SectionBreaking]
MinFieldLength=80
MaxSectionLength=2000

[Paths] section
The [Paths] section contains settings that allow you to split the database into multiple partitions
and settings that indicate the location of files that IDOL server uses.
For example:
[Paths]
DyntermPath=./dynterm
NodetablePath=./nodetable
RefIndexPath=./refindex
MainPath=./main
StatusPath=./status
Main=./extendedindex
UserPath=./users
Modules=./res/modules
ClusterDirectory=./cluster
TaxonomyDirectory=./taxonomy
CategoryDirectory=./category
ImExDirectory=./imex
TemplateDirectory=./templates

[Databases] section
The [Databases] section lists the databases in which IDOL server stores its data and contains
a subsection for each of the databases, in which you specify settings that only apply to this
database. Note that if you are indexing documents in multiple languages, you don't need to
create a database for each of the languages.
For example:
[Databases]
NumDBs=2
[Database0]
Name=News
[Database1]
Name=Archive
Page 394

[Schedule] section
The [Schedule] section contains settings that allow you to schedule when IDOL server is
compacted and when documents are expired from databases.
For example:
[Schedule]
Compact=true
Expire=true
CompactTime=00:00
CompactInterval=672
ExpireTime=00:00
ExpireInterval=24

[Summary] section
The [Summary] section contains settings that determine summary details.
For example:
[Summary]
MinWordsPerSentence=10

[FieldProcessing] section
The [FieldProcesssing] section lists the processes that you want to apply to fields, and
contains a subsection for each of the processes, in which you define the process.
For example:
[FieldProcessing]
Number=15
0=SetIndexFields
1=SetIndexAndWeightHigher
2=SetSectionBreakFields
3=SetDateFields
4=SetDatabaseFields
5=SetReferenceFields
6=SetTitleFields
7=SetHighlightFields
8=SetSourceFields
9=DetectNT_V4Security
10=DetectNotes_V4Security
11=DetectNetware_V4Security
12=DetectExchange_V4Security
13=DetectDocumentum_V4Security
14=HideAutonomyMetaDataField

Page 395

[SetIndexFields]
Property=IndexFields
PropertyFieldCSVs=*/DRECONTENT,*/DRETITLE
[SetIndexAndWeightHigher]
Property=IndexWeightFields
PropertyFieldCSVs=*/SUMMARIES
[SetSectionBreakFields]
Property=SectionFields
PropertyFieldCSVs=*/DRESECTION
[SetDateFields]
Property=DateFields
PropertyFieldCSVs=*/DREDATE,*/DATE
[SetDatabaseFields]
Property=DatabaseFields
PropertyFieldCSVs=*/DREDBNAME,*/DATABASE
[SetReferenceFields]
Property=ReferenceFields
PropertyFieldCSVs=*/DREREFERENCE,*/REFERENCE
[SetTitleFields]
Property=TitleFields
PropertyFieldCSVs=*/DRETITLE,*/TITLE
[SetHighlightFields]
Property=HighlightFields
PropertyFieldCSVs=*/DRETITLE,*/DRECONTENT
[SetSourceFields]
Property=SourceFields
PropertyFieldCSVs=*/DRETITLE,*/DRECONTENT
[DetectNT_V4Security]
Property=SecurityNT_V4
PropertyFieldCSVs=*/SECURITYTYPE
PropertyMatch=nt
[DetectNotes_V4Security]
Property=SecurityNotes_V4
PropertyFieldCSVs=*/SECURITYTYPE
PropertyMatch=*notes_v4

Page 396

[DetectNetware_V4Security]
Property=SecurityNetware_V4
PropertyFieldCSVs=*/SECURITYTYPE
PropertyMatch=*netware_v4
[DetectExchange_V4Security]
Property=SecurityExchange_V4
PropertyFieldCSVs=*/SECURITYTYPE
PropertyMatch=*exchange_v4
[DetectDocumentum_V4Security]
Property=SecurityDocumentum_V4
PropertyFieldCSVs=*/SECURITYTYPE
PropertyMatch=*documentum
[HideAutonomyMetadataField]
Property=HideMetaDataFields
PropertyFieldCSVs=*/AUTONOMYMETADATA

[Properties] section
The [Properties] section lists the properties that you have created for the processes that you
have listed in the [FieldProcessing] section, and contains a subsection for each of the
properties, in which you set configuration parameters that are applied to associated fields.
For example:
[Properties]
0=IndexFields
1=IndexWeightFields
2=SectionFields
3=DateFields
4=DatabaseFields
5=ReferenceFields
6=TitleFields
7=HighlightFields
8=SourceFields
9=SecurityNT_V4
10=SecurityNotes_V4
11=SecurityNetware_V4
12=SecurityExchange_V4
13=SecurityDocumentum_V4
14=HideMetaDataFields

Page 397

[IndexFields]
Index=TRUE
[IndexWeightFields]
Index=TRUE
Weight=2
[SectionFields]
SectionBreakType=TRUE
[DateFields]
DateType=TRUE
[DatabaseFields]
DatabaseType=TRUE
[ReferenceFields]
ReferenceType=TRUE
TrimSpaces=TRUE
[TitleFields]
TitleType=TRUE
[HighlightFields]
HighlightType=TRUE
[SourceFields]
SourceType=TRUE
[SecurityNT_V4]
SecurityType=NT_V4
[SecurityNotes_V4]
SecurityType=Notes_V4
[SecurityNetware_V4]
SecurityType=Netware_V4
[SecurityExchange_V4]
SecurityType=Exchange_V4
[SecurityDocumentum_V4]
SecurityType=Documentum_V4
[HideMetaDataFields]
HiddenType=TRUE
ACLType=TRUE
Page 398

[Security] section
The [Security] section lists the security modules that you are using, and contains a subsection
for each of the security modules, in which you can specify the settings that you want to apply to
each module.
For example:
[Security]
SecurityInfoKeys=123,144,564,231
0=NT_V4
1=Netware_V4
2=Notes_V4
3=Exchange_V4
4=Documentum_V4
[NT_V4]
SecurityCode=1
Library=C:\IDOLserver\IDOL\modules\mapped_security
Type=AUTONOMY_SECURITY_V4_NT_MAPPED
ReferenceField=*/AUTONOMYMETADATA
[Netware_V4]
SecurityCode=2
Library=C:\IDOLserver\IDOL\modules\mapped_security
Type=AUTONOMY_SECURITY_V4_NETWARE_MAPPED
ReferenceField=*/AUTONOMYMETADATA
[Notes_V4]
SecurityCode=3
Library=C:\IDOLserver\IDOL\modules\mapped_security
Type=AUTONOMY_SECURITY_V4_NOTES_MAPPED
ReferenceField=*/AUTONOMYMETADATA
[Exchange_V4]
SecurityCode=4
Library=C:\IDOLserver\IDOL\modules\mapped_security
Type=AUTONOMY_SECURITY_V4_EXCHANGE_GRPS_MAPPED
ReferenceField=*/AUTONOMYMETADATA
[Documentum_V4]
SecurityCode=5
Library=C:\IDOLserver\IDOL\modules\mapped_security
Type=AUTONOMY_SECURITY_V4_DOCUMENTUM_MAPPED
ReferenceField=*/AUTONOMYMETADATA

Page 399

Note:

use the [FieldProcessing] and [Properties] section to identify fields that determine the
security type of documents and the processes that should be applied to these fields or
documents.

if you are running your IDOL server on a UNIX platform, you need to specify the
LD_LIBRARY_PATH to ensure that IDOL server can find the shared objects that it
requires in order to implement security.

[User] section
The [User] section contains settings that determine how many agents each user can have and
which fields belong to these agents.
For example:
[User]
XmlTempDirectory=C:\IDOLserver\IDOL\community\temp\userxml
MaxAgents=10
IndexFieldCSVs=drelanguagetype

[UserSecurityFields] section
The [UserSecurityFields] section lists the security fields.
For example:
[UserSecurityFields]
0=username
1=password
2=group
3=domain

Page 400

[UserSecurity] section
The [UserSecurity] section lists your security repositories, specifies generic settings for them,
and contains a subsection for each of the listed security repositories, in which you can specify
the settings that you want to apply to this security repository.
Note: you can list up to 8 security types.
For example:
[UserSecurity]
DefaultSecurityType=0
DocumentSecurity=TRUE
SyncRolesFromGroups=FALSE
SecurityUsernameDefaultToLoginUsername=FALSE
0=Autonomy
1=NT
2=Notes
3=LDAP
4=Documentum
5=Exchange
6=Netware
[Autonomy]
Library=C:\IDOLserver\IDOL\modules\user_autnsecurity
EnableLogging=FALSE
DocumentSecurity=FALSE
SecurityFieldCSVs=none
[NT]
CaseSensitiveUserNames=FALSE
CaseSensitiveGroupNames=FALSE
Library=C:\IDOLserver\IDOL\modules\user_ntsecurity
EnableLogging=FALSE
DocumentSecurity=TRUE
V4=TRUE
SecurityFieldCSVs=username,domain
Domain=DOMAIN
DocumentSecurityType=NT_V4

Page 401

[Notes]
Library=C:\IDOLserver\IDOL\modules\user_notessecurity
EnableLogging=FALSE
NotesAuthURL=http://notesserver/names.nsf
DocumentSecurity=TRUE
CaseSensitiveUserNames=FALSE
CaseSensitiveGroupNames=FALSE
SecurityFieldCSVs=username
DocumentSecurityType=Notes_V4
[LDAP]
Library=C:\IDOLserver\IDOL\modules\user_ldapsecurity
EnableLogging=FALSE
RDNAttribute=CN
Group=OU=Users,O=Company
LDAPServer=127.0.0.1
LDAPPort=389
FieldCSVs=email,emailaddress,telephone
LDAPAllAttributeValues=TRUE
LDAPAttributeValueSeparatorChar=,
SecurityFieldCSVs=none
DocumentSecurity=FALSE
CaseSensitiveUserNames=FALSE
CaseSensitiveGroupNames=FALSE
[NT]
CaseSensitiveUserNames=FALSE
CaseSensitiveGroupNames=FALSE
Library=C:\IDOLserver\IDOL\modules\user_ntsecurity
EnableLogging=FALSE
DocumentSecurity=TRUE
V4=TRUE
SecurityFieldCSVs=username,domain
Domain=DOMAIN
DocumentSecurityType=NT_V4
[Documentum]
DocumentSecurity=TRUE
SecurityFieldCSVs=username
DocumentSecurityType=Documentum_V4
CaseSensitiveUserNames=FALSE
CaseSensitiveGroupNames=FALSE

Page 402

[Exchange]
DocumentSecurity=TRUE
V4=FALSE
SecurityFieldCSVs=username,domain
DocumentSecurityType=Exchange_V4
CaseSensitiveUserNames=FALSE
CaseSensitiveGroupNames=FALSE
[Netware]
DocumentSecurity=TRUE
DocumentSecurityType=Netware_V4
SecurityFieldCSVs=username
CaseSensitiveUserNames=FALSE
CaseSensitiveGroupNames=FALSE

[Role] section
The [Role] section contains role details.
For example:
[Role]
DefaultRolename=everyone
AutoSetDatabases=TRUE
DatabasePrivilege=databases

[Agent] section
The [Agent] section determines how agents are going to operate.
For example:
[Agent]
DynamicAgentFields=TRUE
DreCombine=Simple
DreSentences=3
DreCharacters=300
DrePrint=All
DreSummary=Context
DontCopyAgentFields=emailaddress
ResultsCacheDuration=60
AgentResultsCacheDuration=60
AgentIndexFieldCSVs=drelanguagetype

Page 403

[Profile] section
The [Profile] section contains settings that apply to profiles.
For example:
[Profile]
DreCombine=Simple
DreSentences=3
DreCharacters=300
DrePrint=All
DreSummary=Context
ResultsCacheDuration=60
AgentResultsCacheDuration=60
DreMaxQueryTerms=20

[ProfileNamedAreas] section
The [ProfileNamedAreas] section determines the names of the areas that contain the profiles
that are created when users read or write documents.
For example:
[ProfileNamedAreas]
0=default
1=authored

[Community] section
The [Community] section determines how community queries operate.
For example:
[Community]
DreMinScore=20
DreWeighFieldText=FALSE
ExpandQuery=FALSE
ExpandQueryLog=FALSE
ExpandQueryMinScore=60
ExpandQueryMaxResults=30
ExpandQueryMaxScore=80

Page 404

[UserCustom] section
The [UserCustom] section allows you to add custom functionality to IDOL server. It lists the
functionality that you are adding and contains a subsection for each functionality in which you
can specify the settings that apply to this functionality (for example which shared library it uses).
For example:
[UserCustom]
0=Email
[Email]
Library=C:\IDOLserver\IDOL\modules\user_email
FromHost=127.0.0.1
SmtpHost=smtp.company.com
SMTPPort=25
DrePrint=all
XSLTemplate=C:\IDOLserver\IDOL\templates/email.xss
EmailActionXSLTemplate=C:\IDOLserver\IDOL\templates/ondemand.xss
ClassificationServerXSLTemplate=C:\IDOLserver\IDOL\templates/
channels.xss
RunMailer=FALSE
Retries=2
TimeoutMS=15000
StartTime=9:00
Interval=1 day
Cycles=-1
FromName=IdolMailer
DefaultSendEmail=TRUE
DefaultEmailFormat=text/html
DefaultExcludeReadDocuments=TRUE
DefaultAddSetToReadDocuments=TRUE
DefaultSubject=USERNAME's Results
MaxEmailsPerUser=20
From=user@company.com

Page 405

[UserStructure] section
The [UserStructure] section comprises settings that determine the structure of the binary data
files that are stored on disk. You cannot change these settings once you have finished setting
up IDOL server.
For example:
[UserStructure]
MaxAgents=20
AgentTrainingLength=512
TermSize=20
SecurityFieldLength=64
AgentFixedFieldLength=1

[DRE] section
The [DRE] section allows you to list Query, TermGetBest and TermGetInfo action
parameters that you want to be available for Agent and Profile queries (by setting them for the
DRE<QueryParameter> parameter in the [Agent] or [Profile] section).
For example:
[DRE]
AdditionalDREQueryParameters=Characters,MaxDate
AdditionalDRETermGetBestParameters=Weights
AdditionalDRETermGetInfoParameters=OnlyExisting

[DataDRE] section
If you are distributing your IDOL server across multiple machines, the [DataDRE] section
allows you to specify Data index settings.
For example:
[DataDRE]
Host=7.89.01.2
AciPort=6002
Timeout=5000

Page 406

[Cluster] section
The [Cluster] section contains the details for clustering.
For example:
[Cluster]
ResultExpiryDays=30
SnapshotExpiryDays=30
SGExpiryDays=30
DownloadDocAction=drecontents
TitleFromSummary=TRUE
SummaryField=autn:summary

[Taxonomy] section
The [Taxonomy] section contains the taxonomy details.
For example:
[Taxonomy]
MaxConcepts=100
RelevanceThreshold=20
DistributionThreshold=10
ConceptThreshold=400
MinConceptOccs=15
CompoundRelevance=40
SiblingStrength=20
MinChildren=1
OnlyMatchSubset=0
MaxQNum=5000

[AnalysisSchedules] section
The [AnalysisSchedules] section summarizes the number of classification schedules that you
want to execute, and contains a subsection for each of these schedules in which you can
specify details for this schedule . You can schedule the following actions:

ClusterSnapshot

ClusterCluster

ClusterSGDataGen

TaxonomyGenerate

For example:
[AnalysisSchedules]
Number=5

Page 407

[AnalysisSchedule0]
ScheduleStartTime=now
ScheduleInterval=1 day
ScheduleCycles=-1
ScheduleAction=CLUSTERSNAPSHOT
TargetJobname=myjob
[AnalysisSchedule1]
ScheduleStartTime=now
ScheduleInterval=1 day
ScheduleCycles=-1
ScheduleAction=CLUSTERCLUSTER
SourceJobName=myjob
TargetJobName=myjob_clusters
DoMapping=TRUE
[AnalysisSchedule2]
ScheduleStartTime=now
ScheduleInterval=1 day
ScheduleCycles=-1
ScheduleAction=CLUSTERCLUSTER
SourceJobName=myjob
TargetJobName=myjob_clusters_new
WhatsNew=TRUE
Interval=86400
[AnalysisSchedule3]
ScheduleStartTime=now
ScheduleInterval=1 day
ScheduleCycles=-1
ScheduleAction=CLUSTERSGDATAGEN
Interval=604800
SourceJobName=myjob
TargetJobName=myjob_sg
[AnalysisSchedule4]
ScheduleStartTime=now
ScheduleInterval=1 day
ScheduleCycles=-1
ScheduleAction=TAXONOMYGENERATE
Cluster=0,1,2,3,4,5,6,7,8,9
SourceJobName=myjob_clusters
TargetJobName=myjob_taxonomy
NumResults=25

Page 408

[IndexTasks] section
The [IndexTasks] section determines which tasks IDOL server performs on data before
indexing. It includes the StartTask setting, which identifies which task ahould be executed first
and a subsection for each of the tasks in which you can specify details for each task.
For example:
[IndexTasks]
StartTask=CatTask
[AlertTask]
Module=Alert
IdolServer=localhost:9000
NextTask=IndexTask
SMTPServer=1.23.45.6
SMTPPort=25
SMTPSubject="Alert from IDOLServer"
SMTPSendFrom=postmaster
Template=res/templates/alertTemplate.html
AttachmentTemplate=res/templates/alertTemplate.html
Fields=DRECONTENT
FieldMappings=text
Queryparameters=MinScore=80
ContentType=text/html
[CatTask]
Module=Cat
IdolServer=localhost:9000
NextTask=AlertTask
TextFields=DRECONTENT
TagField=CategoryTag
[IndexTask]
Module=Index
IdolServer=127.0.0.1:9001

Page 409

[DocumentTracking] section
The [DocumentTracking] section contains settings that enable the tracking of documents
through import and indexing using an Autonomy Service Dashboard.
For example:
[DocumentTracking]
DiSHACIPort=7002
DiSHHost=1.23.45.6
DiSHRetries=4
DiSHTimeout=120000
DocumentTrackingActive=true

[Synonym] section
The [Synonym] section lists the settings that determine how IDOL server handles synonym
queries. A synonym query returns results which are conceptually similar to the query's terms
and / or conceptually similar to the synonyms that are available for the query's terms.
For example:
[Synonym]
0=PC_Syn
[PC_Syn]
File=myfile.txt
MaxExpandLevel=1
Note: to be able to send synonym queries to IDOL server, you need to set up a synonym file
and add the Synonym action parameter to your query.

[Templates] section
The [Templates] section lists the templates that are required to output results in a manner that
is compatible with non-ACI compatible applications. It contains a section for each of the listed
templates, in which the templates' components are defined.
For example:
[TEMPLATES]
0=results_template
1=content_template
DefaultResultsTemplate=results_template
DefaultContentTemplate=content_template

Page 410

[results_template]
TemplateHeader=templates/resultsheader.txt
TemplateBody=templates/resultsbody.txt
TemplateFooter=templates/resultsfooter.txt
TemplateMimeType=text/plain
[content_template]
TemplateHeader=templates/contentheader.txt
TemplateBody=templates/contentbody_limitedfields.txt
TemplateFooter=templates/contentfooter.txt

[Logging] section
The [Logging] section lists the logging streams that you want to set up in order to create
separate log files for different log message types (query, index and application). It also contains
a subsection for each of the listed logging streams, in which you can configure the settings that
determine how each stream is logged.
For example:
[Logging]
LogArchiveDirectory=C:\IDOLserver\IDOL\logs\archive
LogDirectory=C:\IDOLserver\IDOL\logs
// These values apply to all streams, override on an individual
basis
LogTime=TRUE
LogEcho=TRUE
LogLevel=normal
OldLogFileAction=compress
LogOldAction=move
LogHistorySize=50
MaxLogSizeKbs=10240
//log streams
0=ApplicationLogStream
1=QueryLogStream
2=IndexLogStream
3=QueryTermsLogStream
4=UserLogStream
5=CategoryLogStream
6=ClusterLogStream
7=TaxonomyLogStream
8=ScheduleLogStream
9=CommunityTermLogStream

Page 411

[ApplicationLogStream]
LogFile=application.log
LogTypeCSVs=application
[QueryLogStream]
LogFile=query.log
LogTypeCSVs=query
[IndexLogStream]
LogFile=index.log
LogTypeCSVs=index
[QueryTermsLogStream]
LogFile=queryterms.log
LogTypeCSVs=queryterms
[UserLogStream]
LogFile=user.log
LogTypeCSVs=user
[CategoryLogStream]
LogFile=category.log
LogTypeCSVs=category
[ClusterLogStream]
LogFile=cluster.log
LogTypeCSVs=cluster
[TaxonomyLogStream]
LogFile=taxonomy.log
LogTypeCSVs=taxonomy
[ScheduleLogStream]
LogFile=schedule.log
LogTypeCSVs=schedule
[CommunityTermLogStream]
LogFile=term.log
LogTypeCSVs=term
Note: all queries are truncated to 4000 characters in query logs.

Page 412

[LanguageTypes] section
The [LanguagesTypes] section lists the language types that you want to use. It contains a
section for each of the listed language types, in which you configure the settings that determine
how each language type is handled.
For example:
[LanguageTypes]
DefaultLanguageType=englishASCII
DefaultEncoding=UTF8
LanguageDirectory=C:\IDOLserver\IDOL\langfiles
0=englishASCII
1=englishUTF8
2=chineseCHINESESIMPLIFIED
3=chineseCHINESETRADITIONAL
4=chineseUTF8
5=frenchASCII
6=frenchUTF8
7=germanASCII
8=germanUTF
[englishASCII]
LanguageCode=1
Language=ENGLISH
Encoding=ASCII
Stoplist=english.dat
IndexNumbers=1
[englishUTF8]
LanguageCode=2
Language=ENGLISH
Encoding=UTF8
Stoplist=english.dat
IndexNumbers=1
[chineseCHINESESIMPLIFIED]
LanguageCode=21
Language=CHINESE
Encoding=CHINESESIMPLIFIED
SentenceBreaking=chinesebreaking
IndexNumbers=1

Page 413

[chineseCHINESETRADITIONAL]
LanguageCode=22
Language=CHINESE
Encoding=CHINESETRADITIONAL
SentenceBreaking=chinesebreaking
IndexNumbers=1
[chineseUTF8]
LanguageCode=23
Language=CHINESE
Encoding=UTF8
SentenceBreaking=chinesebreaking
IndexNumbers=1
[frenchASCII]
LanguageCode=38
Language=FRENCH
Encoding=ASCII
Stoplist=french.dat
IndexNumbers=1
[frenchUTF8]
LanguageCode=39
Language=FRENCH
Encoding=UTF8
Stoplist=french.dat
IndexNumbers=1
[germanASCII]
LanguageCode=42
Language=GERMAN
Encoding=ASCII
Stoplist=german.dat
IndexNumbers=1
[germanUTF8]
LanguageCode=43
Language=GERMAN
Encoding=UTF8
Stoplist=german.dat
IndexNumbers=1

Page 414

Appendix B: Error codes and messages


Error codes
-1

Error: Bad Parameter

-2

Error: Out Of Memory

-3

Error: User Not Found

-4

Error: File Error

-5

Error: Maximum Fields

-6

Error: User Exists

-7

Error: Maximum Users

-8

Error: Agent Exists

-9

Error: Agent Not Found

-10

Error: Data Index Error, Data Index not found

-11

Error: Agent Index Error

-12

Error: Maximum Agents

-13

Error: Field Not Found

-14

Error: Role Exists

-15

Error: Role Not Found

-16

Error: Maximum Roles

-17

Error: Privilege Exists

-18

Error: Privilege Not Found

-19

Error: Maximum Privileges

-20

Error: Maximum Values

-21

Error: Profile Not Found

-22

Error: Disk Error

-23

Error: Numeric terms must be specified alone

-24

Error: Data Index Error, no terms found

-25

Error: Dynfields not found

Page 415

Error messages
VQL conversion error messages
The following error messages are produced by Legacy Profile tasks if they encounter ill-formed or
unexpected content during the VQL conversion that it performs in order to import a VQL category into
IDOL server.
If you have set up a log stream in IDOL server's configuration file (see Setting up log streams on
page 384) that has LogTypeCSVs set to ExtendedIndex, VQL conversion error messages are written
to log files under the following line of text:
[LP - VQLTask] VQL conversion failed

Note: while corrupt VQL may produce several errors, only a single error will be reported.

General errors
Unknown operator <operator_name>
The category that the Legacy Profile task is importing includes an element in angled brackets
that is not one of the operators which the task can convert.
For example:
<cat>

Unable to parse adjacent operators


The category that the Legacy Profile task is importing contains an expression in which certain
operators appear immediately adjacent to each other. It is not always meaningful to place
operators immediately adjacent to each other without any intervening text.

VQL could not be fully parsed - invalid use of operators or


Unable to fully parse VQL
The Legacy Profile task cannot complete the VQL conversion stage of legacy profile importing
for some reason other than those identified by the other error messages listed in this appendix.

Page 416

Proximity errors
NEAR requires at least two operands or
Problematic proximity operator - at least two operands required
The category that the Legacy Profile task is importing contains an expression in which a
proximity operator occurs with only one operand.
The proximity operators NEAR, PARAGRAPH and SENTENCE require two or more words or
phrases to be within a specified distance from each other.

Unrecognized
The category the Legacy Profile task is importing contains an expression in which it does not
recognize the usage of the NEAR operator. This can be for one of the following reasons:

the VQL category input is corrupt

the usage of NEAR is unsupported. For example, the Legacy Profile task does not
convert nested NEAR statements such as:
dog <NEAR/10> (kennel <NEAR/10> bone)

Adjacent NEAR values should not be different


The category the Legacy Profile task is importing contains an expression in which the distances
specified for NEAR operators are not identical.
For example:
dog <NEAR/10> kennel <NEAR/5> bone
If the proximity operator NEAR occurs adjacent to another occurrence of NEAR within an
expression, the distance specified between operands by each use of NEAR must be the same.

Adjacent SENTENCE operators


The proximity operator SENTENCE occurs adjacent to another instance of SENTENCE within
an expression. This is not a valid VQL expression.
For example:
dog <SENTENCE> cat <SENTENCE> fish

Page 417

ORDER must come directly before another operator


The category that the Legacy Profile task is importing contains an expression in which the
ORDER operator does not occur immediately before another operator.
The ORDER operator must apply to another operator.

ORDER is followed by an unsupported operator


The category that the Legacy Profile task is importing contains an expression in which the
ORDER operator does not occurs immediately before a proximity operator.
The ORDER operator can only apply to the proximity operators NEAR, PARAGRAPH and
SENTENCE.

Unable to process <ORDER><NEAR>


The category that the Legacy Profile task is importing contains an expression in which the
ORDER operator occurs immediately before a NEAR operator that does not specify a distance.
When the ORDER operator applies to the proximity operator NEAR, there must be some
distance included for NEAR (for example, <NEAR/5>).

Boolean operator errors


Unmatched brackets
The category that the Legacy Profile task is importing contains a closing brackets that is
missing a corresponding opening bracket.
For example:
(dogs) <OR> cats)

NOT cannot be terminal or


Problematic NOT / Bracket - single operand required
The category that the Legacy Profile task is importing contains an expression in which NOT is
not followed by a Boolean expression.
The NOT operator must occur before the Boolean expression which it applies to.
Note: the NOT, AND and OR operators, unlike other operators, do not need to occur within
angular brackets (<NOT> and NOT are both valid). For this reason, you must put occurrences
of not that you don't want to use as operators into double quote marks ("not"). This error
message is often returned when the word not occurs without quotation marks at the end of a
phrase.

Page 418

Prefix unavailable for comma, OR, AND and IN


The category that the Legacy Profile task is importing contains an expression in which one of
the operators below appears to be used in prefix form.
Some operators can occur either as infixes to their operands or as prefixes. Because of
potential ambiguities, the Legacy Profile task does not accept the following operators as
prefixes:
,
OR
AND
IN
Note: the NOT, AND and OR operators, unlike other operators, do not need to occur within
angular brackets (<OR> and OR are both valid). For this reason, you must put occurrences of
or or and that you don't want to use as operators into double quote marks ("or", "and"). This
error message is often returned when the words or or and occurs without quotation marks.

Problematic AND / OR - at least two operands required


The category that the Legacy Profile task is importing contains an expression in which the OR
or AND operator appears to be used with only one operand.
For example:
dog AND
The OR and AND operators must apply to at least two operands (for example dog AND cat).
Note: the NOT, AND and OR operators, unlike other operators, do not need to occur within
angular brackets (<OR> and OR are both valid). For this reason, you must put occurrences of
or or and that you don't want to use as operators into double quote marks ("or", "and"). This
error message is often returned when the words or or and occurs without quotation marks.

Page 419

Field restriction errors


Nested IN operators
The category that the Legacy Profile task is importing contains an expression in which an IN
operator is nested within another IN expression.
For example:
dog <IN> (pet <IN> animal)
The IN operator requires that a word or phrase occurs within a specified field. For example,
dog <IN> pet is only matched if the word dog occurs in the pet field. Since nested fields cannot
exist, it is not possible for IN operators to be nested.

Invalid use of IN operator or


Problematic IN - two operands required
The category that the Legacy Profile task is importing contains an expression in which the task
cannot determine the expression and field operands for an IN operator.
The IN operator requires word or phrase to occur within a specified field, with two operands in
the correct order: an expression (which defines the words or phrases to look for) and the field in
which it must occur.

Problematic IN - single field required


The category that the Legacy Profile task is importing contains an expression in which more
than one field operand appears to be specified for an IN operator.
The IN operator requires word or phrase to occur within a specified field, with two operands in
the correct order: an expression (which defines the word or phrase to look for) and the field in
which it must occur.

Page 420

Word and phrase errors


Mismatched quotes
The category that the Legacy Profile task is importing contains an expression in which an
operator in angular brackets occurs within double quote marks.
For example:
"dogs <AND> cats"
Double quote marks indicate that words or phrases should be used without modification (for
example, to indicate that a word should be matched exactly, or a phrase containing and, or or
not should not be used as a Boolean expression).
Note: while it is possible that the expression "dogs <AND> cats" is correct and valid, it is more
likely to be a corrupt form of an expression such as:
"dogs" <AND> "cats"

Unmatched quotes
The category that the Legacy Profile task is importing contains an odd number of double quote
marks.

Unable to resolve WORD / PHRASE or


Word / Phrase has children
The category that the Legacy Profile task is importing contains an expression with the WORD
or PHRASE operator used in a way that the task cannot parse.

Invalid use of WORD operator


The category that the Legacy Profile task is importing contains an expression in which the
WORD operator occurs before more than one word.
The WORD operator must occur before a single word.

Invalid use of PHRASE operator


The category that the Legacy Profile task is importing contains an expression in which the
PHRASE operator does not appear to be followed by the components of a phrase.
The PHRASE operator must be followed by the components of a phrase.

Page 421

Unsupported use of PHRASE operator or


Unrecognized use of PHRASE
The category that the Legacy Profile task is importing contains an expression in which the
PHRASE operator occurs before elements from which the task cannot build a phrase.
The PHRASE operator builds phrases from one or more words or phrases or a commaseparated list of ORed words or phrases.

Expression errors
VQL is not in conjunctive IN format
The conjunctive in format requires that each line of VQL consists of one or more component
expressions that are connected with the AND operator.
Each component must exclude the IN operator, or be of the form expression <IN> field, where
expression excludes the IN operator.
For example:
expression A
In this example, expression A does not use the IN operator.

(expression A) AND (expression B) AND (expression C)


In this example, expression A, expression B and expression C do not use the IN
operator.

(expression A) AND (expression B <IN> field1) AND (expression C)


In this example, expression A, expression B and expression C do not use the IN
operator. Note that while expression B is part of an IN expression, it does not itself use
the IN operator.

Page 422

Appendix C: Service port commands


IDOL server behaves as a standard Autonomy service. If the ServicePort, ServiceStatusClients and
ServiceControlClients settings are added to the [Service] section of the IDOL server configuration
file, the service port is enabled and will accept the following standard status and control commands:
GetConfig
Returns the services configuration file settings.
GetLogStream
Returns a specific log stream for the service.
GetLogStreamNames
Returns the names of the log streams that have been set up for the service.
GetStatistics
Returns statistics for the service.
GetStatus
Returns the services status (running or stopped).
GetStatusInfo
Returns status information for the service (for example, the services product name, version
number and so on).
MergeConfig
Merges a configuration file fragment with the services configuration file (this requires a POST
request method). Alternatively, you can use it to set or delete individual configuration
parameters.
SetConfig
Sets the services configuration file. This command requires a POST request method.
Stop
Stops the service.

Page 423

GetConfig
The GetConfig command returns the services configuration file settings.
http://<host>:<port>/action=GetConfig
<host>
The IP address (or name) of the machine that hosts the service.
<port>
Enter the ServicePort that you have specified in the IDOL server configuration files [Service] section.

GetLogStream
The GetLogStream command returns a specific log stream for the service.
http://<host>:<port>/action=GetLogStream&Name=<name>&FromDisk=<true/
false>&Tail=<number>
<host>
The IP address (or name) of the machine that hosts the service.
<port>
Enter the ServicePort that you have specified in the IDOL server configuration files [Service] section.
<name>
Enter the name of the log stream that you want to return.
<true/false>
Enter true if you want the log stream to be read from disk rather than from memory. By default this is
false.
<number>
Enter the number of lines that you want to return from the log stream. The lines are read from the top
(that is the most recent lines are retuned). Enter -1 to return all entries (this is the default).

Page 424

GetLogStreamNames
The GetLogStreamNames command returns the names of the log streams that have been set up for
the service.
http://<host>:<port>/action=GetLogStreamNames
<host>
The IP address (or name) of the machine that hosts the service.
<port>
Enter the ServicePort that you have specified in the IDOL server configuration files [Service] section.

GetStatistics
The GetStatistics command returns statistics for the service.
http://<host>:<port>/action=GetStatistics
<host>
The IP address (or name) of the machine that hosts the service.
<port>
Enter the ServicePort that you have specified in the IDOL server configuration files [Service] section.

Page 425

GetStatus
The GetStatus command returns the services status (running or stopped).
http://<host>:<port>/action=GetStatus
<host>
The IP address (or name) of the machine that hosts the service.
<port>
Enter the ServicePort that you have specified in the IDOL server configuration files [Service] section.

GetStatusInfo
The GetStatusInfo command returns status information for the service (for example, the services
product name, version number and so on).
http://<host>:<port>/action=GetStatusInfo
<host>
The IP address (or name) of the machine that hosts the service.
<port>
Enter the ServicePort that you have specified in the IDOL server configuration files [Service] section.

Page 426

MergeConfig
The MergeConfig command allows you to merge IDOL servers configuration file with one or more
configuration file sections. Alternatively, you can use it to set or delete individual configuration
parameters.

Using MergeConfig to merge a configuration file with one or more configuration file sections
If IDOL servers configuration file already contains a section that has the same name as the section
with which it is going to be merged, any settings that only the new section contains are added to the
existing section. If the new section contains settings that are already present in the existing section,
the new section's settings overwrite the settings of the old section.
Note: This command requires a POST request method
action=MergeConfig&Config=<configuration_file_content>
<configuration_file_content
Enter the configuration file content that you want to merge with the content of IDOL servers
configuration file.
Note that you must escape the configuration file content.

Using MergeConfig to set individual configuration parameters


The MergeConfig command allows you to set one or more configuration parameters.
http://<host>:<port>/action=MergeConfig&Key<N>=<param>&Value<N>=<value>
<host>
The IP address (or name) of the machine that hosts the service.
<port>
Enter the ServicePort that you have specified in the IDOL server configuration files [Service] section.
<N>
A unique number that identifies which <param> belongs to which <value>.
<param>
The configuration file section that contains the parameter you want to set, and the parameter whose
value you want to set. Note that you need to specify this using the format:
<config_file_section>/<parameter_name>

Page 427

<value>
The value that you want to set for the corresponding <param>.

For example:
http://1.23.45.6:10000/action=MergeConfig&Key0=Server/DeleteAfterAdd&Value0=true&
Key1=UserEmail/RunMailer&Value1=true
In this example, the MergeConfig command is used to set the value of the DeleteAfterAdd parameter
in the configuration files [Server] section to true, and to set the value of the RunMailer parameter in
the configuration files [UserEmail] to true.

Using MergeConfig to delete individual configuration parameters


The MergeConfig command allows you to delete one or more configuration parameters.
http://<host>:<port>/action=MergeConfig&DeleteKey<N>=<param>
<host>
The IP address (or name) of the machine that hosts the service.
<port>
Enter the ServicePort that you have specified in the IDOL server configuration files [Service] section.
<N>
A unique number for each <param> you want to delete.
<param>
The configuration file section that contains the parameter you want to delete, and the parameter you
want to delete. Note that you need to specify this using the format:
<config_file_section>/<parameter_name>

For example:
http://1.23.45.6:10000/action=MergeConfig&Key0=Server/DeleteAfterAdd&Key1=UserEm
ail/RunMailer
In this example, the MergeConfig command is used to delete the DeleteAfterAdd parameter from the
configuration files [Server] section, and to delete the RunMailer parameter from the configuration
files [UserEmail] section.

Page 428

SetConfig
The SetConfig command allows you to set IDOL servers configuration file.
Note: this command requires a POST request method
action=SetConfig&Config=<configuration_file_content>
<configuration_file_content
Enter the configuration file content with which you want to overwrite the current content of the IDOL
server configuration file.
Note that you must escape the configuration file content.

Stop
The Stop command stops the service
http://<host>:<port>/action=Stop

<host>
The IP address (or name) of the machine that hosts the service.
<port>
Enter the ServicePort that you have specified in the IDOL server configuration files [Service] section.

Page 429

Page 430

Appendix D: manually creating IDX files


To manually create an IDX file, you need to create a text file, which contains the information that you
want to index into IDOL server in IDOL server fields. The IDOL server fields give the data a format that
can be indexed.
IDOL server fields:
#DREREFERENCE

Enter a unique reference string for the document. Usually this is a


file name, URL or a unique code number.

#DRETITLE

Enter the title of the document. You can enter multiple lines.

#DRECONTENT

Enter the content of the document. You can enter multiple lines.
(This parameter is optional. However, if you dont enter
#DRECONTENT, you should specify one or more #DREFIELD
<Name><N>= as otherwise the document will not have any
content).

#DREFIELD
<Name><N>

Specify the name of each DREFIELD that you are defining, and
enter an appropriate value for it. For example, if you want to index
customer details:
#DREFIELD
#DREFIELD
#DREFIELD
#DREFIELD
#DREFIELD
#DREFIELD

surname1="Smith"
forename1="Peter"
title1="Mr."
surname2="Miller"
forename2="Susan"
title2="Dr."

Note: if your document only contains one instance of the


DREFIELD that you are defining, you do not need to add a qualifier
to the name of the field. For example:
#DREFIELD company="Autonomy"
(This parameter is optional. However, if you dont enter #DREFIELD
<Name><N>=, you should specify #DRECONTENT as otherwise
the document will not have any content).
#DREDATE

Enter the creation date of the document using the format that you
have specified for DateFormat in IDOL servers configuration file.
By default this yyyy/mm/dd.

#DREDBNAME

Enter the name of the database into which you want to index the
document.

Page 431

#DRESTORECONTENT

This is only needed in DRE3.


Enter y, if you want to store the document's content in IDOL server,
or n if you dont want to store the content.

#DRESECTION <N>

If you are indexing a large document and want to split it up into


smaller sections, you can give each section a DRESECTION
number in order to index the defined sections as individual
documents into IDOL server.
Note: If you split up a document into sections:
the first section must be #DRESECTION 0
the section numbers must be in numerical order
apart from the #DRESECTION number and the #DRECONTENT
each section must contain the same IDOL server field values.
(For further information, please Sectioning a document on
page 433).

#DREENDDOC

Indicates the end of the document. You must enter this delimiter.

Note: The text file must start with #DREREFERENCE and end with #DREENDDOC.
Example text file
The following is an example of a text file that can be indexed into IDOL server:
#DREREFERENCE 392348A0
#DREFIELD authorname1="Brown"
#DREFIELD authorname2="Edgar"
#DREFIELD title="Dr."
#DREDATE 1998/08/06
#DRETITLE
Jurassic Molecules
#DRECONTENT
Scientists announced last week the successful reproduction of a
possible precursor to all life on Earth. The molecules consist of a
part of DNA and the molecular "scissors" responsible for destroying
messenger RNA in humans.
Using a technique called test tube evolution, scientists created a
nucleic acid enzyme, the first known enzyme that uses an amino acid to
start chemical activity. Scientists hope that the creation of this
molecule will lead to the elusive precursor. The precursor, by
definition, will have to contain both the genetic code for replication
and an enzyme to trigger self replication.
#DRETYPE text
#DREDBNAME Science
#DRESTORECONTENT y
#DREENDDOC
Page 432

Sectioning a document
If a document that you want to index contains more than 500 words, you should split it up into sections
in order to make it more manageable for IDOL server. If you want to index XML rather than IDX, you
dont need to section your data as IDOL server automatically applies sectioning to it.
Declare a separate document for each of the sections into which you are splitting the original
document, and give each section a #DRESECTION number. Note that if you split up a document into
sections:

the first section must be #DRESECTION 0

the section numbers must be in numerical order

you must put the content of each section into the #DRECONTENT field

no #DRECONTENT field should contain more than 500 words.

apart from the #DRESECTION number and the #DRECONTENT each section must contain
the same DRE field values.

Example text file


The following is an example of a text file, in which a document has been split up into sections:
#DREREFERENCE 392348A0
#DREFIELD authorname1="Brown"
#DREFIELD authorname2="Edgar"
#DREFIELD title="Dr."
#DREDATE 1998/08/06
#DRETITLE
Jurassic Molecules
#DRESECTION 0
#DRECONTENT
Scientists announced last week the successful reproduction of a
possible precursor to all life on Earth. The molecules consist of a
part of DNA and the molecular "scissors" responsible for destroying
messenger RNA in humans.
Using a technique called test tube evolution, scientists created a
nucleic acid enzyme, the first known enzyme that uses an amino acid to
start chemical activity. Scientists hope that the creation of this
molecule will lead to the elusive precursor. The precursor, by
definition, will have to contain both the genetic code for replication
and an enzyme to trigger self replication.
At this point, no naturally occurring hybrid enzymes have been found.
Scientists speculate that such enzymes may exist in nature and most
certainly existed in Earth's early history.
#DRETYPE text

Page 433

#DREDBNAME Science
#DRESTORECONTENT y
#DREENDDOC
#DREREFERENCE 392348A0
#DREFIELD authorname1="Brown"
#DREFIELD authorname2="Edgar"
#DREFIELD title="Dr."
#DREDATE 1998/08/06
#DRETITLE
Jurassic Molecules
#DRESECTION 1
#DRECONTENT
Scientists have known for some time that the key ingredients for life
are DNA, RNA, and proteins. An interesting chicken-egg dilemma has
developed: which came first, RNA, DNA, or proteins? Many believe that
a replicating RNA molecule is the likely precursor to all life on
Earth.
RNA serves as both a genetic molecule and an enzyme in the body, which
scientists believe strongly suggests the likelihood of an RNA
precursor to all life. They speculate that RNA was first, followed by
DNA, the much more stable of the two. It would serve as an efficient
storehouse for the genetic code. Proteins, better catalysts than RNA,
likely evolved later as well. At some point, the current three-based
system developed from the initial one-based system of RNA.
Scientists hope that these scissors molecules may also have practical
uses in medicine, since the molecules can efficiently shred specific
DNA. Theoretically, it may be possible to tailor such a molecule to
attack and shred harmful DNA from pathogenic organisms. These
molecules could be made to be activated only in specific
circumstances.
#DRETYPE text
#DREDBNAME Science
#DRESTORECONTENT y
#DREENDDOC

Page 434

Glossary
ACI (Autonomy Content Infrastructure)
The Autonomy Content Infrastructure is a technology layer that automates operations on unstructured
information for cross enterprise applications, thus enabling an automated and compatible business-tobusiness, peer-to-peer infrastructure.
The ACI allows enterprise applications to understand and process content that exists in unstructured
formats, such as e-mail, Web pages, office documents, and Lotus Notes.

Agent index
IDOL server stores agents and profiles in its Agent index. By default the Agent index comprises the
Agent and Profile databases. The Agent index is configured automatically and should not be modified.

Agentboolean fields
IDOL server can store Boolean agents (a Boolean or Proximity expression that legacy technologies
use to categorize documents) in agentboolean fields. You can then query IDOL server with text and an
agentboolean field to return categories whose Boolean agent matches this text.

Agents
An agent searches for information about a specific topic. An administrator can create agents for users
or allow users to create their own agents.

APCM (Adaptive Probabilistic Concept Modeling)


Terms are given a weight according to their statistical importance in IDOL server. Terms can have a
weight between 0 and 255.

Category index
IDOL server stores categories in its Category index. By default the Category index comprises the
Activated and Deactivated databases. The Category index is configured automatically and should not
be modified.

Page 435

Glossary

Clusters
Cluster information is hierarchically agglomerated data that has been extracted from snapshots (this
does not require the setup of an initial taxonomy). Each cluster represents a concept area that contains
a set of items, which share common properties. Clustering data allows you to make trends and
developments in data visible.

Community
The community comprises all people in a user's network neighborhood. It allows a user to find other
people in the community who have been looking at similar documents or have agents that are similar
to the user's agents.

Concept summary
A brief summary of each result document that is returned for a query. The concept summary displays a
few sentences that are typical of the result's content (these sentences can be from different parts of the
result document).

Connector
A connector is an Autonomy fetching solution (for example HTTPFetch, Oracle Fetch, File System
Fetch and so on) that allows you to retrieve information from any type of local or remote repository (for
example, a database or a web site). It imports the fetched documents into IDX or XML file format and
indexes them into IDOL server from where you can retrieve them (for example by sending queries to
IDOL server).

Context summary
Returns a conceptual summary of the result document that is biased by the terms in the query. A
context summary comprises sentences that are particularly relevant to the terms in the query (these
sentences can be from different parts of the result document).

Data index
IDOL server stores content data in its Data index. By default the Data index comprises the News and
Archive databases. You can customize how data is stored in the Data index by configuring appropriate
settings in the IDOL server configuration file.

Page 436

Glossary

Database
An Autonomy database is an IDOL server data pool that stores indexed information. The administrator
can set up one or more databases, and specifies how data is fed to the databases. By default IDOL
server contains the databases Profile, Agent, Activated, Deactivated, News and Archive.

Default user
By default IDOL server gives users the default user role. That means that the user only has the
privileges that have been allocated to this role.

IAS (Intellectual Asset Protection System)


The Intellectual Asset Protection System provides an integrated security solution to protect your data.
At the front end, authentication checks that users are allowed to access the system on which result
data is displayed. At the back end, entitlement checking and authentication combine to ensure that
query results only comprise documents that the user is allowed to see, from repositories that the user
is allowed to access.

IDOL server
Using Autonomy connectors, Autonomy's Intelligent Data Operating Layer (IDOL) server integrates
unstructured, semi-structured and structured information from multiple repositories through an
understanding of the content, delivering a real time environment in which operations across
applications and content are automated, removing all the manual processes involved in getting the
right information to the right people at the right time.

IDX
Apart from XML files only files that are in IDX format can be indexed into IDOL server. You can use a
connector to import files into this format or manually create IDX files (see Storing content in IDOL
server on page 83).

Indexing
The process of storing data in IDOL server. Data can be stored in different field types (index, numeric
and ordinary fields) or prevented from being storing It is important to store data in appropriate field
types in order to ensure optimized performance. IDOL server can return any fields it stores for queries,
however, you can only query for terms in Index fields.

Page 437

Glossary

Index fields
You should store fields that contain text which you want to query frequently as Index fields. Index fields
are processed linguistically when they are stored in IDOL server. This means that stemming and
stoplists are applied to text in Index field before they are stored, which allows IDOL server to process
queries for these fields more quickly (typically DRETITLE and DRECONTENT are fields that should be
set up as Index fields).

Link term
Link terms (also referred to as "Links") are terms in query text that are also contained in the result
documents that IDOL server returns for this query.

Privilege
The privileges of a user depend on the roles that have been allocated to him within IDOL server.
Privileges determine, for example, whether a user is allowed to access specific data.

Profiles
The profile of a user that is based on the concept of documents that the user reads. Every time a user
opens a document his profile is updated. This allows the administrator to bring new documents to the
user's attention which he is interested in according to his profile.

Query
You can submit a natural language query to IDOL server which analyzes the concept of the query and
returns documents that are conceptually similar to the query. You can also submit other query and
search types to IDOL server, for example, Boolean, bracketed Boolean and keyword searches.

Quick summary
A brief summary of each result document that is returned for a query. The quick summary displays the
first few sentences of the result document.

Reference fields
Reference fields are used to identify documents. At index time Reference fields can be used to
eliminate duplicate copies of documents. At query time Reference fields can be used to filter results.

Page 438

Glossary

Retrain
You can retrain agents by indicating which of the results that have been returned to you are most
relevant to your query. The retrained agent will then return more relevant results.

Role
Each user is allocated one or more roles within IDOL server by the administrator. The roles that a user
has determine which privileges he has.

Section breaking
IDOL server indexes documents in sections (the number of sections that a document is split up into
increases proportionally with the size of the document). This ensures that when you, for example,
query for text that is relevant to a specific part of a book, IDOL server can find the appropriate section
and return it to you (if the book was not indexed in sections, IDOL server might not be able to find the
text you are looking for, as it may not be conceptually relevant to the entire book).

Snapshot
Internal raw data from which you can extract clusters. You can thus generate cluster information and
spectrographs.

Stemming
In languages some words have a common morphological root. Autonomy provides stemming
algorithms that reduce words to this form. This is useful because it allows concepts to be matched
regardless of the grammatical use of words. In English for example, the words "help", "helpful",
"helping" and "helped" can all be stripped down to their stem "help" without significant loss of meaning.
Autonomy provides as standard a set of stemming algorithms for the most commonly used languages.
Stemming is applied after stopwords have been discarded both at index time (when content is stored in
IDOL server) and at query time (query text is stopped and stemmed before it is matched).

Stoplist
Each language that is supported needs a stoplist (located in IDOL servers langfiles directory) which
contains a list of common words that are not stored in IDOL server. Words as, for example, "the" or "a"
are used too frequently to carry any significance and IDOL server does not require them to understand
the concept of text.

Page 439

Glossary

Stopword
A common word that is used too frequently to carry any significance (for example, "the" or "a").
Stoplists list the stopwords for the languages that IDOL server supports. Stopwords are not stored in
IDOL server.

Synonym file
A synonym file allows IDOL server to handle synonym queries (IDOL server also needs to be
configured to enable synonym queries). A synonym query returns results which are conceptually
similar to the query's terms and / or conceptually similar to the synonyms that are available for the
query's terms. A synonym file contains comma separated lists of synonym strings for words. You can
specify lists for each language type you have set up in IDOL server within this file.

Taxonomy
IDOL server's taxonomy generation feature allows you to create automatically hierarchical contextual
taxonomies of clusters or other information. This provides you with an overview of the 'information'
landscape and an insight into specific areas of the information.

Term
The basic entity that is indexed into IDOL server (for example, a word in a document after stemming
has been applied to it).

Page 440

Index
A
ACI 435
API 14
Task 73
ACLType (configuration setting) 115, 279
Action commands 189
AdminRevokeLicense 43
AgentAdd 121
AgentCopy 123
AgentDelete 123
AgentEdit 122
AgentGetResults 122, 181
AgentRead 123
AgentRetrain 122
CategoryActivate 143
CategoryBuild 144
CategoryCopy 132
CategoryCreate 130
CategoryDelete 144
CategoryDeleteTraining 144
CategoryExportToXML 145
CategoryGetDetails 141
CategoryGetHierDetails 141
CategoryGetTNW 142
CategoryGetTraining 142
CategoryImportFromCluster 131
CategoryImportFromTopic 131
CategoryImportFromXML 132
CategoryMove 140
CategoryQuery 148, 181
CategoryReplace 143
CategorySetDetails 142
CategorySetTNW 143
CategorySetTraining 140
CategorySuggestFromCategory 147
CategorySuggestFromDocument 147
CategorySuggestFromText 147
CategorySyncCatDRE 145
CategorySyncCatDre 56
ClusterCluster 29, 36, 154, 155, 156, 160,
407
ClusterResults 155
ClusterServe2DMap 29, 36, 155

ClusterSGDataGen 29, 36, 153, 156, 160,


407
ClusterSGDataServe 153
ClusterSGDocsServe 153
ClusterSGPicServe 29, 36, 153
ClusterSnapshot 29, 36, 152, 153, 154, 156,
160, 407
ClusterWriteToDisk 29, 36
Community 163, 171
Custom 179, 180, 181
DetectLanguage 190
Export 382
GetContent 189, 276
GetLicenseInfo 118
GetQueryTagValues 189, 228, 230
GetRequestLog 117
GetStatus 190
GetTagNames 189
GetTagValues 189, 228, 229, 230
Help 61, 389
Highlight 189
Import 383
Index 56
IndexerGetStatus 55, 106, 190
LicensInfo 41
List 50, 52, 190
Online help 61
ProfileClear 187
ProfileEdit 187
ProfileGetResults 187
ProfileRead 187
ProfileUser 185, 186
Query 165, 189, 191, 193, 194, 196, 197,
198, 199, 227, 232, 235, 237, 238, 240,
246, 247, 258, 272, 276, 297
RoleAdd 110
RoleAddRoleToRole 110
RoleAddUserToRole 110
Suggest 165, 174, 189, 191, 197, 199, 247,
258, 272, 276, 297
SuggestOnText 165, 189, 191, 197, 199,
247, 258, 276, 297
Summarize 189, 259
Syntax 62

Page 441

Index
TaxonomyGenerate 132, 160, 261, 262, 407
TermGetAll 190
TermGetBest 189
TermGetInfo 189
UserAdd 110
Activating or deactivating categories 143
Adaptive Probabilistic Concept Modeling 435
Adding language type fields to documents 318
AdminClients (configuration setting) 190
Administering
Categories 141
IDOL server 357
Administration 4
AdminRevokeLicense action 43
Advanced keyword search 193, 232
AdvancedSearch (configuration setting) 193, 232
Afrikaans encoding settings 325
AFTER operator 195, 236
Agent index 435
[Agent] configuration file section 403
AgentAdd action 121
Agentboolean
Categories 300
Fields 299
Agentboolean fields 435
Storing Boolean agents 299
AgentCopy action 123
AgentDelete action 123
AgentEdit action 122
AgentGetResults action 122, 181
AgentRead action 123
AgentRetrain action 122
Agents 8, 109, 121, 435
AgentAdd action 121
AgentCopy action 123
AgentDelete action 123
AgentEdit action 122
AgentGetResults action 122, 181
AgentRead action 123
AgentRetrain action 122
Copying 123
Creating an agent 121
Deleting 123
Editing 122
Emailing agent results to users 176
Exporting 382

Page 442

Importing 383
Querying with an agent 122
Retraining 122
Training 121
Viewing an agents details 123
Albanian encoding settings 325
Alert task 73, 125
Alerting 8, 109, 125
Email templates 127
Users to new content 125
Users to new documents 73
alertTemplate.html 127
[AnalysisSchedules>] configuration file section
407
AND operator 194, 195, 236
APCM 435
Application server 39
Arabic encoding settings 326
ASCII 390
AttachmentTemplate (configuration setting) 127
Attributes
XML 69
AutoDetectLanguagesAtIndex (configuration
setting) 71, 320
Automater iii
Automatic Language Detection
Enabling 320
Automatic language detection 307
Autonomy
Content Infrastructure 435
Data flow and security 5
Infrastructure 1
Autonomy Service Dashboard 118, 410
Azeri encoding settings 326
B
Backing up IDOL servers Data index 378
Backup (configuration setting) 379
BackupCompression (configuration setting) 379
BackupDir<N> (configuration setting) 379
BackupInterval (configuration setting) 379
BackupMaintainDirStructure (configuration
setting) 379
BackupTime (configuration setting) 379
Basque encoding settings 326

Index
Before indexing
Data processing 73
BEFORE operator 195, 236
Before storing content in IDOL server 63
Belarussian encoding settings 327
BIAS field specifier 266, 269
BIF files 63, 81
BindLevel (configuration setting) 158, 159
Boolean
Agents 299
Operators 194
AND 194, 195, 236
EOR 195, 236
NOT 194, 195, 236
OR 194, 195, 236
Precedence of Boolean and Proximity
operators 195
XOR 195, 236
Search 194
Boosting result relevance 266, 269, 271
Bracketed expressions 195
Breton encoding settings 327
Building categories 144
Bulgarian ecoding settings 327
C
Canonicalization 308
CantHaveCSVs (configuration setting) 67
CantHaveFields (configuration setting) 67
Cat task 73
Catalan encoding settings 328
Categories
Activating 143
Administering 141
Building 144
Changing Fields 142
Changing term weights 143
Deactivating 143
Deleting 144
Deleting training 144
Exporting to XML 145
Matching 148
Moving 140
Replacing 143
Retraining 140
Suggesting 147

Synchronizing 145
Training 140
Viewing 141
Viewing terms and weights 142
Viewing training 142
Categorization 8, 129
CategoryActivate 143
CategoryBuild 144
CategoryCopy action 132
CategoryCreate action 130
CategoryDelete 144
CategoryDeleteTraining 144
CategoryExportToXML 145
CategoryGetDetails 141
CategoryGetHierDetails 141
CategoryGetTNW 142
CategoryGetTraining 142
CategoryImportFromCluster action 131
CategoryImportFromTopic action 131
CategoryImportFromXML 132
CategoryMove 140
CategoryQuery 148
CategoryReplace 143
CategorySetDetails 142
CategorySetTNW 143
CategorySetTraining 140
CategorySuggestFromCategory 147
CategorySuggestFromDocument 147
CategorySuggestFromText 147
CategorySyncCatDRE 145
Creating a hierarchical category structure
130
Categorizing
Data 146
Documents 73
Legacy profiles from BIF files 73
Category index 435
CategoryActivate action 143
CategoryBuild action 144
CategoryCopy action 132
CategoryCreate action 130
CategoryDelete action 144
CategoryDeleteTraining action 144
CategoryExportToXML action 145
CategoryGetDetails action 141
CategoryGetHierDetails action 141
CategoryGetTNW action 142

Page 443

Index
CategoryGetTraining action 142
CategoryImportFromCluster action 131
CategoryImportFromTopic action 131
CategoryImportFromXML action 132
CategoryMove action 140
CategoryQuery action 148, 181
CategoryReplace action 143
CategorySetDetails action 142
CategorySetTNW action 143
CategorySetTraining action 140
CategorySuggestFromCategory action 147
CategorySuggestFromDocument action 147
CategorySuggestFromText action 147
CategorySyncCatDRE action 145
CategorySyncCatDre action 56
Changing
Category fields 142
Category term weights 143
Field values in documents 375
Channels 9, 149
Emailing channel results to users 176
Setting up and using 149
channels.xss template 180, 181
CharConv (configuration setting) 313, 319
Checking
That IDOL server is running correctly 117
The indexing process 106
Chinese encoding settings 328
ClassificationServerHost (configuration setting)
177
ClassificationServerNumResults (configuration
setting) 177
ClassificationServerParams (configuration
setting) 177
ClassificationServerPort (configuration setting)
177
ClassificationServerRetries (configuration
setting) 177
ClassificationServerThreshold (configuration
setting) 177
ClassificationServerTimeout (configuration
setting) 177
ClassificationServerValues (configuration
setting) 177

Page 444

ClassificationServerXSLTemplate (configuration
setting) 177
[Cluster] configuration file section 407
ClusterCluster action 29, 36, 154, 155, 156, 160,
407
Clustering 9, 151
A large amount of data 158
A small amount of data 157
Changing the data view 159
Changing the number and size of clusters
156
ClusterCluster action 29, 36, 154, 155, 156,
160
ClusterResults action 155
ClusterServe2DMap action 29, 36, 155
ClusterSGDataGen action 29, 36, 153, 156,
160
ClusterSGDataServe action 153
ClusterSGDocsServe action 153
ClusterSGPicServe action 29, 36, 153
ClusterSnapshot action 29, 36, 152, 153,
154, 156, 160
ClusterWriteToDisk action 29, 36
Configuring clustering 156
Generating snapshots 152
Generating WhatsNew and WhatsHot
information 154
Setting up schedules 160
Spectrograph data generation 153
Very different data 159
Very similar data 158
ClusterResults action 155
Clusters 436
ClusterServe2DMap action 29, 36, 155
ClusterSGDataGen action 29, 36, 153, 156, 160,
407
ClusterSGDataServe action 153
ClusterSGDocsServe action 153
ClusterSGPicServe action 29, 36, 153
ClusterSnapshot action 29, 36, 152, 153, 154,
156, 160, 407
ClusterWriteToDisk action 29, 36
Collaboration 9, 109, 163
Community action 163
Combine action 295
Combining different query types 241

Index
Commands
Action 189
Index 84
Service 423
Community 436
Community action 163, 171
[Community] configuration file section 404
Compact (configuration setting) 377
Compacting IDOL servers Data index 376
CompactInterval (configuration setting) 377
CompactTime (configuration setting) 377
Concept summary 257, 436
Configuration
Executing changes 358
Configuration file
[Agent] section 403
[AnalysisSchedules>] section 407
ASCII versus UTF8 390
[Cluster] section 407
[Community] section 404
[Databases] section 394
[DataDRE] section 406
[DocumentTracking] section 410
[DRE] section 406
[FieldProcessing] section 395
[IndexCache] section 393
[IndexTasks] section 409
[LanguageTypes] section 413
[License] section 392
[Logging] section 411
Modifying parameter values 390
[Paths] section 394
[Profile] section 404
[ProfileNamedAreas] section 404
[Properties] section 397
[Role] section 403
[Schedule] section 395
[SectionBreaking] section 394
Sections 391
[Security] section 399
[Server] section 393
[Service] section 392
[Summary] section 395
[Synonym] section 410
[Taxonomy] section 407
[Templates] section 410
[TermCache] section 393

[User] section 400


[UserCustom] section 405
[UserSecurity] section 401
[UserSecurityFields] section 400
[UserStructure] section 406
Configuration settings
ACLType 115, 279
AdminClients 190
AdvancedSearch 193, 232
AttachmentTemplate 127
AutoDetectLanguagesAtIndex 71, 320
Backup 379
BackupCompression 379
BackupDir<N> 379
BackupInterval 379
BackupMaintainDirStructure 379
BackupTime 379
BindLevel 158, 159
CantHaveCSVs 67
CantHaveFields 67
CharConv 313, 319
ClassificationServerHost 177
ClassificationServerNumResults 177
ClassificationServerParams 177
ClassificationServerPort 177
ClassificationServerRetries 177
ClassificationServerThreshold 177
ClassificationServerTimeout 177
ClassificationServerValues 177
ClassificationServerXSLTemplate 177
Compact 377
CompactInterval 377
CompactTime 377
ContextSummaryQueryTermWeight 258
Cycles 176
DatabaseType 66, 279
DateFormatCSVs 373
DateType 279
DefaultAddSetToReadDocuments 177
DefaultEmailFormat 176
DefaultEmailResultsType 176
DefaultExcludeReadDocuments 177
DefaultLanguageType 311, 321, 322, 323,
324
DefaultSendEmail 176
DefaultSubject 176
DeferLogin 111
DelayedSync 72

Page 445

Index
DiscardUnconfiguredLanguagesAtIndex
320
DiscardUnknownLanguagesAtIndex 320
DocumentTrackingActive 108
DocumentTrackingType 279
DreTemplateReferenceEnd 177
DreTemplateReferenceStart 177
EmailActionXSLTemplate 179
Encoding 325
Expire 368
ExpireDateType 279, 367
ExpireInterval 368
ExpireIntoDatabase 366, 368
ExpireTime 368
FieldCheckType 279, 292
FixedField<N> 318
FixedFieldValue<N> 318
FlattenIndexType 279
From 176
FromHost 176
FromName 176
HiddenType 279
HighlightingType 298
HighlightType 279
HyphenChars 104
IDOLserver 125
Index 70, 267, 279, 286
IndexPort 26, 33, 85, 359, 360, 361, 362,
364, 365, 367, 369, 371, 373, 376, 378,
379, 381
Interval 176
InvertedAgentType 279
KillDuplicates 105, 295
Language 325
LanguageDirectory 311, 312, 319
LanguageType 279, 314
Library 176, 179
LogTypeCSVs 416
MaxEmailsPerUser 177
MaxSyncDelay 72
MinClusterDocs 157, 159
MinWordsPerSentence 258
Module 74
Name 363
NextTask 74
NodeTableStoreContent 64
Number 366
NumberOfBackups 380

Page 446

NumClusters 159
NumDBs 363
NumericDateType 279, 288
NumericType 279, 290
OnFailureTask 74
Online help 389
ParametricType 229, 280
Port 26, 33, 62, 106
PrintType 276, 280
ProperNames 231
Property 314, 316
PropertyFieldCSVs 229, 267, 276, 288, 290,
292, 295, 298, 367, 390
PropertyMatch 65, 114, 281, 285
ProxyHost 176, 179
ProxyPassword 176, 179
ProxyPort 176, 179
ProxyUsername 176, 179
QueryClients 189
ReferenceType 280
Retries 176
RunMailer 176
SectionBreakType 280
SecurityType 280
SeedBindLevel 157, 158, 159
SeedSize 157
SendToList 128
SentenceBreaking 351
ServicePort 26, 33, 43
SleepBetweenRequests 177
SMTPHost 176, 179
SMTPPort 176, 179
Soundex 237
SourceFields 258
SourceType 280
SpellCheckCorrectMinDocOccs 255
SpellCheckIncorrectMaxDocOccs 255
SpellCheckMaxCheckTerms 255
StartingSuggestOverrideFactor 157, 158
StartTask 74
StartTime 176, 178
StripLanguage 313, 319
SynonymType 240, 280
Template 127
TermSize 349
TestUser 176, 178
TimeoutMS 176
TitleType 280

Index
Transliteration 350
TrimSpaces 280
VerboseLogging 178
Weight 267, 280
XSLTemplate 176
Configuring
Clustering 156
IDOL server 389
Connector 3, 436
Content
Indexing 83
Storing 83
Context summary 257, 436
ContextSummaryQueryTermWeight
(configuration setting) 258
Converting results to a specific encoding 322
Copying agents 123
Creating
A hierarchical category structure 130
A new database 363
Agents 121
Categories
By copying categories 132
By generating a taxonomy 132
From clusters 131
From legacy topic sets 131
From scratch 130
From XML 132
Databases 362
Users 110
Croatian encoding settings 329
Cross-lingual systems 307
Custom action 179, 180, 181
Custom emails 179
Cycles (configuration setting) 176
Czech encoding settings 329
D
Danish encoding settings 329
Data
Before indexing 63
Categorizing 146
Distributing across multiple disks 64
Indexing 83
Data index 436

Databases 437
Allocating files 65
Changing a documents database 373
Creating 362, 363
Deleting 364
Deleting all documents 365
Expiring documents 366
Exporting IDX documents 369
Exporting XML documents 371
[Databases] configuration file section 394
DatabaseType (configuration setting) 66, 279
[DataDRE] configuration file section 406
DateFormatCSVs (configuration setting) 373
Dates
Storing dates in fields 287
DateType (configuration setting) 279
Deduplication 105
Default
User 437
DefaultAddSetToReadDocuments (configuration
setting) 177
DefaultEmailFormat (configuration setting) 176
DefaultEmailResultsType (configuration setting)
176
DefaultExcludeReadDocuments (configuration
setting) 177
DefaultLanguageType (configuration setting)
311, 321, 322, 323, 324
DefaultSendEmail (configuration setting) 176
DefaultSubject (configuration setting) 176
DeferLogin (configuration setting) 111
Delayed synchronization 72
DelayedSync (configuration setting) 72
Deleting
Agents 123
Categories 144
Category training 144
Documents from IDOL server 359, 360, 365
IDOL server databases 364
Profiles 187
Deploying Retina to your application server 39
DetectLanguage action 190
DiscardUnconfiguredLanguagesAtIndex
(configuration setting) 320
DiscardUnknownLanguagesAtIndex
(configuration setting) 320

Page 447

Index
DiSH 40, 118
Displaying
Additional fields for individual queries 276
Additional fields with results 275
IDOL server license information 41
Online help 61
Distributed systems 3
Distributing IDOL server 46
Example 47
DNEAR<N> operator 195, 235, 236
Documents
Changing field values 375
Changing the index date, expire date or
database of documents 373
Deleting 359, 360, 364, 365
Expiring 366
Sectioning 433
Tracking documents through import and
indexing 108
Undeleting 361
[DocumentTracking] configuration file section
410
DocumentTrackingActive (configuration setting)
108
DocumentTrackingType (configuration setting)
279
[DRE] configuration file section 406
DREADD (index command) 84
DREADDDATA (index command) 94
DREBACKUP (index command) 378
DRECHANGEMETA (index command) 373
DRECOMPACT (index command) 376, 378, 380
DRECREATEDBASE (index command) 362
DREDELDBASE (index command) 365
DREDELETEDOC (index command) 360
DREDELETEREF (index command) 359
DREEXPIRE (index command) 366
DREEXPORTIDX (index command) 369, 371
DREINITIAL (index command) 381
DREREMOVEDBASE (index command) 364
DREREPLACE (index command) 375
DreTemplateReferenceEnd (configuration
setting) 177
DreTemplateReferenceStart (configuration
setting) 177
DREUNDELETEDOC (index command) 361
Page 448

Dutch encoding settings 330


Dynamic Thesaurus 9, 165
E
Editing
Agents 122
Mailing operation templates 181
Profiles 187
Templates for alert emails 127
Educe task 73
Eduction 9
Eliminating duplicate documents during indexing
105
Email
Agent results 175, 176
Channel results 175, 176
Custom 179
Sending
Alert email 125
email.xss template 180, 181
EmailActionXSLTemplate (configuration setting)
179
Enabling Automatic Language Detection 320
Encoding (configuration setting) 325
Encodings 308, 325
Afrikaans 325
Albanian 325
Arabic 326
Azeri 326
Basque 326
Belarussian 327
Breton 327
Bulgarian 327
Catalan 328
Chinese 328
Croatian 329
Czech 329
Danish 329
Dutch 330
English 330
Estonian 330
Faroese 331
Finnish 331
French 331
Gaelic 332
Galician 332
German 332

Index
Greek 333
Greenlandic 333
Hebrew 333
Hindi 334
Hungarian 334
Icelandic 334
Indonesian 335
Italian 335
Japanese 335
Kazakh 336
Korean 336
Kurdish 336
Kyrgyz 337
Lappish 337
Latin 337
Latvian 338
Lithuanian 338
Luxembourgish 338
Macedonian 339
Malay 339
Maltese 339
Maori 340
Mongolian 340
Norwegian 340
Persian 341
Polish 341
Portuguese 341
Romanian 342
Russian 342
Serbian 342
Slovak 343
Slovenian 343
Somali 343
Sorbian 344
Spanish 344
Swahili 344
Swedish 345
Tagalog 345
Tatar 345
Thai 346
Turkish 346
Ukrainian 346
Urdu 347
Uzbek 347
Valencian 347
Vietnamese 348
Welsh 348
English encoding settings 330

EOR operator 195, 236


Error
Codes 415
Messages 416
VQL conversion 416
Estonian encoding settings 330
Evaluating the quality of OCR files 73
Exact Phrase search 196
Executing an action command 73
Expertise 9, 109, 171
Community action 171
Expire (configuration setting) 368
ExpireDateType (configuration setting) 279, 367
ExpireInterval (configuration setting) 368
ExpireIntoDatabase (configuration setting) 366,
368
ExpireTime (configuration setting) 368
Expiring documents 366
Export action 382
Exporting
Categories to XML 145
Users, roles, agents and profiles from IDOL
server 382
Exporting IDX documents from IDOL server 369
Exporting XML documents from IDOL server 371
Extracting information from unstructured data 73
F
Faroese encoding settings 331
Field process
Setting up a field process to boost result
relevance 266
Field search 198
Field specifiers
BIAS 266, 269
MATCH 67, 285
STRING 249
TERMEXACTPHRASE 197
TERMPHRASE 197
WILD 247
Field Text query 199
FieldCheckType (configuration setting) 279, 292
FieldCheckType fields 291
FieldOp task 73
[FieldProcessing] configuration file section 395

Page 449

Index
Fields 279
Adding metadata to documents after
indexing 103
Agentboolean 435
Agentboolean fields 299
Associating properties with fields 281
Changing field values in documents 375
FieldCheckType fields 291
Highlight fields 297
Index fields 285, 438
Language type 318
Numerical fields 289
NumericDateType fields 287
Processing fields and documents that
contain specific fields 281
Properties 279
Reference fields 105, 272, 295, 438
Setting up
Highlight fields 297
Indexing 67
Speeding up numerical queries 289, 291
Files
Importing 83
FileWriter task 73
Finnish encoding settings 331
FixedField<N>n (configuration setting) 318
FixedFieldValue<N> (configuration setting) 318
FlattenIndexType (configuration setting) 279
French encoding settings 331
From (configuration setting) 176
FromHost (configuration setting) 176
FromName (configuration setting) 176
Functionality matrix 18
Fuzzy queries 227
G
Gaelic encoding settings 332
Galician encoding settings 332
Generating
Snapshots 152
Taxonomies 261
WhatsNew and WhatsHot information 154
German encoding settings 332

Page 450

GetConfig (service port command) 423, 424


GetContent action 189, 276
GetLicenseInfo action 118
GetLogStream (service port command) 423, 424
GetLogStreamNames (service port command)
423, 425
GetQueryTagValues action 189, 228, 230
GetRequestLog action 117
GetStatistics (service port command) 423, 425
GetStatus (service port command) 118, 423, 426
GetStatus action 190
GetStatusInfo (service port command) 423, 426
GetTagNames action 189
GetTagValues action 189, 228, 229, 230
Greek encoding settings 333
Greenlandic encoding settings 333
H
Hebrew encoding settings 333
Help action 61, 389
Helpdesk 57
HiddenType (configuration setting) 279
Highlight action 189
Highlighting 297
Query action 297
Setting up 297
Suggest action 297
SuggestOnText action 297
HighlightingType (configuration setting) 298
HighlightType (configuration setting) 279
Hindi encoding settings 334
HTTP calls 73
HTTP task 73
Hungarian encoding settings 334
Hyperlinking 10, 173
Implementation 174
Hyphenated terms
Indexing 104
HyphenChars (configuration setting) 104

Index
I
IAS 437
Icelandic encoding settings 334
IDOL server 3, 437
Administration 357
Backing up the Data index 378
Before storing content 63
Changing the index date, expire date or
database of documents 373
Checking that IDOL server is running
correctly 117
Clustering 151
Compacting the Data index 376
Configuration 389
Configuration file 391
Creating a new database 362
Data flow and security 5
Database 437
Deleting
A database and all the documents it
contains 364
All documents from a database 365
Documents by reference 359
Individual documents and ranges of
documents 360
Directory structure 28, 35
Distributing 46
Executing configuration changes 358
Expiring documents 366
Exporting IDX documents 369
Exporting users, roles, agents and profiles
382
Exporting XML documents 371
Functionality matrix 18
IDOL server
Profiling 10, 109, 185
Importing users, roles, agents and profiles
383
Initializing IDOL servers Data index 381
Installation 23, 25, 32
Integrating with a third party user structure
111
Introduction 7
Licensing 40, 41, 42, 43, 44
Revoking a client license 42
Modifying configuration parameter values
390
Online help 61, 389

Operations 7
Agents 8, 109, 121
Alerting 8, 109, 125
Categorization 8, 129
Channels 9, 149
Clustering 9
Collaboration 9, 109, 163
Dynamic Thesaurus 9, 165
Eduction 9
Expertise 9, 109, 171
Hyperlinking 10, 173
Mailing 10, 109, 175
Retrieval 10, 189
Spelling Correction 12, 255
Summarization 12, 257
Taxonomy generation 13, 261
Profiling 10, 109, 185
Restoring deleted documents 361
Starting 59
Stopping 60
Storing
Content 83
Users 109
System
Architecture 14
Requirements 23
System architecture 5
Upgrading to 50
Using multiple languages 307
IDOLserver (configuration setting) 125
IDX files 437
Creating 431
Import action 383
Importing
Data
Tracking documents 108
Files 83
Legacy profiles from BIF files 73
Users, roles, agents and profiles from IDOL
server 383
Index (configuration setting) 70, 267, 279, 286
Index action 56
Index commands 84
DREADD 84
DREADDDATA 94
DREBACKUP 378
DRECHANGEMETA 373

Page 451

Index
DRECOMPACT 376, 378, 380
DRECREATEDBASE 362
DREDELDBASE 365
DREDELETEDOC 360
DREDELETEREF 359
DREEXPIRE 366
DREEXPORTIDX 369, 371
DREINITIAL 381
DREREMOVEDBASE 364
DREREPLACE 375
DREUNDELETEDOC 361
Index fields 285, 438
Setting up 285
Index task 73
[IndexCache] configuration file section 393
IndexerGetStatus action 55, 106, 190
Indexing 437
Considerations 63
Content 83
Data
Checking if the indexing process was
successful 106
Tracking documents 108
Data over a socket 94
Directly indexing IDX and XML files 84
Eliminate duplicate documents 105
Fields 67
Hyphenated terms 104
Optimizing 72
Process 72
Users 109
XML attributes 69
Indexing Delayed Synchronization 72
IndexPort (configuration setting) 26, 33, 85, 359,
360, 361, 362, 364, 365, 367, 369, 371, 373,
376, 378, 379, 381
[IndexTasks] configuration file section 409
Indonesian encoding settings 335
Initializing IDOL servers Data index 381
Installing IDOL server 23, 25, 32
Windows directory structure 28, 35
Integrating with a third party user structure 111
Intellectual Asset Protection System 437
Interfaces 3
Interval (configuration setting) 176

Page 452

Introduction
IDOL server 7
InvertedAgentType (configuration setting) 279
Italian encoding settings 335
J
Japanese encoding settings 335
K
Kazakh encoding settings 336
KillDuplicates (configuration setting) 105, 295
KillDuplicates configuration parameter 295
Korean encoding settings 336
Kurdish encoding settings 336
Kyrgyz encoding settings 337
L
Language (configuration setting) 325
LanguageDirectory (configuration setting) 311,
312, 319
Languages 307, 309
Adding language type fields to documents
318
Automatic language detection 307
Canonicalization 308
Converting results to a specific encoding 322
Cross-lingual systems 307
Enabling Automatic Language Detection 320
Encoding settings 325
Encodings 308
Processing 71
Required files 325
Returning documents
In a specific language for your query
324
In multiple languages for your query 323
SentenceBreaking files 351
Settings 325
Specifying the language type of your query
321
Stemming 308
Stoplists 308, 353
TermSize setting 349
Transliteration
Schemes 308
Settings 350

Index
LanguageType (configuration setting) 279, 314
[LanguageTypes] configuration file section 413
Lappish encoding settings 337
Latin encoding settings 337
Latvian encoding settings 338
Legacy profiles 73
Library (configuration setting) 176, 179
[License] configuration file section 392
LicenseInfo action 41
Licensing 40
Displaying information 41
Forcibly revoking licenses from inaccessible
clients 43
Licensing errors 44
Revoking a client license 42
Link term 438
List action 50, 52, 190
Lithuanian encoding settings 338
Logging
Setting up log streams 384
[Logging] configuration file section 411
LogTypeCSVs (configuration setting) 416
LP task 73
Luxembourgish encoding settings 338
M
Macedonian encoding settings 339
Mailing 10, 109, 175
Templates 180
Malay encoding settings 339
Maltese encoding settings 339
Mangolian encoding settings 340
Manipulating the relevance of query results 266,
269, 271
Manually creating IDX files 431
Maori encoding settings 340
MATCH field specifier 67, 285
Matching
Categories 148
Matching documents against agentboolean
categories 300
MaxEmailsPerUser (configuration setting) 177
MaxSyncDelay (configuration setting) 72
Memory mapping 289, 291
MergeConfig (service port command) 423, 427

Metadata 279
Adding metadata to documents after
indexing 103
MinClusterDocs (configuration setting) 157, 159
MinWordsPerSentence (configuration setting)
258
Modifying field content 73
Module (configuration setting) 74
Moving categories 140
Multipliers 271
N
Name (configuration setting) 363
NEAR<N> operator 195, 235, 236
NextTask (configuration setting) 74
NodeTableStoreContent (configuration setting)
64
Norwegian encoding settings 340
NOT operator 194, 195, 236
Number (configuration setting) 366
NumberOfBackups (configuration setting) 380
Numbers
Storing numbers in fields 289
NumClusters (configuration setting) 159
NumDBs (configuration setting) 363
Numeric fields 289, 291
Numerical fields 289
NumericDateType (configuration setting) 279,
288
NumericDateType fields 287
NumericType (configuration setting) 279, 290
O
OCR task 73
ondemand.xss template 180, 181
OnFailureTask (configuration setting) 74
Online help 61, 389
Operations 7
Agents 8, 109, 121
Alerting 8, 109, 125
Categorization 8, 129
Channels 9, 149
Clustering 9
Collaboration 9, 109, 163
Dynamic Thesaurus 9, 165

Page 453

Index
Eduction 9
Expertise 9, 109, 171
Hyperlinking 10, 173
Mailing 10, 109, 175
Retrieval 10, 189
Spelling Correction 12, 255
Summarization 12, 257
Taxonomy generation 13, 261
Operators
AFTER 195, 236
AND 194, 195, 236
BEFORE 195, 236
Boolean 194
DNEAR<N> 195, 235, 236
EOR 195, 236
NEAR<N> 195, 236
NOT 194, 195, 236
OR 194, 195, 236
Precedence of Boolean and Proximity
operators 195, 236
Proximity 235
WNEAR<N> 195, 235, 236
XOR 195, 236
Optimizing
Content storage 72
Indexing 72
OR operator 194, 195, 236
P
ParagraphConcept summary 257
ParagraphContext summary 257
Parametric search 228
ParametricType (configuration setting) 229, 280
[Paths] configuration file section 394
Persian encoding settings 341
PODS 4
Polish encoding settings 341
Port (configuration setting) 26, 33, 62, 106
Portuguese encoding settings 341
Precedence of Boolean and Proximity operators
195, 236
Preventing term stemming 122
PrintType (configuration setting) 276, 280
Privilege 438
Processing data before indexing it 73
Examples 75, 76, 78, 79, 81

Page 454

Processing fields and documents that contain


specific fields 281
[Profile] configuration file section 404
ProfileClear action 187
ProfileEdit action 187
ProfileGetResults action 187
[ProfileNamedAreas] configuration file section
404
ProfileRead action 187
Profiles 438
Deleting 187
Editing 187
Exporting 382
Importing 383
Querying with a profile 187
Viewing a profiles details 187
ProfileUser action 185, 186
Profiling 10, 109, 185
ProfileClear action 187
ProfileEdit action 187
ProfileGetResults action 187
ProfileRead action 187
ProfileUser action 185, 186
Users 185
Proper Names queries 231
Proper Names query 231
ProperNames (configuration setting) 231
[Properties] configuration file section 397
Property (configuration setting) 314, 316
PropertyFieldCSVs (configuration setting) 229,
267, 276, 288, 290, 292, 295, 298, 367, 390
PropertyMatch (configuration setting) 65, 114,
281, 285
Proximity
Operators
AFTER 195, 236
BEFORE 195, 236
DNEAR<N> 195, 235, 236
NEAR 236
NEAR<N> 195
NEAR<N>Operators
NEAR<N> 235
Precedence of Boolean and Proximity
operators 236
WNEAR<N> 195, 235, 236
Search 235

Index
ProxyHost (configuration setting) 176, 179
ProxyPassword (configuration setting) 176, 179
ProxyPort (configuration setting) 176, 179
ProxyUsername (configuration setting) 176, 179
Q
Queries 438
Specifying the language type of your query
321
Query 6
Query action 165, 189, 191, 193, 194, 196, 197,
198, 199, 227, 232, 235, 237, 238, 240, 246,
247, 258, 272, 276, 297
Query results
Converting to a specific encoding 322
Displaying additional fields 276
Displaying additional fields with results 275
Filtering 272
Manipulating relevance 266, 269, 271
Relevance ranking 265
Returning documents in a specific language
324
Returning multiple languages 323
Query types
Advanced keyword 193, 232
Boolean 194
Exact Phrase 196
Field search 198
Field Text query 199
Fuzzy 227
Parametric 228
Proper Names 231
Proximity 235
Soundex 237
Synonym 238
QueryClients (configuration setting) 189
Querying
Agents 122
BIAS field specifier 269
For non-alphanumeric characters 249
Numeric fields 289, 291
With profiles 187
Quick summary 257, 438

R
Reference fields 438
Eliminate duplicate documents during
indexing 105
Filtering results at query time 272
Simultaneously using KillDuplicates and
Combine 295
ReferenceType (configuration setting) 280
Relevance ranking 265
Manipulating result relevance 266, 269, 271
Replacing categories 143
Requesting support 57
Restoring deleted documents 361
Results
Converting to a specific encoding 322
Displaying additional fields 276
Displaying additional fields with results 275
Filtering 272
Manipulating relevance 266, 269, 271
Relevance ranking 265
Returning documents in a specific language
324
Returning multiple languages 323
Retina
Deploying Retina to your application server
39
Retraining 439
Agents 122
Categories 140
Retries (configuration setting) 176
Retrieval 10, 189, 191
Advanced keyword search 193, 232
Boolean search 194
Combining different query types 241
Conceptual matching 191
Custom action 179, 180
DetectLanguage action 190
Exact Phrase search 196
Field search 198
Field Text query 199
Fuzzy query 227
GetContent action 189, 276
GetQueryTagValues action 189, 228, 230

Page 455

Index
GetStatus action 190
GetTagNames action 189
GetTagValues action 189, 228, 229, 230
Highlight action 189
IndexerGetStatus action 190
List action 190
Paramatric search 228
Precedence of Boolean and Proximity
operators 195, 236
Proper Names queries 231
Proper Names query 231
Proximity search 235
Query action 165, 189, 191, 193, 194, 196,
197, 198, 199, 227, 232, 235, 237, 238,
240, 246, 247, 272, 276
Querying for non-alphanumeric characters
249
Soundex keyword search 237
Suggest action 165, 174, 189, 191, 197, 199,
247, 272, 276
SuggestOnText action 165, 189, 191, 197,
199, 247, 276
Summarize action 189
Synonym query 238
TermGetAll action 190
TermGetBest action 189
TermGetInfo action 189
Using wildcards in queries 246
Returning documents
In a specific language for your query 324
In multiple languages for your query 323
Revoking client licenses 42, 43
[Role] configuration file section 403
RoleAdd action 110
RoleAddRoleToRole action 110
RoleAddUserToRole action 110
Roles 439
Exporting 382
Importing 383
Romanian encoding settings 342
root category 130
Route task 73
Routing documents to multiple tasks 73
RunMailer (configuration setting) 176

Page 456

Running IDOL server 59


Checking IDOL server is running correctly
117
In multiple languages 309
Russian encoding settings 342
S
[Schedule] configuration file section 395
Scheduling
Clustering 160
Data backup 378
Data compaction 376
Document expiry 366
Snapshots 160
Spectrograph data generation 160
Taxonomy generation 160, 262
Section breaking 439
[SectionBreaking] configuration file section 394
SectionBreakType (configuration setting) 280
Sectioning documents 433
Security 113, 437
[Security] configuration file section 399
SecurityType (configuration setting) 280
SeedBindLevel (configuration setting) 157, 158,
159
SeedSize (configuration setting) 157
Sending action commands to IDOL server 61
Sending HTTP calls 73
SendToList (configuration setting) 128
SentenceBreaking (configuration setting) 351
Serbian encoding settings 342
[Server] configuration file section 393
Service port commands 423
GetConfig 423, 424
GetLogStream 423, 424
GetLogStreamNames 423, 425
GetStatistics 423, 425
GetStatus 118, 423, 426
GetStatusInfo 423, 426
MergeConfig 423, 427
SetConfig 423, 429
Stop 423, 429
[Service] configuration file section 392
ServicePort (configuration setting) 26, 33, 43
SetConfig (service port command) 423, 429

Index
Setting up
Clustering schedules 160
Index fields 285
Log streams 384
Security 113
Tasks to process data before indexing 74
Sizing 46
SleepBetweenRequests (configuration setting)
177
Slovak encoding settings 343
Slovenian encoding settings 343
SMTPHost (configuration setting) 176, 179
SMTPPort (configuration setting) 176, 179
Snapshots 439
Generating 152
Somali encoding settings 343
Sorbian encoding settings 344
Soundex (configuration setting) 237
Soundex keyword search 237
SourceFields (configuration setting) 258
SourceType (configuration setting) 280
Spanish encoding settings 344
Specifying the language type of your query 321
Spectrograph data generation 153
SpellCheckCorrectMinDocOccs (configuration
setting) 255
SpellCheckIncorrectMaxDocOccs (configuration
setting) 255
SpellCheckMaxCheckTerms (configuration
setting) 255
Spelling Correction 12, 255
Starting IDOL server 59
StartingSuggestOverrideFactor (configuration
setting) 157, 158
StartTask (configuration setting) 74
StartTime (configuration setting) 176, 178
Stemming 308, 439
Tilde 122
Stop (service port command) 423, 429
Stoplists 308, 353, 439
Stopping IDOL server 60
Stopword 440
Storing fields 67
Storing Boolean agents in agentboolean fields
299

Storing content in IDOL server 83, 437


Allocating files to IDOL server databases 65
Checking the indexing process 106
Delayed synchronization 72
Disabling content storage 64
Hyphenated terms 104
Index commands 84
Optimizing 72
Storing IDOL servers data files on multiple
disks 64
Tracking documents 108
XML attributes 69
Storing users in IDOL server 109
STRING field specifier 249
StripLanguage (configuration setting) 313, 319
Suggest action 165, 174, 189, 191, 197, 199,
247, 258, 272, 276, 297
Suggesting
Categories 147
SuggestOnText action 165, 189, 191, 197, 199,
247, 258, 276, 297
Summaries
Concept 257, 436
Context 257, 436
Generating 258
ParagraphConcept 257
ParagraphContext 257
Quick 257, 438
Summarization 12, 257
Query action 258
Returning summaries with query action
results 258
Suggest action 258
SuggestOnText action 258
Summarize action 259
Summarizing text or documents 259
Summarize action 189, 259
[Summary] configuration file section 395
Support 57
Swahili encoding settings 344
Swedish encoding settings 345
Synchronizing categories 145
Synonym
File 440
Query 238
[Synonym] configuration file section 410

Page 457

Index
SynonymType (configuration setting) 240, 280
Syntax
Action commands 62
Index commands 84, 94
Service commands 424
System
Architecture 14
Requirements 23
T
Tagalog encoding settings 345
Tasks 73
ACI 73
Alert 73
Cat 73
Educe 73
Examples 75, 76, 78, 79, 81
FieldOp 73
FileWriter 73
HTTP 73
index 73
LP 73
OCR 73
Processing data before indexing 74
Route 73
Tatar encoding settings 345
Taxonomy 440
Generation 13, 261
Scheduling 262
TaxonomyGenerate action 132, 160,
261, 262
[Taxonomy] configuration file section 407
TaxonomyGenerate action 132, 160, 261, 262,
407
Template (configuration setting) 127
Templates 180
alertTemplate.html 127
channels.xss 180, 181
Editing mailing operation templates 181
email.xss 180, 181
ondemand.xss 180, 181
Writing templates for alert emails 127
[Templates] configuration file section 410
Term 440
[TermCache] configuration file section 393
TERMEXACTPHRASE field specifier 197

Page 458

TermGetAll action 190


TermGetBest action 189
TermGetInfo action 189
TERMPHRASE field specifier 197
TermSize (configuration setting) 349
TestUser (configuration setting) 176, 178
Thai encoding settings 346
Tilde 122
TimeoutMS (configuration setting) 176
TitleType (configuration setting) 280
Tracking documents 108
Through the import and indexing process
108
Training
Agents 121
Categories 140
Transliteration (configuration setting) 350
Transliteration schemes 308
TrimSpaces (configuration setting) 280
Turkish encoding settings 346
Typographical conventions iii
U
Ukrainian encoding settings 346
Undeleting documents 361
Upgrading to IDOL server 50
Urdu encoding settings 347
[User] configuration file section 400
UserAdd action 110
[UserCustom] configuration file section 405
Users
Creating 110
Exporting 382
Importing 383
Indexing 109
Integrating with a third party user structure
111
Profiling 185
Storing 109
[UserSecurity] configuration file section 401
[UserSecurityFields] configuration file section
400
[UserStructure] configuration file section 406

Index
Using 246
Multiple languages 309
Encoding settings
languages 325
UTF8 390
Uzbek encoding settings 347

for

supported

V
Valencian encoding settings 347
VerboseLogging (configuration setting) 178
Vietnamese encoding settings 348
Viewing
Agent details 123
Categories 141
Category details 141
Category hierarchy details 141
Category terms and weights 142
Category training 142
Profile details 187
VQL conversion error messages 416
W
Weight (configuration setting) 267, 280
Welsh encoding settings 348
WhatsHot 154
WhatsNew 154
WILD field specifier 247
Wildcards 246
Searches in Japanese, Chinese, Korean and
Thai 248
Using 247
WNEAR<N> operator 195, 235, 236
Writing documents to disk 73
X
XML
Attributes 69
Importing 83
Indexing 83
XOR operator 195, 236
XSLTemplate (configuration setting) 176

Page 459