Vous êtes sur la page 1sur 9

Technical Specification Architecture For Captiva

Introduction:
Technical Specification Architecture document describes the solution methodology
to be implemented using EMC Captiva product suite

Overview:
InputAccel is a web-based, global input management system that converts business
critical information from paper or electronic sources into digital content and delivers
it to back-end systems. Immediate access to indexed information promotes
expedient business decisions and efficient transactions. By working in a global
environment, InputAccel reduces transmission costs and takes advantage of time
zone differences and low network traffic.
Captiva Product Suites consists of server components and client components.
1.

InputAccel Server

2.

Dispatcher

3.

eInput

InputAccel Server:
The InputAccel server is windows based service or application which stores all data
and acts as a controller. It Communicates with InputAccel Client modules using
TCP/IP and administers the document capture process performed by the client
modules. It Manages the Sequencing of pages through the capture process. It
balances the system workload across available resources. It Serves as a temporary
repository image and data files. It provides centralized status reporting.
Emc Documentum Advanced Export:
Exports documents to new or existing objects in the Documentum system. Objects
are any items that can be manipulated, such as documents, document renditions,
as well as the cabinets and folders in which they are stored.
Dispatcher:
Dispatcher is a high-speed automatic classification, indexing, extraction and routing
solution designed to help businesses handle the vast amount of incoming paperbased information they receive on a daily basis in a fast and efficient manner. With
Dispatcher, you can identify all kinds of documentsstructured, semi-structured
and unstructuredusing sophisticated image- and text-based classification
technology that automatically learns and detects documents based on layout and
attributes such as keywords, logos, and text phrases. Dispatcher includes
sophisticated recognition capabilities to extract data from all types of documents,
including free-form data extraction from less structured documents.
In our solution we are going to use the modules like Dispatcher Project, Dispatcher
Classification and Dispatcher Recognition
Dispatcher Manager:
Dispatcher Manager is the development module in Dispatcher. Dispatcher Manager
is used to customize the way documents are processed by Dispatcher. The
Dispatcher Manager development environment features an automatic learning
module which builds templates based on sample documents. Dispatcher Manager
also has a complete range of modules used to define and test the project before
exporting it to the production environment.
Dispatcher Classification:
Dispatcher Classification module , which quickly and easily identify documents
using a combination of text- and image-based analysis technologies to classify
documents types. Dispatcher does most classification without any user intervention.

After documents have been imported to InputAccel server using eScan modules,
they can be classified using the Dispatcher Classification module. This module
classifies documents automatically by assigning each document to a template
defined in a Dispatcher project.
Dispatcher Recognition:
The Dispatcher Recognition module is an automatic process that extracts data from
structured forms by using character, hand print, and optical mark recognition
processes to identify and digitally encode the printed or handwritten characters that
make up a document. In more simple terms, the text and images of paper
documents are extracted, translated, and encoded into a form that the workstation
can manipulate.

eInput:
eInput is a set of web-based modules that enable remote scanning and indexing of
documents while disconnected from the InputAccel Server. Batch processing is
directed by the InputAccel Server, and, although the eInput modules can
individually process data while disconnected, periodic connection to the InputAccel
Server is necessary during batch creation, setup, and moving batches between
modules in a process workflow.
eInput includes the data processing modules like eScan, eIndex.
eScan:
eScan creates eInput batches and populates them with images obtained
either from a scanner or from image files. After images are scanned or imported,
they can be sent via the eInput Server to the InputAccel Server, where other
modules can obtain the batch data for additional data processing. eScan can run in
offline mode which enables operators in remote locations to scan documents while
disconnected from the InputAccel Server. Connection to the server is only necessary
to upload batches and make them available for the next module in the batch, or to
create new batches.
eIndex:
eIndex is a web-deployable module that transforms data from paper or electronic
sources into indexed digital content. This content can then be exported to back-end
systems for fast, efficient data and image storage and retrieval. In setup mode,
administrators customize eIndex by creating index fields, image zones, keyboard
layouts, keyboard shortcuts, and also defining default configuration settings for
optimum efficiency and productivity. Operators can index and validate data from
both structured and unstructured documents.

Business Requirement:
Captiva product suites should provide the processes and modules that capture,
clean, classifies, extract information (OCR/ICR) and import documents and metadata
into the repository

Solution:
Following modules are involved in the solution ,
1. Process Developer:
In Process Developer the steps are being created or defined .Here the steps
means the way in which documents should flow in order to achieve the solution.
The steps are included in the process builder using MDF (Module Definition File)
files.
The process builder can be started by
START-PRORAMFILES-INPUTACCEL-PROCESS BUILDER
In Process Builder Select FILE menu and Click on the New Project-Define Steps Popup Window Appears in that click on SELECT MDF, so that the required modules are
selected for the solution. After selecting the required module click on the ADD
button so that ADD STEPS window appears, in that window define the flow in which
the documents or files should process , trigger levels can be setted.
A step refers to a specific configuration of an InputAccel module in an Integrated
Process Project file (IPP). In an IPP, you can use the same module in different steps
to perform different tasks
For example : The document should flow according to the below mentioned steps
1.eScan
2. Dispatcher Classification
3. Dispatcher Recognition
4. eIndex
5. EMC Documentum Advanced Export
These are the steps created in the Process builder. In each step some code is
written using Visual basic programming to trigger to next level and also to perform
some operations .Once this IPP file is created in Process builder. Save the IPP file
and the IPP file is compiled into IAP file and then it is installed in the Administration

Console of Input Accel.


For Compiling GOTO File Menu in Process Developer Click on Make IAP , once it is
clicked the file is compiled and save it. During compilation errors are shown and can
be rectified.
After Compilation INSTALL PROCESS can be found in File Menu in the Process
Developer. So the process is Finally installed in Administration console.

2. Administartion Console:
Web-based module that allows administrators to monitor, configure, and control
an InputAccel system over the Internet. Controls all process and batch
administration, including monitoring batch traffic and finding batches. Performs
administrative tasks related to InputAccel modules ,eInput and Dispatcher.
In the Administration Console window, select Systems from the navigation panel.
The Systems pane displays. Click View Processes. The Processes pane displays the
list of processes added to the system. Select the process containing the module
step to configure. The Steps for process table lists all the module steps associated
with the selected process. Double-click the module step to configure or right-click
the step and select Settings. The corresponding module is run in setup mode.
Configure the module step and click OK to return to the Processes pane.
In Administration Console, when we click on the installed IAP file ,the list of
process created and installed using the process builder is displayed .Clicking on one
of the process , the steps defined for that process is displayed , double click on any
of the step in the process a setup window appears .configure the step based on the
requirement and save it and return back for the other steps configuration.
3. eScan:
As the image entry point for the capture solution is eScan, it must be the first
module instance or step in the process. Furthermore, the eScan module must be
used to create the batches. eScan setup must be performed at the process level.
Launch eScan from Start program files InputAccel eScan .
Authentication window appears for connectivity to InputAccel Server. Enter the
Login Credentials, now the Action page appears.
In the Actions panel of the eScan Production window, click New Batch. The New
Batch window opens. Select the process from the Based on Process list box. Every
batch must be based on a process that has been defined and installed on the
InputAccel Server. Only processes that include instances of eScan appear in this list
box.
Type a unique name for the new batch in the Batch name text box. Click on ok to
create the batch - Now the Main Window appears.
To import pages:
In the Actions panel, click Import.

Select the files to import. Several files can be included in the selection.
Click Open to close the Select Files to Import window and import the selected
images. Thumbnail images of the imported pages appear in the Tree pane and the
last page imported appears in the Image pane.
After importing the image through eScan, it is sent to the InputAccel Server for the
other steps to access the same batch. In the Actions panel, click Send Batch, then
when prompted, confirm to proceed with the operation.
3. Dispatcher Classification:
Files or Batches are retrieved for classification in Dispatcher Classification. It can
be launched from
Start Program files Dispatcher Dispatcher Classification.
The main purpose of this classification module is to classify the document based on
the type. Classification is based on the defined template. So the templates are
created using Dispatcher Manager or in classification module itself .The Dispatcher
classification refers the template for classifying the document. There can be many
templates created based on the requirement.
In classification module, the New Project Wizard features the automatic learning
functionality. You use this functionality when you have a set of unsorted images and
you want Dispatcher to analyze the images and create the classification templates
automatically.
To create standard templates automatically:
Select File > New or click the New button in the toolbar. The New Project Wizard
appears.
Click the Start button. The Project Parameters window appears. Fill in the Project
Name, Author, and Company.
You can also add notes about the project.
Specify how many images you want to keep in the template image base, regardless
of the number of images used to run automatic learning. Specify this value in the
Maximum number of images linked to an image reference field.
For example, if you specify 10 images and use 100 during automatic learning,
Dispatcher will automatically keep in the image base the 10 images that are most
representative of the template. The image base is used to run template tests and
classification tests. The default value is 50. It is recommended to provide 10 to 20
images to enable efficient automatic learning. At least 10 images are necessary.
Select Remove black edges from documents to instruct Dispatcher to ignore black
edges present on the images during classification (the black edges are just ignored,
the image files are not modified). This option is recommended as images with black
edges are not typical of the template and may not be correctly classified.
Select Next to carry out automatic learning. The Select a Base of Image window
appears.
Select images: these are the options available
To load selected images from a directory

To load all the images from a directory


To load all the images from a directory and subdirectories
To unload the selected images
To unload all the images
after loading the images, you can click the icon to run the Image Analyzer to check
the image format and resolution or you can go to the next step where the Image
Analyzer will run automatically.
Select Learn to start the automatic learning process. The Image Analyzer runs and
automatic learning is performed.
Once the template is created,
From the main interface of Dispatcher Manager, select the Test > Classification Test
menu.
select the menu File > Open to select the images to be tested. When you select the
Test > Classification Test menu in Dispatcher Manager, the Classification Test
window opens and all the project images are loaded by default in the Test window.
You can delete images and reload others using the File menu options.
Select the Test menu. You can choose to run classification tests on all templates in
the project, or simply on specific types of template. This possibility is very useful
when running tests on projects that have templates of different types that each
contain a large number of images. Running classification tests on such projects can
be a very lengthy process, so limiting the test to a specific template type can
reduce considerably the time it takes to carry out the classification test. You can
select several types of template in this way so that only those templates are tested
Select Test > Run. The Classification Test window appears.
When the test completes (i.e., when the progress bar displays 100%) click the OK
button.
The tested images are displayed in the three following tabs: Classified, To confirm,
Not classified. The status bar displays the classification performance, namely: the
number of processed images, the number of images processed per second and the
number of images classified.
The images that have been successfully classified to the project templates appear
in the Classified tab. For each image the File column indicates the complete path of
the image; the Name and Code columns indicate the name and code of the
template to which the image is classified. If the detection of rotation and/or side
flipping is enabled, columns appear indicating rotated and/or side-flipped images.
Once the classification is completed the files or images are sent to the InputAccel
server.
4. Dispatcher Recognition:
Launch the Dispatcher Recognition from Start programs Dispatcher
Dispatcher Recognition.

Dispatcher has a set of recognition engines to carry out both zonal and Free
Form recognition. Zonal recognition is mostly used on structured documents and
Free Form recognition is used for semi-structured or unstructured documents. Free
Form recognition requires a definition file created with the module Free Form
Designer.
To configure recognition in Dispatcher, you create fields within one or several
indexing families that you associate with classification templates. An indexing
family is a set of fields to be extracted from a specific type of document. For
example, you can create an indexing family specifically for all invoice documents in
which you create fields, such as invoice number, supplier reference, and total
amount. Then, you assign this indexing family to all the classification templates that
you have created specifically to process invoices.
The Dispatcher Recognition module performs the following functions:
Extracts data from the image and outputs this data in the form of IA Values
including
Field data: Data from predefined fields on an image.
Zonal OCR data: Data from zones on the image.
Free-Form data: Full-page reading during which data is extracted from fields.
Character-level data: Character-level data is only available for zonal OCR data.
Table data: The name and zone coordinates of the table and the comma-separated
data values of all the cells in the table as defined in the template having the highest
confidence match with the image being processed.
Inputs IA Values containing field values: If field values are known before recognition,
their values can be sent to the Dispatcher Recognition module via the IA Value
InputFieldNames that contains a list of the field names to be input during
recognition. Dynamic IA values can be used to send the value of each input field.
Outputs IA Values containing the recognized and validated text: The result of
recognition on an image is an IA Value containing the name of the class and its
confidence score.
5. eIndex:
Before launching the module, setup should be done for this module in
Administration Console.
Run Administration Console and select Systems from the navigation panel.
Click View Processes to display the Processes pane.
Select the process to set up from the Processes table.
The steps included in the process display in the Steps for process table.
From the list of available steps, double-click the step to run for setup,
or right-click on the step and select Settings. The setup window for the step
displays.
Configure settings for the step as required.
Click OK or Save as appropriate to save the module configuration settings.
All new batches based on the process will use these settings.

Launch the eIndex module from Start Programs Input Accel eIndex.
Authentication window appears for the connectivity to the InputAccel server,
Once it is done main window appears , now select the options from the Action
Panel , the options are
1.Run Single Batch
2.Run All Batches
3.Open Batch
From the action panel select any one option to retrieve the batch for validation.
Once the option is selected the main window appears, showing the image and the
fields entered
the image is displayed at the top and the fields are at the bottom .once the
validation is completed by
clicking on the accept task button, then the batch is sent to the Input Accel
Server by clicking on the Send batch button.
6. EMC Documentum Advanced Export: