Académique Documents
Professionnel Documents
Culture Documents
Projects
Version 10.5
October 2006
Opening this package indicates your acceptance of the terms and conditions of the HarteHanks license agreement. The customer acknowledges and agrees that (a) the System and
all related documentation are confidential trade secrets of Harte-Hanks or Harte-Hanks
licensors and (b) title to and intellectual property rights in the System and related
documentation (including without limitation all copyright, trademark, trade secret and patent
rights) are and shall remain the confidential proprietary property and information of HarteHanks and Harte-Hanks licensors.
The customer shall use the system only in accordance with this Agreement. The customer
shall not disclose, copy, or reproduce any portion of the system or documentation in any form
to any third person without the prior written consent of Harte-Hanks, nor allow third parties
to do the same. The customer shall keep the System and all confidential information in the
strictest confidence.
October 2006
TOC-1
CHAPTER 2
CHAPTER 3
CHAPTER 4
TOC-2
CHAPTER 5
CHAPTER 6
CHAPTER 7
TOC-3
Syntax of Definitions............................................... 7-4
Synonym..............................................................7-12
Special Entries ......................................................7-14
Conventions in Parsing Customization ........................7-21
How to Customize the Parser Definition Tables for Japan.. 7-23
Clue Table ............................................................7-23
Name Tables.........................................................7-26
jp_bnp_name.txt...................................................7-27
jp_bnp_name_h.txt ...............................................7-28
jp_pnp_name.txt...................................................7-29
Using the Parser Customization Editor ........................7-31
View a Standard Definitions Table ............................7-31
View and Correct City Problems ...............................7-33
View and Correct Pattern Problems ..........................7-37
Save the Entries....................................................7-40
Re-Run Customer Data Parser .................................7-40
View Errors in Parsing Customization........................7-40
CHAPTER 8
CHAPTER 9
TOC-4
Browsing the Postal Directory....................................9-20
City Level Directory ...............................................9-20
Street Level Directory ............................................9-21
Street Details........................................................9-22
CHAPTER 10
CHAPTER 11
CHAPTER 12
TOC-5
Run the Create Common and View Results .............. 12-11
Create Common Decision Routines........................... 12-12
Decision Routine Selections for a Single Field .......... 12-14
CHAPTER 13
CHAPTER 14
CHAPTER 15
TOC-6
CHAPTER 16
CHAPTER 17
APPENDIX A
TOC-7
CLASS Keyword ....................................................A-10
APPENDIX B
TOC-8
CHAPTER 1
Introduction
This book is intended for users who wish to learn how to use TS
Quality. It provides step-by-step instructions to set up a project and
process data. The book assumes that the users have installed TS
Quality Server, TS Quality Client, TS Quality Country Template
Projects and Postal Tables according to Installing TS Quality, and
read the introductory book, Getting Started with TS Quality.
This book covers the basic functions of TS Quality, but users should
also consult companion materials, such as TS Quality Reference
Guide and TS Quality Online Help to utilize the full capabilities of
TS Quality.
See Getting Started with TS Quality for the complete
list of TS Quality documentation and materials.
Introduction
1-2
Sample Project
Sample Project
In this book, a global sample project (TMT project) is used to
illustrate various TS Quality functions. The TMT (TrilMedTech)
project contains customer data from the United States, United
Kingdom, Canada and Germany. The record data consists of typical
business database fields:
Contact name
Phone number
Address information
Product information
Various dates
Account representative
Account status
2-1
CHAPTER 2
2-2
In order to process your data, we strongly recommend that you first
create a project. A project includes a set of steps (core modules) for
centralized access and allows you to manage data processing tasks
easily. Projects are created in the Control Center, the graphical user
interface. Within a project, you can run processes, view data, create
and edit DDLs, modify settings, analyze output and tune the overall
process. Projects within the Control Center are mainly used to
create and test batch process flows for later use in a production
environment.
This chapter focuses on these topics:
Project types
Starting and setting up the Control Center
Creating and working with projects
For an overview of the TS Quality Control Center and
projects, refer to Getting Started with TS Quality.
Types of Projects
2-3
Types of Projects
A project is a combination of one or more modules and tasks that
process a particular set of data in a job flow. Each module in a
project is called a step. A project includes all required data files,
DDL files, settings files, output, statistics files, user-defined tables
and batch scripts for modules. Within a project, you can run the
entire job flow, from the Transformer to the Relationship Linker, or
only part of the flow.
There are two types of projects:
Standard Project - a basic project which includes
predefined modules
Custom Project - a complex project for advanced users
The Create New Project Wizard will guide you through creating a
project. You will be prompted to select a type at the beginning of
the Wizard. Both standard and custom projects may later be
modified by adding and deleting steps, or can be customized by
adding user-defined components.
2-4
1.
2.
3.
2-5
Tools Palette
Tool Bar
Work Area
2-6
2.
2-7
Option
Description
On Startup
Other Startup
Options
Default Project
Directory
Enter the directory where project and step files will be stored.
Default: C:\TrilliumSoftware\tsq10r5s\mynewdir
Input Staging
Directory
Enter the directory where input data files for the project or step
will be stored.
Default: C:\TrilliumSoftware\tsq10r5p
Help Directory
My Editor
Enter the path and executable file of your text editor to display
and edit text files within the Control Center.
My Statistics Viewer
2-8
Discovery Launch
Directory
4.
Click OK.
2.
3.
4.
5.
6.
7.
Creating a Project
2-9
Creating a Project
The Control Center allows you to create a standard or a custom
project. The standard project option is recommended for new users
and may later be modified to meet your specific data cleansing
needs. The custom project option is used to create a more complex
project and is recommended for more experienced users. The
Project Wizard will guide you through the project creation
process.
In order to create a TS Quality project you will need certain
information:
The name and location of your input data file(s). The input
data file(s) should be either:
a delimited file
2-10
Creating a Project
To select a project type
1.
From the main menu select File, New Project. The Create
New Project Wizard appears.
2.
3.
Select Next.
4.
Option
Description
Standardize
Standardize and
Enrich
Standardize,
Enrich and Link
Other Custom
Process
Settings
Description
Project Name
Project Directory
Path
Single or Multiple
Country Project
Creating a Project
Input Files
Country of Origin
2.
2-11
Select Next.
2-12
Creating a Project
provide the input file name, format, and DDL in the
Specify Multiple Inputs window. Click Next.
Valid
delimiters are
Tab, Space,
Semicolon,
Comma, and
Pipe.
Characters
other than
those listed
must be
enclosed by
quotation
marks.
2.
Creating a Project
3.
If you are
using a
delimited file
as input in the
Wizard, the
subsequent
input files and
all output files
in the project
become fixed
field files.
2-13
2-14
Creating a Project
found on the input DDL file. Select the field and drag it to the
Name and Address Palette. The actual record data is
displayed in the Preview Name Address area.
After dragging selected fields to the palette, you can
make multiple fields single-line by editing them in the
palette.
The Apply
button must
be selected for
the Control
Center to
accept your
desired name
and address
format.
2.
3.
Creating a Project
4.
2-15
2-16
Project Panel
The Project Panel is displayed when the Control Center is opened.
Existing projects appear as a suitcase icon labeled with the users
hostname and the project name.
To explore the Project Panel
1. Click
to close an open project and to view the Project
Panel.
Project Panel
Project Viewer
2-17
Right-click and
select
Properties
Project Viewer
The Project Viewer displays all modules or steps within a project.
To explore the Project Viewer
1.
2.
2-18
Project Viewer
Project Viewer
Step Viewer
2-19
Step Viewer
In the Step Viewer you can set up the module, specify input and
output files, modify program tasks and conditions, customize rules,
run the module, and view and analyze output files, statistics and
logs.
To open the Step Viewer
1.
2-20
Graphics View
2-21
Graphics View
In the Graphics View, you can perform various step-specific tasks:
run, rename, and move steps
delete and connect steps
copy steps
change settings files
Right-click on a
step
2.
3.
2-22
Graphics View
To rename steps
1.
2.
To move steps
1.
To move a single step, click and hold the step and drag it to a
new location.
2.
To move the entire job flow, click on the first step, hold down
the CTRL key, and click all the other steps in the job flow. You
may now drag the complete flow to a new location. Or, rightclick a step and select the Select All Downstream option,
then drag it to a new location.
To connect steps
1.
To remove a connection
1.
Incoming
Connection Area
Outgoing
Connection Area
Graphics View
2-23
2.
In the list view, select the module to copy from the list and
click the Copy Selected Step button in the toolbar menu
above.
2.
3.
2-24
2-25
To add comment
1.
2.
3.
2.
Select a step from the Step Palette: Drag and drop it on the
DFA. Choose a country, and provide a name for this step.
Then click OK.
2.
2-26
List View
List View
In the List View, you can view steps in the order in which they will
be processed. A step may be opened by double-clicking it. From the
List View you can perform several tasks:
Open, rename, add, delete, and reorder steps
Generate a batch script to run selected steps
For information on batch scripts, See Batch Script on
page 14-3.
To open the List View
1.
2.
Click a step in the List View and the tool bar options become
available.
Tool bar
2.
Click
2.
Click
3.
To add steps
List View
2-27
1.
2.
Click
on the tool bar. Step Palette appears on the
left. Drag and drop the desired step into the List View.
3.
4.
To delete steps
1.
2.
Click
To move steps
1.
2.
Use the up and down arrow keys to move the steps into the
desired order for processing.
Move selected step(s) up
and down
2-28
Input
Settings
tab
Use the Input Settings tab to specify the Input File Name and
Input DDL Name.
To specify input files
1.
Type a file name in the Input File Name and Input DDL
Name text boxes. You can use the File Chooser button to
select the files.
2.
Type a file name in the Input File Name and Input DDL
Name text boxes. You can use the File Chooser button to
select the input files.
2-29
Click Replace. The file names in the Input Data File Name
and Input DDL Name column are replaced with the files you
just specified.
Highlight the row in the Input Data File Name and Input
DDL Name column that contains the file names you want to
delete.
2.
Click Delete.
Comment
Data Browser
DDL Editor
Entry List
2-30
Output
Settings
tab
The Output Settings tab lets you specify the Output File Name,
the Output DDL Name, the Statistics File Name, and the
Process Log Name.
To specify output files
1.
Type a file name in the Output File Name and Output DDL
Name text boxes. You can use the File Chooser button to
select the files.
2.
2-31
2-32
To run a step
1.
2-33
2-34
Keywords in a DDL
For delimited files, every field in the DDL should reflect the
maximum field length.
For example, if you have a field on input called ADDR_LINE_1 and
the value is "10 Main St", then a field length of 10 bytes for that
field will be sufficient, but a field length of 8 bytes will truncate to
"10 Main ". If you have that field on output and the line was
changed to "10 Main Street" by processing, then a field length of 10
will truncate the output to "10 Main St". Make sure that you have
enough field length for each field on the DDL for delimited files.
Keywords in a DDL
A DDL uses the keywords shown in Table 2.1. Required keywords
are listed in bold.
Table 2.1 DDL Keywords
Keyword
Description
Record
Name
Record
Length
The total record length in bytes. The total length of the record must be
equal to the sum of the lengths of all fields.
Field Name
Type
Data type for the field. You can specify the appropriate character
encoding or other type of value.
See Type Keyword on page 2-42.
Redefine
Start
Position
The relative byte position of a field within the record. DDLs are zerobased. Therefore, the first field of a record generally begins in column
zero.
Keywords in a DDL
2-35
Description
The length of a field in bytes. The number must be a positive integer
greater than zero.
If the entity is a field, the length must be less than the Record Length.
Two fields cannot occupy the same space, unless one field is a
redefinition of that field. If the entity is a subfield, the length must be
less than the parent field.
The sum of all field lengths must equal the length of the record.
Default
The default value for the field. The value must agree in type with the
Type. Numbers may be positive or negative.
Values:
SPACES fill the field length with spaces.
-1 for a numeric with a negative value.
0 for a numeric.
'0' for a character field.
"0" for a string field.
Comment
Attributes
2-36
Keyword
CLASS
Description
Converts any 2-digit year into a 4-digit year. If used, it must
immediately follow a Field statement.
Values:
DATE BACKWARD
DATE FORWARD
DATE WINDOW {nnn}
If used, CLASS is required to be on the input DDL. See CLASS
Keyword on page A-10 to learn more about the Class
specifications.
Open the DDL Editor from the Control Center. Select New
from the File menu. A new empty DDL opens.
2-37
4.
5.
6.
Repeat this process until all fields are defined in the DDL.
7.
2-38
If you are
using the
Project Wizard
to create a
project and
you dont have
a DDL file for
delimited
input, it will be
created
automatically
using the
header as field
names.
1.
2.
Select New from the File menu. A new empty DDL opens.
3.
4.
For delimited files, every field in the DDL will reflect the
Creating and Working with TS Quality Projects
2-39
Syntax
Use the following syntax:
Keyword [is, are, in] Parameter
Keywords are case-insensitive.
For example, the following keywords all mean the same
thing:"Field", "FIELD", and "field".
Brackets
The actual brackets [ ] are not physically entered on a DDL
file. All punctuation and noise words such as "is", "are", and
"in" can be used. They are highly recommended to make
subsequent reading more understandable.
Parameters are case sensitive.
All name and string value parameters are case sensitive.
String values are enclosed within double quotes. (example,
Hello World)
Tab characters are not allowed in a DDL.
Always define until the last carriage return.
Comments can be enclosed between the string pairs "/*"
and "*/", or can be indicated by the prefix string "//".
2-40
Example
/* This is a comment that extends over two lines
delimited by the slash and asterisk pairs */
//This is a comment to the end of this line
Field is input_line_1
Type is ASCII
Starts in column 0
Length is 50
Field is input_line_2
Type is NOTRANS
Starts in column 50
Length is 50
Default is 0
Field is input_line_3
Class is DATE FORWARD
Type is ASCII
Starts in column 100
Length is 50
Body Information
Field is input_line_4
Type is ASCII
Starts in column 150
Length is 50
Attributes are NOVALIDATION
2-41
of the field or the same field with a different name in the output.
Redefining fields requires listing two fields: the field to be redefined,
followed by a field listing that is the redefinition.
The Starts in position may be maintained manually. However, the
automatic renumbering of Starts in position is facilitated through
the //REDEFINE statement. When the Recalculate Positions
function in the DDL Editor encounters the string //REDEFINE ahead
of a pair of field definitions, it will not increment the Starts in
number for the second field definition.
Type is FIXED
Length is 200
//REDEFINE
Field is ORIGINAL_RECORD
Type is ASCII
Starts in COLUMN
0
Length is 200
Field is input_line_1
Type is ASCII
Starts in COLUMN
0
Length is
100
Field is input_line_2
Type is ASCII
Starts in COLUMN
100
Length is
100
2-42
Type Keyword
Type Keyword
The Type is required for every field entity. There are two Type
categories: encoding (code page), and date format. The
following list shows the main values used for the Type keyword.
Encoding (Code Page)
Encoding is a mapping of binary values to code position to
represent characters of data. It is also called a code page.
The main character encoding used in TS Quality includes
ASCII, Latin1 and Latin2.
See Appendix A for the complete list of Encoding.
Date format
Date format is a type of data which may contain only valid
dates.
See Appendix A for the complete list of Date format.
Class keyword
Class keyword specifies the format to be used for the date
field. By using the class keyword, you can convert any 2-digit
year into a 4-digit year.
See Appendix A for the complete list of Class
keywords.
3-1
CHAPTER 3
3-2
After you create a project, you must investigate your data before
working with any processes. Investigation helps you determine how
well your data conforms to rules that govern acceptable limits and
requirements for data elements, and helps you understand what
data quality processes need to be put in place. Investigate your
data with the Data Browser, DDL Editor, and TS Discovery.
This chapter focuses on four tasks:
View data using the Data Browser
View DDL using the DDL Editor
Analyze data using TS Discovery
Identify problems with the data
3-3
The Data Browser lets you view a data file to verify its format as
described by the data dictionary language (DDL) file. You can verify
the format on either a record-by-record or on a field-by-field
basis.
To open the Data Browser and view the input data
1.
Project
First step
3-4
On the Input Settings tab, select the first entry in the entry
listing options. The input file name and corresponding DDL
file name will already be populated. These files were
specified during the Creating Project Wizard process (See
Creating a Project on page 2-9).
Entry Listing
5.
Select the fields you want to display in the upper pane and
click Add. To select all the fields, click Add All.
7. After the fields appear in the Selected Fields list box, you
have several options:
6.
3-5
See the next procedure To save the view for more details
on saving the fields.
Click Display.
3-6
Browse the data and verify that the field names reflect the
data contained within them.
2.
3.
Click Save
3-7
The Save window opens. Name this view and save it in the
desired directory.
3-8
Click the view name and select OK. The fields will be loaded
in the Selected Fields window. Select Display to view the
stored fields.
3-9
The Data Dictionary Editor (DDL Editor) lets you view existing
data dictionary language (DDL) files.
To open the DDL Editor and view a DDL
1.
3-10
Window/Option
Description
Record Name
Record Length
Update
ORIGINAL_
RECORD Length
3.
The upper frame shows the Record Name, Record Length and
Update ORIGINAL_RECRD_LENGTH option:
Column
Description
Field Name
DDL fields listed row by row, in the order that they appear
in the DDL. The standard field names are displayed in blue.
Other unique field names are displayed in black.
Type
Redef
(Redefine)
Start Pos.
Length
3-11
Default
Comment
Attribute
Class
4.
5.
Click
3-12
3-13
3-14
4-1
CHAPTER 4
4-2
After you have investigated the data and identified the issues, you
can begin to process the data. First, use the Global Data Router to
separate the multi-country input file into country-specific files. One
advantage to running the Router step before cleansing and
standardizing your data is that it enables data to be standardized at
the country level. This ensures that further processing is done at a
country-specific level.
In this chapter, you will perform these tasks:
Specify the input and output files
Identify the rules files used to determine the country of origin
Identify the Global Geography table, which contains state,
city, locality, post code and word/pattern structures
Define the settings for the Global Data Router. These include
the ability to:
4-3
Since the Global Data Router step is usually the first step in the
project, it uses the Input File Name and Input DDL Name specified
in the Project Wizard as the default inputs.
To specify input and output files
1.
Open the Global Data Router step and click the Input
Settings tab.
2.
Enter file names in the Input File Name and Input DDL
Name text boxes.
3.
4.
Enter file names in the Output File Name and Output DDL
Name text boxes.
Separate Output
If you want a separate output file for each country, select
Generate a separate output file per country. When
this option is selected, an underscore(_) and an asterisk
(*) will be added automatically to the filename you
specified in the Output File Name text box. After
processing, each output filename will include a country
suffix in lower case. For example, the US data will be
named <filename>_us, and the Canadian data will be
named <filename>_ca.
Using the Global Steps
4-4
A red flag
indicates a
REQUIRED field
for this operation.
2.
3.
4.
Process Settings
2.
4-5
2.
2.
If you are using a delimited file for input and/or output, you
must specify delimited settings.
1.
2.
3.
4.
Process Settings
Once you have specified input and output files, you are ready to
specify the settings to process your data. Do this in the Advanced
Settings window.
4-6
Rules Files
Rules Files
The Global Data Router uses two rules files to determine country of
origin. Rules files contain entries that define the resource tables
used by the Global Data Router program, as well as country-specific
data.
Global Rules FileDefines rules that apply to all
countries. It also contains translation tables, street types,
city definitions, and other rules that require lengthy entries.
Country Rules FileDefines rules that apply to specific
countries.
See Global Data Router in the TS Quality Reference
Guide for details of these rules files.
To specify the Rules Files
Tip:
You can either
edit the file names
manually or click
the File Chooser
icon to
browse
for and
select the file.
1.
2.
3.
Locate the Global Rules File and Country Rules File and
specify the files.
Default Global Rules File:
\TrilliumSoftware\tsq10r5s\tables\
general_resources\rtrules1.win
Default Country Rules File:
\TrilliumSoftware\tsq10r5s\tables\
general_resources\rtrules2.win
You can edit the Rules Files. You may also use the
Customer Rules File which allows you to add your own
user-definied rules. See Global Data Router in the TS
Quality Reference Guide for details.
Country Settings
4-7
2.
Country Settings
If the data has a country code field, you must specify the field name
for the country code. This ensures that the Global Data Router uses
the data in this field to identify and score country of origin.
To specify a Country Code Field
1.
2.
Country
Code
Field
4-8
Country Settings
2.
Review the list and confirm that the Country List identifies
the valid country choices for your data.
Country List
Fields Settings
4-9
Fields Settings
You must tell the Global Data Router which fields contain country of
origin codes. When there is no valid country code or the country
code is suspect, the Field Settings will determine which fields the
GDR will inspect.
To specify fields to scan for country of origin data
Navigate to Fields, Field. Select the field name that
contains information for country of origin.
If you have a valid country code field, you can select that
field. This means that the program will only scan that field
for country of origin data.
DDL Settings
If you choose, you can specify separate output DDLs for each
country. If this is not specified, the output DDL specified in the
Output Settings will be used.
To specify a separate DDL for each country
1.
2.
Select the DDL file for each country from the drop-down list.
4-10
Additional Settings
Additional Settings
You can specify the following additional settings:
See Global
Data Router in
TS Quality
Reference
Guide for the
complete
settings
information.
2.
3.
In the Debug File text box, accept the default path and file
name, or enter a new file name. Debugging information will
be written to this file.
2.
4-11
1.
2.
3.
4.
Select OK.
On the Results tab, select the Statistics sub-tab. The
Statistics sub-tab will show the number of records included in
each country-specific file. The NOMATCH file contains any
records where the Global Data Router was unable to
determine country of origin.
4-12
5-1
CHAPTER 5
5-2
After you separate the input data into country-specific data, you can
start the cleansing process. This chapter explains how to cleanse
the data using the Transformer.
In this chapter, you will perform these tasks:
Specify the input and output files
Use character translation to convert particular hexadecimal
values
Use field scanning to change field values
Use table recoding to recode the values in a field using a
literal or mask shape
Use conditionals to control the field scan and table recode
settings
Run the Transformer and review the results
5-3
2.
Specify a file name in the Input File Name and Input DDL
Name text boxes.
3.
5-4
Tip:
You can either
edit the file names
manually or click
the File Chooser
icon to
browse
for and
select the file.
To view the
contents of your
data file,
click the
Data
Browser
icon.
Click Replace. The default file names in the Input Data File
Name and Input DDL Name column are replaced with the
files you just specified.
The Transformer can use up to ten input files
simultaneously.
4.
5.
6.
Use the
Dictionary
Editor to view the
contents
of the
DDL file.
2.
3.
4.
5-5
2.
Select the appropriate input file from the Input Files text
box on top.
2.
If you are using a delimited file for input and/or output, you
must specify delimited settings.
1.
2.
3.
4.
5-6
The File
Source and
Source Field
work together.
If you specify
one of these
values, you
must specify
the other
value. If you
delete one of
these values,
you must
delete the
other value.
4.
2.
2.
2.
5-7
2.
In the Input Data File field, type or browse to the input file
you wish to use.
3.
In the Input DDL File field, type or browse to the inpt DDL
file associated with the input data file you specified in Step 2.
4.
Click Add.
5.
Repeat Steps 2-3 until youve added all DDL files you want to
use to create the common output format.
6.
7.
5-8
Use the Input DDL drop-down menu to select the DDL file
you want to use to map fields to an output DDL file. The
input DDL fields appear in the left pane and the final output
DDL fields appear in the right-pane.
9.
Use the buttons in the center panel to refine the output DDL
list of fields. You can choose from these options:
Addadds the selected input DDL field to the output DDL
list.
Deletedeletes a selected output DDL field from the list.
Process Settings
5-9
When you are ready, click Save to save the output DDL field
mapping. When the Transformer step runs, it will create an
output DDL file that uses this mapping.
Process Settings
Once you have specified the input and output files, you can
configure the settings to process your data. The settings for
processing are managed in the Advanced Settings window.
Character Translation
The Transformer lets you convert the original hexadecimal value to
another hexadecimal value.
To convert the hex value
1.
5-10
Field Scanning
2.
3.
Specify a value for From Hex Value. This is the original hex
value which will be translated to another hex value.
4.
Field Scanning
The Field Scanning function converts the
values in the field. You can scan values and
then Change, Copy, Cut, and Flag the
values.
A red flag
indicates a
REQUIRED
field for this
operation.
1.
2.
Setting
Scan Field
Description
Field in the DDL file that specifies the location in which
to perform the scan.
5-11
Description
Field Justification
Scan Format
Scan Value
Change Value
Change Occurences Numeric value that indicates how many times to scan
for a value in a particular word or field
Scan Position
Scan Level
Scan Direction
Between Substring
And Substring
Retain Between
Characters
Scan Value
Encoding
5-12
Description
Change Value
Encoding
Between Substring
Encoding
Example
In this example, the phone number currently has dashes and
spaces. To match more accurately, you should remove the dashes
and spaces from the phone number. To change the phone number
format, scan the Phone field for the Literal value - (a dash) using
the following criteria:
Scan Field
Phone
Field Justification
Left Pack
Scan Format
Literal Value
Scan Value
- (a dash)
Scan Position
Default
Scan Level
Field
Change Value
Change Occurrences
A (for All)
These settings will cause the Transformer to scan the Phone field
for the literal value - at the Field level. If the value is found, the
Transformer will left-pack the value and change it to nothing.
Phone Field
207-555-4423
Phone Field
207555442
5-13
A red flag
indicates a
REQUIRED
field for this
operation.
1.
2.
Field
Description
Scan Field
Target Field
Field
Justification
Scan Format
Scan Value
Retain Scan
Value
Scan Level
Scan Position
Scan Direction
5-14
Description
Scan Capture
Word Delimiter
Between
Substring
And Substring
Retain Between
Substring
Scan Value
Encoding
Word Delimiter
Encoding
Between
Substring
Encoding
2.
Refer to the following table and specify values for Flag in the
Field Scanning window:
Setting
Description
Scan Field
Target Field
A red flag
indicates a
REQUIRED
field for this
operation.
5-15
Description
Scan Value
Retain Scan Value When checked, retains the scanned-for value in the
target field
Scan Level
Scan Position
Scan Direction
Word Delimiter
Flag Value
Between
Substring
And Substring
Retain Between
Substring
Scan Value
Encoding
5-16
Description
Word Delimiter
Encoding
Flag Value
Encoding
Between
Substring
Encoding
Example
For example, to flag the Doctor_flag field in this example, scan the
Title field for the Literal value DR using the following criteria
Literal values are always case sensitive.
Scan Field
Title
Target Field
Doctor_flag
Field
Justification
No Justification
DR
Retain Scan
Value
Check on
Scan
Position
Default
Scan Level
Field
Flag Value
These options direct the Transformer to scan the Title field for the
literal value DR at the Field level. If the value is found, the
transformer will retain the scan value (DR) in the source field, and
place the flag value Y in the Doctor_flag field.
Title Field
DR
Doctor_flag Field
Y
Table Recoding
5-17
Table Recoding
The Transformers Table Recoding function converts the values in a
field using an external user-defined recode table. You can recode
literal or mask values.
Mask
Masks are character representations of a data value which define
each character in the data value as follows:
Code
Represents
explicit
Value
Jane Smith
aaaa aaaaa
5.00E+02
n.nna+nn
$400.00
$nnn.nn
05/31/2005
nn/nn/nnnn
jane_smith@abc.com
nnnn_nnnnn@nnn.nnn
Example
2.
5-18
Mask
Example
In this example, the Start_date field has a variety of data shapes
and formats, such as 1/1/2005 and 1/01/2005. Create a recode
table as shown to change the mask shapes for the Start_date field,
so that every Start_date has the format of MM-DD-YYYY.
Original Mask
Recode Mask
N = Numeric
The table requires a DDL that assigns field names to the two
columns. Create a DDL file that corresponds to the recode
table. For example, a DDL file for the table above would look
like this:
Mask
5-19
4.
5.
6.
Enter names for the Table File and Table DDL File.
7.
8.
9.
10.
11.
12.
13.
TBL1
Table File
datamask.csv
datamask.ddx
File Delimiter
Comma
originalmask
Start_date
Mask
recodemask
5-20
Mask
Reocode Table Fields Format
Mask Value
Start_date
Setting
Lookup Fields Case-Sensitive
Description
Enables or disables the case-sensitive table
lookup. By default, the lookup is caseinsensitive. For example, Rick will match
either RICK or riCK.
Conditionals
5-21
Conditionals
Conditionals control the flow of TS Quality processes by
performing specific operations on data records, or by running
functions. In the Transformer, the Conditionals function controls all
other functions including character translation, field scanning and
table recoding. The conditionals settings are specified in the
Advanced Settings, Conditionals window. This section explains
the conditionals syntax and sample usage, and then teaches you
how to build a conditional statement.
If you are using translation, recode, and/or scan
functions in the Transformer, you must specify
Conditionals. See Build a Conditional Statement on
page 5-35 for instructions.
In addition to the Transformer, you can use conditional statements
for the following TS Quality modules:
Customer Data Parser
Business Data Parser
File Display Utility
File Update Utility
Set and Selection Utility
Syntax
An IF/ELSE statement is used to describe the condition. The
following syntax must be used to build the conditional statement:
The IF keyword allows you to conduct conditional tests on values in
the field. When conditions are True, the RUN and/or SET keywords
following IF are executed. When condition(s) are False, the RUN
and/or SET keywords following the ELSE keyword are executed. A
conditional statement always closes with ENDIF. Refer to the
5-22
IF Statement
IF [condition]
RUN [function1]
SET [function2]
ELSE
RUN [function3]
SET [function4]
ENDIF
following table for a list of keywords used in conditional statements.
Table 5.1 Keywords of Conditional Statements
Keyword
Description
IF
RUN
SET
ELSE
ELSEIF
ENDIF
IF Statement
The IF statement sets the condition. The IF statement is defined by:
DDL field names
Operators (arithmetic/comparison/logical)
Field value(s)
Literal field values such as Boston must be enclosed in
double quotation marks. Field names and numeric values
do not need the quotation marks. If numeric values such
as 123 are enclosed in the quotation marks, they are
read as literal values instead of numeric values.
RUN/SET Statements
5-23
Example
IF (age > 18 AND state IN (NY, MA) ) OR first_name LIKE *ob
The IF statements can be
nested as long as the
corresponding ENDIF
statement closes out the each
IF statement. See the nested
IF sample at right.
IF [condition1]
IF [condition2]
SET [function1]
ELSE
RUN [function2]
ENDIF
SET [function3]
ELSE
RUN [function4]
ENDIF
RUN/SET Statements
The RUN/SET statement contains the function to perform.
RUN
The RUN statement is defined by:
Function names as defined in the Transformers settings file
Entry ID (list of entries) to be executed (comma-delimited
values or ranges of values)
Example
IF (age > 18)
RUN FIELD_SCANNING(2,3)
RUN CHARACTER_TRANSLATION(3-5)
ENDIF
In the first RUN statement of this example, the numbers in
parentheses (2,3) apply to ENTRY_ID 1 and ENTRY_ID 2 under
FIELD_SCANNING. In the second RUN statment in this example, the
numbers in parentheses (3-5) apply to ENTRY_ID 3, 4, and 5 under
CHARACTER_TRANSLATION.
Cleansing Your Data
5-24
SET
SET
The SET statement takes as arguments:
DDL field names
The equal sign assignment operator (=)
Value or field data arithmetic
Example
IF (age > 18)
SET age = processing_date birth_date
ENDIF
ELSE Statement
The ELSE statement will run certain statements if a specified
condition is False. In other words, you can use an IF/ELSE
statement to define two blocks of executable statements: one block
to run if the condition is True, the other block to run if the condition
is False.
Example
IF (age >
RUN
SET
ELSE
SET
ENDIF
18)
FIELD_SCANNING(2, 3)
age = processing_date birth_date
record_notes = "Invalid"
ELSEIF Statement
5-25
ELSEIF Statement
A variation on the IF/ELSE statement allows you to choose from
several alternatives. Adding ELSEIF clauses expands the
functionality of the statement so you can control program flow
based on different possibilities.
Example
IF (age > 21)
RUN FIELD_SCANNING(2,3)
ELSEIF (age > 18)
RUN CHARACTER_TRANSLATION(3-5)
ELSE
RUN FIELD_SCANNING(1)
ENDIF
In this example, if (age > 21) evaluates as True, FIELD_SCANNING
(2, 3) is executed. If (age > 21) evaluates as False, the ELSEIF
(age > 18) condition is performed. If ELSEIF condition (age > 18)
evaluates as True, CHARACTER_TRANSLATION (3-5) is executed. If
all conditions (age > 21) and (age > 18) evaluate as False, then the
statement RUN FIELD_SCANNING (1) is executed.
You can add as many ELSEIF statements as you need to
provide alternative choices. However, note that extensive
use of ELSEIF clauses often becomes cumbersome.
5-26
Description
ALL
ALWAYS
AND
OR
UCASE
SET last_name=UCASE(NAME)
This example tests the field for the literal of any case combination of SMITH,
and if TRUE, it makes the string in the field uppercase.
LCASE
SET last_name=LCASE(name)
This example tests the field for the literal of any case combination of smith,
and if TRUE, it makes the string in the field lowercase.
=
Is equal to
!=, <>
Is NOT equal to
>
Is greater than
<
Is less than
>=
<=
5-27
Description
LIKE
Links a literal with a wild card asterisk (*) in a field that is used to look for a
match. You can place the asterisk before the literal (for example, *LE) to
search for all matches to the beginning of a string, or place it after the literal
(for example LE*) to search for matched endings. You cannot place an
asterisk in the middle of a literal, however, for example L*E.
Example: IF first_name LIKE *OB
IN
BETWEEN
Sum of
Difference of
||
String concatenation
Divided by
Multiplied by
5-28
Description
JTOKATAKANA
(Japan)
Example:
JTOHIRAGANA
(Japan)
Example:
CJKTOHALF
(China, Japan,
Korea, Taiwan)
5-29
Description
CJKTOFULL
(China, Japan,
Korea, Taiwan)
Example:
Harte-hanks
JKANATOROMAN
(Japan)
jouzousho
JROMANTOKANA
(Japan)
Example:
haatohankusu
KTOROMAN
Korea
Example:
daechidong
5-30
Description
HIRAGANASTOL
(Japan)
Zenkaku
Hankaku
Example:
CTOTRADCHINESE
(China, Taiwan)
Example:
CTOSIMPCHINESE
(China, Taiwan)
Example:
CJKTOARABICNUM
(China, Japan,
Korea, Taiwan)
150
Please make sure that you are applying this
operator to the field where Chinese numbers
only represent NUMBERS. Otherwise, following
may happen. ---->
5-31
ABCDEFGHIJKLMNOPQRSTUVWXYZ
abcdefghijklmnopqrstuvwxyz
Number
0123456789
Symbol
~ ! @ # $ % ^ & * _ + ` - = { } | [ ] \ : ;
<>?,./
Katakana
5-32
Katakana Hebon
a
i
u
e
o
ka
ki
ku
ke
ko
sa
shi
su
se
so
ta
chi
tsu
te
to
na
ni
nu
ne
no
ha
hi
fu
he
ho
ma
mi
mu
me
mo
ya
Kunrei
a
i
u
e
o
ka
ki
ku
ke
ko
sa
si
su
se
so
ta
ti
tu
te
to
na
ni
nu
ne
no
ha
hi
fu
he
ho
ma
mi
mu
me
mo
ya
Hira gana
Katakana Hebon
Kunrei
ga
gi
gu
ge
go
za
ji
zu
ze
zo
da
di
du
de
do
ba
bi
bu
be
bo
pa
pi
pu
pe
po
ga
gi
gu
ge
go
za
zi
zu
ze
zo
da
di
du
de
do
ba
bi
bu
be
bo
pa
pi
pu
pe
po
sha
shu
sho
cha
chu
cho
ja
ju
jo
sya
syu
syo
tya
tyu
tyo
zya
zyu
zyo
yu
yo
ra
ri
ru
re
ro
wa
n
wo
5-33
yu
yo
ra
ri
ru
re
ro
wa
n
wo
Syntax 1
This syntax is used to convert a literal value or field data in the DDL
field.
IF [condition]
SET [DDL field name] = [Operator](DDL field name)
ENDIF
Example 1
In this example, all full-width characters in the INPUT_LINE_01 field
are converted to their half-width form.
IF (ALWAYS)
SET INPUT_LINE_01 =
ENDIF
CJKTOHALF(INPUT_LINE_01)
5-34
Syntax 2
Syntax 2
This syntax is used to convert a literal value or field data in the DDL
field 2 and compare it against the value in the DDL field 1 to
evaluate the IF statement.
IF [DDL field name 1] = [Operator] (DDL field name 2)
RUN [function]
ENDIF
Example 2
In this example, the program converts the Traditional Chinese
characters in the CUSTOMER_NAME field to Simplified Chinese, and
compares it against the value in the INPUT_LINE_01 field. If that
value is equal to the value in INPUT_LINE_01, it will run
FIELD_SCANNING. If the value is not equal, it will run the
TABLE_RECODING function.
IF INPUT_LINE_01 = CTOSIMPCHINESE (CUSTOMER_NAME)
RUN FIELD_SCANNING(ALL)
ELSE
RUN TABLE_RECODING(ALL)
ENDIF
5-35
2.
5-36
Click the
button on the upper right and select your
Qualifiers For Input Data Files from the pop-up list.
4.
5.
6.
7.
8.
9.
10.
5-37
1.
2.
3.
5-38
Additional Settings
5.
6.
7.
Additional Settings
You can also specify the following additional settings:
See
Transformer
in the TS
Quality
Reference
Guide for the
complete
settings
information.
2.
2.
3.
In the Debug File text box, accept the default or specify the
path and file name of the debug file.
4.
5.
5-39
2.
In the Mask File text box, specify the path and file name for
the mask file.
2.
2.
5-40
3.
Select OK.
4.
5.
6.
7.
View the fields on the Output Settings tab using the Data
Browser
to be sure the scan and recode occurred.
to
6-1
CHAPTER 6
6-2
In this chapter, you will standardize the name and address elements
using the Customer Data Parser, then standardize the non-name
and address elements using the Business Data Parser.
This chapter explains the parsing logic used to standardize data
elements. You will perform these tasks:
Specify input and output files
Define the settings for the Customer Data Parser and
Business Data Parser
Use name generation to determine how many additional
records are generated
Set line definitions for input data
Run the Customer Data Parser and Business Data Parser and
view results
For Asia-Pacific countries (China, Japan, Korea, and
Taiwan), the Customer Data Parser identifies and
standardizes the name elements only. Parsing and
standardization of address elements for those countries'
data is performed by country-specific Postal Matchers.
6-3
6-4
2.
3.
4.
Generate output.
Example
Assume that you have the following name and address data in an
input file:
INPUT_LINE_01
Lexington Drug
INPUT_LINE_02
Ben K Pike MD
INPUT_LINE_03
10 Lois Lane
INPUT_LINE_04
Lexington 02420
6-5
6-6
Step 1
Name
BEN
PIKE
MD
GVN-NM1
1ALPHA
ALPHA
TITLE-SUFFIX
RELATIONSHIP
Street
ALPHA
1ALPHA
TYPE
ALPHA
Geog
COUNTRY
1ALPHA
ALPHA
STATE
First, the CDP assigned all possible attributes for each components
of data in INPUT_LINE_02.
Step 2
Name
BEN
PIKE
MD
GVN-NM1
1ALPHA
ALPHA
TITLE-SUFFIX
The CDP identified this line as a Name line because it had more
name definitions than street or geography definitions. BEN is no
longer considered a RELATIONSHIP attribute since it is not located
at the END of the name line.
Step 3
Name
BEN
PIKE
MD
GVN-NM1
1ALPHA
ALPHA
TITLE-SUFFIX
GVN-NM1
GVN-NM2
SURNAME
TITLE-SUFFIX
Once the CDP identified the line types and the attributes on those
lines, a pattern was created. The CDP then looks the pattern up in
the Parser Definitions Table. If the pattern is found, the recode
value is returned, as in this example. If the pattern is not found, the
CDP will not be able to recode the unknown attributes and it will
6-7
send the bad name pattern to the parsing exception file for review.
Entry from Parser Definitions Table (using allowable abbreviations):
GVN-NM1 1ALPHA ALPHA TITLE-SUFFIX PATTERN NAME
RECODE=GVN-NM1(1) GVN-NM2(1) SURNAME(1) TITLE-SUFFIX(1)
Step 4
Generate Output
Original Input Data
BUS-NAME:
Lexington Drug
BUS-NAME:
LEXINGTON DRUG
GVN-NAME1:
Ben
GVN-NM1:
BENJAMIN
GVN-NAME2:
GVN-NM2:
SURNAME:
Pike
SURNAME:
PIKE
TITLESUFFIX:
MD
TITLE-SUFFIX:
MD
HSNO:
10
HSNO:
10
STREETNAME:
Lois
STREET-NAME:
LOIS
STREET-TYPE:
Lane
STREET-TYPE:
LN
CITY:
Lexington
CITY-NAME:
LEXINGTON
STATE:
MA
POST CODE:
02420
STATE:
POST CODE:
02420
The CDP can identify name and address elements for many
countries, using country-specific definitions tables. The CDP
identifies up to ten lines (100 bytes each) of input Name/Address
data. It can also identify up to ten names per input record.
6-8
Example - China
Input data: 22 135800
Initial token results: (1 name token)
6-9
Example - Korea
Input data: , 973-2 3 , 135-280
Initial token results: (1 name token)
Example - Taiwan
Input data: 2 3 106
Initial token results: (1 name token)
Example - China
Token results: 2 tokens)
Previous Results
New Results
Reasoning
6-10
Example - Korea
Token results: 2 tokens
Previous Results
New Results
Reasoning
Example - Taiwan
Token results: 2 tokens
Previous Results
New Results
Reasoning
6-11
First Name
Honorific
Principle Name
Business
Type
Branch
AB
()
First
Name
Honorific
Business
Name
Principle
Name
Business
Type
Branch
AB
()
6-12
Example:
Zenkaku field
Hankaku field
INPUT_LINE_01
FURIGANA_NAME
PREPOS
The CDP then passes a comprehensive data block called the
PREPOS (Parser Repository). The PREPOS contains fixed-fielded
character data including error codes, identification indicators, name
information, street information and geographic information. The
Output DDL determines which of these fields are returned to the
Output file.
See Appendix B of TS Quality Reference Guide for a
complete list of PREPOS fields and descriptions.
Creating and Working with TS Quality Projects
PREPOS
6-13
6-14
Open the Customer Data Parser step and select the Input
Settings tab.
2.
Specify a file name in the Input File Name and Input DDL
Name text boxes.
3.
Click the Dictionary Editor icon and view the input DDL.
4.
5.
6.
7.
8.
2.
3.
4.
6-15
2.
2.
2.
2.
6-16
Process Settings
Valid delimiters
are Tab, Space,
Semicolon,
Comma, and
Pipe. Characters
other than those
listed must be
enclosed by
quotation marks.
3.
4.
2.
2.
In the Mask File text box, enter or select the path and file
name for the mask file.
Process Settings
Once you have specified input and output files, you can specify the
settings used to process your data. The settings for processing are
managed in the Advanced Settings window.
The navigation pane of the Advanced Settings window contains
two tabs:
Parser
Prcustom
Parser Tables
6-17
The Parser tab contains settings for the Customer Data Parser. The
Prcustom tab is used to define settings for the Parser
Customization process. The Parser Customization process is
explained in the next chapter.
The settings for China, Japan, Korea, and Taiwan differ
slightly from other countries. Refer to the online Help or
the TS Quality Reference Guide for those countries'
settings.
Parser Tables
The Customer Data Parser uses two table files to parse the name
and address elements of the input data.
Word Pattern Definition FileDefines word patterns for a
given country. It contains standard definitions for words and
phrases (tokens), and the patterns associated with each line
type.
City Directory FileDefines state names, city names, and
postal codes for a given country.
To specify the Parser tables
1.
2.
6-18
Business Attribute
Business Attribute
You must specify whether to enable or disable the business
assignment function.
To specify the business attribute
1.
2.
Refer to the table below and select one of these options for
Assigned Business Attribute:
Setting
Description
Automatic
Business
Business via
Pattern
No Business
Assignment
2.
Refer to the table below and select one of these options for
Preprocess House Number:
Setting
Description
No Preprocessing
Disable preprocessing.
Minimum
Preprocessing
Line Definitions
6-19
Setting
Description
Maximum
Preprocessing
Line Definitions
In this example, the input file has two names on each record. The
first is a business name, and the second is a personal name. The
first input line consists of a business name and the second line is a
personal name (contact name). This is a very common data
structure. You can pre-define these two line types to the CDP, thus
allowing the CDP to work more efficiently.
To set line definitions
1.
2.
3.
Setting
Description
Name Line
Business Name
Line
Street Line
Geography Line
No Pre-definition
Prohibit Name
Line definition
6-20
5.
2.
3.
Name Generation
4.
6-21
Select Create.
6.
7.
8.
Name Generation
After the Parser processes the input data, it generates name and
address records. This process is called name generation. In many
cases, one record in the input data contains multiple business or
personal names. You must specify how many records to generate
when more than one business/personal names are found in the
input data.
To define name generation settings
1.
2.
6-22
Additional Settings
3.
Click the Field Settings tab. Refer to the following table and
specify the values for these settings.
Setting
Description
Generate Business
Records for
Additional Names
Generate Personal
Records for
Additional Names
4.
Additional Settings
You can also specify the following settings:
See Customer
Data Parser in
TS Quality
Reference
Guide for the
complete
settings
information.
Additional Settings
6-23
2.
3.
4.
2.
3.
4.
5.
6-24
Additional Settings
format for the value specified in From Line End Value and
To Line Begin Value.
If an address has ten lines and a split line is perfomed,
then the last line will be dropped.
To enable debug function
1.
2.
3.
In the Debug File text box, accept the default path and file
name, or specify the name of the file to which debugging
information will be sent.
2.
2.
6-25
1.
2.
3.
Select OK.
4.
5.
Analyze Results
After running the CDP, the Parser generates Completion Codes
and Review Codes to identify specific conditions which occurred
for each record being parsed. You can review those codes to
analyze the Parser results.
The completion codes are written to the CDP Repository Output
Record (PREPOS) in the following field:
pr_completion_code
The review codes are written to the CDP Repository Output Record
(PREPOS) in three character pairs in the following fields:
pr_name_review_codes
pr_street_review_codes
pr_geog_review_codes
6-26
Statistics File
pr_misc_review_codes
pr_global_review_codes
To change the
review group
order, Review
Group Order
(Process,
Settings) can
be used to
specify the
review group
hierarchy.
Statistics File
The Parsing Statistics Report is generated by the CDP and
summarizes the number and percentage of records distributed over
each review group. A brief description of each review group also
appears on the statistics report.
Review Groups# of Records
0
945
1
0
2
22
3
0
4
0
5
0
6
0
7
2
8
4
9
8
10
11
11
1
12
0
13
0
14
3
%
Descriptions
94.5%No Targeted Conditions Found
0.0% Unidentified Item
2.2% Mixed Name Forms
0.0% Hold Mail
0.0% Foreign Address
0.0% No Names Identified
0.0% No Street Identified
0.2% No Geography Identified
0.4% Unknown Name Pattern
0.8% Derived Genders Conflict
1.1% More Than One Middle Name
0.1% Unknown Street Pattern
0.0% Invalid Directional
0.0% Unusual or Long Address
0.3% No City or County Identified
6-27
6-28
Step 1
Attribute
1995
Toyota
Camry
YEAR
MAKE
MODEL
Step 2
Step 3
1995
Toyota
Camry
YEAR
MAKE
MODEL
6-29
Step 4
Generate Output
The BDP produces a comprehensive data block called the BPREPOS
(Business Data Parser Repository). The BPREPOS consists of fixedfielded character data including error codes and identification
indicators. The Output DDL determines which of these fields are
returned to the Output file, and can be customized by the user.
See Appendix B of TS Quality Reference Guide for a
complete list of BPREPOS Fields and descriptions.
6-30
1.
Open the Business Data Parser step and select the Input
Settings tab.
2.
Specify a file name in the Input File Name and Input DDL
Name text boxes.
3.
4.
5.
2.
3.
4.
2.
2.
6-31
2.
2.
Valid delimiters
are Tab,
Space,
Semicolon,
Comma, and
Pipe.
Characters
other than
those listed
must be
enclosed by
quotation
marks.
1.
2.
2.
3.
4.
6-32
Process Settings
See Encoding (Code Page) on page A-3 for more
information on encoding.
You can specify records to either Select or Bypass under certain
conditions in both input and output files. See Select or Bypass
Records on page 5-37 for instructions on how to specify select or
bypass definitions.
Process Settings
Once you have specified input and output files, you can specify
settings used to process your data. The settings for processing are
managed in the Advanced Settings window.
The navigation pane of the Advanced Settings window contains
two tabs:
Parser
Prcustom
Settings for the Business Data Parser are shown on the Parser tab.
The Prcustom tab contains settings for the Parser Customization
process. The Parser Customization process is explained in the next
chapter.
Parser Tables
The Business Data Parser uses the Word Pattern Definition table file
to parse the non-name and address elements of the data. The Word
Pattern Definition table for the Business Data Parser is created from
the Parsing Customization process.
For instructions on the Parsing Customization process,
see Chapter 7, Tuning the Parsing Rules and
Appendix B.
Word Pattern Definition FileDefines word patterns for a
given country. It contains definitions for words and phrases
(tokens), and the patterns associated with each line type.
These tables use a two letter prefix to indicate the
Creating and Working with TS Quality Projects
Parser Tables
6-33
2.
Example
For example, you can create a Word Pattern Definitions table for
automobile classification. At least one definition and one pattern
entry must be present in the Word Pattern Definitions table.
Entry from Word Pattern Definitions Table
'ACURA'
INSERT MISC DEF ATT=MAKE
'ALFA'
INSERT MISC DEF ATT=MAKE,RECODE='ALFA ROMEO'
'ALFA ROMEO' INSERT MISC DEF ATT=MAKE
'AMC'
INSERT MISC DEF ATT=MAKE
'AUDI'
INSERT MISC DEF ATT=MAKE
'BERTONE'
INSERT MISC DEF ATT=MAKE
'BMW'
INSERT MISC DEF ATT=MAKE
'BUICK'
INSERT MISC DEF ATT=MAKE
'CADDY'
INSERT SYNONYM='CADILLAC'
'CADI'
INSERT SYNONYM='CADILLAC'
'CADILLAC'
INSERT MISC DEF ATT=MAKE,RECODE='CADILLAC'
'CADY'
INSERT SYNONYM='CADILLAC'
'CHEVROLET'
INSERT MISC DEF ATT=MAKE
'CHEVY'
INSERT MISC DEF ATT=MAKE,RECODE='CHEVROLET'
Figure 6.7
6-34
Additional Settings
Additional Settings
You can also specify the following settings:
See Business
Data Parser in
TS Quality
Reference
Guide for the
complete
settings
information.
2.
2.
2.
2.
Additional Settings
3.
6-35
In the Debug File text box, accept the default path and file
name, or enter a file name where debugging information will
be written.
2.
2.
6-36
1.
2.
3.
Select OK.
4.
5.
After you run the BDP, the Parser generates Completion Codes
and Review Codes to identify specific conditions which occurred
for each record being parsed. You can review those codes to
analyze the parser results.
The completion codes are written to the BDP Repository Output
Record (BPREPOS) in the following field:
bp_completion_code
The review codes are written to the BDP Repository Output Record
(BPREPOS) in three character pairs in the following fields:
bp_misc_review_codes
See Appendix B for the complete list of Completion
Codes and Review Codes for the Business Data Parser.
There are no Review Groups for the Business Data
Parser.
7-1
CHAPTER 7
7-2
If the Customer Data Parser cannot recognize the name or address
component such as city name or surname on a record, an exception
is reported. When that occurs, you must change the parsing rules
using Parsing Customization.To use Parsing Customization, you
must first understand how the parser definition tables work.
This chapter explains the parser definition tables. You will also
perform these tasks:
View parser exceptions
Identify and create an entry for a misspelled city name
Identify and create an entry for a bad name pattern
Review the new entries in the Customized Definitions file
Run Parsing Customization and re-run the Customer Data
Parser
Check errors in the Parsing Customization process
This chapter focuses on the Parsing Customization process
for the Customer Data Parser. See Online Help to tune the
parsing rules for the Business Data Parser.
7-3
7-4
Syntax of Definitions
Syntax of Definitions
Entries in Standard and User Definitions tables require a special
syntax.This section describes the syntax for definition entries.
Syntax
TOKEN [OPERATION] LINE-TYPE [POSITION] KEYWORD=ATTRIBUTE, [ATTRIBUTE MODIFIER]
Example
MARY
INS NAME
MARY
INS
NAME
BEG
ATT=GVN_NM1
GEN=F
Token
Operation
Line Type
Position
Keyword=Attribute
Attribute
Modifier
Token
A token is any word or phrase in the data, or a mask of any word, or
phrase. Tokens are informally called the left side of the equation
in a definition table entry. In this example, the token is Mary.
MARY
Token entries
can be no
more than 100
characters in
length.
Token
7-5
Sub-token
Phrase
Mask
The table below describes the token structures and provides
examples.
Table 7.1 Parser Token Structures
Token
The smallest entity that has a meaning by itself.
A token may or may not contain one or more sub-tokens.
Example: 'PIZZA' NAME ATT=BUS
Sub-token
String entity that has a meaning within a token (e.g. strasse;). A sub-token may appear at the
beginning or end of the token.
If your data contained BERGENSTRASSE:
Example: STRASSE STREET END-TKN ATT=STR-TYPE
Where:
STREET
the line type
END-TKN
location of the sub-token within the word (also indicates this is a sub-token)
ATT=STR-TYPE
the attribute assignment for table lookup
Beginning-Token (BEG-TKN)
Used for the sub-token position. This keyword indicates that the sub-token position lies at the
beginning of a token.
Example: STRASSE STREET BEG-TKN ATT=STR-TYPE
Ending-Token (END-TKN)
Used for the sub-token position. This keyword indicates that the sub-token position lies at the
end of a token.
Example: STRASSE STREET END-TKN ATT=STR-TYPE
BEG-TKN and END-TKN are only allowed on street lines. See line types in the following
section for more information.
Phrase
One or more tokens grouped together that have a meaning.
Example: 'HOLD MAIL' STREET ATT=HOLD
7-6
Operations
Table 7.1 Parser Token Structures
Mask
A mask is a description of a word or phrase, using alpha, numeric or special characters to
represent letters, numbers, and special characters. Masks define characters of data elements
using:
n to represent a number (0 -9)
a -z to represent alphabetic letters (lowercase only)
Every character that is not a letter or number is represented by the character itself:
/ (forward slash), @ (at symbol), and so forth.
For example, a mask can define any series of five numerals as a ZIP code, instead of entering
each of the 99,999 possible combinations in the table. This mask entry looks like:
nnnnn
Masks may include special characters if they are part of the word representation. For example,
a mask for a nine-digit ZIP code is:
nnnnn-nnnn
MASK GEOG DEF ATT=POSTCODE
Appendix D in the TS Quality Reference Guide lists the valid token tags for Asia-Pacific
countries.
Operations
The Parser identifies three types of operations:
Insert
Modify
Delete
In this example, the operation is INS (INSERT).
MARY
Operations
7-7
DELETE
Deleting Synonyms: With the SYNONYM keyword, you must enter the actual synonym:
Example:
BV
DELETE SYNONYM=BOULEVARD
Deleting Patterns: You must enter the actual pattern followed by DELETE PATTERN.
Example:
GVN-NM1 1ALPHA ALPHA
DELETE PATTERN
7-8
Line Types
Line Types
Each definition entry requires a line type assignment. The Parser
identifies four types of lines:
Name
Note that
attributes do
not cross line
types. For
instance, an
attribute of
GVN-NM1
cannot be
used with a
line type of
STREET.
Street
Geography
Micellaneous
In this example, the line type is NAME.
MARY INS NAME BEG ATT=GVN-NM1,GEN=F
Line Type
Description
NAME
Name of a person or business. Names are usually the first one or two
lines in an address record.
BOOKSTORE
STREET
GEOGRAPHY
The city, state, postal code, and country in the address. Geography
line(s) are usually at the end of an address record.
MASSACHUSETTS GEOG END ATT=STATE,REC=MA
MISCELLANEOUS
Information that does not fit into the other line types, such as account
name or a comment.
HOLD MAIL
Positions
7-9
Positions
A token may be defined in relation to its position within the name or
address line. There are three types of positions:
Beginning
Ending
Default
In this example, the position is BEG (BEGINNING).
MARY INS NAME BEG ATT=GVN-NM1,GEN=F
(optional)
When the physical location of the word in the line is irrelevant, Default is used.
A default word may appear anywhere on the line, including the beginning or end. If this
keyword is omitted from the entry, Default is assumed.
ENDING
The last word and any further non-alphabetic characters are the ending of a line. For
example, consider the line
BRIARWOOD ESTATES APT 3
Both APT and the apartment number 3 are considered to be at the end of the line.
7-10
Attributes
Attributes
Attributes (ATT=) are line-specific definitions and assign a specific
meaning to a word or mask shape. The following table lists available
attributes organized by line type.
For the complete list of Attributes, see Appendix D in the
TS Quality Reference Guide.
Note that
attributes do
not cross line
types. For
instance, an
attribute of
GVN-NM1
cannot be
used with a
line type of
STREET.
UserDefined
Attributes
Attribute
Description
Geography Line
Attributes
Miscellaneous Line
Attributes
Attribute Modifiers
Attributes can be further described by various Attribute Modifiers.
The following section lists all definition modifiers that can be used
after the attribute assignment. All modifiers must be separated
from the attribute by a comma. Valid attribute modifiers are
Gender, Category, Function and Recode.
Attribute Modifiers
Gender
7-11
Category
Function
7-12
Synonym
Recode
In the above example, the parser recodes the word ROAD to RD.
ROAD would be the value stored in the original data field
on Parser output. This is the pr_street_type1_original
field in the Parser repository.
RD would be the value stored in the recoded data field on
Parser output. This is the pr_street_type1_recoded field
in the Parser repository.
Recode for
Masks
Synonym
A synonym is a shortcut for defining a token entry with the same
value as a prior entry. For example:
PBOX
SYNONYM=PO BOX
Synonym
7-13
Example
The definitions table contains the following entry:
'CENTRE COMMERCIAL'
SYNONYM=CENTRE COMMERCIAL
7-14
Special Entries
Special Entries
In addition to the basic syntax described in the previous sections,
the Parser uses some special entries. This section explains special
entries including:
US city name changes
Non-US city name changes
Multiple definitions for one entry
Patterns
Example
MABEVERLEY_ GEOG DEF ATT=CITY-CHG,
REC=MABEVERLY
CASAN FRAN_
7-15
Description
Post
Town
Locality
Patterns
A pattern consists of attributes and/or intrinsic attributes, which
include any alpha, numeric, or special character representation of a
data element.
Changes can be made to an existing pattern by adding another tag
to the first line, using the MODIFY operation.
7-16
Patterns
'ALPHA ALPHA' MODIFY PATTERN NAME
REC=GVN-NM1(1) SRNM(1)
See MODIFY on page 7-7 for details.
Token identification is converted into meaningful information
through pattern processing. Patterns are created in the same text
file as the Definition entries. The Parser understands the difference
between a definition and a pattern and processes each
appropriately. Because of this, it is not necessary to create the
various entries in any particular order. For organizational purposes,
however, it makes sense to organize the entries by type.
Pattern
Structure
The pattern structure uses one or two lines, using the following
structure.
FIRST LINE:
Inbound combination of
tokens
Smith
ALPHA
Intrinsic Attributes
Keyword indicating to which line
type this pattern entry applies
7-17
Intrinsic Attributes
An intrinsic attribute is one that represents an individual entity
that did not have a definition entry in the table. This table lists the
main intrinsic attributes used for patterns.
For the complete list of Intrinsic Attributes, see Appendix D
in the TS Quality Reference Guide. .
Only the inbound portion of the pattern entry may contain
intrinsic attributes. All outbound portions (recode line)
must contain only non-intrinsic attribute values.
INTRINSIC
ATTRIBUTE
ABBR.
DESCRIPTION
ALPHA
Letters only
HYPHEN
A hyphen ()
7-18
INTRINSIC
ATTRIBUTE
NUMERIC
ABBR.
DESCRIPTION
Numerals only
7-19
New pattern:
(A) HAWTHORNE COTTAGE
B1F
(S) 10 MAIN STREET
7-20
7-21
Comment
Lines
Line
Lengths
Table entries longer than one line may span multiple lines. Each
additional line within each entry must be indented. Each new entry
must begin in column 1.
The maximum line length for entries is 189 characters, including the
newline character. The entry definition length may not exceed 100
characters. Components of an entry may not exceed more than one
line.
Quotation
Marks
7-22
SYNONYM=TRUSTEE FOR
7-23
Clue Table
The Clue table (jp_clue.txt) is used to store keywords that the
Parser uses to separate input text into tokens and to determine
business/personal classification. You can customize this table. The
following types of keywords are included in the Clue table.
Table 7.3 Tokens for jp_clue.txt
Token Type
Item
Description
Business Keyword
T
Business Type
Business Name
Words such as , .
Branch Name
7-24
Clue Table
Table 7.3 Tokens for jp_clue.txt
Business Keyword
Honorific
Title (position)
Region
Ex. ,
Format
The table consists of the following 4 items. The delimiter for each
item is a comma. If the format is not correct, that line will be
ignored and the subsequent lines will not be recognized properly.
Table 7.4 Format for jp_clue.txt
Position
Item
Token type
Not allowed
Zenkaku field
Allowed
Hankaku field
Allowed
User comment
Allowed
Clue Table
7-25
Example:
D, , , user comment
T,( ),( )
If the user comment is null, the comma between the third
item and the fourth item can be omitted.
Input
Output
Business
type
(T)
Unknown
word
Branch
name
(D)
Business
type
Field
Business name
Field
Branch name
Field
7-26
Name Tables
spaces is N type. If you register the keyword that includes
spaces, delete all spaces before and after the entry and
change all spaces within the entry to one hankaku space.
Ex. N, ,Hart Hanks
Name Tables
Name tables contain additional personal and business names that
are not included in the personal and business name
dictionaries.They also include principal business names. You can
customize these tables.
Table 7.5
File
Description
jp_bnp_name.txt
Contains business principal name patterns and business type standard patterns (for zenkaku field)
jp_bnp_name_h.txt
Contains business principal name patterns and business type standard patterns (for hankaku field)
jp_pnp_name.txt
jp_bnp_name.txt
7-27
jp_bnp_name.txt
This table is used to register principal business names and principal
business type for zenkaku field. It is not used to separate business
name and business type.
Table 7.6 Token for jp_bnp_name.txt
Token Type
Item
Description
Business name
Business type
Format
The table consists of the following 4 items. The delimiter for each
item is comma.
Table 7.7 Format for jp_bnp_name.txt
Position
Item
Token type
Not allowed
Business name
Not allowed
Principal name
Not allowed
User comment
Allowed
7-28
jp_bnp_name_h.txt
Example:
B,JR ,
T,, ,
If the user comment is null, the comma between the third
item and fourth item can be omitted.
Input
JR
Output
Business
type
Field
Principal
business type
Field
JR
Business
name
Field
Principal business
name
Field
Branch
name
Field
By standardizing the business data using this table, you can achieve
more accurate matching.
Duplicate words: when you register the new keyword,
try not to register duplicate words in different type.
Character Code: use CP932 for registration.
jp_bnp_name_h.txt
This table is used to register principal business names and principal
business type for hankaku field. It is not used to separate business
name and business type. The usage and function of this table is as
same as jp_bnp_name.txt except that the field for this table is
Kana.
jp_pnp_name.txt
7-29
jp_pnp_name.txt
This table is used to register additional personal names. If you
found last names or first names that are not in the personal name
dictionary, you can add them in this table.
Table 7.8 Token for jp_pnp_name. txt
Token Type
Item
Description
Last name
First name
Format
This table consists of the following 5 items. The delimiter for each
item is comma.
Table 7.9 Format for jp_pnp_name.txt
Position
Item
Token type
Not allowed
Allowed
Allowed
When parsing zenkaku first/last
name, use this field as reading
of the name.
Not used
Allowed
User comment
Allowed
7-30
jp_pnp_name.txt
Example:
F, , ,,user comment
Duplicate words: when you register the new keyword,
try not to register duplicate words in different type.
Character Code: use CP932 for registration.
Words with Spaces: for integration purposes, small
characters must be converted to large characters when
adding hankaku kana last and first names.
7-31
For detailed
information
on the Parser
Customization
Editor, see the
Online Help.
Customization
Editor button
7-32
4.
From the Main Menu select Search, Find Entry to review the
entries in this file.
6.
7-33
2.
3.
Click below the line of asterisks. This will position your cursor
to enter customized definitions.
Be sure to position the cursor below the line of
asterisks before applying an entry.
7-34
Navigation Area
Cursor position
Cursor Position
7-35
Cursor position
7-36
8.
If you
accidentally
hit the Apply
button or if
an entry is
incorrect, you
can modify or
delete entries
directly in the
Customized
Definitions
file.
9.
10.
7-37
2.
Unknown attribute
4.
7-38
If there are
elements of
data that you
do not wish
to maintain,
assign an
IGNORE
attribute to
the piece of
data.
Corrected attribute
7-39
Confirm button
8.
7-40
Run the Customer Data Parser step. When asked Would you
like to run parsing customization prior to running the
step?, select Yes.
2.
3.
4.
7-41
2.
The log displays the error message and indicates the line
number, as well as the entry where the error occurred. A
sample error log is shown below:
7-42
4.
8-1
CHAPTER 8
8-2
Sometimes users need to test and analyze the results of cleansing,
standardization and linking on a single data record. TS Quality
Analyzer allows the user to parse, geocode, and match name and
address data interactively. It is a useful way to test and view
modifications you make to the parsing rules.
In this chapter, you will perform these tasks:
Start the TS Quality Analyzer
Input name and address data
View the cleansed results
Show details for name/address parsing and standardization
Show details for address validation
Match data against your database
Review results of matching
The TS Quality Analyzer is not available for Asia-Pacific
countries.
8-3
2.
8-4
Main Menu
Tool Bar
8-5
2.
Select File from the main menu to select the input and
output mode for the record. For input, select from Input
Mode, Input Fields or Free Form Input. For output, select
Output Mode, Output Fields or Free Form Input.
3.
If you select Input Fields mode, enter the record line by line.
If you select Free Form mode, you can enter the record in
free text format.
4.
To clear the
Input frame,
click Clear or
select Input
from the
Reset menu.
8-6
in the tool
8.
Advanced Details
8-7
are shown in the lower left window and the results of the
Postal Matcher are shown in the lower right window.
Advanced Details
In addition to the parsing, standardization, and validation details,
you can review the advanced details of the results of the Customer
Data Parser and Postal Matcher results.
To review advanced details of data
1.
Select...
To...
8-8
Matching
Matching
Once the Cleansing step has run, you can match the record against
records in your database.
To match data against database
1.
Select the Matching tab. Notice the window key for the
cleansed record is shown.
Window Key
Click the Plus sign (+) to show the Master Database. The
records in the database are shown in the lower window.
Matching
8-9
3.
4.
Click Match.
You can also click the Match button
bar.
5.
in the tool
8-10
Organize Database
1.
2.
3.
The Relationship Linker Rule Editor opens with the field and/
or pattern files for this matching process.
4.
5.
6.
Organize Database
To add data to database
At this point, if you decide to keep the input record in the
database, you can add the record.
1.
2.
Click
To reset database
3.
4.
9-1
CHAPTER 9
9-2
9-3
Tip:
You can either
edit the file names
manually or click
the File Chooser
icon to browse to
the
appropriate
file and
select it.
To view the
contents of your
data file, click the
Data
Browser
icon.
Use the
Dictionary
Editor to view the
contents
of the DDL
file.
1.
Open the Sorting Utility step and click the Input Settings
tab.
2.
Enter file names in the Input File Name and Input DDL
Name text boxes.
3.
4.
5.
Enter the Output File Name and Output DDL Name file
names. The Output File Name must have the extension .srt
to indicate this is a sorted file.
6.
2.
9-4
2.
Valid
delimiters are
Tab, Space,
Semicolon,
Comma, and
Pipe.
Characters
other than
those listed
must be
enclosed by
quotation
marks.
1.
2.
2.
3.
4.
Process Settings
9-5
Process Settings
Once you have identified the input and output files, you are ready
to specify the settings used to process your data. The settings for
processing are managed in the Advanced Settings window.
Sort Fields
To specify sort fields
A red flag
indicates a
REQUIRED
field for this
operation.
1.
2.
3.
Select the input DDL fields from the drop-down list in the
Key box. These are the fields used in the sort process.
Sort fields are pre-determined according to the
country-specific step. You can change the default
fields by selecting different sort fields.
4.
Select the sort order from the drop-down list in the Order
box. Values are either Ascending Order or Descending
Order.
Geographic
fields used in the
sort process
9-6
Additional Settings
1.
2.
3.
Additional Settings
You can specify the following additional settings.
See Sort in
the TS Quality
Reference
Guide for the
complete
settings
information.
2.
3.
2.
3.
Additional Settings
9-7
2.
3.
4.
In the Debug File text box, accept the default path and file
name, or enter a new file name to receive debugging
information.
2.
3.
2.
3.
9-8
1.
2.
3.
Click OK.
4.
9-9
2.
If you are using the Census tables and/or DPV tables, select
the Include Census Tables or Include DPV Tables box.
3.
9-10
2.
3.
4.
2.
2.
2.
Process Settings
Valid
delimiters are
Tab, Space,
Semicolon,
Comma, and
Pipe.
Characters
other than
those listed
must be
enclosed by
quotation
marks.
9-11
1.
2.
3.
4.
Process Settings
Once you have identified input and output files, you are ready to
specify settings to process your data. The settings for processing
are managed in the Advanced Settings window.
Postal Directories
The country-specific postal directories are included in TS Quality
and were installed when you installed the software. These
directories must be accessible to all projects.
See Installing TS Quality for a complete list of postal
directories for all countries and the locations of those
tables.
To specify postal directories
1.
9-12
Postal Directories
The Process Settings window will vary from country
to country. See TS Quality Reference Guide for a
complete list of settings for each country.
2.
A red flag
indicates a
REQUIRED
field for this
operation.
Setting
Description
Additional Settings
9-13
Setting
Description
4.
Additional Settings
You can also specify the following additional settings.
See Postal
Matchers in
the TS Quality
Reference
Guide for the
complete
settings
information.
2.
3.
In the Debug File text box, choose the default path and file
name, or enter a different file name to receive debugging
information.
2.
9-14
2.
1.
2.
3.
Select OK.
4.
Match Levels
9-15
After running the Postal Matchers, the Match Level Codes are
generated to identify specific conditions which occur for each record
being processed. You should review those codes to analyze the
postal matcher results.
Match Levels
The Match Level Codes indicate the accuracy of the match
between country geography data to the appropriate postal table.
The match level codes are written and the output record in the xx_
gout_match_level field.
In actual use, the xx in the descripton above will be
replaced with a two-letter country code (Example: US
= United States, CA = Canada, GB = Great Britain and
DE = Germany). Thus, xx_gout_match_level would
become US_gout_match_level for United States data.
9-16
Match Levels
There are several Match Level Codes. Some common codes include:
A 0 in the US_GOUT_MATCH_LEVEL field indicates that the
input data successfully matched to the Directory
An Y in the US_GOUT_STREET_NAME_CHANGE field
indicates that the street name was changed
Misspelled street name was corrected
The full street name was given to the abbreviated name
See the TS Quality Reference Guide for a complete
list of Match Level Codes for the Postal Matchers.
9-17
2.
If one of the addresses is a post office box (PO box), then us_
gin_street_name will contain the other address, and a P is
set in the first position of us_gout_secondary_type. The PO
box number is also stored, starting at the second position of
us_gout_secondary_type.
3.
4.
9-18
us_gin_street_
name
street name /
general delivery
street name
general delivery /
street name
street name
street name /
PO box
street name
PO box
number
PO box /
street name
street name
PO box
number
general delivery /
PO box
PO box
PO box /
general delivery
PO box
us_gout_
secondary_
type[1]
us_gout_
secondary_
number[1]
9-19
us_gin_street_
name
us_gout_
secondary_
type[1]
us_gout_
secondary_
number[1]
rural route /
general delivery
rural route
general delivery /
rural route
rural route
rural route /
PO box
rural route
PO box
number
10
PO box /
rural route
rural route
PO box
number
11
street name /
rural route
street name
route
number
PO box
number
12
rural route /
street name
street name
route
number
PO box
number
9-20
1.
3.
9-21
Click OK. The City Level window for the selected country
opens. This window lists cities, zip codes, and finance codes.
6.
7.
9-22
Street Details
street level window contains all the street names for the
selected city.
3.
Street Details
To browse the street details
1.
2.
Street Details
3.
9-23
4.
9-24
Street Details
10-1
CHAPTER 10
10-2
This chapter explains how to link your data. Linking is the process of
identifying records with a matching relationship (consumer/
business) in a file or duplicates in several files. Linking compares
records to determine the level of similarity between them.
The result of the comparisons is categorized as either a passed,
suspect, or failed match, based on the similarity of data elements in
the records, as well as the assigned score of their exceptions.
Data linking involves three steps:
Create window keys using the Window Key Generator
Sort records by the window key using the Sort Utility
Match records using the Relationship Linker
10-3
Example
Input Records:
CENTER HOSPITAL
25 BRATTLE LN
ARLINGTON MA 02476
CHEMIST ASSOCIATES
12 BRANTWOOD RD
ARLINGTON MA 02476
Window Keys are generated from one of the window key rules
provided by the Window Key Generator. For example, Key_List_10
is set to generate the window key as follows:
Key_List_10 rule:
Use the first three character of postal code.
Append to this the first character of the business name.
Append to this the first character and subsequent
consonants of the street name.
Append to this a 1 if this is a personal name and a 2 if this
is a business name.
10-4
024CBR2
The same window key is generated for both records, bringing them
into the same match window for comparison purposes. Subsequent
matching rules will indicate that these records are not matches.
Open the Window Key Generator step and click the Input
Settings tab.
2.
3.
Click the Output Settings tab and enter the Output File
Name and Output DDL Name.
4.
2.
2.
Process Settings
10-5
2.
1.
2.
3.
4.
Process Settings
Once you have specified input and output files, you can define the
settings to process your data. The settings for processing are
specified in the Advanced Settings window.
10-6
1.
2.
3.
4.
5.
10-7
2.
In Window Key Field Name, select the field name from the
drop-down list. The generated window key will be placed into
that field on the output record. In this example, the
generated window key from KEY_LIST_10 will be placed into
the field named WINDOW_KEY_01:
Additional Settings
You can also specify these additional settings:
To enable debug function
1.
2.
3.
In the Debug File text box, accept the default path and file
name, or enter the name of a file to receive debugging
information.
10-8
Additional Settings
2.
1.
2.
2.
In the Mask File text box, enter the path and file name for
the mask file.
10-9
1.
2.
3.
Select OK.
4.
5.
10-10
1.
2.
3.
Enter file names in the Input File Name and Input DDL
Name text boxes.
4.
5.
6.
Enter the Output File Name and Output DDL Name. The
Output File Name should have the extension of .srt to
indicate that it is a sorted file.
7.
Process Settings
10-11
Process Settings
Once you have identified input and output files, you are ready to
define the settings to process your data. The settings for processing
are managed in the Advanced Settings window.
A red flag
indicates a
REQUIRED
field for this
operation.
1.
2.
3.
Select the input DDL fields from the drop-down list in the
Key box.
4.
Select the sort order from the drop-down list in the Order
box. Values are either Ascending Order or Descending
Order.
10-12
1.
2.
3.
4.
Select OK.
On the Results tab, the Statistics sub-tab appears. The
Sort Key Summary is shown on this tab.
Be sure the file to be used in the Relationship Linking
step is sorted by the appropriate window key.
10-13
10-14
Linking Examples
Linking Examples
This section contains detailed examples for each stage of matching,
beginning with input data.
Linking Examples
10-15
10-16
Linking Examples
Linking Examples
10-17
10-18
Window Linking
Window Linking
Window Linking compares records to other records in the same
file. A group of records is matched to each other, one window key
set at a time.
2.
Specify a file name in the Input File Name and Input DDL
Name text boxes.
3.
To view the
contents of your
data file,
click the
Data
Browser icon.
Use the
Dictionary
Editor to view the
contents
of the DDL
file.
Click Replace. The default file names in the Input Data File
Name and Input DDL Name column are replaced with the
files you just specified.
4.
5.
Enter file names in the Output File Name and Output DDL
Name text boxes.
6.
7.
10-19
2.
3.
4.
2.
Valid
delimiters are
Tab, Space,
Semicolon,
Comma, and
Pipe.
Characters
other than
those listed
must be
enclosed by
quotation
marks.
1.
2.
2.
10-20
Basic Settings
2.
3.
4.
Basic Settings
You must specify the match method and the name form field.
To select match method
A red flag
indicates a
REQUIRED
field for this
operation.
1.
2.
2.
In Name Form Field, select the name form field from the
drop-down list. The Name Form Field contains the
Consumer/Business flag. This field is created by the
Transformer or Customer Data Parser, and is used by the
Relationship Linker to distinguish between consumer and
business records.
Flag
Description
Consumer
Business
10-21
10-22
2.
2.
In Window Key Field, select the window key field you are
using for matching. In this example, Window Key Field is set
to WINDOW_KEY_01.
Window Size
You can control how many records are added to the match window.
If there are more records of one window key than the value
specified, additional windows are created for the remaining records.
For example, if you have 1000 records and set the value at 500,
additional match windows are created for the remaining records.
10-23
2.
1.
2.
Select OK.
3.
4.
10-24
Reference Linking
Reference Linking
Reference Linking compares records in your input file to an
existing reference file. It is mainly used to update new records
within the existing master file in the database.
For example, suppose youve received a new set of records after
running the initial linking. In this case, you would take the new
records as your input file and the initial matched records as your
reference file. You can compare the input file with the reference file
and verify the existence of new records in the reference file, and
update the file if necessary.
If a match is found, a matching key number is copied from the
reference record to the input record. If no match is found, a new
key number is generated and appended onto the input record. The
number of output records in reference linking is the same as in the
input records. Users can use the matching key numbers to update
the reference file.
See Relationship Linker in the TS Quality Reference
Guide for detailed information on Reference Linking.
2.
Enter file names in the Input File Name and Input DDL
Name text boxes.
3.
10-25
4.
5.
6.
7.
Enter file names in the Output File Name and Output DDL
Name text boxes.
8.
9.
2.
10-26
Basic Settings
Basic Settings
The steps for settings of Match Method, Name Form Field, Field
Pattern, and Window Key are the same as those for window linking.
See Basic Settings on page 10-20 for details.
2.
3.
10-27
1.
2.
3.
2.
2.
10-28
2.
3.
Additional Settings
For both Window Linking and Reference Linking, you can configure
these additional settings:
To enable debug function
See
Relationship
Linker in the
TS Quality
Reference
Guide for the
complete
settings
information.
1.
2.
3.
In the Debug File text box, accept the default path and file
name, or enter the name of the file which will receive
debugging information.
2.
10-29
2.
2.
In the Mask File text box, enter the path and file name for
the mask file.
1.
2.
3.
Select OK.
4.
5.
10-30
11-1
CHAPTER 11
11-2
The output of the Relationship Linking process is displayed in the
Relationship Linker Results Analyzer. This tool allows you to
view and analyze linked results. After viewing these results, you can
determine if there is a need to customize the rules of the link
process to meet your business requirements.
In this chapter, you will perform these tasks:
Use the Results Analyzer to view and analyze the results of
the Relationship Linker process
Use the Rule Editor to analyze the linking rules and add a
field to compare in the link process
Customize the field and pattern lists by adding fields and
patterns to the process
Re-run the Relationship Linker using the new linking rules
and view results
Use the Data Comparison Calculator to test the comparison
routine and appropriate score
11-3
Results
Analyzer
button
11-4
Click on
the tab and
view data
Matched
records
Match key
11-5
2.
When you view suspect matches, the field for the matched
level is highlighted in red and italicized, in addition to the
field that contains the match key (highlighted in bold and in
11-6
5.
11-7
If you select
Show
Standard
Fields in the
Format menu,
it displays only
the standard
DDL fields.
1.
2.
The left window shows all Available Fields. Any field can be
highlighted and dragged into the Selected Fields window. A
field can also be highlighted and moved by clicking Add. If
you want to move all fields, click Add All.
3.
4.
11-8
To search for a field, enter the field name in the Search text
box. Click Show.
11-9
In the Save window, name the view, and then identify the
desired location for the file.
To view a stored view, select the name of the view from the
drop-down menu in Select a Selected Customized View.
The fields will be loaded in the Selected Fields window.
Click Show to view the stored fields.
You can use Back
and Forward
to display the previous or next view.
11-10
If you notice
breaks in the
record number
sequence, it is
because each
record is either
a Consumer or
Business level
record.
Previous Block
Next Block
11-11
Matches in
groups of 2 or
more
Pattern number
11-12
Rules
Editor
button
11-13
2.
Click on the
tab and view
different levels
of field and
patterns.
Click on a
column
heading and
drag it to the
desired
location to
rearrange the
columns.
11-14
Description
Describe all fields in the field settings file. Double-click the cell to
edit it.
Score A - E
Comparison
Routine
Propagation
Routine
Field Name 1
-3
Routine
Modifier
Pattern ID
11-15
2.
3.
4.
11-16
6.
2.
11-17
Select an A for the grade for the last_name field and for
the grade for the account_number. The grade A means
Score A (100) for those fields.
Select File, Save to save the file. When asked Do you want
to continue? select Yes. When asked Do you want to
delete subsequent duplicate patterns? select No.
5.
6.
7.
11-18
Pattern ID
102 is
duplicated
11-19
4.
5.
11-20
3.
4.
Pattern Number
box
11-21
1.
Example
In this example, two values in the Account_number field are
compared using the ABSOLUTE and PARTIAL1 routines.
ABSOLUTE compares two fields and looks for an exact match.
Score 100 is an exact match, including blank vs. blank. PARTIAL1
compares two fields and looks for an exact match, but applies
different scores for blanks. Score 100 is an exact match excluding
Tuning the Linking Rules
11-22
3.
11-23
11-24
12-1
CHAPTER 12
12-2
The Create Common Utility lets you select the best record of a
matched set of records (called the survivor), and then copies that
record to a field in another record, across a matched set of records.
This selection process is defined by decision routines. You can
commonize data in the current field or in a new field, using data
records that originate in another field.
In this chapter, you will perform these tasks:
Understand commonization and survivorship
Determine match key level settings
Identify common fields
Assign a survivor record
Run Create Common and view its results
Use the Data Browser to view the actual record data
12-3
The Create Common Utility allows you to set options that copy
data across a linked record set. This module has two major
functions:
CommonizationCopy data in one field to other fields in
records linked by a match key. You can commonize data in
an existing field or in a new field. You can also commonize
data sourced from another field.
SurvivorshipSelect a user-defined survivor record
among a group of records, using survivor selection rules.
This function flags a single record at any level, indicating the
best record of the linked set.
Input data file must be sorted by match keys (such as
LEV1_MATCHED) prior to being processed by this module.
If you run this module right after the Relationship Linker
step, the input file is automatically sorted by the match
keys. If you run this module separately, be sure to sort
the input file by match keys.
Example
Assume that the best record is determined according to the most
recent date in the Last_contact_date field. In this example, you
want to copy the account representative information with the most
recent contact date to the set of linked records, and then identify
one account representative per business.
Commonize the account representative from the record that
has the most recent Last_contact_date field.
Once the data is copied, place an indicator of 1 into the
Survivor_flag field for the record that has the most recent
Last_contact_date.
This indicator will be used later to select the best records
from the file.
12-4
2.
Specify a file name in the Input File Name and Input DDL
Name text boxes.
3.
4.
5.
2.
3.
4.
12-5
2.
2.
1.
2.
3.
4.
12-6
Process Settings
Process Settings
Once you have specified input and output files, you can specify the
settings to process your data. The settings for processing are
managed in the Advanced Settings window.
2.
In Key Field, select the match key from the drop-down list
of DDL fields.
Common Fields
The Common Fields designate the decision routines used to copy
data from one field into other fields in the records linked by a
common key.
A red flag
indicates a
REQUIRED
field for this
operation.
Common Fields
2.
12-7
Setting
Description
Level ID
Test Field
From Field
Target Field
Example
This example uses a decision routine called HIGHCHAR_NBNZ.
The HIGHCHAR_NBNZ routine commonizes the highest value
(non-blank, non-zero) that occurs in the Last_contact_
date field of all records at a record level of 1.
It copies the values in the Acct_rep field with the most
recent Last_contact_date (HIGHCHAR_NBNZ) and puts
this value into the Common_rep field.
12-8
Survivor Record
Input
LEV1_MATCHED
Last_contact_
date
Acct_rep
Output
Common_rep
Record 1
00000013
2005-03-17
JLS
Record 1
JLS
Record 2
00000013
2003-01-07
BPL
Record 2
JLS
Record 3
00000013
2004-02-08
JCN
Record 3
JLS
Record 4
00000014
2005-01-18
KJP
Record 4
KJP
Record 5
00000014
2003-11-09
MMR
Record 5
KJP
Survivor Record
You can designate a survivor record from a group of records linked
by a match key. Any record flagged as the survivor is assigned a
flag number. The Assign Survivor function defines the test field,
decision routine and target field for survivor identification.
A red flag
indicates a
REQUIRED
field for this
operation.
To assign survivor
1.
2.
Setting
Description
Level ID
Test Field
Decision Routine
Survivor Record
12-9
Setting
Description
Assigned Value
Example
This example uses a decision routine called HIGHCHAR_NBNZ.
Assume that the best record is the one with the most recent date
(HIGHCHAR_NBNZ) in the Last_contact_date field. This record
needs a survivor flag of 1 in the Survivor_flag field to identify it
as the best record for the LEV1_MATCHED grouping.
LEV1_
MATCHED
Last_
contact_date
Acct_rep
Output
Common_
rep
Surviror_
flag
Record 1
00000013
2005-03-17
JLS
Record 1
JLS
Record 2
00000013
2003-01-07
BPL
Record 2
JLS
Record 3
00000013
2004-02-08
JCN
Record 3
JLS
Record 4
00000014
2005-01-18
KJP
Record 4
KJP
Record 5
00000014
2003-11-09
MMR
Record 5
KJP
12-10
Additional Settings
Additional Settings
You can also specify the following settings:
See Create
Common in
the TS Quality
Reference
Guide for
complete
settings
information.
2.
3.
In the Debug File text box, accept the default path and file
name, or specify a different file to receive debugging
information.
2.
2.
12-11
1.
2.
3.
Select OK.
4.
5.
6.
In the Field Selection window, select the fields you used for
the Create Common process, such as LEV1_MATCHED, Acct_
rep, Last_contact_date, Survivor_flag, and Common_rep.
7.
8.
12-12
Description
LOWEST
LOWEST_NB
LOWEST_NZ
LOWEST_NBNZ
HIGHEST
HIGHEST_NB
HIGHEST_NZ
HIGHEST_NBNZ
LOWCHAR
LOWCHAR_NB
LOWCHAR_NZ
LOWCHAR_NBNZ
HIGHCHAR
HIGHCHAR_NB
HIGHCHAR_NZ
12-13
Description
HIGHCHAR_NBNZ
LEAST
LEAST_NB
LEAST_NZ
LEAST_NBNZ
LITERAL
LONGEST
Compares the length of the test field data on one record against the
length of the data in the same field on another record. System
commonizes the longer of the two fields.
Field1 = Smith
Field2 = Smit
In this case, the contents of test field, Smith (the longer of the
two) is commonized.
MOST
MOST_NB
MOST_NZ
MOST_NBNZ
SHORTEST
Compares the length of the test field data on one record against the
length of the data in the same field on another record. System
commonizes the shorter of the two fields.
Test field = Smith
Test field = Smit
In this case, Smit (the shorter of the two) is commonized.
12-14
Description
SURVIVOR
HIGHEST
456 (Record 3)
LOWEST
LOWEST_NB
LOWEST_NZ
LOWEST_NBNZ
LEAST
456 (Record 3)
MOST
12-15
Routine
MOST_NZ
MOST_NBNZ
12-16
13-1
CHAPTER 13
13-2
In some cases you may want to manipulate and reconstruct data
elements at certain stages of data processing. Use the Data
Reconstructor to manage various data manipulation tasks. The
Data Reconstructor is particularly useful when global data needs to
be standardized into an identical format at the end of a project.
This chapter explains how to use the Data Reconstructor. You will
perform these tasks:
Specify input, output, and DDL files for the Data
Reconstruction step
Define specific Data Reconstruction rules for each country
Set the Use Rule
Run Data Reconstruction and view results
Use the Data Browser to view the reconstructed data
Generate a single file of all your global data
13-3
Rules File
The Rules file is a plain text file that contains data reconstruction
rules, which are constructed with a special scripting language.
Country-specific rules files are included in the installation package.
These rules use nested IF/ELSE logic that includes selection and
conditional data reconstruction features.
A rules file can contain a single rule or many rules; however, only
one rule can be executed at a time.
Default Rules Files:
C:\TrilliumSoftware\tsq10r5s\<project name>
settings\xxdrrules.sto
xx is a 2-digit country code such as ca, de, gb, or us.
13-4
Fields
Syntax
n.
out.
IN.
OUT.
Literal
Values
[n:n]
field name
[n:*]
(n:*)
literal value
(n:n)
OR
literal value
BLANKS
ZEROS
NULLS
13-5
or
TS Quality'
A literal string must begin and end with the same type of
quotation mark.
If you need to include an actual quote character in the string, you
can either enter it twice in a row or quote the entire string with the
other quote character.
Although there is no practical limitation to the length of a
literal value, this version of the Data Reconstructor limits
the total combined size of all literals to 100 KBytes.
alphanumeric
append:0spaces
copy_all
is
append:pack
append:2spaces
left_justify
left_justify:full
NULLS
numeric
perform
proper_case
proper_case:a
else
proper_case:A
proper_case:anyline
proper_case:g
lower_case
proper_case:G
proper_case:geography
proper_case:n
then
proper_case:N
proper_case:name
proper_case:S
endif
proper_case:s
proper_case:street
right_justify:full
LT
AND
ends_with
OR
title_case
and
EQ
or
endrule
append
append_pack
GE
OUT
move
GT
Out
upper_case
BLANKS
if
pack
NE
CONTAINS
13-6
Precedence
IN
right_justify
ZEROS
contains
in
rule
copy
LE
STARTS_WITH
ENDS_WITH
Precedence
Precedence controls which operators are executed first in an
expression. Operators are grouped into the following levels
(from highest to lowest):
Operator type
Keyword or symbol
Relational operators
Equity operators
String operators
Logical OR operator
OR, or, ||
Example
In the following expression, relational operations are performed first
(== >= and <), followed by the logical AND operation, and finally
the logical OR operation:
if(state == "CA" or zip_code >= 10000 AND zip_code < 20000)
//statement(s);
endif;
endif;
endrule
Associativity
Associativity controls how operators at the same precedence level
are grouped. All operations have left-to-right associativity.
Examples
The following expressions perform the same action:
Associativity
13-7
Example 1
PE OR prov =
Example 2
Comments
PE) OR prov
C Style
Begin with /*, end with */ and include all characters in between.
Comments can span multiple lines.
/*
#... Example of C style comments.
*/
Only C style comments can be embedded in the middle of a line.
C++ Style
Begin with // and extend to the end of the line. If multi-line comments
are required, the comment portion of each line must begin with //.
//
//... This is an example of C++ style comments.
//
Shell Style
13-8
Associativity
Input or
Output
Dictionary?
Example
move out.newline2, OUT.newline1;
You may declare your fields explicitly as input or output by always
including the IN. or OUT. prefix (this will also improve your scripts
readability):
if(in.gout_fail_level != "0") then
move in.line1, out.line1;
move in.line2, out.line2;
move in.line3, out.line3;
move in.line4, out.line4;
endif;
Selecting a
Portion of a
Field
Associativity
13-9
Substring notation can only be used with DDL fields and cannot be
used with literal values. For example, each of these statements will
generate an error message:
move BLANKS[1:10] , OUT.newline1; // will generate an error
move "CANADA"[2:*], OUT.newline2; // will generate an error
Binary
Data
Strings
Concatenat
ing Literal
Values
Example
move
+
+
move
BLANKS,
ZEROS and
NULLS
"----------------------------------"
"-------------------------------------"
"--------------------------------", dashed_line_120ch;
'Network Pathways Inc., ' +
'Suite 100-401, '
+
'1600 Bedford Hwy, '
+
'Bedford, NS, '
+
'Canada B4A 1E8'
, return_address;
The BLANKS, ZEROS and NULLS keywords can be used to set a field
entirely to blanks, zeros or binary-zeros. They can also test to see if
a field contains only blanks, zeros or binary zeros.
Manipulating Your Data
13-10
Associativity
Whenever these keywords are used, a literal value is created
dynamically with exactly the right number of blanks, zeros or
NULLS to match the size of the other fields used in the expression.
If, for some reason, all fields in an expression are BLANKS, ZEROS
or NULLS keywords, the length of the resulting literal values will be
one.
Examples
In this example, all fields used within the IF-conditions are one byte
long:
If(BLANKS == BLANKS) then
// always true
endif;
If(BLANKS == ZEROS) then
// always false
endif;
In this example, the length of the BLANKS literal will be ten bytes to
match the 10-byte substring selected from the 30-byte city field
using the city[2:10] notation.
If(city[2:10] == BLANKS) then
// characters 2 through 11 of city are blank
endif;
IF
Statements
Syntax
IF [condition [and/or/AND/OR] condition]
[action statement]
[else action statement;]
ENDIF;
IF statements consist of three parts:
THEN
Associativity
13-11
Example
Conditions
Relational Conditions
Description
field1 GT field2
field1 > field2
field1 GE field2
field1 >= field2
field1 LT field2
field1 < field2
Less Than
True if field1 is less than field2
field1 LE field2
field1 <= field2
13-12
Associativity
Table 13.1 Data Reconstructor Rules Conditions
Equality Conditions
Description
field1 EQ field2
field1 == field2
field1 = field2
Equal To
True if field1 is equal to field2
field1 NE field2
field1 != field2
field1 <> field2
Not Equal To
True if field1 is not equal to field2
String Conditions
Description
field1 is numeric
field1 is alphabetic
field1 is alphanumeric
Associativity
13-13
Example
This is an example using all twelve conditions.
if(zip_code GT "10000"
zip_code LT "50000"
pr_rev_group GE "008"
pr_rev_group LE "010"
pr_gout_fail_level == "0"
state != "NY"
first_name starts_with "PH"
last_name ends_with "ING"
company_name contains "TAXI"
in.birth_date is numeric
postal_code[1:1] is alphabetic
company_name is alphanumeric
move "1", flag;
else
move "0", flag;
endif;
Logical
Operators
AND
AND
AND
AND
AND
AND
AND
AND
AND
AND
AND
) then
Description
condition1 OR condition2
condition1 or condition2
condition1 || condition2
13-14
Associativity
Example
if((pr_rev_group == "000" OR pr_rev_group == "009") AND
pr_gout_fail_level == "0") then
/* Construct a new address from postal matcher output
fields */
Nested IF
Statements
Example
rule LABEL1
if(gb_out_match_level = "0") then
if(gb_out_dpndthorough_name <> BLANKS) then
move
gb_out_house_number,
nwaddrl3;
append gb_out_dpndthorough_name, nwaddrl3;
append gb_out_dpndthorough_desc, nwaddrl3;
append gb_out_thorough_name
, nwaddrl4;
append gb_out_through_desc
, nwaddrl4;
else
move
gb_out_house_number
, nwaddrl3;
append gb_out_thorough_name
, nwaddrl3;
append gb_out_through_desc
, nwaddrl3;
endif;
endrule
The sample shows one rule definition called LABEL1, which will
either populate output fields nwaddrl3 or nwaddrl4, depending on
whether the field gb_out_dpndthorough_name is blank or not, as
long as the record had a match level of 0. Both nwaddrl3 and
Creating and Working with TS Quality Projects
Associativity
13-15
Action
Statements
Syntax
verb [:modifier] [source field] [,] [destination field] ;
- orperform rule_name;
Some action statements may include a modifier that changes their
operation slightly. The modifier must immediately follow the verb
and be delimited from it with a single colon.
For example, the append:2spaces statement works like the
append statement, with the exception that two spaces are used for
a delimiter instead of one. The comma separating the source-field
from the destination-field is optional.
Specific action statements take either no, one, or two arguments as
described in the following sections.
Action
Statements
Require
No
Arguments
Statements
Description
copy_all
13-16
Associativity
that has no corresponding input field. For this reason, it
should always be used at the beginning of your script.
Action
Statements
Require
One
Argument
The script language has six action statements that require a single
argument. In each of these statements, the lone argument is used
to specify a destination field and cannot be a literal value.
Statements
Description
pack
upper_case
lower_case
title_case
right_justify
left_justify
right_justify:full
EXPIRY 20001127"
Associativity
13-17
Statements
Description
left_justify:full
Left justifies the contents of the destination field and converts each
occurrence of multiple blanks to a single blank. For example, for a
20-character field containing this value:
"
THE
"
proper_case
proper_case:a
proper_case:A
proper_case:anyline
proper_case:g
proper_case:G
proper_
case:geography
perform
proper_case:n
proper_case:N
proper_case:name
proper_case:s
proper_case:S
proper_case:street
Example:
perform
fix_name_line
13-18
Associativity
Action
Statements
Require
Two
Arguments
Statements
Description
copy
Copies the contents of one field to another, adjusting the data type (if
necessary) to match the description of the output field in the DDL. If
the first argument is a literal, a move operation is performed instead of
a copy.
move
Moves one text field to another. Unlike copy, no conversion from one
data type to another is attempted.
If source-field is longer than destination-field, it is truncated during the
move. If source-field is shorter than destination-field, the destinationfield is padded with blanks after the move.
append
Appends the contents of one field to the end of the contents of another
field, after first adding a single blank character as a separator.
If the destination-field is currently empty (all blanks) then a move
operation is performed instead of append. This makes it possible to
perform a series of append operations on the same destination-field
without creating unwanted blanks at the beginning of the field.
If there is not enough room at the end of the destination-field, the
source-field will be truncated to fit. There must be at least 2 blanks at
the end of the destination-field before an append operation will be
attempted.
append_pack
append:pack
append:0spaces
Works like the append statement, but without the blank separator.
Appends the contents of one field directly to the end of contents of
another field.
There must be at least 1 blank at the end of the destination-field before
this operation will be attempted.
Associativity
13-19
Statements
Description
append:2spaces
Appends contents of one field to the end of the contents of another field
after first adding two blank characters as a separator. May be required
in some countries (e.g. Canada) to separate the postal-code from the
remainder of the line.
If there is not enough room at the end of the destination-field, the
source-field will be truncated to fit. There must be at least 3 blanks at
the end of the destination-field before an append:2spaces operation
will be attempted.
Overlapping
Fields
Example
move "TRILLIUM,
out.temp;
move out.temp[2:4], out.temp[1:4]// following this move the out.temp
// field will contain RILLIUM"
String
Variables
Example
String variables may be used any place in a rule that a DDL field
name can be used.
13-20
// 30ch long
// 256ch long
//10,000ch long
rule
sample1
STRING $name[50];
move in.first_name, $name;
append in.last_name, $name;
move $name, out.full_name;
endrule;
2.
Specify a file name in the Input File Name and Input DDL
Name text boxes.
13-21
3.
4.
2.
3.
4.
2.
Valid delimiters
are Tab, Space,
Semicolon,
Comma, and
Pipe. Characters
other than those
listed must be
enclosed by
quotation marks.
1.
2.
2.
13-22
4.
2.
13-23
Example
A sample usdrrules.sto file might look like this:
Rule
rule label_line
Rule Name
Keyword
#---------------------------------------#
# Output Alignment Section
#---------------------------------------#
if(out.NEWADDRL4(1:5) = "
") then
move out.NEWADDRL5, NEWADDRL4;
move " "
, NEWADDRL5;
endif;
Endrule
endrule
Keyword
2.
Specify the rule name in the Use Rule field. For the example
above, enter label_line. If you are using multiple rules in
the Rules File, place a comma after each rule.
Rule File
Use Rule
13-24
Additional Settings
Additional Settings
You can also specify the following settings:
See Data
Reconstructor
in the TS
Quality
Reference
Guide for
complete
settings
information.
2.
2.
Additional Settings
13-25
2.
2.
3.
In the Debug File text box, accept the default path and file
name, or specify the file to which debugging information will
be written.
2.
2.
13-26
1.
2.
3.
4.
5.
6.
In the Field Selection window, select the fields you used for
the Data Reconstruction process, such as NEWADDRL1 NEWADDRL5.
7.
Click Display to review the data and ensure that it has been
reconstructed properly.
13-27
1.
From the Main menu, click Edit, Add new project step
from palette. The palette displays all the available steps.
You can also select the List View tab and click Add
New Step from Palette. Right-click anywhere in the
Data Flow Architect area and select Add New Step
from Palette.
2.
3.
4.
5.
6.
13-28
13-29
Tip:
You can either
edit the file names
manually or click
the File Chooser
icon to
browse
for and
select the file.
To view the
contents of your
data file,
click the
Data
Browser
icon.
Use the
Dictionary
Editor to view the
contents
of the
DDL file.
1.
2.
Select the Input Data and Input DDL files for the first country
(for example, Canada). Click Add.
5.
13-30
1.
2.
Click the first condition row and select Edit Condition. The
Logic Builder window appears.
3.
4.
5.
Process Settings
13-31
6.
Select the next input file name under Input Files and repeat
the steps.
7.
Process Settings
Once you have specified input and output files, you can specify
settings used to process your data. The settings for processing are
managed in the Advanced Settings window.
Source Identification
Since you are merging multiple records into one file, an input
source identifier should be applied. This indicator designates the
origin of the record.
To specify source identification
The File
Source field is
set to four
characters in
the standard
DDL.
1.
2.
13-32
Select the next input file name under Input Files and repeat
the steps.
4.
1.
2.
3.
Select OK.
4.
13-33
5.
Notice that only selected records from each input file are
included.
6.
7.
8.
13-34
14-1
CHAPTER 14
Packaging Projects
Packaging Projects
14-2
Most users create a script from the steps in the project. This
chapter will describe how to create and run a script, and provides a
summary of Real-Time Processing. Real-Time processing can ensure
that new data entering the database is transformed, cleansed,
enriched and linked at the point of entry. The TS Quality Analyzer
tool can demonstrate a Real-Time processing environment.
This chapter focuses on several tasks:
Use the List View to order and select steps for inclusion to a
batch script
Generate a script to run all selected steps
Save, view and run the script
Export and import projects
Understand the Director architecture and the role of the
cleansing/matching servers
Move from a batch environment to a real-time environment
Understand the role of the business rules
Use the TS Quality Analyzer to sample real-time cleansing
and matching
Batch Script
14-3
Batch Script
You can combine projet steps into a batch script that can be run
on UNIX or Windows platforms. The Windows interface lets you set
up a project that exists on UNIX (by using mapped network drives),
but that can be run using the local Windows client.
Create a Script
Before generating a script to run your project, you must select the
steps to be included.
To select steps
1.
1.
14-4
Edit a Script
2.
3.
Edit a Script
To edit a batch script
1.
The Modify tab allows you to view and edit the batch file.
Select the Modify tab. The steps in the batch file are listed
on the left and the file contents appear on the right.
If you click a step on the left, the right pane will
automatically page down to the steps section in the file.
Any remarks are preceded with a *rem*.
Run a Script
14-5
2.
To edit the script, click Edit. This will open the script in the
WordPad editor.
3.
4.
Click Refresh
Run a Script
To run a batch script
1.
Select the Run tab. From this tab, choose one of the
following methods to run the batch file:
Run as an attached process - If you run the batch file
as an attached process, the Control Center must remain
open. The Notify when job is done running check box
is active only when this option is selected.
Run as a detached process - If you run the batch file
as a detached process, the process will run independently
of the Control Center.
Packaging Projects
14-6
Exporting/Importing Projects
14-7
Exporting/Importing Projects
The Export and Import functions take a Control Center project and
relocate it to a new release directory, or to another machine,
without loss of project steps. This feature lets you move project
contents to different platforms and drive locations.
Whether an upgrade is taking place or a project is being moved
from one licensed server to another, Import/Export procedures
make it easy to move your data quality process from one location to
another. This feature also allows previous versions of the TS Quality
to be migrated successfully into the current environment.
Projects created with TS Quality are fully exportable and can be
imported without loss of user-defined steps. These projects can be
exported into a TS Quality directory.
In order to export and import a project to another physical
machine, the project directory and its accompanying
subdirectories must be physically copied to a media device
and transferred to the other machine.
Packaging Projects
14-8
Export Projects
Export Projects
The first step of the export/import procedure is to export the
project.
To export a project
1.
2.
4.
5.
Select OK.
Import Projects
14-9
Import Projects
The second step of the export/import procedure is to import the
project into the current environment.
To import a project
1.
Enter file names for the Project to Import (the .zip file),
New Project Directory and New Project Name. Use the
navigation buttons to browse for the file and directory.
3.
Under Original Path, you should see the old path for the
project, postal table, census table and software. If you want
to change this information, specify the new location under
the New Path area and click Substitute.
4.
5.
Packaging Projects
14-10
Real-Time Processing
14-11
Real-Time Processing
Now that the batch process is in place, you can leverage the
business rules which you have developed in batch and use them in
a real-time environment. By implementing a real-time solution,
your data can be transformed, cleansed, enriched and linked at
point of entry. This section provides an overview of the TS Quality
real-time processing with the Director. The Director for TS
Quality is an optional application. For information on the Director
for TS Quality, please contact your Trillium Software sales
representative.
The Director
The Director acts as a registry for cleansing and matching servers
that are made available to the calling environment.
14-12
Cleansing Server
Cleansing Server
TS Quality uses a single API approach to simplify the task of moving
data through the various country specific modules. The simple API
eliminates the need for the programmer to know the internal
workings of each TS Quality function. This interface uses a single
configuration file to enable simple construction of complex
transactional data quality processing. The business rules developed
in the batch application are used in this configuration file.
Matching Server
The Matching Server supports reference matching, allowing you to
compare an incoming record to the database of existing records.
Match results are returned to the calling application.
14-13
Packaging Projects
14-14
Process Method
When a record is entered into the data entry window, it is collected
and sent to the real-time cleansing engine when the user clicks the
Cleanse button. The cleansed results are displayed on the screen.
Next, the cleansed transaction record can be compared to records
in the master database with the Relationship Linkers reference
match function. Candidate records are retrieved from the database
for the given window key. The transaction record and the retrieved
records are then sent to the reference match function for
comparison.
The calling application must retrieve the records from the
database using the window key from the transaction
record. The retrieval of records is not a function of TS
Quality.
If a match is found, the matched record is displayed. If no match is
found, a message will be displayed on the screen.
See Matching on page 8-8 for instructions on how to run
the Matching function using the TS Quality Analyzer.
Creating and Working with TS Quality Projects
15-1
CHAPTER 15
15-2
This chapter describes commands that run the TS Quality modules
on UNIX and 32-bit PC platforms. Use these commands for two
purposes:
run modules
create log files
15-3
Syntax
The syntax for command line execution is:
<program_name>
<settings_file_name>
<log_file_name>
where:
Example
program_name
settings_file_name
log_file_name
..\settings\ustranfrmr.stx ..\data\error.log
15-4
Program Names
Program Names
Table 15.1 contains a complete list of program names. Use the
appropriate program name in your command line:
Program Name
Transformer
tranfrmr
cusparse
apparse (China, Japan, Korea, and Taiwan)
Parsing Customization
prcustom
apcustom (China, Japan, Korea, and Taiwan)
busparse
Postal Matchers
globrtr
winkey
Relationship Linker
rellink
Create Common
common
Data Reconstructor
datarec
tsqdisp
fileupdt
Program Names
15-5
tsqfreq
mrgsplit
Resolve Utility
resolve
tsqsetsl
Sort Utility
tsqsort
15-6
Program Names
16-1
CHAPTER 16
16-2
TS Quality offers a number of utilities to perform specific tasks. All
utilities can be executed from the TS Quality Control Center, in a
batch process, or from the command line. This chapter explains
how to use these utilities:
File Display utility
File Update utility
Frequency Count utility
Merge Split utility
Resolve utility
Set Selection utility
Sort utility
Each utilitys basic settings, such as input and output settings, are
the same as the TS Quality core modules. This chapter focuses on
the process settings information (the Advanced Settings window)
for each utility.
Refer to the TS Quality Reference Guide for
complete settings information on each utility.
16-3
16-4
Outer
Key
Group
Inner
Key
Group
2.
Select the Outer Key Field and Inner Key Field from the
drop-down menu.
In addition, you can specify the following settings for the outer key
and inner key:
Setting
Description
16-5
Setting
Description
Outer Key
Delimiters
Inner Key
Delimiters
16-6
A red flag
indicates a
REQUIRED
settings.
1.
2.
In Title 1, enter the first title line of the report. This line
must be enclosed within quotation marks ( ) if there is a
space between characters. For example: Matching Report
3.
4.
2.
3.
4.
5.
16-7
Setting
Description
Carriage Return
Maximum Lines
Per Page
Compress Blank
Lines
Inner Break
Spacing
Outer Break
Spacing
Outer Key
Minimum
16-8
Field Settings
Field Settings
The fields that will be displayed in the report should also be
identified.
LINE_01
LINE_02
LINE_03
LINE_04
Field Settings
2.
See TS Quality
Reference
Guide and
Online Help
for complete
settings
information.
16-9
Setting
Description
Report Value
Report Value
Encoding
16-10
16-11
Example
In these sample master and transaction files, the following fields
are used (the Match Key is the Record_Key field):
Tran File
Master File
Record#
Record_Key
Street
City
Record#
Record_Key (Match Key)
Name
(Match Key)
State
If the output DDL has all fields from the master and transaction
files, the match master file includes the following fields. Therefore,
the value in the common field, Record#, will be overwritten by the
transaction file:
Match Master File
The values of
these fields
are inserted
from the tran
file.
Record #
Record_Key
Name
Street
City
State
16-12
Example
Input
In this example, the master file contains the customers names and
the transaction file contains the customers addresses.
Match key: Record_Key field.
Master Input File (master.dat)
Rec #
Record_Key
Name
0001
John Nicoli
0001
J Nicoli
0002
Mary Rogers
0003
Kevin McCarthy
Record_Key
Street
City
State
100
0001
25 Linnell Circle
Billerica
MA
200
0001
1 Elm St.
Nashua
NH
300
0002
25 Linnell Circle
Billerica
MA
400
0002
12 Oak St.
Waltham
MA
500
0004
3 Royal Court
Boston
MA
Output DDL
The following fields are used in the DDL for ALL master output files
(match_master.ddx, match_dup_master.ddx, unmatch_
master.ddx):
Rec #
Record_Key
Name
Street
16-13
Rec #
City
State
Output
The program searches for records in master.dat that have the
same key values as tran.dat. In this case, the records with
0001 and 0002 in the Record_Key field are matched
records.
1.
2.
Record_Key
Name
Street
City
State
100
0001
John Nicoli
25 Linnell Circle
Billerica
MA
300
0002
Mary Rogers
25 Linnell Circle
Billerica
MA
3.
ON
16-14
Rec #
Record_Key
Name
Street
City
State
100
0001
J Nicoli
25 Linnell Circle
Billerica
MA
Record_Key
Name
0001
J Nicoli
4.
OFF
Street
City
State
Record_Key
Name
0003
Kevin McCarthy
5.
Street
City
State
Record_Key
Street
City
State
100
0001
25 Linnell Circle
Billerica
MA
300
0002
25 Linnell Circle
Billerica
MA
16-15
Record_Key
Street
City
State
200
0001
1 Elm St.
Nashua
NH
400
0002
12 Oak St.
Waltham
MA
2.
Specify Match Keys by selecting field names from the dropdown list.
3.
4.
16-16
2.
1.
2.
16-17
FIRST_NAME LAST_NAME
John
John
Nicoli
John
Nicoli
John
Smith
John
Smith
Bernard
Bernard
LeCuyer
Bernard
LeCuyer
Bernard
LCuyer
Bernard
LCuyer
Clara
Clara
Currier
Clara
Currier
Iulia
Iulia
Andrei
Iulia
Andrei
Jack
Jack
Sweeney
Jack
Sweeney
STREET_ADDR
13 Yellow Way
23 Purple Circle
19 Blue St
19 Blue St
18 Red Road
14 Orange Parkway
17 Black Street
16-18
Count Settings
To specify fields to count
A red flag
indicates a
REQUIRED
setting.
See TS Quality
Reference
Guide and
Online Help
for complete
settings
information.
1.
2.
Click Entry Settings. Select Field Name from the dropdown list. This is the field which will be counted.
3.
Select either
Literal or
Mask for
Count Type.
4.
Click Field
Settings.
Select Sort
Type
(Descending Order or Ascending Order) and Sort Option
(Count or Value). The Sort Option specifies whether to sort
the results by Count or by field Value.
5.
16-19
2.
In the Input Data File field, type or browse to the input file
you wish to use.
3.
In the Input DDL File field, type or browse to the inpt DDL
file associated with the input data file you specified in Step 2.
4.
Click Add.
5.
Repeat Steps 2-3 until youve added all DDL files you want to
use to create the common output format.
6.
7.
16-20
Use the Input DDL drop-down menu to select the DDL file
you want to use to map fields to an output DDL file. The
input DDL fields appear in the left pane and the final output
DDL fields appear in the right-pane.
9.
Use the buttons in the center panel to refine the output DDL
list of fields. You can choose from these options:
Addadds the selected input DDL field to the output DDL
list.
Deletedeletes a selected output DDL field from the list.
Merge Files
16-21
When you are ready, click Save to save the output DDL field
mapping. When the Merge Split Utility step runs, it will create
an output DDL file that uses this mapping.
Merge Files
For a merge operation, all input files that will be merged and the
output file MUST be the same shape. In other words, they must use
the same DDL.
Input files must be sorted by match keys.
Example
In this example, Input 1 will be merged into Input 2 using the
Name field as Match Key.
Input 1
Custmer_ID#
Name
0000001
John Nicoli
0000002
Mary Nicoli
16-22
Merge Files
Input 2
Custmer_ID#
Name
9000001
Alice Rogers
9000002
Kevin McCarthy
The following DDLs are used for ALL input files and output files:
Customer_ID#
Name
On output, the program copies Match Key values from Input 1 and
Input 2 along with other components of data. The record order will
be determined by the order of key values. As a result, the total
number of records is the sum of the number of records from both
Input 1 and Input 2.
Output File
Customer_ID# Name
9000001
Alice Rogers
0000001
John Nicoli
9000002
Kevin McCarthy
0000002
Mary Nicoli
To merge files
A red flag
indicates a
REQUIRED
settings.
1.
2.
3.
On the Field Settings tab, select Match Key from the dropdown list.
You can specify up to five fields for the Match Key,
separated by commas.
Split a File
16-23
Split a File
Splitting a file is useful when your system has a file-size limit or you
want to separate a file into manageable pieces. The pieces can later
be re-assembled using the Merge operation. You can split the input
file by the number of records or bytes per segment.
To split a file
1.
2.
3.
Partition Method
Description
Round Robin Number Split the file by number of records. If this is selected,
the Round Robin Number must be specified.
Round Robin Keys
Ranges
Ranges Stable
Split the file by the field name and field length. The
field name and field length are specified by Match
Key.
Records Per Segment Split the file by segment. If this is selected, Records
Per Segment must be specified.
Bytes Per Segment
16-24
Split a File
4.
Examples
Round Robin Keys
In this example, the input file will be split to Output 1 and Output 2
using the Round Robin Keys method, and Lev2_matched field as
the Match Key.
Input File
Name
Lev2_matched
B McCarthy
000001
Bob McCarthy
000001
Catherine Rogers
000002
Cathy Rogers
000002
On output, the program splits the input file. The first Lev2_matched
group will be written to Output1, and the second Lev2_matched
group will be written to Output2.
Output 1
Name
Lev2_matched
B McCarthy
000001
Bob McCarthy
000001
Output 2
Name
Lev2_matched
Catherine Rogers
000002
Cathy Rogers
000002
16-25
Lev2_matched
B McCarthy
000001
Bob McCarthy
000001
Catherine Rogers
000002
Cathy Rogers
000002
Lev2_matched
B McCarthy
000001
Cathy Rogers
000002
Output 2
Name
Lev2_matched
Bob McCarthy
000001
Output 3
Name
Lev2_matched
Catherine Rogers
000002
16-26
Partition Method
Description
Round Robin Number Split the file by number of records. If this is selected,
a Round Robin Number must be specified.
Round Robin Keys
Split the file by the key (field). The input file must be
sorted by this field. If this is selected, Round Robin
Keys must be specified.
Ranges
Ranges Stable
Split the file by field name and field length. The field
name and field length are specified by Match Key.
Records Per Segment Split the file by segment. If this is selected, Records
Per Segment must be specified.
Bytes Per Segment
3.
On the Field Settings tab, select Match Key from the dropdown list.
See TS Quality
Reference
Guide and
Online Help
for complete
settings
information.
Resolve Utility
16-27
Resolve Utility
See TS Quality
Reference
Guide and
Online Help
for complete
settings
information.
Example
Merge Files
Resolve
16-28
Recid
0007
0015
0009
0022
Type Pat
P
329
P
210
P
230
P
230
The Resolve Utility processes this file and produces the following
output:
Recid
0007
0015
0009
0022
Recid
0002
0002
0002
0002
Link Field
16-29
Link Field
To link files
A red flag
indicates a
REQUIRED
setting.
1.
2.
In From Link Field, select the DDL field from the drop-down
list. This DDL field contains the starting key of the link
(generally located in the left column of the Relationship
Linkers link output file).
3.
Description
Process Group
Records
Process Group
Memory
16-30
Example
For example, if the Match Key is Household_Number field, the
program first selects records that have the Household_Number field
in the input file.
Assume that, in the Select Record Conditions or Bypass Record
Conditions, the condition is set to Household_number=00001. In
this case, the program selects records if the values in the
Household_Number field equal 00001.
After running the program, you can verify the results of the select
operation by viewing the output file in the Data Browser.
Select Records
16-31
Select Records
The selection will be applied to records in the input file based on the
Match Key field specified by the user. Records with the same
match key values will be selected and written to the output file.
To select records
1.
2.
In Match Key, select the DDL field from the drop-down list.
The program selects records which have this field.
2.
Setting
Description
Maximum Total
Records
16-32
Select Records
Setting
Description
Minimum Records Per Numeric value. Any key set with a record count that
Set
is less than or equal to this value will be discarded
without processing.
See TS Quality
Reference
Guide and
Online Help
for complete
settings
information.
Maximum Records
Per Set
Maximum Set
2.
Sort Utility
16-33
Sort Utility
See TS Quality
Reference
Guide and
Online Help
for complete
settings
information.
The Sort Utility reads records from input data files and sorts them
to produce a single output file. The single output file is created in a
common shape with a single associated Data Dictionary Language
(DDL) file.
The sort functions support up to 99 sort keys. During the sort
step, you can select fields from input records to be written to the
output. This process is controlled through the input and output DDL
field-mapping function.
See Chapter 9 and Chapter 10 for details of the Sort
Utility.
16-34
Sort Utility
17-1
CHAPTER 17
17-2
TS Quality allows you to customize your work area by changing
several configuration options:
fonts used within the Control Center
size and style of text
color used within the Control Center
17-3
1.
17-4
Select the Display tab. In the Category box, click the item
that you want to change.
2.
In Font Selection, select the new font, style, and size from
the pull-down menus on the right. As you make your
changes, the text in the Sample box reflects the changes.
3.
Click OK. The Preferences tab closes and the new font
settings are applied to the selected item.
Select the Display tab. In the Category box, click the item
that you want to change. The Color section becomes active.
17-5
b.
HSB
The HSB tab lets you define the color by Hue (the colors tint),
Saturation (the hues purity), and Brightness (the colors
brightness):
a.
b.
c.
d.
RGB
The RGB tab allows you to define a color as a combination of the
Red, Green, and Blue primary colors.
a.
17-6
When you are satisfied with your selection, click OK. The
Foreground/Background window closes.
2.
3.
2.
3.
A-1
APPENDIX A
A-2
A-3
Description
NOTRANS
ASCII
BIG5
Traditional Chinese
CCSID937
Traditional Chinese
CP037
EBCDIC, IBM037
CP1250
A-4
Description
CP1251
Cyrillic (Slavic)
CP1252
Latin1 (ANSI)
CP1253
Greek
CP1254
Turkish
CP1255
Hebrew
CP1256
Arabic
CP1257
Baltic
CP1258
Vietnamese
CP932
CP936
CP949
Korean
CP950
Traditional Chinese
EUCCN
EUCJP
EUCKR
EUCTW
GB12345
Traditional Chinese
HZGB2312
IBM-83-4040
IBM-83-4242
ISO2022JP
Japanese, ISO-2022-JP
ISO-8859-7
Latin/Greek
ISO 8859-9
A-5
Description
JEF-83-A1A1
JEF-83-4040
JEF-78-A1A1
JEF-78-4040
JOHAB
Korean
KEIS-83-A1A1
KEIS-83-4040
KEIS-78-A1A1
KEIS-78-4040
LATIN1
ISO 8859-1
LATIN2
ISO 8859-2
LATIN4
Baltic
LATIN7
Baltic
LATIN9
UCS2
UTF7
UTF8
UNICODE20:BIGENDIAN
Unicode with the most significant byte first. Other name: bigendian
UNICODE20:LITTLE
-ENDIAN
Unicode with the least significant byte first. Other name: littleendian
A-6
Trillium Types
Trillium Types
Trillium Type is a data type of DDL field. The following table is a list
of Types used in TS Quality. Many of these Types can be used
together (example: PACKED DECIMAL).
Description
ASCII NUMERIC
BITFIELD
BOOLEAN
BINARY
INTEGER
PACKED
Trillium Types
A-7
Description
ZONED DECIMAL
A-8
Date Format
Date Format
Date format is a type of data which may contain only valid dates.
The following table contains a list of valid date formats.
Data Format
ASCII AMERICAN
MM(/)DD(/)YYYY. 8 or 10 bytes.
ASCII EUROPEAN
DD(/)MM(/)YYYY. 8 or 10 bytes.
ASCII JULIAN
(YY)YY(/-)DDD. 5, 7, or 8 bytes.
YYYY(/-)DDD. 7 or 8 bytes.
YYYY(/)MM(/)DD. 8 or 10 bytes.
EBCDIC AMERICAN
MM(/)DD(/)YYYY. 8 or 10 bytes.
EBCDIC EUROPEAN
DD(/)MM(/)YYYY. 8 or 10 bytes.
EBCDIC JULIAN
(YY)YY(/-)DDD. 5, 7, or 8 bytes.
YYYY(/-)DDD. 7 or 8 bytes.
YYYY(/)MM(/)DD. 8 or 10 bytes.
PACKED AMERICAN
0MMDDYYYY. 5 bytes.
PACKED EUROPEAN
0DDMMYYYY. 5 bytes.
PACKED JULIAN
(YY)YYDDD. 3 or 4 bytes.
YYYYDDD. 4 bytes.
0YYYYMMDD. 5 bytes.
MMDDYYYY. 4 bytes.
DDMMYYYY. 4 bytes.
0(YY)YYDDD. 3 or 4 bytes.
0YYYYDDD. 4 bytes.
Date Format
A-9
Data Format
YYYYMMDD. 4 bytes.
30 1 1
You must use valid month/day
combinations. If the month/day is invalid,
the output data is blanked out.
SJIS JAPANESE DATE
1997 1 1
You must use valid month/day
combinations. If the month/day is invalid,
the output data is blanked out.
ASCII ROMAJI IMPERIAL DATE
A-10
CLASS Keyword
CLASS Keyword
Class keyword specifies the format to be used for the date field. By
using the class keyword, you can convert any 2-digit year into a 4digit year.
The following table describes all specifications for the CLASS
keyword.
Description
DATE FORWARD
Converts any 2-digit year into a 4-digit year when the data value is
equal to, or greater than, the current year.
Top of date window = current year + 99
Bottom of date window = current year
Example
If the current year is 2005:
Top of date window = 2104 (2005 + 99 = 2104)
Bottom of date window = 2005
DATE BACKWARD
Converts any 2-digit year into a 4-digit year when the data value is
equal to, or less than, the current year.
Top of date window = current year
Bottom of date window = current year 99
Example
If the current year is 2005:
Top of date window = 2005
Bottom of date window = 1906 (2005 99 = 1906)
CLASS Keyword
A-11
Description
DATE WINDOW
{nnn}
Converts a 2-digit year into a 4-digit year, according to a userspecified date window. You can specify 1 to 4-digit numbers in
{nnn}.
----------------------------------------------------------------Top of date window = if {nnn} >100 and {nnn} > current year, then
{nnn} is the top of the date window.
Bottom of date window = top of the date window - 99
If the current year is 1999: CLASS IS DATE WINDOW 2030
Top of date window = 2030 (2030 > 100 and > the current year)
Bottom of date window = 1931 (2030 99 = 1931)
----------------------------------------------------------------Top of date window = bottom of the date window + 99
Bottom of date window = If {nnn} >100 and {nnn} < current year,
then {nnn} is the bottom of the date window.
If the current year is 1999: CLASS IS DATE WINDOW 1967
Top of date window = 2066
(1967 + 99 = 2066)
Bottom of date window = 1967 (1967 > 100 but < the current year)
----------------------------------------------------------------Top of date window = If {nnn} > 0 and {nnn} < 100, then top of the
date window is current year + nnn
Bottom of date window = current year + nnn -99.
If the current year is 1999: CLASS IS DATE WINDOW 30
Top of date window = 2029
(30 > 0 but < 100, 1999 + 30)
Bottom of date window = 1930 (1999 + 30 -99).
----------------------------------------------------------------Top of date window = current year nnn + 99.
Bottom of date window = If {nnn} < 0, then bottom of date window
is current year nnn
If the current year is 1999:
CLASS IS DATE WINDOW -30
Top of date window = 2068 (1999 - 30 + 99)
Bottom of date window = 1969 (-30 < 0, 1999 - 30).
A-12
CLASS Keyword
B-1
APPENDIX B
B-2
Parser Results
Parser Results
The Parser generates Completion Codes and Review Codes to
identify specific conditions that occur for each record being parsed.
You can review these codes to analyze the Parser results.
Description
No error
Pattern-Word-City Tab Error Pattern, Word and/or City tables not readable.
B-3
Description
Parser not successfully initialized. Settings file may not be correctly defined. Check path and
file name.
B-4
For each of these fields, a flag value of '1' is placed in the position in
the field that corresponds to the value of the condition. So in our
earlier example, where a review code of 26 was reported, you would
find a '1' in the field pr_street_review_codes at position 26.
Table B-2 lists review codes, review groups, and descriptions for the
Customer Data Parser.
Review
Group
Description
000
008
009
009
009
009
009
009
009
009
10
009
11
009
12
009
13
009
14
009
Name Codes
B-5
Review
Group
Description
15
009
16
009
17
009
18
009
19
009
20
009
21
009
22
010
26
011
27
011
28
011
29
011
30
012
31
012
32
012
33
013
34
013
35
013
36
013
37
013
38
013
Street Codes
B-6
Review
Group
Description
39
013
40
013
41
013
42
013
43
013
44
013
45
013
46
013
47
013
48
013
49
013
50
013
51
013
52
013
53
013
54
013
55
013
56
013
57
013
58
013
59
020
B-7
Review
Group
Description
Geography Codes
61
014
62
014
63
014
64
014
66
015
67
015
70
015
71
015
72
015
73
015
74
015
75
015
76
015
77
016
78
000
79
017
80
018
001
Unidentified token
84
019
Unidentified line
85
001
B-8
Review
Group
Description
86
001
87
001
88
001
89
001
90
002
91
003
92
004
93
005
No names identified
94
006
No street identified
95
007
No geography identified
96 - 99
Currently unassigned
001
Unidentified token
Description
B-9
Description
005
No names identified
006
No street identified
007
No geography identified
014
No city or county
identified
019
Unidentified Line
008
011
013
012
Invalid directional
Direction is inconsistent
017
Conflicting geography
types
B-10
Description
015
018
016
020
010
009
004
Foreign address
003
Hold mail
B-11
Description
002
000
Description
Unidentified pattern
087
086
088
083
Unknown token
090
No data found
000
B-12
Description
000
009
010
Unknown tokens remaining after matching front and back parts of string.
041
042
043
044
No business name.
045
051
052
053
054
No surname.
055
No given name.
056
057
058
Unidentified token.
Review Groups
Review Code
Description
921
B-13
Review Groups
Review groups are groups of review codes that illustrate types of
conditions present in the data, whereas review codes describe
actual specific conditions. Thus, review groups provide a way for
users to quickly understand the general types of conditions
occurring in a data. Review groups are written to the pr_review_
group field.
Review Group
Description
002
Missing name (no business name and missing either first or last name).
003
Business name does not contain a business keyword from the word/
pattern file.
004
005
B-14
Exception Status
Exception Status
The Customer Data Parser for Japan generates exception status for
each record to highlight specific conditions. Exception status are
written to the pr_status/pr_h_status fields. The following table lists
the individual exception status.
The following table shows values for the exception status.
Value
Description
Mode
00
ALL
20
ALL
22
ALL
30
BNP
CLUE
31
BNP
CLUE
32
BNP
CLUE
33
*All parsed string is in alphabet with word delimiter, and no business clue found.
BNP
CLUE
34
All parsed string is in katakana with word delimiter, and no business clue found.
BNP
CLUE
35
All parsed string is in hiragana with word delimiter, and no business clue found.
BNP
CLUE
36
BNP
CLUE
40
PNP
Index
Index
A
ABSOLUTE
Data Comparison Calculator 11-21
ALPHA
Intrinsic Attributes
Parsing Customization 7-17
Asian Characters
operators 5-33
Associativity
Data Reconstructor 13-6
Asterisks (*)
Parsing Customization 7-21
Attribute
DDL Editor 3-11
Attribute Modifiers
Category 7-11
Function 7-11
Gender 7-11
Parsing Customization 7-10
Recode 7-12
Attributes
Parsing Customization 7-10
B
Batch Script
create a script 14-3
Edit a script 14-4
Run a script 14-5
batch script 14-3
Binary Data Strings
Data Reconstructor 13-9
blanks
Frequency Count Utility 16-17
BNP 6-11
BNP_CLUE 6-11
Build a Conditional Statement 5-35
I-1
C
Category
Attribute Modifiers 7-11
Character Translation 5-9
City Directory File 6-17
City Name Changes
Locality 7-15
Post Town 7-15
CJKTOARABICNUM operator 5-30
CJKTOFULL operator 5-29
CJKTOHALF operator 5-28
Class
DDL A-10
DDL Editor 3-11
CLASS Keyword A-10
Class keyword
DDL 2-42
collating sequence
Sort Utility 9-5
Colleting sequence
ASCII 9-6
EBCDIC 9-6
FOLDED_ASCII 9-6
FOLDED_EBCDIC 9-6
MULTI_NATIONAL 9-6
Comman line execution
Program names 15-4
I-2
Index
syntax 15-3
Comment
DDL Editor 3-11
Comment Lines
Parsing Customization 7-21
Comments
Data Reconstructor 13-7
Common Fields
Create Common Utility 12-6
Commonization
Create Common Utility 12-3
Comparison Routine
Data Comparison Calculator 11-21
Comparison Routines
Relationship Linker 10-13
Completion Codes
Business Data Parser 6-36
Customer Data Parser 6-25
COMPOSE or COMP 5-30
Conditionals 5-21
Logic Builder 5-35
Operators 5-26
Syntax 5-21
IF/ELSE Statement 5-21
Control Center
Data Flow Architect 2-20
Graphics View 2-21
List View 2-26
Project Panel 2-16
Project Viewer 2-17
Step Viewer 2-19
Conventions
Parsing Cutomization 7-21
Country Settings 4-7
Create Common
Decision Routines 12-12
Create Common Utility 12-3
Common Fields 12-6
Commonization 12-3
D
Data Browser 3-3
Index
I-3
I-4
Index
E
Encoding
DDL 2-42
Error Report
Field and Pattern Lists 11-18
Exporting projects 14-7
F
Field and Pattern Lists
Error Report 11-18
Field Files
Relationship Linker 10-21
Field List Editor
Comparison Routine 11-14
Description 11-14
Field Name 11-14
Propagation Routine 11-14
Routine Modifier 11-14
Score 11-14
Field Name
DDL Editor 3-10
Field Scanning 5-10
Field Selection
Data Browser 3-4
Field Settings
File Display Utility 16-8
Fields
Data Reconstructor 13-4
Function
Attribute Modifiers 7-11
G
Gender
Attribute Modifiers 7-11
Global Data Router 4-3
Country Rules file 4-6
Country Settings 4-7
DDL Settings 4-9
Fields Settings 4-9
Global Geography Table 4-6
Global Rules file 4-6
NOMATCH file 4-4
Rules Files 4-6
Separate Output 4-3
Single Output 4-4
Global Geography Table 4-6
Grade Pattern Editor
Category 11-14
Field Name Columns 11-15
Pattern ID 11-14
Graphics View
Control Center
Data Flow Architect 2-21
Index
H
Help
Control Center 2-8
HIRAGANASTOL operator 5-30
How to Use Operators for Asian
Characters 5-33
HYPHEN
Intrinsic Attributes
Parsing Customization 7-17
I
IF Statements
Data Reconstructor 13-10
IF/ELSE
Data Reconstructor 13-3
Import projects
Windows to Unix 14-10
Importing projects 14-7, 14-9
Inner Key
File Display Utility 16-3
Insert
Parsing Customization 7-7
Intrinsic Attributes
Parsing Customization 7-17
J
JKANATOROMAN operator 5-29
JROMANTOKANA operator 5-29
K
KTOROMAN operator 5-29
L
Length
DDL Editor 3-10
Line Definitions 6-19
Line Lengths
Parsing Customization 7-21
Line Type
I-5
Geography 7-8
Miscellaneous 7-8
Name 7-8
Street 7-8
Line Types
Parsing Customization 7-8
Linking File
Relationship Linker 10-18, 10-25
List View
Control Center
Data Flow Architect 2-26
literal data string
Frequency Count Utility 16-17
Literal Values
Data Reconstructor 13-4
M
MALINK
Resolve Utility 16-27
Mask 5-17
Transformer 5-17
mask shapes
Frequency Count Utility 16-17
Masks
Parsing Customization 7-6
Recodes
Parsing Customization 7-12
master file
File Update Utility 16-10
Match Key
File Update Utility 16-10, 16-15
Set Selection Utility 16-31
Match Key Level Settings
Create Commmon Utility 12-6
Match Level Codes
Postal Matchers 9-15
Match Master Duplicate File
File Update Utility 16-13
Match Master File
I-6
Index
N
Name and Address Format
project 2-13
NUMERIC
Intrinsic Attributes
Parsing Customization 7-18
O
Operations
Parsing Customization 7-6
Operators
Data Reconstructor 13-6
Operators for Asian Characters 5-28
Outer Key
File Display Utility 16-3
P
Parser Customization Editor
Index
I-7
Q
Quotation Marks
Parsing Customization 7-21
R
Real-Time Processing 14-11
Director 14-11
Recode
Attribute Modifiers 7-12
Record Length
DDL Editor 3-10
Record Name
DDL Editor 3-10
Redef
DDL Editor 3-10
I-8
Index
Redefine
DDL 2-40
reference file
Relationship Linker 10-24
Reference Level1 Number
Relationship Linker 10-26
Reference Level2 Number
Relationship Linker 10-26
Reference Linking
Relationship Linker 10-13, 10-24
Reference Record ID
Relationship Linker 10-26
Relationship Linker 10-3
Business level 10-13
Comparison Routines 10-13
Consumer level 10-13
Field Files 10-21
Pattern Files 10-21
Reference File 10-24
Reference Level1 Number 10-26
Reference Level2 Number 10-26
Reference Linking 10-13, 10-24
Reference Record ID 10-26
Window Key 10-3
Window Linking 10-13, 10-18
Window Size 10-22
Relationship Linker Results Analyzer
11-2, 11-3
fields to display 11-7
matched records 11-5
records to display 11-9
suspect records 11-5
Relationship Linker Rule Editor 11-12
Field List Editor 11-13, 11-14
Grade Pattern Editor 11-13, 11-14
Reserved Words
Data Reconstructor 13-5
Resolve Utility 16-27
MALINK 16-27
multi-linking 16-27
transitivity 16-27
Review Codes
Business Data Parser 6-36
Customer Data Parser 6-25
Review Groups
Customer Data Parser 6-26
ROMAJITOHIRAGANA or RTH 5-29
Round Robin Keys
Merge Split Utility 16-24
Round Robin Number
Merge Split Utility 16-24
Routine Modifier
Comparison Routine 11-21
Rule Script Language
Data Reconstructor 13-4
Rules File
Data Reconstructor 13-22
rules file
Data Reconstructor 13-3
Rules Files 4-6
S
Save view
Data Browser 3-6
Score
Data Comparison Calculator 11-21
Select and Bypass Records
Data Reconstructor 13-30
Select or Bypass Records 5-37
Select/Bypass Records
Logic Builder 5-37
Set Selection Utility 16-30
Sort Fields
Sort Utility 9-5
Sort Utility 16-33
.srt 9-2
Collating sequence 9-5
for Postal Matchers 9-2
Index
JUST_DUPS 9-7
KEEP_ALL 9-6
KEEP_NONE 9-6
KEEP_ONE 9-6
Sort Fields 9-5
Source Identification
Transformer 13-31
Special Entries
Parsing Customization 7-14
Split a File
Merge Split Utility 16-23
split rules
Merge Split Utility 16-19
Standard Definitions Table
Parsing Customization 7-3
Standardization
TS Qulity Analyzer 8-4
Start Pos
DDL Editor 3-10
Step Viewer
Control Center 2-19
Sub-tokens
Parsing Customization 7-5
Survivor record
Create Common Utility 12-8
Survivorship
Create Common Utility 12-3
Synonym
Parsing Customization 7-12
Syntax
Command line execution 15-3
Syntax of Definitions
Parsing Customization 7-4
T
Table Recoding 5-17
Title
File Display Utility 16-5
I-9
Tokens
Parsing Customization 7-4
transaction file
File Update Utility 16-10
Transformer 5-2
Character Translation 5-9
Field Scanning 5-10
File Trace Key 5-38
hex conversion 5-9
Source Identification 13-31
Table Recoding 5-17
transitivity
Resolve Utility 16-27
Trillium Types A-6
TS Discovery 3-12
TS Quality Analyzer 8-3
Cleansing 8-4
Master Database 8-8
Matching 8-4, 8-8
Standardization 8-4
Type
DDL Editor 3-10
U
Underscores
in city name changes 7-14
Unmatch Master File
File Update Utility 16-14
Update ORIGINAL_RECORD Length
DDL Editor 3-10
US City Problems
Parsing Customization 7-34
User Rule
Data Reconstructor 13-22
User-Defined Attributes
Parsing Customization 7-10
Using Multiple Input Files to Create an
Output DDL 5-7
I-10
Index
V
View input data
Data Browser 3-3
W
Window Key
Sort 10-10
Window Key Field 10-7
Window Key Generator 10-3
Window Key Rule 10-3
Window Keys 10-3
Window Key Rule