Data Distribution, Hashing, and Index Access

Module 2: Data Distribution, Hashing, and Index Access
After completing this module, you will be able to: Describe the data distribution form and method. Describe Hashing. Describe Primary Index hash mapping. Describe the reconfiguration process. Describe a Block Layout. Describe File System Read Access.
Data Distribution
Records From Client (in random sequence) 2 32 67 12 90 6 54 75 18 25 80 41
From Host
Teradata
Data distribution is dependent on the hash value of the primary index.
EBCDIC Parsing Engine(s) ASCII Message Passing Layer Parsing Engine(s)
ASCII
Converted and Hashed Distributed
AMP 1
AMP 2
AMP 3
AMP 4
Formatted
2 18
12 54 41 90
80 67 75 32 6
25
Stored
Hashing
The Hashing Algorithm creates a fixed length value from any length input
string.
Input to the algorithm is the Primary Index (PI) value of a row. The output from the algorithm is the Row Hash. A 32-bit binary value. The logical storage location of the row. Used to identify the AMP of the row. Table ID + Row Hash is used to locate the Cylinder and Data Block. Used for distribution, placement, and retrieval of the row. Row Hash uniqueness depends directly on PI uniqueness. Good data distribution depends directly on Row Hash uniqueness. The algorithm produces random, but consistent, Row Hashes. The same PI value and data type combination always hash identically. Rows with the same Row Hash will always go to the same AMP. Different PI values rarely produce the same Row Hash (Collisions).
Hash Related Expressions

The SQL hash functions are:
HASHROW (column(s)) HASHAMP (hashbucket) HASHBUCKET (hashrow) HASHBAKAMP (hashbucket)
Example 1:
SELECT HASHROW ('Teradata') ,HASHBUCKET (HASHROW ('Teradata')) ,HASHAMP (HASHBUCKET (HASHROW ('Teradata'))) ,HASHBAKAMP (HASHBUCKET (HASHROW ('Teradata'))) Hash Value F66DE2DC Bucket Num 63085 AMP Num 2 AS AS AS AS "Hash Value" "Bucket Num" "AMP Num" "AMP Fallback Num" ;
AMP Fallback Num 3
Example 2:
SELECT HASHROW ('Teradata') ,HASHROW ('Teradata ') ,HASHROW (' Teradata') Hash Value 1 F66DE2DC Hash Value 2 F66DE2DC AS "Hash Value 1" AS "Hash Value 2" AS "Hash Value 3 ; Hash Value 3 53F30AB4
Hashing Numeric Data Types

The Hashing Algorithm hashes the same numeric value in different numeric data types to the same hash value. The following data types hash the same: BYTEINT, SMALLINT, INTEGER, DECIMAL(x,0), DATE
Example: CREATE TABLE tableA (c1_bint BYTEINT ,c2_sint SMALLINT ,c3_int INTEGER ,c4_dec DECIMAL(8,0) ,c5_dec2 DECIMAL(8,2) ,c6_float FLOAT ,c7_char CHAR(10)) UNIQUE PRIMARY INDEX c1_bint, c2_sint); INSERT INTO tableA (5, 5, 5, 5, 5, 5, '5'); SELECT HASHROW (c1_bint) ,HASHROW (c2_sint) ,HASHROW (c3_int) ,HASHROW (c4_dec) ,HASHROW (c5_dec2) ,HASHROW (c6_float) ,HASHROW (c7_char) tableA; Output from SELECT Hash Byteint Hash Smallint Hash Integer Hash Dec80 Hash Dec82 Hash Float Hash Char 609D1715 609D1715 609D1715 609D1715 BD810459 E40FE360 334EC17C AS AS AS AS AS AS AS "Hash Byteint" "Hash Smallint" "Hash Integer" "Hash Dec80" "Hash Dec82" "Hash Float" "Hash Char"
FROM
Multi-Column Hashing

The Hashing Algorithm uses multiplication and addition to create the hash value for a multi-column index. Assume PI = (A, B)
[Hash(A) * Hash(B)] + [Hash(A) + Hash(B)] = [Hash(B) * Hash(A)] + [Hash(B) + Hash(A)]
Example: A PI of (3, 5) will hash the same as a PI of (5, 3) if both c1 & c2 are equivalent data types.
CREATE TABLE tableB (c1_int INTEGER ,c2_dec DECIMAL(8,0)) UNIQUE PRIMARY INDEX (c1_int, c2_dec); INSERT INTO tableB (5, 3); INSERT INTO tableB (3, 5); SELECT c1_int AS c1 ,c2_dec AS c2 ,HASHROW (c1_int) AS "Hash c1" ,HASHROW (c2_dec) AS "Hash c2" ,HASHROW (c1_int, c2_dec) as "Hash c1c2" FROM tableB;
*** Query completed. 2 rows found. 5 columns returned.

c1 These two rows will hash the same and will produce a hash synonym. 5 3 c2 3 5 Hash c1 609D1715 6D27DAA6 Hash c2 6D27DAA6 609D1715 Hash c1c2 6C964A82 6C964A82
Multi-Column Hashing (cont.)
A PI of (3, 5) will hash differently than a PI of (5, 3) if both column1 and column2 are not equivalent data types. Example:
SELECT c1_int AS c1 ,c2_dec AS c2 ,HASHROW (c1_int) AS "Hash c1" ,HASHROW (c2_dec) AS "Hash c2" ,HASHROW (c1_int, c2_dec) as "Hash c1c2" FROM tableB; *** Query completed. 2 rows found. 5 columns returned. c1 These two rows will not hash the same and probably will not produce a hash synonym. 5 3 c2 3.00 5.00 Hash c1 609D1715 6D27DAA6 Hash c2 A4E56902 BD810459 Hash c1c2 0E452DAE 336B8C96
CREATE TABLE tableB (c1_int INTEGER ,c2_dec DECIMAL(8,2)) UNIQUE PRIMARY INDEX (c1_int, c2_dec); INSERT INTO tableB (5, 3); INSERT INTO tableB (3, 5);
Additional Hash Examples

A NULL value for numeric data types is treated as 0. Upper and lower case characters hash the same.
Example:
CREATE TABLE tableD (c1_int INTEGER ,c2_int INTEGER ,c3_char CHAR(4) ,c4_char CHAR(4)) UNIQUE PRIMARY INDEX (c1_int, c2_int); INSERT INTO tableD ( 0, NULL, EDUC, Educ ); SELECT HASHROW ,HASHROW ,HASHROW ,HASHROW tableD; (c1_int) AS "Hash c1" (c2_int) AS "Hash c2" (c3_char) AS "Hash c3" (c4_char) AS "Hash c4"
FROM
Result:
Hash c1 00000000 Hash of 0
Hash c2 00000000 Hash of NULL
Hash c3 34D30C52
Hash c4 34D30C52
Hash of EDUC Hash of Educ
Using Hash Functions to View Distribution

Hash Functions can used to calculate the impact of NUPI duplicates and
synonyms for a PI.
SELECT HASHROW (Last_Name, First_Name) AS "Hash Value" ,COUNT(*) FROM customer GROUP BY 1 ORDER BY 2 DESC; SELECT HASHAMP (HASHBUCKET (HASHROW (Last_Name, First_Name))) AS "AMP #" ,COUNT(*) FROM customer GROUP BY 1 ORDER BY 2 DESC; AMP # 7 6 4 5 2 3 1 0 Count(*) 929 916 899 891 864 864 833 821
Hash Value 2D7975A8 14840BD7
Count(*) 12 7
(Output cut due to length) E7A4D910 AAD4DC80 1 1
The largest number of NUPI duplicates or synonyms is 12.
AMP #7 has the largest number of rows.
Primary Index Hash Mapping

Primary Index Value for a Row Hashing Algorithm
DSW - Destination Selection Word
DSW (first 16 bits) Row Hash (32 bits)
Remaining 16 bits
Hash Map - 65,536 entries (memory resident)
Message Passing Layer (PDE and BYNET)
AMP AMP AMP AMP AMP AMP AMP AMP AMP AMP 0 1 2 3 4 5 6 7 8 9
Hash Maps

Hash Maps are the mechanism for determining which AMP gets a row. There are four (4) Hash Maps on every TPA node. The Hash Maps are loaded into PDE memory space of each TPA node when PDE software boots.
Communications Layer Interface (PE, AMP) Current Configuration Primary Reconfiguration Primary
Current Configuration Fallback
Reconfiguration Fallback
Each Hash Map is an array of 65,536 entries and is approximately 128 KB in

size.
The Communications Layer Interface checks all incoming messages against

the designed Hash Map.
For a PI or USI operation, only the AMP whose number appears in the
referenced Hash Map entry is interrupted.
Primary Hash Map

Row Hash (32 bits) DSW Remaining (first 16 bits) 16 bits
PRIMARY HASH MAP

0 000 001 002 003 004 005 15 13 10 15 15 01 1 14 14 10 15 04 00 2 15 14 13 13 05 05 3 15 10 14 14 07 04 4 13 15 11 06 09 08 5 14 08 11 08 06 10 6 12 11 12 13 09 10 7 14 11 12 14 07 05 8 13 15 11 13 15 08 9 15 09 11 13 15 08 A B C 15 10 14 14 03 06 12 12 12 14 08 09 11 09 13 07 15 07 D E 12 09 14 08 15 06 13 10 12 15 02 05 F 14 13 12 07 06 11
Note: This partial hash map is associated with a 16 AMP System.
The first 16 bits of a Row Hash is the Destination Selection Word (DSW). The DSW points to one map and one entry within that map. The referenced Hash Map entry identifies the AMP for the row hash.
Hash Maps for Different Systems

Row Hash (32 bits)
DSW (first 16 bits) Remaining 16 bits
PRIMARY HASH MAP 8 AMP System

0 000 001 002 003 004 005 07 07 01 07 04 01 1 06 07 00 06 04 00 2 07 02 05 03 05 05 3 06 04 05 03 07 04 4 07 01 03 06 05 03 5 04 00 02 06 06 02 6 05 05 04 02 07 06 7 06 04 03 02 07 05 8 05 03 01 01 03 01 9 05 02 00 00 02 00 A B C 06 03 06 01 03 06 06 05 02 00 04 05 07 01 04 07 01 07 D E 07 00 04 07 00 06 03 02 01 05 02 05 F 04 06 00 07 06 07
Assume row hash of 0023 1AB2 8 AMP system AMP 05 16 AMP system AMP 14
000 001 002 003 004 005

0 15 13 10 15 15 01 1 14 14 10 15 04 00 2 15 14 13 13 05 05 3 15 10 14 14 07 04 4 13 15 11 06 09 08 5 14 08 11 08 06 10 6 12 11 12 13 09 10 7 14 11 12 14 07 05 8 13 15 11 13 15 08 9 15 09 11 13 15 08 A B C 15 10 14 14 03 06 12 12 12 14 08 09 11 09 13 07 15 07 D E 12 09 14 08 15 06 13 10 12 15 02 05 F 14 13 12 07 06 11
Fallback Hash Map

Row Hash (32 bits)
DSW (first 16 bits) Remaining 16 bits

0 000 001 002 003 004 005 15 13 10 15 15 01 1 14 14 10 15 04 00 2 15 14 13 13 05 05 3 15 10 14 14 07 04 4 13 15 11 06 09 08 5 14 08 11 08 06 10 6 12 11 12 13 09 10 7 14 11 12 14 07 05 8 13 15 11 13 15 08 9 15 09 11 13 15 08 A B C 15 10 14 14 03 06 12 12 12 14 08 09 11 09 13 07 15 07 D E 12 09 14 08 15 06 13 10 12 15 02 05 F 14 13 12 07 06 11
Assume row hash of 0023 1AB2 Primary AMP 14 Fallback AMP 06

Note: 16 AMP System with 2 AMP clusters 000 001 002 003 004 005
FALLBACK HASH MAP 16 AMP System

0 07 05 02 07 07 09 1 06 06 02 07 12 08 2 07 06 05 05 13 13 3 07 02 06 06 15 12 4 05 07 03 14 01 00 5 06 00 03 00 14 02 6 04 03 04 05 01 02 7 06 03 04 06 15 13 8 05 07 03 05 07 00 9 07 01 03 05 07 00 A B C 07 02 06 06 11 14 04 04 04 06 00 01 03 01 05 15 07 15 D E 04 01 06 00 07 14 05 02 04 07 10 13 F 06 05 04 15 14 03
Reconfiguration
Existing AMPs New AMPs
16,384
16,384
16,384
16,384
EMPTY
EMPTY
65,536 Hash Map Entries
10,923
10,923
10,923
10,923
10,922
10,922
The system creates new Hash Maps to accommodate the new configuration. Old and new maps are compared.
Each AMP reads its rows, and moves only those that hash to a new AMP.
It is not necessary to offload and reload data due to a reconfiguration.
Percentage of Number of New AMPs Rows Moved = = SUM of Old + New AMPs to new AMPs 2 6 = 1 3 = 33.3%
Row Retrieval via PI Value Overview

SELECT * FROM tablename WHERE primaryindex = value(s);
Parsing Engine SQL Request Parser Hashing Algorithm
32 Bit Row Hash Index Value
48 Bit TABLE ID
Message Passing Layer
DSW
AMP File System
Logical Block Identifier
Only the AMP whose number appears in the referenced Hash Map is interrupted.
Vdisk
Logical Row Identifier
Data Block
Names and Object IDs

DBC.Next (1 row)
NEXT DATABASE ID NEXT TVM ID 4 Other Counters
Each Database/User/Profile/Role is assigned a globally unique numeric ID. Each Table, View, Macro, Trigger, Stored Procedure, Join Index, and Hash
Index is assigned a globally unique numeric ID.
Each Column is assigned a numeric ID unique within its Table ID.
Each Index is assigned a numeric ID unique within its Table ID.

The DD keeps track of all SQL names and their numeric IDs. The PEs RESOLVER uses the DD to verify names and convert them to IDs.
The AMPs use the numeric IDs supplied by the RESOLVER.
Table ID
A Unique Value for Tables, Views, Macros, Triggers, and Stored Procedures comes from DBC.Next dictionary table. Unique Value also defines the type of table:
Normal data table Permanent journal Global Temporary Spool file
UNIQUE VALUE 32 Bits
SUB-TABLE ID 16 Bits
Sub-table ID identifies the part of a table the system is looking at.

Table Header Primary data copy Fallback data copy First secondary index primary copy First secondary index fallback copy Second secondary index primary copy Second secondary index fallback copy Third secondary index primary copy 0 (shown here in decimal value) 1024 2048 1028 2052 1032 2056 1036 and so on
Table ID plus Row ID makes every row in the system unique. Examples shown in this manual use the Unique Value to represent the entire Table ID.
Row ID
On INSERT, the system stores both the data values and the Row ID.
ROW ID = ROW HASH and UNIQUENESS VALUE
Row Hash Row Hash is based on Primary Index value. Multiple rows in a table could have the same Row Hash. NUPI duplicates and hash synonyms have the same Row Hash. Uniqueness Value Type system creates a numeric 32-bit Uniqueness Value. The first row for a Row Hash has a Uniqueness Value of 1. Additional rows have ascending Uniqueness Values. Row IDs determine sort sequence within a Data Block. Row IDs support Secondary Index performance. The Row ID makes every row within a table uniquely identifiable. Duplicate Rows Row ID uniqueness does not imply data uniqueness.
AMP File System Locating a Row via PI

The AMP accesses its Master Index (always memory-resident) An entry in the Master Index identifies a Cylinder # and the AMP accesses
the Cylinder Index (frequently memory-resident).
An entry in the Cylinder Index identifies the Data Block. The Data Block is the physical I/O unit and may or may not be memory
resident.
A search of the Data Block locates the row(s).
The PE sends request to an AMP via the Message Passing Layer (PDE & BYNET).
Table ID Row Hash PI Value
AMP Memory
Master Index Cylinder Index (accessed in FSG Cache)
Vdisk CI
Data Block (accessed in FSG Cache)
Row
Teradata File System Overview

Master Index
CID CID CID CID ...
AMP Memory Resident
VDisk
Cylinder Index
SRD - A DBD - A1 DBD - A2 SRD - B DBD - B1 DBD - B2
Cylinder 3872 sectors
Data Block A1
Data Block A2
Data Block B1 Data Block B2
Cylinder Index
SRD - B DBD - B3 DBD - B4 DBD - B5
Data Block B3 Data Block B4 Data Block B5
Master Index Format

Memory resident structure
specific to each AMP.
Vdisk
Cylinder 0
Seg. 0
Contains Cylinder Index

Descriptors (CID) - one for each allocated Cylinder. Master Index
FIB (contains Free Cylinder List)
CI
Each CID identifies the lowest

Table ID / Part# / Row ID and the last Table ID / Part# / Row Hash for a cylinder.
CID 1
Cylinder CI Cylinder CI Cylinder
CID 2
CID 3 . . CID n
Range of Table ID / Part# / Row

IDs does not overlap with any other cylinder.
Sorted list of CIDs.

V2R5 Notes:
CI Cylinder
The Master index and Cylinder Index entries are 4 bytes larger to include the
Partition #s to support Partitioned Primary Index (PPI) tables.
For non-partitioned tables, the partition number is 0 and the Master and Cylinder
Index entries (for NPPI tables) will use 0 as the partition number in the entry.
Cylinder Index Format

Located at beginning of
each Cylinder..
Cylinder Index
VDisk
Cylinder
Data Block A1 Data Block A2
There is one SRD (Subtable

Reference Descriptor) for each subtable that has data blocks on the cylinder.
SRD A DBD A1 DBD A2 . SRD B DBD B1 DBD B2 . FSE
Each SRD references a set

of DBD(s). A DBD is a Data Block Descriptor..
Data Block B1
Data Block B2
One DBD per data block identifies location and first Part# / Row ID and the last Part # / Row Hash within a block.
FSE
Range of Free Sectors Range of Free Sectors
FSE - Free Segment Entry identifies free sectors.

V2R5 Note: Cylinder Index entries are 4 bytes larger to include the Partition #s.
Data Block Layout

Contains rows with same table ID. Contains rows within range of Row IDs of associated DBD entry and the range
of Row IDs does not overlap with any other data block.
Logically sorted list of rows. Variable Block Sizes:

V2R2 (and before) V2R3 and V2R4 V2R4.1, V2R5.0, and V2R5.1 1 to 63 sectors (app. 32 KB) 1 to 127 sectors (app. 64 KB) 1 to 255 sectors (app. 128 KB)
A Row has maximum size of 64,256 bytes with releases V2R3 through V2R5.1.
Header (36 bytes)
Row Reference Array -3 -2 -1 0
Row 2 Row 4
Trailer (2 bytes)
Row 1
Row 3
General Row Layout

V2R4.1 and V2R5 tables with NPPI
ROW ID Row Length 2 Bytes Row Hash 4 Bytes Uniqueness Value 4 Bytes (Variable) Additional Overhead (2 or more bytes) Column Data Values Row Ref. Array 2 Bytes
The Primary Index value determines the Row Hash. The system generates the Uniqueness Value. NPPI Non-Partitioned Primary Index (typical Teradata primary index) For an NPPI table, the Row ID will be unique for every row in a table (for both SET and MULTISET). (or 62.75 KB).
Rows in a table may vary in length. The maximum row length is 64,256 bytes
In V2R5, if the Primary Index is not partitioned, then the row is implicitly
assumed to be in partition #0.
Partitioned Primary Indexes will be covered in another module.
Example of Locating a Row Master Index

Table ID Part # 100 0 Row Hash 1000 empno 3755
SELECT * FROM employee WHERE empno = 3755;

Free Cylinder List Pdisk 0
: 124 125 168 170 183 189 201 217 220 347 702 :
Master Index Lowest Highest Pdisk and Table ID Part # Row ID Table ID Part # Row Hash Cylinder Number
: 078 098 100 100 100 100 100 100 100 123 123 : : 0 0 0 0 0 0 0 0 0 1 2 : : 58234, 2 00107, 1 00773, 3 01361, 2 02937, 1 03662, 1 04123, 2 05974, 1 07353, 1 00343, 2 06923, 1 : : 095 100 100 100 100 100 100 100 120 123 123 : : 0 0 0 0 0 0 0 0 0 2 3 : : 72194 00676 01361 02884 03602 03999 05888 07328 00469 01864 00231 : : 204 037 169 777 802 117 888 753 477 529 943 :
Free Cylinder List Pdisk 1

: 761 780 895 896 914 935 941 1012 1234 1375 1520 :
Part # - Partition Number - V2R5
To CYLINDER INDEX
What cylinder would have Table ID = 100, Row Hash = 00598?
Example of Locating a Row Cylinder Index

Table ID Part # 100 0 Row Hash 1000 empno 3755
SELECT * FROM employee WHERE empno = 3755;
Cylinder Index - Cylinder #169 SRDs Table ID First DBD DBD Offset Count
SRD #1 100 FFFF 12
Free Block List

Free Sector Entries
DBDs
: DBD #4 DBD #5 DBD #6 DBD #7 DBD #8 DBD #9 :
Part #
: 0 0 0 0 0 0 :
Lowest Row ID
: 00867, 2 00938, 1 00998, 1 01010, 3 01185, 2 01290, 1 :
Part #
: 0 0 0 0 0 0 :
Highest RowHash
: 00902 00996 01010 01177 01258 01333 :
Start Sector
: 1010 0093 0789 0525 0056 1138 :
Sector Row Count Count

: 4 7 6 3 5 5 : : 5 10 8 4 6 6 :
Start Sector
: 0270 0301 0349 0470 0481 0550 :
Sector Count
: 3 5 5 4 6 5 :
This example assumes that only 1 table ID has rows on this cylinder and the table is not partitioned.
Part # - Partition Number - V2R5
Example of Locating a Row Data Block

Sector 789 Header (36) Row 3 Row 4 Row 5 Row 8 Row Reference Array Trailer (2) Row 1 Row 2 Row 6 Row 7
790
791 792 793 794
Row Heap
A block is the physical I/O unit. The block header contains the Table ID (6 bytes). Only rows for the same table reside in the same data block. Rows are not split across block boundaries. Blocks within a table vary in size. The system adjusts block sizes dynamically. Blocks may be from 512 bytes to 127.5 KB (1 to 255 disk sectors). With V2R3 and V2R4.0, the maximum block size is 127 sectors (63.5 KB). Data blocks are not chained together. Row Reference Array pointers are stored (sorted) in reverse sequence based on Row ID within the block.
Accessing the Row within the Data Block

Within the data block, the Row Reference Array is used to locate the first row
with a matching Row Hash value within the block.
The Primary Index data value is used as a row qualifier to eliminate synonyms.
Value 3755 Hash 1000 Index Hash Uniq Value 998 1 4219 Data Columns
Row data
Data Block Sectors 789
SELECT FROM WHERE
* employee employee_number = 3755;
999
999 1000 1000 1002 1008 1010
1
2 1 2 1 1 1
2968
6324 1006 3755 6838 8825 0250
Row data
Row data Row data Row data Row data Row data Row data
794
AMP Read I/O Summary

The Master Index is always memory resident. The AMP reads the Cylinder Index if not memory resident. The AMP reads the Data Block if not memory resident.
AMP memory, cache size, and locality of reference determine if either of these steps
require physical I/O. Often, the Cylinder Index is memory resident and a Unique Primary Index retrieval requires only one (1) I/O. Message Passing Layer Table ID AMP Memory
Master Index Cylinder Index (accessed in FSG Cache) Data Block (accessed in FSG Cache)
Row Hash
PI Value
Vdisk CI
Row
Review Questions
1. The Row Hash for a PI value of 824 is the same for the data types of INTEGER and DECIMAL(18,0). True or False. _______ 2. The first 16 bits of the Row Hash is referred to as the _________ or the _______ _________ . 3. The Hash Map consists of 65,536 entries which identify an _____ number for the Row Hash. 4. The Current Configuration ___________ Hash Map is used to locate the AMP to locate/store a row based on PI value. 5. The ____________ utility is used to redistribute rows to a new system configuration with more AMPs. 6. The Unique Value of a Table ID comes from the dictionary table named DBC.________ . 7. The Row ID consists of the _______ ________ and the __________ _____ . 8. The _______ _______ contains a Cylinder Index Descriptor (CID) for each allocated Cylinder. 9. The _______ _______ contains an entry for each data block in the cylinder. 10. The ____ __________ ________ consists of a set of 2 byte pointers to the data rows in data block. 11. For Teradata V2R5.0, the maximum block size is approximately _______ and the maximum row size is approximately _______ . 12. The Primary Index data value is used as a row qualifier to eliminate hash _____________ .
Module 2: Review Question Answers

1. The Row Hash for a PI value of 824 is the same for the data types of INTEGER and DECIMAL(18,0). True or False. True 2. The first 16 bits of the Row Hash is referred to as the DSW or the bucket number . 3. The Hash Map consists of 65,536 entries which identify an AMP number for the Row Hash. 4. The Current Configuration Primary Hash Map is used to locate the AMP to locate/store a row based on PI value. 5. The Reconfig utility is used to redistribute rows to a new system configuration with more AMPs. 6. The Unique Value of a Table ID comes from the dictionary table named DBC.Next . 7. The Row ID consists of the Row Hash and the Uniqueness Value . 8. The Master Index contains a Cylinder Index Descriptor (CID) for each allocated Cylinder. 9. The Cylinder Index contains an entry for each data block in the cylinder. 10. The Row Reference Array consists of a set of 2 byte pointers to the data rows in data block. 11. For Teradata V2R5.0, the maximum block size is approximately 128 KB and the maximum row size is approximately 64KB. 12. The Primary Index data value is used as a row qualifier to eliminate hash synonyms or collisions .

Data Distribution, Hashing, and Index Access

Transféré par

Informations du document

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Data Distribution, Hashing, and Index Access

Transféré par

Droits d'auteur :

Formats disponibles

Module 2: Data Distribution, Hashing, and Index Access

Data distribution is dependent on the hash value of the primary index.

EBCDIC Parsing Engine(s) ASCII Message Passing Layer Parsing Engine(s)

Converted and Hashed Distributed

Hash Related Expressions

AMP Fallback Num 3

Hashing Numeric Data Types

*** Query completed. 2 rows found. 5 columns returned.

Multi-Column Hashing (cont.)

Additional Hash Examples

Hash c1 00000000 Hash of 0

Hash c2 00000000 Hash of NULL

Hash of EDUC Hash of Educ

Using Hash Functions to View Distribution

Hash Value 2D7975A8 14840BD7

(Output cut due to length) E7A4D910 AAD4DC80 1 1

The largest number of NUPI duplicates or synonyms is 12.

AMP #7 has the largest number of rows.

Primary Index Hash Mapping

DSW - Destination Selection Word

DSW (first 16 bits) Row Hash (32 bits)

Hash Map - 65,536 entries (memory resident)

Message Passing Layer (PDE and BYNET)

Current Configuration Fallback

Each Hash Map is an array of 65,536 entries and is approximately 128 KB in

The Communications Layer Interface checks all incoming messages against

Primary Hash Map

PRIMARY HASH MAP

Note: This partial hash map is associated with a 16 AMP System.

Hash Maps for Different Systems

PRIMARY HASH MAP 8 AMP System

PRIMARY HASH MAP 16 AMP System

Fallback Hash Map

PRIMARY HASH MAP 16 AMP System

Assume row hash of 0023 1AB2 Primary AMP 14 Fallback AMP 06

FALLBACK HASH MAP 16 AMP System

65,536 Hash Map Entries

Row Retrieval via PI Value Overview

Message Passing Layer

AMP File System

Logical Block Identifier

Logical Row Identifier

Names and Object IDs

Each Column is assigned a numeric ID unique within its Table ID.

Each Index is assigned a numeric ID unique within its Table ID.

The AMPs use the numeric IDs supplied by the RESOLVER.

Normal data table Permanent journal Global Temporary Spool file

UNIQUE VALUE 32 Bits

Sub-table ID identifies the part of a table the system is looking at.

AMP File System Locating a Row via PI

A search of the Data Block locates the row(s).

Data Block (accessed in FSG Cache)

Teradata File System Overview

AMP Memory Resident

Cylinder 3872 sectors

Data Block B1 Data Block B2

Data Block B3 Data Block B4 Data Block B5

Master Index Format

Contains Cylinder Index

FIB (contains Free Cylinder List)

Each CID identifies the lowest

Cylinder CI Cylinder CI Cylinder

Range of Table ID / Part# / Row