Académique Documents
Professionnel Documents
Culture Documents
M.S.Prasad
165916
Contents
Introduction to Teradata
Teradata Architecture
Data Distribution
PI characteristics
Data Access
Teradatas scalability
Data Protection features
Introduction to Teradata
Teradata is a Relational Database Management
System (RDBMS):
1. Designed to run the worlds largest commercial databases.
2. Preferred solution for enterprise data warehousing (OLAP).
3. Executes on UNIX-MP-RAS or NT-based system platforms
4. Compliant with ANSI industry standards
5. Runs on single (SMP) or multiple (MPP) nodes
6. Acts as Database server to client applications throughout the enterprise
7. Uses Parallelism to manage Terabytes of data
8. Shared-Nothing Architecture
Advantage Teradata
1. Unlimited, Proven Scalability
2. Unlimited Parallelism - Parallel sorts/aggregations, temporary tables
Shared-Nothing architecture
3. Mature Optimizer - Complex queries, joins per query, ad-hoc processing
Its a Cost Based Optimizer.
3. Model the Business - 3NF, robust view processing, star schema
4. Lowest TCO - ease of setup & maintenance, robust parallel utilities, no re-orgs,
lowest disk to data ratio, robust expansion utility
5. High Availability - No single point of failure,
scalable data loading, parallel load utilities
Note: If the table demographics are well defined, the optimizer will choose the
best plan for the query execution.
Advantage Teradata
7. Enormous capacity
Billions of rows
Terabytes of data
8. High-performance parallel processing
9. Single database server for multiple clients Single Version of the Truth
10. Network and Mainframe connectivity
11. Industry standard access language (SQL)
12. Manageable growth via modularity
13. Fault tolerance at all levels of hardware and software
14. Data integrity and reliability
Advantage Teradata DBA
Things Teradata DBAs NEVER Have to Do!
They know that if data doubles, the system can expand easily to
accommodate it.
Data Warehouse
Teradata
Access Tools
Cognos Access BO
End Users
Architecture
Channel-Attached System Network-Attached System
CLI
CLI
Channel MTDP
TDP
MOSI
Parsing Engine
BYNET
PDE The PDE (Parallel Database Extensions) software layer runs the
operating system on each node. It was created by NCR to
support the parallel environment.
MTDP (Micro Teradata Performs many of the TDP functions including session
Director Program) management but not session balancing
Session Control
Parser
Optimizer
Dispatcher
BYNET
DSW - Destination
DSW Remaining
Selection Word (first 16 bits) Row Hash (32 bits) 16 bits
AMP AMP AMP AMP AMP AMP AMP AMP AMP AMP
0 1 2 3 4 5 6 7 8 9
Data Distribution
Records From Client (in random sequence)
2 32 67 12 90 6 54 75 18 25 80 41
From
Host
Teradata
EBCDIC ASCII
Converted
Parsing Parsing
Engine(s) and
Engine(s)
Hashed
ASCII
12 80 2
2
5 9 67
5
Stored
18 41 75
4 0
3 6
2
PI Characteristics
Primary Indexes (UPI and NUPI)
1. A Primary Index may be different than a Primary Key.
2. Every table has only one, Primary Index.
3. A Primary Index may contain null(s).
4. Single-value access uses ONE AMP and, typically, one I/O.
The Column chosen for PI must be at least nearly UNIQUE to achieve good
distribution of data. Higher the distribution, higher the parallelism
AMP Operations
Single AMP operation
(Typical UPI access)
Multi-AMP
operation
All AMP
operation
Single AMP operation - Illustration
SAMPLE
NUMBER LETTER
UPI
1 P
SELECT LETTER 2 U
FROM SAMPLE 3 Y
WHERE NUMBER = 19 4 T
; 5 R
6 E
ANSWER : N
7 W
8 Q
9 A
10 S
11 D
12 F
13 G
14 H
15 J
16 K
17 L
18 M
19 N
20 B
21 V
22 C
23 X
24 Z
Application to PE
APPL APPL PE PE AMP AMP AMP AMP AMP AMP AMP AMP
1 2 1 2 1 2 3 4 5 6 7 8
SQL Request
13 G 15 J 20 B 7 W 14 H 9 A 22 C 1 P
SELECT LETTER 6 E 4 T 19 N 16 K 23 X 21 V 5 R 10 S
12 F 17 L 2 U 11 D 3 Y 24 Z 18 M 8 Q
FROM SAMPLE
WHERE NUMBER = 19;
SQL Request
SELECT NUMBER, LETTER
FROM SAMPLE
WHERE NUMBER > 9
ORDER BY LETTER ;
Each AMP sends its first block of sorted data to BYNET merge process.
AMP to Merge
All-AMP
Query with Sort
Plan
1. GET NUMBER, LETTER
WHERE NUMBER > 9
2. SORT ON LETTER
3. MERGE ON LETTER
1. The merge process continues to request sorted blocks from the AMPs until all AMPs
have exhausted their spool supply.
2. When the merge process has an EOF from each AMP, the answer set is complete.
Note: Spool is a temporary space used by the AMPs to store
the intermediate results.
PE to Application
All-AMP
Query with Sort
AMP
AMP
CESSING
AMP
PARALLEL PRO
Disk
Disk Space
Disk Space
Space DATA
Node
Node
Node N O DE
PE PE
BYNET
2 11
6 3 5 12 8 1 Primary rows
3 5
8 2 1 11 6 12 Fallback rows
Benefits of Fallback
1. Permits access to table data during AMP off-line period
2. Adds a level of data protection beyond disk array RAID
3. Automatically restores data changed during AMP off-line
4. Critical for high availability applications
Cost of Fallback
1. Twice the disk space for table storage is needed
2. Twice the I/O for INSERTs, UPDATEs and DELETEs is needed
Fallback Cluster
A defined number of AMPs treated as a fault-tolerant unit.
Fallback rows for AMPs in a cluster reside in the cluster.
Loss of an AMP in the cluster permits continued table access.
Loss of two AMPs in the cluster causes the RDBMS to halt.
Lose AMP 3 from cluster -> AMPs 1, 2 and 4 experience 33% increase in workload.
Lose AMP 6 from cluster -> AMPs 5, 7 and 8 experience 33% increase in workload.
Lose AMP 7 from cluster ->System halts.
System performance can be adversely affected where any AMP has a
disproportionate burden.
Fallback vs. Non-Fallback Tables
FALLBACK TABLES
ONE AMP DOWN AMP AMP AMP AMP
Data fully available
NON-FALLBACK TABLES
ONE AMP DOWN
Data partially available
AMP AMP AMP AMP
Queries avoiding down AMP succeed
RAID-5 (Parity)
1. For every 3 blocks of data, there is a parity block on a 4th disk.
2. Parity Algorithm is applied to determine the parity block.
3. If a disk fails, any missing block may be reconstructed using the other three disks.
4. Parity reduces available disk space by 25% in a 4-disk rank.
5. Array controller reconstruction of failed disks is longer than
RAID 1.
Block 0 Block 1 Block 2 Parity
Parity Block 3 Block 8 Block 4
Block 5 Parity Block 6 Block 7
Summary
RAID-1 - Good performance with disk failures
Higher cost in terms of disk space
RAID-5 - Reduced performance with disk failures
Lower cost in terms of disk space
Recovery Journal For Down AMPs
Recovery Journal is:
Automatically activated when an AMP is taken off-line
Maintained by other AMPs in the cluster
Totally transparent to users of the system
While AMP is off-line:
Journal is active Table updates continue as normal
Journal logs Row-IDs of changed rows for down-AMP
41 66 7
58 93 20 88 2 45 17 37 72
93 72 88 58 41 20 2 66
45 7 17 37
D D
A A
C C
AMP vprocs can run on any node within the clique and still have full access to
their disk array space. If a node fails, AMPs migrate to another node in the
clique.
SMP 1 SMP 2 SMP 3 SMP 4
AMP 3 AMP 4
D D
Note:
A A Failure of a Node within a Clique
C C increases the workload for the
other Nodes within the clique
Transient Journal
Transient Journal
1. Consists of a journal of transaction before images.
2. Provides rollback in the event of transaction failure.
3. Is automatic and transparent.
4. Before images are reapplied to table if transaction fails.
5. Before images are discarded upon transaction completion.
BEGIN TRANSACTION
UPDATE Row A Before image Row A recorded
(Add $100 to checking)
UPDATE Row B Before image Row B recorded Successful
(Subtract $100 from savings) Transaction
END TRANSACTION Discard before images
BEGIN TRANSACTION
UPDATE Row A Before image Row A recorded
Failed
UPDATE Row B Before image Row B recorded
Transaction
(Failure occurs)
(Rollback occurs) Reapply before images
(Terminate TXN) Discard before images
The Permanent Journal
An optional, user-specified, system-maintained journal used for database recovery to
a specified point in time.
1.Used for recovery from unexpected hardware or software disasters. May be
specified for:
One or more tables
One or more databases
2. Permits capture of BEFORE images for database rollback .
3. Permits capture of AFTER images for database rollforward.
4. Permits archiving change images during table maintenance.
5. Reduces need for full-table backups.
6. Provides a means of recovering NO FALLBACK tables.
7. Requires additional disk space for change images.
8. Requires user intervention for archive and recovery activity.
Note:
The user cannot directly query the permanent journal table.
Permanent Journal occupies Permanent space and hence needs to be
cleaned up periodically.
Archiving and Recovering Data
ARC Utility
1. The Archive/Restore utility
2. Runs on IBM, UNIX and NT
3. Archives data from RDBMS
4. Restores data from archive media
5. Permits data recovery to a specified checkpoint
1. The Teradata architecture and how it achieves the best parallelism and
scalability.
2. The concept of Shared-Nothing Architecture.
3. The way data is distributed using Hashing algorithm.
4. The significance of PI in row distribution.
5. How the data rows are fetched?
6. The various protection features in Teradata
References
Teradata Basics Official curriculum Published by NCR Teradata
Solutions Group