Académique Documents
Professionnel Documents
Culture Documents
www.compactsolutionsllc.com
2013 Compact Solutions LLC. All rights reserved.
Compact Solutions Confidential and Proprietary DO NOT DISCLOSE OUTSIDE OF THE COMPANY
Indexes
Introduction
Indexing is one of the most important features of the Teradata RDBMS.
In the Teradata RDBMS, an index is used to define row uniqueness and
retrieve data rows, it can be used to enforce the primary key and unique constraints for a
table.
The Teradata RDBMS support five types of indexes:
Unique Primary Index (UPI)
Unique Secondary Index (USI)
Indexes (Contd.)
The following rules apply to the indexes used in the Teradata
Relation database:
An index is a scheme used to distribute and retrieve rows of a data table. It can be
based on the values in one or more columns of the table.
A table can have a number of indexes, including one primary index, and up to 32
secondary indexes.
An index for a relational table may be primary or secondary, and may be unique or
non-unique. Each kind of index affects system performance, and can be important
to data integrity.
An index is usually defined on a table column whose values are frequently used in
specifying WHERE constraints or join conditions.
An index is used to enforce PRIMARY KEY and UNIQUE constraints.
Primary Index
The Primary Index will determine on which AMP a row will reside.
Secondary Index
Secondary Indexes provide an alternate path to the data.
So far we have learned that every table has one and only one Primary Index
and we have learned that the Primary Index is much faster than the Full Table
Scan.
Secondary Indexes are not as fast as the Primary Index, but they can be pretty
fast, and they can be much faster than a Full Table Scan.
There can be up to 32 Secondary Indexes on a table.
Every Secondary Index creates a Subtable on every AMP designed to point to
the real Primary Index Row-ID.
There are two types of Secondary Index and they are Unique Secondary
Indexes, which are called USIs and Non-Unique Secondary Indexes called
NUSIs.
An USI is always a Two-AMP operation so it is almost as fast as a Primary
Index, but a NUSI is an All-AMP operation, but not a Full Table Scan.
Teradata runs extremely well without Secondary Indexes, but since secondary indexes
use up space and overhead, they should only be used on KNOWN QUERIES or
queries that are run over and over again.
Every time the Parsing Engine sees the USI column in the WHERE clause it comes up
with a plan that involves only two AMPs. USI query is a two-AMP operation.
Foreign Key
Primary Index
Secondary Index
One PK
Multiple FKs
One PI
0 to 32 SIs
Unique values
Unique or non-unique
Unique or non-unique
Unique or non-unique
No NULLs
NULLs allowed
NULLs allowed
NULLs allowed
No column limit
No column limit
64-column limit
64-column limit
n/a
n/a
n/a
Join Index
The Join Index is an index structure that contains columns from one or more
tables. Once created, it becomes an option available to the optimizer but is never
directly accessed by the user.
It actually creates new Physical Table. Hence required Permanent Space.
Join Indexes are automatically updated when the base table changes.
Join Index can have different Primary Index than the Base Table.
Order Table
UPI
Order_ID Cust_ID
Order_Date
Total
Cust_Name
Cust_Name
Order_ID
Order_Date
Total
Aggregate Join Index: Retrieve Total Order Amount by Customer ID and Year
Aggregate Join Index
UPI
Cust_ID
Order_Date_Year
TOTL_YER_AMT
Cliques
Journals
RAID
Locks
Locking prevents multiple users who are trying to access or change the same data
simultaneously from violating data integrity
This concurrency control is implemented by locking the target data
Locks are automatically acquired during the processing of a request and released
when the request is terminated
We have four types of locks in Teradata while they can be acquired in three
different levels
Lock types are automatically applied based on the SQL command:
SELECT
UPDATE
CREATE TABLE
Locks (Contd.)
Levels of Locking
Locks may be applied at three levels:
Database Locks: Apply to all tables and views in the database.
Table Locks: Apply to all rows in the table or view.
Row Hash Locks: Apply to a group of one or more rows in a table.
Locks (Contd.)
There are 4 types of Locks
Access Lock:The use of an access lock allows for reading data while modifications are
in process. Access locks are designed for decision support on tables that
are updated only by small, single-row changes. Access locks is not
concerned about data consistency. Access locks prevent other users from
obtaining the Exclusive locks on the locked data.
Read Lock:Read locks are used to ensure consistency during read operations.
Several users may hold concurrent read locks on the same data, during
this time no data modification is permitted. Read locks prevent other users
from obtaining the Exclusive locks and Write locks on the locked data.
Locks (Contd.)
Write Lock:Write locks enable users to modify data while maintaining data
consistency. While the data has a write lock on it, other users can
only obtain an access lock. During this time, all other locks are
held in a queue until the write lock is released.
Fallback
Fallback is a Teradata database feature that protects data in the case of an AMP
vproc failure.
Fallback guarantees the maximum availability of data.
We can specify Fallback protection at the table or database level. It ensures high
availability of the applications.
Fallback protects our data by storing a second copy of each row of a table on a
different AMP in the same cluster(cluster has more than one AMPs).
Fallback provides AMP fault tolerance at the table level. With Fallback tables, if one
AMP fails, all data is still available and we can continue to use Fallback tables without
any loss of access to data.
During table creation or after a table is created, we may specify whether or not the
system should keep a Fallback copy of the table.
Fallback (Contd.)
Benefits of fallback
Permits access to table data during AMP off-line period
Adds a level of data protection beyond disk array RAID
Automatically restores data changed during AMP off-line
Critical for high availability applications
Cost of fallback
Twice the disk space for table storage is needed
Twice the I/O for INSERTs, UPDATEs and DELETEs is needed
Fallback (Contd.)
Fallback Clusters
A defined number of AMPs treated as a fault-tolerant unit.
Fallback rows for AMPs in a cluster reside in the cluster.
Loss of an AMP in the Cluster permits continued table access.
Loss of two AMPs in the cluster causes the RDBMS to halt.
Cliques
A clique (pronounced, "kleek") is a group of nodes that share access to the same disk arrays.
Each multi-node system has at least one clique. The cabling determines which nodes are in
which cliques -- the nodes of a clique are connected to the disk array controllers of the same
disk arrays.
Teradata CLIQUES are a method of system protection against the failure of an entire node.
Each node contains in memory AMP VPROCs.
Each AMP is attached to one virtual disk (Vdisk) and that AMP is the only Vproc allowed access
to its Vdisk.
A Clique utilizes access to a set of disks from another node. If a node fails the AMP VPROCs can
migrate to the node that has the backup access to its virtual disk.
The migrating AMP can continue to read and write to its Vdisk while its home node is down.
When the home node is fixed and available again the VPROCs return home.
Cliques (Contd.)
Vprocs are distributed across all nodes in the system. Multiple cliques in the system should have
the same number of nodes.
The diagram below shows three cliques. The nodes in each clique are cabled to the same disk
arrays. The overall system is connected by the BYNET. If one node goes down in a clique the
vprocs will migrate to the other nodes in the clique, so data remains available. However, system
performance decreases due to the loss of a node. System performance degradation is
proportional to clique size.
Cliques (Contd.)
Clique provides protection against the failure of an entire node
BYNET
Amp Amp
Amp Amp
Amp Amp
CLIQUE-2
CLIQUE-1
Disk Array
Amp Amp
Disk Array
Disk Array
Disk Array
Journals In Teradata
Journaling is a data protection mechanism in Teradata. Journals are generated to
maintain pre-images and post images of a DML transaction starting/ending at/from a
checkpoint. When a DML transaction fails, the table is restored back to the last available
checkpoint using the journal Images.
Journal is nothing but a record which does some kind of processing or activity.
1)Single image: One copy of data taken here.
2)Dual image: Two copies of data taken here.
3)Before image: Before changes happened to the rows backup is taken
4)After image: After changed happened to the rows image taken.
Transient Journal:
Is an automatic feature that provides Data Integrity
Automatic rollback of changed rows in the event of transaction failure
Data is always returned to its original state after a transaction failure
Takes Before Image (BI) of changes for rollback purpose
BI is stored in AMPs transient journal
AMPs transient journals are maintained in DBC users Perm Space
When the transaction is committed, the BI in transient journal is purged automatically
Permanent Journal
The Permanent Journal is an optional, user specified, system-maintained journal
which is used for recovery of a database to a specified point in time.
Is used for recovery from unexpected hardware or software disasters.
May be specified for one or more tables.
Permits capture of Before Images for database rollback.
When the AMP comes back online, the DARJ will catch-up the AMP by
applying the missed transactions.
Once everything is caught up, the DARJ is dropped.
RAID
RAID Redundant Array of Independent Disks provides protection against a
disk failure.
Teradata uses RAID-1
RAID 1
Transparent Mirroring.
Provides high data availability and performance, but storage costs are high.
Characteristics:
Data is fully replicated.
Mirrored striping is possible with multiple pairs of disks in a drive group.
Transparent to operating system.
Advantages:
Maximum data availability, read performance gains.
No performance penalty with write operations.
Fast recovery and restoration.
Disadvantages:
50% of disk space for mirrored data.
RAID (Contd.)
RAID 5
Data Parity Protection, Interleaved Parity.
Characteristics
Data and parity is striped and interleaved across multiple disks.
XOR logic is used to calculate parity.
Data is reconstructed on a disk failure .
Transparent to operating system.
Advantages
Provides high availability with minimum disk space (e.g., 25%) used for
parity overhead.
Disadvantages
Write performance penalty.
Performance degradation during data recovery and reconstruction.
RAID (Contd.)
Spaces
Perm Space
Perm Space is Max amount of space available for storing:
Tables, Secondary Index (SI), Permanent Journal
Perm Space defines the upper limit, not allocated at table creation time
Perm Space is released when data is deleted or when objects are dropped
Following require no Perm Space:
Views, Triggers, Macros.
All Perm Space specifications are subtracted from the creator
Perm Space is a zero sum game - the total of all Perm Space allocations must equal
the total amount of disk space available
Spaces (Contd.)
Spool Space and Temp Space
Spool Space is Max amount of work space available for requests
Spool Space is used to hold intermediate and final query result set
Spool Space is literally unused Perm Space
Spool Space specified is the upper limit for query answer set
If the query exceeds the limit, query gets aborted immediately
You do not add or subtract when spool space is given to someone else
Temp Space is also unused Perm Space used for Global Temporary Tables
Types of Tables
SET Table
MULTISET Table
VOLATILE Table
DERIVED Table