Vous êtes sur la page 1sur 25

Teradata Basics Certification Study guide

07/09/2013

Anju M (Emp id: 477398)


Hitech IOU

anj.m@tcs.com

Confidentiality Statement
Confidentiality and Non-Disclosure Notice
The information contained in this document is confidential and proprietary to TATA
Consultancy Services. This information may not be disclosed, duplicated or used for any
other purposes. The information contained in this document may not be released in
whole or in part outside TCS for any purpose without the express written permission of
TATA Consultancy Services.

Tata Code of Conduct


We, in our dealings, are self-regulated by a Code of Conduct as enshrined in the Tata
Code of Conduct. We request your support in helping us adhere to the Code in letter and
spirit. We request that any violation or potential violation of the Code by any person be
promptly brought to the notice of the Local Ethics Counselor or the Principal Ethics
Counselor or the CEO of TCS. All communication received in this regard will be treated
and kept as confidential.

Table of Content
1. Teradata Architecture ............................................................................................................. 4
2. Space Management ................................................................................................................. 6
3.

Application Development..7

4.

Data Distribution.8

5.

Partitioning....10

6.

Access Methods10

7.

Join Index..11

8.

Table level lock compatibility..11

9. Active Data Warehouse...12


10. Data Protection15
11. Teradata Fault Tolerance..16
12. Teradata Concurrency Control..19
13. System monitoring.20
14. Teradata Utilities.21
15. References24

1.

Teradata Architecture

There are 3 major components in Teradata Architecture:


1.Parsing Engine Processors (PE)
2.Access Module Processors (AMP)
3.BYNET

1.1 Parsing Engine Processors (PE)


It is a virtual processor (vproc) that communicates with the client system on one side and with the AMPs (via the
BYNET or Boardless BYNET) on the other side.
There are 3 elements in a parsing engine software:
Parser which decomposes SQL into relational data management processing steps
Dispatcher which receives processing steps from Parser and sends them to appropriate AMPs
Session Control Provides user session management such as establishing and terminating sessions

1.2 Access Module Processors (AMP)


Its a vproc that functions as the heart of Teradata RDBMS. It provides a BYNET interface and performs database
and file management tasks. Each AMP is assigned to a virtual disk. Since each AMP is allowed to perform read and
write in its own disk only, it is said to be an example of a Shared Nothing Architecture. AMP also performs join
processing between multiple tables and converts ASCII to EBCDIC.

1.3 BYNET
Its the channel of communication between Parsing Engine and Access Module Processor. It is also called the
Message passing layer in Teradata. There are 2 BYNET systems: BYNET 0 and BYNET 1.If one of the connections
fails, the second is completely independent and can continue to function. Therefore, communications continue
between all nodes.

1.4 Node
Its the basic building block in Teradata. It contains a huge number of hardware and software components. The
processing for database occurs in the node.

1.5 Parallel Database Extensions(PDE)


The PDE software is a software interface layer between the operating system and Teradata Database. The PDE
provide the ability to apply priority scheduler, scheduling sessions, providing a parallel environment etc. But the
key functionality of Teradata PDE is to execute vprocs. AMPs and PEs are vprocs running under the control of PDE.

1.6 Teradata File System


It is a software layer between Teradata RDBMS layer and PDE layer. It allows Teradata RDBMS to store and retrieve
data regardless of low-level operating system interface.

1.7 Trusted Parallel Applications


The PDE provide a series of parallel operating system services to a special class of tasks called a trusted parallel
application (TPA). TPA uses PDE to implement vprocs. The Teradata RDBMS is a TPA on an SMP or MPP system.

1.8 Symmetric Multi Processing (SMP)


In a symmetrical multiprocessing environment, since the CPUs share the same memory, the code, which is run in
one of the CPU, can affect the memory used by another.

1.9 Massively Parallel Processing (MPP)


When multiple SMP nodes are connected to form a larger configuration, it is called a MPP system. The nodes are
connected using BYNET, which allows communication between multiple vprocs on multiple system nodes.
Exam Tips:
The component in Teradata Architecture that has greatest influence over scalable parallelism is: BYNET
Node is a part of Teradatas open architecture
The BYNET is used by PEs and AMPs for internal communication.
All LAN PEs on a node which fails, will migrate to another node in the same clique
To connect to Teradata from a LAN the physical connections are PC to network to ETHERNET card to
Gateway Software to PE
Typical purpose of Semantic layer in an Enterprise data warehouse architecture is data views.
5

Space Management
1.10

Permanent Space (Perm Space)

Its the amount of data storage allowed for a specific user or database. Upon new installation of Teradata, all perm
space in Teradata is owned by the system master account, DBC. The total amount of Perm space on the Teradata
system would be the sum of all available Perm space, across all AMPs, divided by the number of AMPs.
Database, users, tables etc are created and stored in perm space.
Spool Space and Temp Space are unused PERM Space.

1.11

Spool Space

It is the unused Perm space, which is used to temporarily build answer sets when users run queries. Spool space
will accumulate the row results and hold onto the rows until the query completes. Once the query is completed, all
rows are returned by the AMPs through the BYNET.

1.12

Temporary Space (Temp Space)

It is the unused Perm space that can be used to create temporary tables (Volatile or Global temporary). Data is
active up to the current session and tables created in Temp space will survive a restart. Temp space is available to
the user until their session is terminated.
Exam Tips:
If the DBA created the maximum of 32 secondary indexes on a table, then there would be 32 Sub tables
created, each taking up PERM Space.
A database may have PERM space allocated to it. This PERM space establishes the maximum amount of disk
space for storing user data rows in any table located in the database. However, if no tables are stored
within a database, it is not required to have PERM space.
Although a database without PERM space cannot store tables, it can store views and macros because they
are physically stored in the Data Dictionary (DD) PERM space and require no user storage space. The DD is
in a database called DBC.
A database or user with no PERM space can still own views, macros, and triggers and execute queries.
Password differentiates a user from a database.
Global temporary tables survive system restarts.

2. Application Development
Application development for Teradata RDBMS falls into one of the following categories:
Explicit SQL
Implicit SQL
Under explicit SQL application development you have the following tools:
Embedded SQL
Macros
Stored Procedures
BTEQ
CLI
ODBC
Queryman
Third-party products that package and submit SQL
EXPLAIN statement
Under implicit SQL application development, you have tools such as Teradata and third-party products that permit
various fourth- generation languages and application generators to be translated into SQL.
Here we will discuss only some of the most important topics required for certification.

2.1 Macro
Teradata macros are SQL statements, which the server stores and executes. The advantages of using macros
include the generation of less channel traffic and easy execution of frequently used SQL operations. Macros are
particularly useful for enforcing data integrity rules, providing data security, and improving performance.

2.2 Stored Procedures


Stored procedures consist of a set of SPL control and condition handling statements that provide a procedural
interface to the Teradata RDBMS. The following can be specified in stored procedures:
Multiple SQL statements
Input and output parameter
Local variables
Procedural constructs
Condition handlers

3. Data Distribution
3.1 Primary Key
It is a logical concept in Teradata. Primary key cannot be NULL and is not mandatory. It is the designated column
(or columns) whose unique values will be used to identify each row in the table. Only one primary key can exist in
a table. The best choice for PK would be one small column, which appropriately represents the data in the row.

3.2 Primary Index


Primary Indexes are mainly used in Teradata for storage and distribution of rows. A PI is the most significant
component of a Teradata table. A hash code is derived on insertion of a new row, by applying a hashing algorithm
to the value in the columns of the primary code. Rows having the same primary index value are stored on the
same AMP. It is a physical concept. Primary Index can be NULL and is mandatory in Teradata. A primary index is the
most efficient method of accessing data and the best primary index has the greatest level of uniqueness. There are
3 main purposes for a PI: (1) Data distribution: the selection of PI columns directly determines the distribution of
tables rows. (2) Data retrieval: the PI is the fastest way to retrieve table data. (3) Join performance: the PI can
significantly impact the performance of query joins.

3.3 Partitioned Primary Index


It is a feature that was introduced from version V2R5.
PPI allows users to access a portion of data of a large table. Thus the overhead of scanning the entire table can be
reduced and thus performance can be improved. It doesnt distribute data, but creates partitions on already
distributed data based on PI. PPI sequences the rows in a data block by the stored data value instead of ROWID.
This makes SQL operations much faster when range scans are needed.

3.4 Unique Primary Index


With UPIs, columns cannot have duplicate values, guaranteeing uniform distribution of table rows.
The UPI will evenly distribute the rows across all the AMPs. Therefore a UPI is usually the best type of PI to select
when creating a table. UPI ensures that data is retrieved on the fastest physical path. Since the guaranteed unique
values obviate the need for duplicate checking, the performance will also be increased.

3.5 Non-Unique Primary Index


A table created with NUPI can have duplicate values in its columns, causing skewed data when distributing data.
Although NUPIs are not evenly distributed, they can still be one of the fastest ways to retrieve data. NUPIs can be
very efficient for query access and joins if the like rows are located on the same AMPs.

3.6 Unique Secondary Index


Unique secondary indexes are built using the following process:
Each AMP accesses its subset of base table rows.
The secondary index value is copied and appended to the ROWID of the base table row.
A row hash is created on the secondary index value.
All three values, Rowid, Row hash and secondary index value are placed onto the BYNET.
The data is received by an appropriate AMP and creates a row in the index sub table.
If the row already exists in the index sub table an error is reported.

3.7 Non Unique Secondary Index


Non-unique secondary indexes are a good choice for oft-queried columns and those columns that have recurring
column values. For each NUSI created on a table (up to 32 are allowed), a sub table is created on every AMP.Its an
all-AMP operation. Its known as Amp local because every AMP has a secondary index sub table that points to its
own base rows.

3.8 NoPI
A NoPI table is a MULTISET nontemporal table (a nontemporal table is a table that doesnt support
TransactionTime) that does not have a primary index. The chief purpose of nonpartitioned NoPI tables is to
enhance the performance of Fastload and Teradata Parallel Data Pump Array INSERT data loading operations.

4. Partitioning
4.1 Partitioned Primary Index
The default PI of Teradata Database is a non-partitioned PI, though both UPIs and NUPIs can be partitioned. A
Partitioned primary index will still provide a path to rows in the base table, global temporary tables, volatile tables,
and non-compressed, join indexes using PI values. When a PPI is used to create a table or join index, rows are
hashed to AMPs based on the PI columns and assigned to appropriate partitions. Rows are stored in row hash
order when assigned to a partition.

4.2 Multi-Level Partitioned Primary Index


A Multi Level PPI allows partitions to be sub partitioned. There can be up to 15 levels of partitioning. Each level of
MLPPI must define at least two partitions. The product of number of partitions should not exceed 65535.

4.3 Teradata columnar


Besides partitioning tables or joining indexes based on rows, Teradata allows us to partition tables on their
columns too. This form of column partitioning is called Columnar. Column partitioning is possible only if a table
or join index has no PI. Column partitioning is a physical database design choice thats not suitable for all
workloads that access a table or join index.

5. Access Methods
The following are the basic data access methods in Teradata in order of best-to-worst performance:

5.1 Primary Index


Whenever possible, Teradata uses PI to retrieve data, as this is the fastest method. Since PI points directly to the
AMP where the data is located, the Parsing engine knows exactly where to go to get the data.

5.2 Secondary Index

10

If a SI exists on a table and if the PE determines that PI cant be used, then the secondary index will be used to
retrieve data.

5.3 Full table scan


If neither PI nor SI is used, Teradata resorts to a Full Table Scan on the entire table. At this point, Teradata invokes
its powerful parallel processing engine to retrieve data from all the AMPs concurrently.

6. Join Index
Join Indexes are file structures for permitting queries to be resolved by accessing an index rather than its base
table. Join indexes can be defined on one or more tables.
Multitable join indexes will store and maintain joined rows of two or more tables and will aggregate selected
columns. These are used for join queries that are performed with high frequency.
Aggregate Join Index is a cost effective highly efficient method of resolving queries to frequently specified
aggregate operations on the same column or set of columns. As a result, aggregate calculations for every query
are not required.
Single table Join Indexes are used to resolve joins on large tables without redistributing the joined rows across the
AMPs. These types of join indexes will hash a frequently joined subset of base table columns to the same AMP. As
a result BYNET traffic is eliminated.

7. Table level lock compatibility


7.1 Lock levels
The Teradata lock manager implicitly locks the following objects:
Database: Locks rows of all tables in the database
Table: Locks all rows in the table and any index and fallback subtables
Row hash: Locks the primary copy of a row and all rows that share the same hash code

11

7.2 Levels of lock types


Users can apply four different types of locking on Teradata resources.

Exclusive: The requester has exclusive rights to the locked resource. No other process can read from, write to, or
access the locked resource in any way.
Write: The requester has exclusive rights to the locked resource except for readers not concerned with data
consistency
Read: Several users can hold Read locks on a resource, during which the system permits no modification of that
resource.
Access: The requestor is willing to accept minor inconsistencies of the data while accessing the database.

Exam tips:

The following are compatible lock types:


Access-Access, Access-Read, Access-write
Read-Access, Read-read
Write-Access
A user can lock the following resource types in Teradata:
Database
Table
Row hash
Access Lock is used to perform dirty read and to be able to read data that is currently being written by
another process.

8. Active Data Warehouse


Teradata has combined OLTP quick transactions with Decision Support Systems(DSS) to achieve the incredible
concept of Active Data Warehouse. The Active Data Warehouse allows companies to take their OLTP transactions
and load them into the data warehouse in near real-time so users can analyze data and make decisions before
their competitors. Some of the characteristics of an active data warehouse environment are mission critical
applications, tactical queries and a need for 24/7 reliability. Active data warehouses provide scalability in order to
support large amounts of detail data. Users are allowed to update the operational data store directly, and an
integrated environment supporting a wide mix of queries is created.

The order of evolution of Data warehouse is:


Reporting, Analyzing, Predicting, Operationalizing, Active Warehousing
12

8.1 Operational Data Store (ODS)


Companies run their daily business by taking orders and this comes in the form of transactions. A business
transaction is designed to make companies money. These transactions are stored in what is called an Operational
Data Store.The Operational Store holds the daily, weekly or even monthly transactions.
A data warehouse takes these transactions from the Operational Data Store and keeps them in a central,
enterprise-wide database.There are often years of transactions kept in the data warehouse so users can query the
detail data to gather information that will help to make strategic and tactical business decisions.

8.2 Processing types in ADW

OLTP
DSS
OLCP
OLAP
Data mining

Active Data Warehousing combines the best features of all above so that enterprise can utilize each processing
type to run their businesses better.

8.3 Data Models


A data model is a graphic representation of the data for a specific area of interest.That area of interest may be as
broad as all the integrated data requirements of a complete business organization ("Enterprise Data Model") or as
focused as a single business area or application. Concentrating one subject area at a time, the EDM is developed
from a top down approach using an enterprise view, not drawn from just one business area or specific
application.An enterprise Data model serves as a neutral data model that is normalized to address all business
areas and not specific to any function or group whereas an application model is built for a specific business area.
An application data model looks at only one aspect of business whereas enterprise logical data model integrates
all aspects of business.An enterprise data model is said to be more extensive that application data model.

8.4 Data Marts


Data mart connects the data warehouse to the enterprise.
There are two types of data marts in logical and physical data marts. A logical data mart is an existing part of the
data warehouse, but a physical data mart can reside on another platform, or on the Data Warehouse itself.
Dependent data marts are created from detail data directly from the data warehouse. If you never create a data
mart directly from another data mart and only create data marts that come from the detail data you will be
following best practices. You should refresh the data mart directly from the data warehouse when it makes logical
sense. That could be hourly, daily, nightly, weekly, monthly or yearly depending on the purpose of the data mart.

13

Logical Data Marts are not separated from the detail data in a separate computer system. Many companies use a
best practice of keeping the detail data and the dependent data marts on the same data warehouse platform. This
allows users to query the data mart for summarized or aggregated information while still being able to ask
questions about the detail data. A dependant data mart has nothing to do with a Logical Data Mart. You can take a
dependent data mart from Teradata and place the summarized information on Oracle or you can choose to keep
the dependent data mart with the detailed Teradata data warehouse.
It is important to have dependent data marts. Dependent data marts are extracted directly from the detail data.
They are always one extract away from the detail. Independent data marts are extracted from other data marts or
directly from operational systems.
Exam tips:
Row v/s Set processing:
Row processing is the type of processing in which rows are fetched one at a time and after doing
calculations in it, it is inserted or updated. Then the next row is fetched and the process continues as
before. Since rows are fetched one by one, it makes the processing very slow, although there is less locking
contention when row processing is used.
Set Processing is built on the concept of handling groups of rows at one time. The biggest advantage of Set
Processing over Row processing is performance and the benefit of using row processing over set
processing is less lock contention.
Throughput v/s Response time:
Response Time is the elapsed time per query and Throughput is the number of queries executed in an
hour. While throughput measures the amount of work processed, response time is a measure of process
completion.
Macros and stored procedures are two methods used by Teradata RDBMS to limit data access.
The typical purpose of Semantic layer in an Enterprise data warehouse architecture is data views.
Inverted list database is built around both set processing and row-at-a-time processing
X-views in data dictionary:
The views present in a data dictionary including DBC.AllTempTablesV, DBC.TablesV and DBC.UsersV can be
accessed using an EXPLAIN modifier preceding a DDL or DCL statement.
Some views are user-restricted and they are only applied to the user submitting the query acting upon the
view. These views are identified by an appended X to the system view name and are sometimes called X
views. These views will only report a subset of the available information. The only difference between an X
view and a non-X view is the existence of a WHERE clause to ensure a user can view only those objects the
user owns, is associated with, been granted privileges on, or assigned a role with privileges.
14

9. Data Protection
9.1 RAID1:
Using RAID1, data is mirrored across paired disks. It provides the highest level of protection, although the disk
space overhead is almost 50%.

9.2 RAID5:
Data and parity are stored by stripping across multiple disks. Its not mirrored.

9.3 Call Level Interface and ODBC:


The Teradata RDBMS uses the Call Level Interface (CLI) or ODBC for all communication between a user terminal
and the Teradata RDBMS. Whether used explicitly or implicitly, CLI and ODBC are the basis for all communication
between users and the Teradata RDBMS.
The CLI is a library of routines that resides in the user address space and provides the interface between the
application program and the TDP or Gateway. The CLI packages SQL requests on a client for routing to the
Teradata server. When a results set is returned to the client, the CLI unpackages the results for the system to
display to the user or write in a report.
The ODBC Driver for the Teradata RDBMS provides an alternate, CLI- independent interface to Teradata databases
using the industry standard ODBC application-programming interface.

Exam Tips
The three interfaces that enable access to the Teradata Database from a network-attached client are:
1. Call level Interface V2 (CLIv2)
2. Java Database Connectivity (JDBC)
3. Open Database Connectivity (ODBC)

Two application interfaces used by Windows applications are:


1.ODBC
2. WinCLI

15

10.

Teradata Fault Tolerance

Both hardware and software provide fault tolerance, some of which is mandatory and some of which is optional.
Teradata RDBMS facilities for software fault tolerance are:

Vproc migration
Fallback tables
AMP clusters
Journaling
Archive/Recovery
Table Rebuild utility

10.1

Vproc Migration:

Because the Parsing Engine (PE) and Access Module Process (AMP) are software, they can migrate from their home
node to another node within the same hardware clique if the home node fails for any reason.
Although the system normally determines which vprocs migrate to which nodes, a user can configure preferred
migratory destinations.

10.2

Fallback:

A fallback table is a duplicate copy of a primary table. Each row in a fallback table is stored on an AMP different
from the one to which the primary row hashes. This reduces the likelihood of loss of data due to simultaneous
losses of the 2 AMPs or their associated disk storage.
The disadvantage of this method is that it requires twice the storage space and twice the I/O (on inserts, update,
and deletes) of tables maintained without fallback. The advantage is that data is almost never lost because of a
down AMP. Data is fully available during an AMP or disk outage, and recovery is automatic after repairs have been
made.

10.3

AMP Clusters:

Clustering is a means of logically grouping AMPs to minimize (or eliminate) data loss that might occur from losing
an AMP.AMP clusters are used only for fallback data. The fallback copy of any row is always located on an AMP
different from the AMP that holds the primary copy. This is an entry-level fault tolerance strategy.

16

10.4

Journaling:

The different journaling capabilities of Teradata RDBMS are:


Down AMP recovery journal:

Is active during an AMP failure only


Includes journals fallback tables only
Is discarded after the down AMP recovers

Transient Journal:

Logs BEFORE images for all transactions


Is used by system to roll back failed transactions aborted either by the user or by the system
Captures:

BT/ET images for all transactions

Before images for updates and deletes

Row IDs for inserts

Control records for creates and drops


Keeps each image on the same AMP as the row it describes
Discards images when the transaction or rollback completes

Permanent Journal:

Is active continuously
Is available for tables or databases
Provides rollforward for hardware failure recovery
Provides rollback for software failure recovery
Provides full recovery of nonfallback tables
Reduces need for frequent, full-table archives

10.5

Archive/Recovery:

Teradata Archive and restore Utility (ARC) has 3 main uses:


1.Archive a database, table or selected partitions of a PPI table
2.Restore a database, table or selected partitions of a PPI table
3.Copy an archived database, table or selected partitions of a PPI table to a Teradata database on a different
system.

17

10.6

Table Rebuild Utility:

Table Rebuild provides for rebuilds of entire databases and all tables in a database, including:
The primary portion of a table
The fallback portion of a table
The entire table (both primary and fallback portions).
All fallback tables that reside on an AMP
All tables that reside on an AMP

Some of the Teradata RDBMS facilities for software fault tolerance are:
Dual BYNETs:
In a Teradata system there are two BYNETs, called the BYNET-0 and BYNET-1.So if one BYNET fails, the other one is
available and the interprocessor traffic is not hindered until both of them fail.
RAID disk units:
These are used to protect against disk failure. The most common level of RAID is RAID -1 (Transparent mirroring).
Each primary disk will have an exact copy of all its data on another disk. This provides the highest level of
protection although it incurs a 100% overhead.
RAID-5 data and parity are stripped across a rank of disks.

Cliques:
They are a group of nodes that share access to the same disk arrays. Cliques support the migration of vprocs when
nodes fail. If a node in a clique fails, then the vprocs in that node migrate to other nodes in the same clique. This
migration minimizes the performance impact on the system.

Exam Tips:
The following can be archived by the Teradata ARC:
Database, table, partition

The ARC statement COPY restores a copy of an archived file to a specified Teradata database system.

18

One of the reasons to use USI over NUSI while creating a table is that they are needed for journaling and it
ensures that unique data is inserted into the table.
Tables defined with FALLBACK create a Down AMP Recovery Journal in the event of an AMP failure.
The Transient Journal (TJ) ensures data integrity by keeping a beforeimage copy of changed rows in
memory. Upon transaction failure, the changes are rolled back.
TJ is maintained automatically. It provides rollback of changed rows for transactions that are not
completed.
Two of the reasons why a customer would choose table partitioning are:
1. To reduce the I/O for range constrained queries
2. For the ability to archive specific partitions in a table.
The 3 ways by which Teradata protects data are:
1. Archive to tape
2.Archive to disk
3.RAID technology
When using RAID1, 50% space is used as overhead.
Archive to tape, archive to disk and RAID technology are some of the ways by which Teradata protects data.
Hot Standby Node is a member of a clique.
When a node fails, all LAN PEs and AMPs will migrate to another node in the clique.
Hot standby nodes:
In Teradata, in case of a node failure, Teradata will reset. When it does, the AMPs and PEs in the down node
will be instructed to migrate to the hot standby node.

11.

Teradata Concurrency Control

It basically involves preventing concurrently running processes from improperly inserting, deleting, or updating
the same data. The are two mechanisms to achieve this:
Transactions
Locks

19

A transaction is the unit of work and the unit of recovery. A partial transaction cannot exist either all statements
should execute or none of them should. A set of transactions is said to be serializable if and only if it produces the
same result as some arbitrary serial execution of those same transactions for arbitrary input. A set of transactions is
correct only if it is serializable.
A lock is a means of claiming usage rights to some resource. A user can lock the following resource types in a
Teradata database:
1. Database 2. Table 3.View 4.Row Hash

11.1

System restarts:

Unscheduled system restarts occur because of the following reasons:


AMP/disk/node failure
Software failure
Parity error

When such an unscheduled restart occurs, 2 types of automatic transaction recovery can occur:
1. Single transaction recovery:
It aborts a single transaction because of many reasons like user error, user initiated abort command, transaction
deadlock timeout etc.
2. RDBMS recovery
It is caused by hardware/software failure or user command

11.2

Down AMP recovery:

If an AMP fails to come online during system recovery, the RDBMS continues to process the transaction with the
help of fallback data. When it comes back online, down AMP recovery procedures make the data for the AMP up to
date. If there are a large number of rows to be processed, then the AMP recovers offline. If only a few are there, it
recovers online.
Exam Tips:
For a tablelevel lock the following lock type combinations are compatible:
Access-Access, Access-Read, Access-write
Read- Access, Read-read
Write-Access

20

The purpose of a lock is to serialize access in situations where concurrent access would data consistency.
Two reasons an access lock is used:
To perform a dirty read
To be able to read data that is currently being written by another process
The impact on a database when a node failure occurs is that the database is restarted

12.

System Monitoring

Teradata Manager is a monitoring system for production and performance, which is used to control one or more
Teradata servers. It is mainly used for reviewing historical workload.
Some of the monitoring tools in Teradata Analyst Pack to help the users and the DBA are:

12.1

Index Wizard:

It is used to analyze a workload of SQL. Index wizard creates a series of reports and index recommendations
describing the costs and statistics associated with those recommendations.

12.2

Statistics Wizard:

It is primarily used to help with collecting statistics. It recommends the collection of new statistics and the
recollection of existing statistics.

12.3

TSET:

It allows user to capture cost parameters, statistics etc. It doesnt export user data. It allows you to quickly project a
production environment by emulating a larger production system in a smaller test or development environment.
This reduces the cost of query plan analysis and your overall development efforts.

12.4

Visual Explain:

It depicts the PE optimizers execution plan to access data in a visual format with easy to read icons. It visually
displays the query execution plan generated by the Teradata optimizer.

21

12.5

Teradata Analyst Pack:

Teradata Analyst Pack, in conjunction with other Teradata tools and utilities, enhances users ability to analyze and
understand the detailed steps involved in query plans along with the influences of the system configuration, data
demo- graphics, and secondary index structure.

13.

Teradata Utilities

13.1

BTEQ- Querying Utility:

It is the earliest Teradata Query tool and can be used to :


1. Submit SQL in either a batch or interactive environment
2. Output SQL in a report format
3. Import data into and export data from Teradata in a report format
BTEQ is an SQL front-end utility that runs on all client platforms. It resides on the client portion of either a channelattached or network-attached system and communicates with one or more Teradata RDBMS systems residing on
the server. BTEQ allows you to create and submit SQL queries either interactively or in batch mode from an
interactive terminal.

13.2

Load and Extract Utilities:

Teradata Parallel Transporter (TPT) provides scalable, high-speed, parallel data extraction, loading and updating by
using and expanding on the traditional Teradata extraction and load utilities such as Fastload, Multiload,
FastExport and TPump.
TPump:
Used for row-level locking
Used for continuous updates to rows in a table
Does not support Multi-SET tables
Loads data to Teradata from a Mainframe or LAN flat file
SQL operations can be performed on many tables simultaneously

FastLoad:
Used for table-level locking
Only one table may be loaded at a time and that table should be empty
Loads (INSERTs) data to Teradata from a Mainframe or LAN flat file
Runs in two operating modes: Interactive mode and Batch mode
Duplicate rows will not be loaded
22

Uses two error tables: the first error table pertains to the errors when the format of the data is not correct and
the second error table takes errors when there is violation of UPI.

MultiLoad:
Used for table-level locking
Up to 20 inserts, updates or deletes can be done on up to 5 tables
Loads data to Teradata from a Mainframe or LAN flat file
Extremely fast
Duplicate rows allowed
Usually the receiving tables are populated
Uses two error tables: Acquisition phase error table is related to data error and Application phase error table is
related to UPI violations

FastExport:

Used to export data into a flatfile on a mainframe or LAN


Export can be done in record mode also.
Two ways to do this: 1.FastExport (faster) 2.BTEQ(used less compared to FastExport)

Uses multiple sessions to export from multiple tables

TPT combines all the above Teradata utilities into 1 comprehensive language. It can perform insertion of data into
tables, exporting from tables, updating table etc with in-line filtering

Exam tips:
BTEQ allows import/export across all supported platforms.
Teradata allows you to set multiple sessions in a BTEQ script. However, this will work only if your SQL is
using the Primary Index or a Unique Secondary Index. So UPI, NUPI and USI are the only options that will
utilize multiple sessions.
BTEQ connects to the database by means of CLIv2
BTEQ enables users on a workstation to easily access one or more Teradata Database systems for ad hoc
queries, report generation, data movement (suitable for small volumes) and database administration.

23

It is possible to run queries in BTEQ through a batch script.


Tables using TPump can have Secondary Indexes, Referential Integrity, and Triggers unlike FastLoad or
MultiLoad.
Teradata loads using FastLoad, updates using MultiLoad, streams using TPump and exports using
FastExport.
TPump doesnt move data in the large blocks. It loads data one row at a time using row hash locks. Because
of this, it is possible to run multiple Tpumps against the same table at the same time
TPump is used to read from queues or flat files
TPT supports table loading, updating and streaming of updates.
For mini-batch data loads, it may be faster to FastLoad the load data into an empty staging table, and then
use a Teradata INSERT ... SELECT process to populate to target table.
FastExport is used to export large amounts of data
BTEQ allows import/export across all supported platforms
Two features of MultiLoad:
Insert data into empty tables
Upsert data to multiple tables
The Fastload utility loads the data into staging as first step to loading.

14.
1.
2.
3.
4.

References
Teradata 101- The foundation and principles by Steve Wilmes and Eric Rivard
Introduction to Teradata RDBMS by Teradata Corporation
http://teradata.uark.edu
Tera-Tom on Teradata Basics by Tom Coffing

24

Thank You

Contact
For more information, contact gsl.cdsfiodg@tcs.com (Email Id of ISU)

About Tata Consultancy Services (TCS)


Tata Consultancy Services is an IT services, consulting and business
solutions organization that delivers real results to global business,
ensuring a level of certainty no other firm can match. TCS offers a
consulting-led, integrated portfolio of IT and IT-enabled infrastructure,
engineering and assurance services. This is delivered through its unique
Global Network Delivery ModelTM, recognized as the benchmark of
excellence in software development. A part of the Tata Group, Indias
largest industrial conglomerate, TCS has a global footprint and is listed on
the National Stock Exchange and Bombay Stock Exchange in India.
For more information, visit us at www.tcs.com.

IT Services
Business Solutions
Consulting
All content / information present here is the exclusive property of Tata Consultancy Services Limited (TCS). The content /
information contained here is correct at the time of publishing. No material from here may be copied, modified, reproduced,
republished, uploaded, transmitted, posted or distributed in any form without prior written permission from TCS.
Unauthorized use of the content / information appearing here may violate copyright, trademark and other applicable laws,
and could result in criminal or civil penalties. Copyright 2011 Tata Consultancy Services Limited