Vous êtes sur la page 1sur 44

TERADAT

A
Swapnil Mahalle (176191)
swapnil.mahalle@cognizant.com

What is Teradata?

Teradata is a relational database management


system (RDBMS) that drives a company's data
warehouse
Teradata is an ideal foundation for many
applications, including:
Enterprise data warehousing
Active data warehousing
Customer relationship management
Internet and E-Business
Data Marts

Enterprise Data Warehousing

Teradata Database is ideal for enterprise


data warehousing, which is commonly
characterized by:
Multiple

subject areas
Many concurrent users
Many concurrent queries, including ad-hoc
queries
Large quantity of tables
Hundreds of gigabytes (and terabytes) of detail
data
Historical data stored (months or years)

Active Data Warehousing

Active Data Warehousing is the technical ability to capture


transactions when they change, and integrate them into the
warehouse.
Active data warehouse must deliver performance, scalability,
availability, and data freshness.
The Teradata Warehouse supports active data warehousing
with:
Capability to handle thousands of additional users and
mixed workloads.
High availability and reliability to support mission-critical
applications.
Scalability to accommodate an increase in the amount of
data, the number of data sources, and the number of
applications supported in the data warehouse environment.

Customer Relationship
Management

CRM is the common terminology used to describe


the managing of prospects all the way through the
entire sales process. CRM is often an entire data
system that can either be manipulated manually.
Teradata Database's detailed data and analysis
capabilities to identify and optimize business
relationships with the highest potential of profitability
and growth.
Teradata's CRM solution, Teradata Relationship
Manager, consists of software, professional and
customer services, and the Teradata Database to
create, maintain, and enhance customer
relationships.

Internet and E-Business

The Teradata Database provides a single repository for


customer information that helps E-Businesses build and
maintain one-to-one customer relationships that are
critical to their success on the Internet.

The Teradata Database allows E-Businesses to:


Capture massive amounts of click-stream data.
Enable multiple users to ask complex questions of the
customer' click-stream data with near real-time
response.
Protect customers' privacy with consumer opt-in/optout preferences and ability for consumers to check
and revise their information stored on the Teradata
Database through the Internet or a company call
center.

Data marts

A data mart is a special purpose subset of a


company's enterprise data used by a particular
department, function, or application.
Often, these single-subject area data marts contain
data that was aggregated or transformed in some
way to better handle the requests of a specific user
community.

The Teradata Database is ideal for the logical data


mart environment, where different user
communities view subsets of a single repository of
enterprise data.

Unique Features

Single data store


Scalability
Unconditional parallelism (parallel
architecture)
Ability to model the business
Mature, parallel-aware Optimizer

Single Data Store

Scalability
"Linear scalability" means that as you add components to the system, the
performance increase is linear.
Hardware
SMP: Symmetric Multiprocessing Platform
MPP: Massively parallel processing
systems
Complexity
Teradata is adept at complex data models that
satisfy the information needs throughout an
enterprise. It has the ability to perform large
aggregations during query run time and can
perform up to 64 joins in a single query.

Concurrent Users
Teradata can handle the most concurrent users,
who are often running multiple, complex
queries.

Unconditional Parallelism

Teradata provides exceptional performance to achieve a single


answer faster than a non-parallel system. Parallelism uses
multiple processors working together to accomplish a task
quickly.

Ability To Model Business

A data warehouse built on a business model (truly normalized)


contains information from across the enterprise. Individual
departments can use their own assumptions and views of the data
for analysis, yet these varying perspectives have a common basis
for a "single version of the truth."

Mature, Parallel-Aware Optimizer


Teradata's Optimizer is the most robust in the
industry, able to handle:
Multiple complex queries
Multiple Joins per query
Unlimited ad-hoc processing

DATAWAREHOUSING

Evolution

Various Stages of DW

Reporting: The initial stage typically focuses on reporting from a single


source of truth to drive decision-making across functional and/or product
boundaries.
Analyzing: Users perform ad-hoc analysis, slicing and dicing the data at a
detail level, and are concerned with drilling down beneath the numbers on a
report.
Predicting: Sophisticated analysts heavily utilize the system to leverage
information to predict what will happen next in the business to proactively
manage the organization's strategy. This stage requires data mining tools and
building predictive models using historical detail.
Operationalizing: Providing access to information for immediate decisionmaking, in the field enters the realm of active data warehousing. Stages 1 to 3
focus on strategic decision-making within an organization. Stage 4 focuses on
tactical decision support. Tactical decision support is not focused on developing
corporate strategy, but rather on supporting the people in the field who execute
it.
Active Warehousing: The larger the role an ADW plays in the operational
aspects of decision support, the more incentive the business has to automate
the decision processes. As technology evolves, more and more decisions
become executed with event-driven triggers to initiate fully automated decision

Evolution of Data Processing


An RDBMS is used in the following main
processing environments:
OLAP
OLTP
DSS

Environments

Data Marts

A data mart is a special purpose subset of enterprise data used by a


particular department, function or application. Data marts may have both
summary and detail data for a particular use rather than for general use.

Independent Data Marts

Logical Data Marts

Dependent Data Marts

Data Marts

A Teradata System

A Teradata system contains one or more nodes.

A node is a term for a processing unit under the control of a single


operating system.

Node Components

Software Components

A Teradata node requires three distinct pieces of software:

TPA, PDE, OS

Parallel Database Extensions (PDE)

The Parallel Database Extensions (PDE) software layer was added to the
operating system by NCR to support the parallel software environment.

Trusted Parallel Application (TPA)

A Trusted Parallel Application (TPA) uses PDE to implement virtual


processors (vprocs). The Teradata Database is classified as a TPA. The
four components of the Teradata TPA are:

AMP

PE

Channel Driver

Teradata Gateway

Parsing Engine

A Parsing Engine (PE) is a vproc that manages the dialogue between a


client application and the Teradata

Each PE can support a maximum of 120 sessions.

Session Control

Parser

Optimizer

Dispatcher

AMP

The AMP is a vproc that controls its portion of the data on the system. The
AMPs work in parallel, each AMP managing the data rows stored on its
vdisk.

Data Distribution
When data is loaded, inserted, and updated, the AMP:

Receives incoming data from the PE.

Formats rows and distributes them on its vdisk.

Data Access

Returns responses over BYNET to dispatcher

BYNET

The BYNET (pronounced, "bye-net") is a high-speed interconnect (network)


that enables multiple nodes in the system to communicate.

Features:

Scalable

High performance

Fault tolerant

Load balanced

Communication Between Nodes

Communication Between Vprocs

Point-to-point

Multicast

Broadcast

Point-to-Point Messages

Communication Between
Vprocs

Multicast Messages

Broadcast Messages

Cliques

A clique (pronounced, "kleek") is a group of nodes that share access to the


same disk arrays. Each multi-node system has at least one clique. The
cabling determines which nodes are in which cliques -- the nodes of a
clique are connected to the disk array controllers of the same disk arrays.

Database

In Teradata, a "database" provides a logical


grouping of information.

Databases

Tables

Views

Macros

Triggers

Stored procedures

USER

User: A Special Kind of Database

Teradata Objects

Tables: A table in a relational database management system is a twodimensional structure made up of columns and physical rows stored in data
blocks on the disk drives.
Views: A view is like a "window" into tables that allows multiple users to look at
portions of the same base data. A view may access one or more tables, and
may show only a subset of columns from the table(s).
Macros: Macros are pre-defined, stored sets of one or more SQL
commands and/or report-formatting (BTEQ) commands. Macros can also
contain comments.
Triggers: A trigger is a set of SQL statements usually associated with a column
or table that are programmed to be run (or "fired") when specified changes are
made to the column or table. The pre-defined change is known as a triggering
event, which causes the SQL statements to be processed.
Stored Procedures: A stored procedure is a pre-defined set of statements
invoked through a single CALL statement in SQL. While a stored procedure
may seem like a macro, it is different in that it can contain:
Teradata SQL data manipulation statements (non-procedural)
Procedural statements (in Teradata, referred to as Stored Procedure Language)

Creating Databases and Users

In Teradata, Databases (including special


category of Databases called Users) have
attributes assigned to them:
Access Rights
Perm Space
Spool Space
Temp Space

Space Management
Perm Space:

Here objects (database, tables, users, macro)


are created and physically stored.
Evenly distributed among all the AMPs to ensure
reasonable data distribution.
At the time of object creation, Teradata does not
allocate space, rather assigned limit for perm
space which is used dynamically by the objects.
Once the objects gets dropped or data gets
deleted, perm space is freed.

Space Management
Spool Space:

Amount of space on the system not allocated to


any object, which is used to store intermediate
results for further processing , during a Teradata
query execution.
Defining spool space is not required during the
object creation, but is recommended to avoid
consumption of all the available spaces by one
query.
Once the query processing completes, spool
space is freed.

Space Management

Temporary Space:
The amount of space taken by global temporary table
during query processing.
Perm space ,not yet occupied is used as Temp space in
Teradata.
The result or inserted data remains available only
through the session.
Temp space is freed upon session completion.

Data Dictionary

The Data Dictionary is a set of relational tables that contains


information about the RDBMS and database objects within it. It is
the metadata or "data about the data" for a Teradata Database. The
Data Dictionary resides in Database DBC. Some of the major items
it tracks are:

Disk space

Access authorizations

Ownership

Data definitions

Data Protection

LOCKS
Exclusive, Write, Read and Access
RAID: Redundant Array of Inexpensive Disks
FALLBACK
JOURNALS
Permanent and Recovery

Locks

Database Locks: Apply to all tables and views in the database.

Table Locks: Apply to all rows in the table.

Row Hash Locks: Apply to a group of one or more rows in a table.

Types :

Exclusive Locks

Write

Read

Access

RAID 1 & RAID 5

FALLBACK

JOURNALS

Permanent Journals
Optional,

user specified, system maintained


(Rollback transaction / recovery of DB)
DBA/User intervention required for recovery.

Recovery Journals
An

interrupted transaction (Transient Journal)


An AMP failure (Down-AMP Recovery Journal)

THANK YOU
Swapnil Mahalle (176191)
swapnil.mahalle@cognizant.com

Vous aimerez peut-être aussi