Vous êtes sur la page 1sur 21

SAP HANA Architecture:

The architecture overview of the In-Memory Computing Engine of SAP HANA. The SAP
HANA database is developed in C++ and runs on SUSE Linux Enterprise Server. SAP
HANA database consists of multiple servers and the most important component is the
Index Server. SAP HANA database consists of Index Server, Name Server, Statistics
Server, Pre-processor Server and XS Engine.
1. Index Server contains the actual data and the engines for processing the data. It
also coordinates and uses all the other servers.
2. Name Server holds information about the SAP HANA database topology. This is
used in a distributed system with instances of HANA database on different hosts.
The name server knows where the components are running and which data is
located on which server.
3. Statistics Server collects information about Status, Performance and Resource
Consumption from all the other server components. From the SAP HANA Studio
we can access the Statistics Server to get status of various alert monitors.
4. Pre-processor Server is used for Analysing Text Data and extracting the
information on which the texts search capabilities is based.
5. XS Engine is an optional component. Using XS Engine clients can connect to SAP
HANA database to fetch data via HTTP.

Now the architecture components of SAP HANA Index Server.








SAP HANA Index Server Architecture:

1. Connection and Session Management component is responsible for creating and
managing sessions and connections for the database clients. Once a session is
established, clients can communicate with the SAP HANA database using SQL
statements. For each session a set of parameters are maintained like, auto-commit,
current transaction isolation level etc. Users are Authenticated either by the SAP
HANA database itself (login with user and password) or authentication can be delegated
to an external authentication providers such as an LDAP directory.

2. The client requests are analyzed and executed by the set of components
summarized as Request Processing and Execution Control. The Request Parser
analyses the client request and dispatches it to the responsible component. The
Execution Layer acts as the controller that invokes the different engines and routes
intermediate results to the next execution step.

For example, Transaction Control statements are forwarded to the Transaction
Manager. Data Definition statements are dispatched to the Metadata Manager and
Object invocations is forwarded to Object Store.
Data Manipulation statements are forwarded to the Optimizer which creates an
Optimized Execution Plan that is subsequently forwarded to the execution layer.
The SQL Parser checks the syntax and semantics of the client SQL statements and
generates the Logical Execution Plan. Standard SQL statements are processed directly
by DB engine.
The SAP HANA database has its own scripting language named SQLScript that is
designed to enable optimizations and parallelization. SQLScript is a collection of
extensions to SQL. SQLScript is based on side effect free functions that operate on tables
using SQL queries for set processing. The motivation for SQLScript is to offload data-
intensive application logic into the database.
Multidimensional Expressions (MDX) is a language for querying and manipulating the
multidimensional data stored in OLAP cubes.
The SAP HANA database also contains a component called the Planning Engine that
allows financial planning applications to execute basic planning operations in the
database layer. One such basic operation is to create a new version of a dataset as a copy
of an existing one while applying filters and transformations. For example: Planning
data for a new year is created as a copy of the data from the previous year. This requires
filtering by year and updating the time dimension. Another example for a planning
operation is the disaggregation operation that distributes target values from higher to
lower aggregation levels based on a distribution function.
The SAP HANA database also has built-in support for domain-specific models (such as
for financial planning) and it offers scripting capabilities that allow application-specific
calculations to run inside the database.
The SAP HANA database features such as SQLScript and Planning operations are
implemented using a common infrastructure called the Calc engine. The SQLScript,
MDX, Planning Model and Domain-Specific models are converted into Calculation
Models. The Calc Engine creates Logical Execution Plan for Calculation Models. The
Calculation Engine will break up a model, for example some SQL Script, into operations
that can be processed in parallel. The engine also executes the user defined functions.
3. In HANA database, each SQL statement is processed in the context of a transaction.
New sessions are implicitly assigned to a new transaction. The Transaction Manager
coordinates database transactions, controls transactional isolation and keeps track of
running and closed transactions. When a transaction is committed or rolled back, the
transaction manager informs the involved engines about this event so they can execute
necessary actions. The transaction manager also cooperates with the persistence layer
to achieve atomic and durable transactions.

4. Metadata can be accessed via the Metadata Manager. The SAP HANA database
metadata comprises of a variety of objects, such as definitions of relational tables,
columns, views, and indexes, definitions of SQLScript functions and object store
metadata. Metadata of all these types is stored in one common catalogue for all SAP
HANA database stores (in-memory row store, in-memory column store, object store,
disk-based). Metadata is stored in tables in row store. The SAP HANA database features
such as transaction support, multi-version concurrency control, are also used for
metadata management. In distributed database systems central metadata is shared
across servers. How metadata is actually stored and shared is hidden from the
components that use the metadata manager.

5. The Authorization Manager is invoked by other SAP HANA database components to
check whether the user has the required privileges to execute the requested operations.
SAP HANA allows granting of privileges to users or roles. A privilege grants the right to
perform a specified operation (such as create, update, select, execute, and so on) on a
specified object (for example a table, view, SQLScript function, and so on).
The SAP HANA database supports Analytic Privileges that represent filters or
hierarchy drilldown limitations for analytic queries. Analytic privileges grant access to
values with a certain combination of dimension attributes. This is used to restrict access
to a cube with some values of the dimensional attributes.
6. Database Optimizer gets the Logical Execution Plan from the SQL Parser or the
Calc Engine as input and generates the optimised Physical Execution Plan based on
the database Statistics. The database optimizer which will determine the best plan for
accessing row or column stores.

7. Database Executor basically executes the Physical Execution Plan to access the row
and column stores and also process all the intermediate results.

8. The Row Store is the SAP HANA database row-based in-memory relational data
engine. Optimized for high performance of write operation, Interfaced from calculation
/ execution layer. Optimised Write and Read operation is possible due to Storage
separation i.e. Transactional Version Memory & Persisted Segment.

Transactional Version Memory contains temporary versions i.e. recent
versions of changed records. This is required for Multi-Version
Concurrency Control (MVCC). Write Operations mainly go into
Transactional Version Memory. INSERT statement also writes to the
Persisted Segment.
Persisted Segment contains data that may be seen by any ongoing active
transactions. Data that has been committed before any active transaction
was started.
Version Memory Consolidation moves the recent version of changed
records from Transaction Version Memory to Persisted Segment based on
Commit ID. It also clears outdated record versions from Transactional
Version Memory. It can be considered as garbage collector for MVCC.
Segments contain the actual data (content of row-store tables) in pages.
Row store tables are linked list of memory pages. Pages are grouped in
segments. Typical Page size is 16 KB.
Page Manager is responsible for Memory allocation. It also keeps track of
free/used pages.
9. The Column Store is the SAP HANA database column-based in-memory relational
data engine. Parts of it originate from TREX (Text Retrieval and Extraction) i.e. SAP Net
Weaver Search and Classification. For the SAP HANA database this proven technology
was further developed into a full relational column-based data store. Efficient data
compression and optimized for high performance of read operation, Interfaced from
calculation / execution layer. Optimised Read and Write operation is possible due to
Storage separation i.e. Main & Delta.



Main Storage contains the compressed data in memory for fast read.
Delta Storage is meant for fast write operation. The update is performed by inserting a
new entry into the delta storage.
Delta Merge is an asynchronous process to move changes in delta storage into the
compressed and read optimized main storage. Even during the merge operation the
columnar table will be still available for read and write operations. To fulfil this
requirement, a second delta and main storage are used internally.
During Read Operation data is always read from both main & delta storages and result
set is merged. Engine uses multi version concurrency control (MVCC) to ensure
consistent read operations.
As row tables and columnar tables can be combined in one SQL statement, the
corresponding engines must be able to consume intermediate results created by each
other. A main difference between the two engines is the way they process data: Row
store operators process data in a row-at-a-time fashion using iterators. Column store
operations require that the entire column is available in contiguous memory locations.
To exchange intermediate results, row store can provide results to column store
materialized as complete rows in memory while column store can expose results using
the iterator interface needed by row store.

10. The Persistence Layer is responsible for durability and atomicity of transactions. It
ensures that the database is restored to the most recent committed state after a restart
and that transactions are either completely executed or completely undone. To achieve
this goal in an efficient way the per-sistence layer uses a combination of write-ahead
logs, shadow paging and save points. The persistence layer offers interfaces for writing
and reading data. It also contains SAP HANAs logger that manages the transaction log.
Log entries can be written implicitly by the persistence layer when data is written via
the persistence interface or explicitly by using a log interface.















Distributed System and High Availability
The SAP HANA Appliance software supports High Availability. SAP HANA scales systems
beyond one server and can remove the possibility of single point of failure. So a typical
Distributed Scale out Cluster Landscape will have many server instances in a cluster.
Therefore Large tables can also be distributed across multiple servers. Again Queries
can also be executed across servers. SAP HANA Distributed System also ensures
transaction safety.
Features
N Active Servers or Worker hosts in the cluster.
M Standby Server(s) in the cluster.
Shared file system for all Servers. Several instances of SAP HANA share the
same metadata.
Each Server hosts an Index Server & Name Server.
Only one Active Server hosts the Statistics Server.
During start up one server gets elected as Active Master.
The Active Master assigns a volume to each starting Index Server or no volume in
case of cold Standby Servers.
Up to 3 Master Name Servers can be defined or configured.
Maximum of 16 nodes is supported in High Availability configurations.
Name Server
Configured
Role
Name
Server
Actual
Role
Index Server
Configured
Role
Index
Server
Actual Role
Master 1 Master Worker Master
Master 2 Slave Worker Slave
Master 3 Slave Worker Slave
Slave Slave Standby Standby

Failover

High Availability enables the failover of a node within one distributed SAP HANA
appliance. Failover uses a cold Standby node and gets triggered automatically. So
when a Active Server X fails, Standby Server N+1 reads indexes from the shared
storage and connects to logical connection of failed server X.
If the SAP HANA system detects a failover situation, the work of the services on
the failed server is reassigned to the services running on the standby host. The
failed volume and all the included tables are reassigned and loaded into memory
in accordance with the failover strategy defined for the system. This
reassignment can be performed without moving any data, because all the
persistency of the servers is stored on a shared disk. Data and logs are stored on
shared storage, where every server has access to the same disks.
The Master Name Server detects an Index Server failure and executes the
failover. During the failover the Master Name Server assigns the volume of the
failed Index Server to the cold Standby Server. In case of a Master Name Server
failure, another of the remaining Name Servers will become Active Master.
Before a failover is performed, the system waits for a few seconds to determine
whether the service can be restarted. Standby node can take over the role of a
failing master or failing slave node.









SAP HANA Appliance
SAP In-Memory Appliance (SAP HANA) software is a flexible, multipurpose, data source
agnostic in-memory appliance that combines SAP software components optimized on
hardware provided and delivered by SAPs leading hardware partners.
SAP HANA Appliance is a combination well defined and validated stack of Hardware and
Software components. SAP HANA 1.0 SPS 03 consists of SAP HANA Database, SAP
HANA Client, and SAP HANA Studio.

SAP HANA In-Memory Computing Engine is an ACID-compliant, Massively Parallel
Processing hybrid Relational Database for Storing Data In-Memory.
SAP HANA Clients are provided for various operating systems, delivering the required
database clients to connect to SAP HANA via JDBC, ODBC, ODBO or BICS.
SAP HANA Studio is an Eclipse based tool used to administer SAP HANA Database,
create analytic models in SAP HANA and for data provisioning.


Software Components
SAP HANA Appliance software is available in different editions:
SAP HANA Appliance Software Platform Edition
SAP HANA Appliance Software Enterprise Edition
SAP HANA Appliance Software Enterprise Extended Edition

Process Flow- SAP HANA
SAP HANA software enables Organizations to instantly explore and analyze huge
volumes of detailed transactional data in real time from virtually any data source.
Operational data is captured in memory while business is happening, and flexible views
expose analytic information at the speed of thought. Below are the steps involved to
gain analytical insight from the transactional data of the source system.
1. Import Source System Metadata- Create Table Definition or Import table
metadata from the source system.
2. Data Provisioning- Initial Data loading and Replication of the subsequent changes
of the source system tables into SAP HANA target tables.
3. Create Information Models- Create analytical data models on the top of the
Physical Tables. Information models are used to create multiple database views
of the transactional data that can be used for analytical purposes.
4. Consume Analytic Data- Consume or retrieve data from SAP HANA database with
a wide variety of client tools. Various connectivity options are provided by SAP
HANA- ODBC, JDBC, ODBO, BICS or SQL DBC.




Data Provisioning
One of the promises of SAP HANA is to deliver real-time analytic insight on vast data
volumes. For the real-time aspect, data acquisition in real time is required. This is the
task of Sybase Replication Server. Tables from SAP ERP system are initially loaded into
SAP HANA. All subsequent changes to these ERP tables are immediately replicated into
the HANA server. To this end Replication Server makes use of the database logs in the
ERP system. The tool that helps selecting the tables to be loaded and replicated is
integrated into the In-Memory Computing Studio.
Data Modelling
Once the tables are created in HANA and loaded from the source system, the semantic
relationships between the tables need to be modelled. Modelling can be done in several
places.
1. If Data Services is used to create and populate the table, first layer of modelling
can be implemented here.
2. Analytical Data Models can be created in the Information Modeller perspective of
In-Memory Computing Studio.
3. Depending on the front-end tool used to retrieve data from the In-Memory
Computing Engine, further modelling decisions can be made in Universes
(Information Design Tool) or other semantic layers.
Analytical Reporting
Due to choice of various connectivity options a wide variety of Reporting tools can be
used for the purpose of showcasing data insights. Below are the reporting tools that
cater to various needs of Enterprise wide reporting.
SAP Business Objects Explorer
SAP Business Objects Analysis, Office Edition
Microsoft Excel 2010,
SAP Business Objects BI 4.0 Suite (Web Intelligence, Dashboards, Crystal
Reports)


We can configure SAP Business Objects BI suite for Reporting, Interactive Analysis and
front-end data visualization. SAP Business Objects Universe Designer allows SQL-like
access to Data tables and views stored in SAP HANA. Configure SAP Business Objects
Query as a Web Service (QAAS) to expose query functionality in Universe as a Web
Service.
Data Replication Methods
In-memory reporting and analyzing of business data requires the replication of the data
from a source system to the SAP HANA database. This section provides an overview of
the possible replication methods that are available for the SAP HANA appliance. Three
main types of replication methods are Trigger, Log, and ETL based.
Trigger-Based Replication: Trigger-Based Data Replication Using SAP
Landscape Transformation (LT) Replication Server is based on capturing
database changes at a high level of abstraction in the source ERP system. This
method of replication benefits from being database-independent, and can also
parallelize database changes on multiple tables or by segmenting large table
changes.


ETL-Based Replication: Extraction-Transformation-Load (ETL) Based Data
Replication uses SAP Business Objects Data Services to specify and load the
relevant business data in defined periods of time (Batches) from an ERP system
into the SAP HANA database. We can reuse the ERP application logic by reading
extractors or utilizing SAP function modules. In addition, the ETL-based method
offers options for the integration of third-party data providers.

Log-Based Replication: Transaction Log-Based Data Replication Using Sybase
Replication is based on capturing table changes from low-level database log files.
This method is database-dependent. Database changes are propagated on a per
database transaction basis, and they are then replayed on the SAP HANA
database. This means consistency is maintained, but at the cost of not being able
to use parallelization to propagate changes. Consider using dedicated private
network to data acquisition systems.


SAP HANA Information Modeller:
Information Models are multiple database views of transactional data stored in the
physical tables of SAP HANA Database used for Analytical purposes. Analytical Data
Modelling is only possible For Column Tables i.e. Information Modeller only works
with column storage tables.
For that reason Replication Server creates SAP HANA tables in column store by
default. Data Services also creates target tables in column store as default for SAP
HANA database. The SQL command to create column table: "CREATE COLUMN
TABLE Table Name.. Also the data storage type of a table can be modified from Row to
Column storage with the SQL command "ALTER TABLE Table Name COLUMN".
We can choose to publish and consume SAP HANA tables data at four levels of
modelling using SAP HANA Studio Information Modeller Perspective. They
are Attribute View, Analytic View, Calculation View and Analytic Privilege. These
content data models are basically the combination of Attributes and Measures.
Attributes
Attributes are individual non-measurable analytical elements. Attributes add context to
data. These are qualitative descriptive data similar to Characteristics of SAP BW. For
example, MATERIAL_NAME. There are three types of Attributes in Information
Modelling:
Simple Attributes are individual non-measurable analytical elements that are
derived from the data foundation. For example, MATERIAL_ID and
MATERIAL_NAME are attributes of a MATERIAL subject area.
Calculated Attributes are derived from one or more existing attributes or
constants. The attribute is based on static value or dynamic calculation. For
example, extracting the year part from the customer registration date, assigning
a constant value to an attribute which can be used for arithmetic calculations.
Private Attributes are used to model Analytic Views and cannot be used outside
the view. Private attributes add more information to the data model. Private
attributes of Fact tables are used to link to the subject area or dimensions i.e.
Attribute Views. For example, we create an analytic view ANV_SALES to analyze
the sales of materials, and select MATERIAL_ID as a private attribute from the
database table SALES_ITEM. In this case, MATERIAL_ID could be used only for
modelling data for ANV_SALES. We will learn about private attributes later when
we will design Analytic Views.
Measures
Measures are simple measurable analytical elements. Data that can be quantified and
calculated are called measures. They are similar to Key Figures in SAP BW. Measures
are defined in Analytic and Calculation Views. Three types of measures can be defined in
Information Modelling:
Simple Measure is a measurable analytical element that is derived from the data
foundation i.e. defined in the fact table. For example, SALES_AMOUNT.
Calculated Measures are defined based on a combination of data from OLAP
cubes, arithmetic operators, constants, and functions. For example, Net Revenue
equals Gross Revenue - Sales Deduction, assigning a constant value to a measure
for some calculation.
Restricted Measures are used to filter the value based on the user-defined rules
for the attribute values. For example, Gross Revenue of a material for country =
US.
Attribute View
Attribute Views are the Reusable Dimensions or subject areas used for business
analysis. Attribute Views are defined in Information Modelling to separate Master Data
Modelling from Fact data. Examples of Attribute Views can be Customer, Material, and
Time. We define the Key or Non-key Attribute of the physical database tables of Master
Data. We can join Text tables to Master data tables or two Master data tables like
product and product group. Also tables can be selected from multiple schemas and are
not restricted to one schema per Attribute View. Activated Attribute Views can be
consumed for reporting or can be linked to the fact tables in Analytical Views.
Table J oins and Properties
The various Join Types available while modelling Attribute Views
are Referential, Inner, Left Outer, Right Outer and Text Join. Apart from that the Join
Condition and Cardinality (1:1, 1:N or N:1 ) needs to be defined accordingly. If we
select the Join Type as Text Join then we need to define the Language Column and
Description Mapping.
The Output structure of the Attribute View must be explicitly defined. At least one Key
Attribute is mandatory. However any number of Non-key Attributes may be defined.
We can also apply static Filter values (List of Values) on any columns of the tables
selected in the Attribute View. Also this column does not need to select as a Non-key
Attribute for output.
Hierarchies
Hierarchies are used to structure and define the relationship between attributes of an
Attribute View that are used for business analysis. Exposed models that consist of
attributes in hierarchies simplify the generation of reports. For example, consider the
TIME Attribute View with YEAR, QUARTER, and MONTH attributes. We can use these
YEAR, QUARTER, and MONTH attributes to define a hierarchy for the TIME Attribute
View. Two types of hierarchies are supported in Attribute Views of Information
Modeller:
Level Hierarchy - This hierarchy is rigid in nature, where the root and the child
nodes can be accessed only in the defined order. This need one attribute per
hierarchy level and number of levels defined are fixed. For example, COUNTRY,
STATE and CITY.
Parent/Child Hierarchy - This hierarchy is very similar to BOM (Parent and
Child) and Employee Master (Employee and Manager). The hierarchy can be
explored based on a selected parent, and there can be cases where the child can
be a parent. This hierarchy is derived based on the value. Variable number of
levels for sub-trees within the hierarchy is possible. For example, EMPID, MGRID.
At present, Hierarchies defined in the Information Modeller are only accessible via MDX.
This means that at present such hierarchies can only be used from MS Excel.
Time Dimension Attribute View
Two types of Time Dimension Attribute Views are supported in Information Modeller.
For Gregorian type Time Dimension the data is stored
in _SYS_BI.M_TIME_DIMENSION. For Fiscal type Time Dimension data is stored
in _SYS_BI.M_FISCAL_CALENDAR. Time Dimension Attribute Views are very often used
while defining the Analytical View.
Analytic View
Analytic views are the Multidimensional Views or OLAP cubes. Analytic Views are used
to analyze values from a single fact table of the data foundation based on the related
attributes from the Attribute Views, looks very similar to Star Schema. We create a
Cube-like view by joining Attribute Views to the Fact table data. For example, total sales
of a material in a given region at a given time.
Data Foundation & Logical View
In the Data Foundation tab we need to select the physical fact table. Next we define
the <attributes< b="" style="colour: rgb(68, 68, 68); font-family: 'Lucida Grande',
'Lucida Sans Unicode', Helvetica, Arial, sans-serif; font-size: 12px; font-style: normal;
font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: 18px;
orphans: 2; text-align: left; text-indent: 0px; text-transform: none; white-space: normal;
widows: 2; word-spacing: 0px; -web kit-text-size-adjust: auto; -web kit-text-stroke-
width: 0px; background-colour: rgb(255, 255, 255); ">and Measures of the Fact table.
We must define at least one Attribute and one Measure. In the Output structure the
attributes of the fact table will appear under the Private Attributes as these as related
only with the fact table. Optionally we can apply static Filter values on attributes of the
fact table. We can also define Calculated Measures or Restricted Measures while
designing the data foundation. Optionally we can also join database tables. We can
select attributes from several tables but they must be joinable. But we can select
measures from only one table (transactional data).
In the Logical View tab we can join as many Attribute Views from any package to the
Data Foundation. Attribute views are joined to the Private Attributes of the Data
Foundation. Typically we include all key attributes of the Attribute View in the join
definition. The default join type is Inner Join and the default Cardinality being N: 1.
The foundation view shows the physical table with all fields that can be incorporated in
to the final model. The logical view displays only those fields which have been chosen to
be included in the data model including the restricted and calculated measures defined.
Calculation View
Calculation Views are used to create data foundation using database tables, Attribute
Views, Analytic Views, and Calculation Views to address a complex business
requirement. If joins are not sufficient, we create a calculation view with SQLScript. Also
Calculation Views are required if the Key Figures span across tables. A Calculation View
is a composite column view visible to the reporting tools. When the view is accessed a
function is implicitly executed. Calculation Views can be modelled via Graphical or SQL
Script. Calculation Views support UNION. An example, comparing the sales of a material
in a particular region for the last two years.
Analytic Privilege
Defines Privileges to partition data among various Users sharing the same Data
Foundation. Analysis Authorizations for row-level security can be based on Attributes in
an Analytic Views. The SAP HANA database supports Analytic Privileges that represent
filters or hierarchy drilldown limitations for analytic queries. Analytic Privileges grant
access to values with a certain combination of dimension attributes. For example, if we
want to restrict access to a cube with sales data to values with dimension attributes of
region = US and year = 2010. As Analytic Privileges are defined on dimension attribute
values and not on metadata, they are evaluated dynamically during query execution.


Package
SAP HANA Packages are used to Group various related information objects in a
structured way. Attribute Views do not need to be in the same package while defining
Analytic View in some other package. Packages do not restrict access to Information
objects for Modelling.
Procedure
SAP HANA Database Stored Procedure defines sets of SQL statements that can process
data. We will learn how to write Procedures at a later section. Anyhow these follows the
same constructs like T-SQL of Microsoft SQL Server or PL/SQL of Oracle database.
</attributes<>