Académique Documents
Professionnel Documents
Culture Documents
BW on SAP HANA
Understanding HANA technology, what it means to your business, and what to expect during data migration
The SAP Business Warehouse (BW) is a core part of the SAP NetWeaver technology. Serving as a powerful Enterprise Data Warehouse application platform BW provides flexible reporting and analysis tools. Businesses are able to make well-founded decisions on the basis of this analysis. Business information from SAP and external data sources are integrated, and consolidated in BW on HANA. SAP HANA (HANA) is a new database and analytics engine. Data now resides in main-memory (RAM) and no longer on a hard disk. Complex calculations on data are not carried out in the application layer, but are moved to the database. By running BW on HANA your business will experience significant gains in speed for retrieving analytical queries and reports. In this section, we introduce core concepts of SAP HANA in-memory computing, and how these concepts can help your SAP NetWeaver 7.3 Business Warehouse run better. We also consider some of the technical implications of upgrading your current data warehouse to version 7.3, and of migrating it to the SAP HANA database. Lastly, we cover what you might expect during the process of transitioning to this new technology.
Row Storage - The data sequence consists of the data fields in one table row. Column Storage - The data sequence consists of the entries in one table column.
Traditional databases store data simply in rows. The HANA in-memory database stores data in both rows and columns. It is this combination of both storage approaches that produces the speed, flexibility and performance of the HANA database. OLAP queries on huge amounts of data take a lot of time because every single row is touched to collect the data for the query response. In columnar tables, this information is stored physically next to each other, significantly increasing the speed of certain data queries. Data is also compressed, enabling shorter loading times. The following example shows the different usage of column and row storage, and positions them relative to row and column queries. Column storage is most useful for OLAP queries because these queries get just a few attributes from every data entry. But for traditional OLTP queries, it is more advantageous to store all attributes side-by-side in row tables. HANA combines the benefits of both row- and column-storage tables.
In-memory Technology
In-memory technology moves data and information sources from remote databases into local memory so the results of analyses and transactions are available immediately. The elements of in-memory computing are not new. However, dramatically improved hardware economics and software technology innovations have made it possible to realize The Realtime Enterprise with in-memory business applications.
The cost of main memory has decreased significantly. It is now cost effective to store all data of a large enterprise in main memory. The SAP HANA Appliance is a combination of in-memory software and SAP-partner hardware that allows you to query multiple types of sources at speeds and in volumes as never before. All data are kept in main memory and can be processed at an incredible speed. HANA's real-time platform combines high-volume transactions with analytics to help create solutions that take your business performance to the next level. The HANA in-memory database can help your applications zero-in on the information you need without wasting time sifting through irrelevant data. The result: Instant answers to your complex queries, and better decision making across your enterprise. With optimized loading routines, system data can be restored quickly in case of power failures. The SAP HANA Appliance can fail over to a cold standby server to guarantee high availability.
Multi-core Processors
Processor speed is no longer dependent on clock speed but rather, on the degree of system parallelism. Modern server boards have many CPUs with several cores each. The HANA database is optimized to use the capabilities of multi-core processors in order to enable incredibly fast queries. Parallelism can be achieved on different levels from the application level to query execution on the database level. Processing multiple queries at the same time is handled by multi-threaded applications which map each query to a single core. Query processing also involves data processing,i.e., the database needs to be queried in parallel. HANA distributes the workload across multiple cores of a single system.
Using column-based tables enables easier data partitioning, and parallel processing wherever allowed. HANA uses multi-core systems on different layers to achieve highly-parallelized query execution.
SAP HANA Modeler SAP HANA Modeler is a graphical data modeling tool used to design analytical models and, later, analytical privileges that govern access to those models. SAP HANA Modeler supports:
ERP table metadata upload: Mass ERP table metadata upload using the Load Controller API Selective ERP table metadata import using Data Services integration Extractor metadata upload:
Extractor table metadata upload using the Load Controller API Selective Extractor table metadata import using Data Services integration
Business Value
BW on HANA enables
Faster information for better, more timely business decisions Lower Total Cost of Ownership (TCO) Tight integration with other parts of your SAP landscape Simplified configuration and operation Improved BW performance Answering many important business questions immediately Significantly faster analytics and reporting Access to the most current and complete business information Realtime access to transactional data Development of deeper insights into your business Elimination of data aggregation Cost effective management of large volumes of data New possibilities applying groundbreaking in-memory hardware innovations to your business needs
BUSINESS VALUE:
Faster decision-making
Having the right information when you need it
Increasingly sophisticated business decision models depend on fast access to and manipulation of massive data stores. Insight into business operations demands data volumes and velocity that are beyond the capabilities of traditional disk-based systems. HANA helps your SAP NetWeaver 7.3 Business Warehouse run better than ever. HANA enables you to analyze large amounts of data, from virtually any source, in near real time, making it possible to access reports with up-to-the-minute information. As an example, having the most current order and logistics information makes it possible to manage your inventory more efficiently, and to predict Available to Promise (ATP) more accurately.
eliminate entire models in many cases. The result is that the same amount of data for BW on HANA requires significantly less storage. Simplified Operations and Monitoring With the integration of basic HANA administration capabilities with the BW Admin Cockpit it is possible to perform and monitor most common database and data warehouse functions from one place. This reduces the number of tools that have to be installed, learned, and maintained, and reduces the skill set and training required to create, operate, and maintain your data warehouse.
In addition, HANA supports the BW Analysis Authorization Concept, and can be integrated with NetWeaver Identity Management to ensure security remains intact.
Migration Options
Various approaches to system implementation
Three options exist for implementing BW on SAP HANA. All three achieve the same result: copying your BW data into an SAP HANA database.
1. The easiest option: Create a totally new BW instance, and connect it to the SAP HANA database. 2. A more complex option: Upgrade an existing BW system to version 7.3 SPS5, then change the underlying database from a traditional disk-based relational database to the new in-memory HANA system. 3. The most popular option: Keep the current BW system running on a traditional database, while creating a new BW instance running on the HANA database. This third option is important for companies who already have an active BW system which must function continously and without interruption. If this is the case for your company, SAP strongly recommends you follow a parallel approach to data migration: keeping your production landscape in place while bringing up the BW on SAP HANA system. You can implement SAP NetWeaver BW scenario by scenario, with the assurance that the existing production landscape is still available as a fallback.
A parallel approach mitigates risk while simultaneously enabling you to familiarize yourself with the administration and capabilities of HANA. SAP also strongly recommends you consider the high availability and backup/recovery procedures of HANA before starting to use it in production systems.
1.4 Why is SAP HANA versionless and what is innovation without disruption?
SAP HANA was originally going to be numbered 1.0, 1.2, 1.5 and 2.0 and you will see this in some early literature. But what SAP have done is really interesting: they have removed the versions and provide innovations automatically when you update HANA. For the purposes of information and marketing, SAP HANA has patches - SP01 which was the ramp-up, SP02 which was the generally available version, SP03 which provided support for BW and SP04 which provides support for Text Analytics and High Availability. But the patches are just to let people know about the new features - there is no release of SP04. But the reality is that SAP HANA only comes released in Revisions. And for example, Revision 28 is SP04. So when last week, I had all our SAP HANA systems updated to SAP HANA Revision 28, we got the innovations from SP04 included. And this update takes about 10 minutes and can be done online in High Availability environments. This is what SAP call innovation without disruption and it seems to work really nicely.
Loading Data from Flat Files (CSV, XLS, XLSX) including automatic table creation in HANA Studio Enhancements for Attribute/Calculation Views, Usability, Security, Multi-language and technical. High Availability ETL-based Data Acquisition by SAP HANA Direct Extractor Connection
10
11
This is a perfect example of the simplification example I gave in the last question. With Oracle, you need to build your transactional database in Exadata, then you replicate this into the Exalytics Times-Ten database for reporting and into Essbase for forecasting. By contrast if you use SAP HANA, you store the information once in the SAP HANA appliance. From that one store you can do transaction processing, analytical reporting, forecasting and predictives. With HANA you are not moving information around the whole time and this simplifies the solution, enables the solution to be more easily changed and more agile. And you do not pay a performance penalty because everything happens in-memory.
12
The answer is it really depends on the number of unique values in your data. The fewer unique values, the better the compression. If you have raw flat files or uncompressed databases like DB2 or Oracle then I generally see 10x compression to be a good start point. If you are using DB2 or Oracle compression then you can expect that to reduce to 5x compression with HANA in an average scenario. Note that this is missing the point because HANA allows simplification. In one customer I have dealt with, they had 27TB of SAP BW database, but 20TB of this was aggregates and indexes used to improve performance. So when the database was moved to SAP HANA, they started with 7TB and got 5x compression. In real life this means compression of 27TB down to 1.5TB or 18:1.
2. SAP HANA database hardware 2.1 What hardware is supported right now?
I have broken out the SAP HANA Hardware guide into a separate FAQ - The SAP HANA Hardware FAQ There is a supported hardware list on SAP's website at: http://service.sap.com/pam (login required).
3. Technical FAQ 3.1 What source databases does SAP HANA support in real-time?
13
There are two mechanisms that HANA supports for near-real-time data loads. First is the Sybase Replication Server (SRS), which works with SAP or non-SAP source systems running on Microsoft, IBM or Oracle databases. This was expected to be the most common mechanism for SAP data sources but there remain some license challenges around replicating data out of Microsoft and Oracle databases, depending on how you license the database layer of SAP. If you buy your database license direct from the vendor then you are fine, but if you buy it through SAP then you may have a restricted license that does not allow for usage of SRS. For those scenarios, SAP have a second choice of replication mechanism called System Landscape Transformation (SLT). SLT is also near-real-time and works from a trigger from the SAP Business Suite products. This is both database-independent and pretty neat, because it allows for application-layer transformations and therefore greater flexibility than the SRS model. Note that SLT has now been extended to work with non-SAP source systems. In addition there is a new model, the Direct Extractor Connection. This provides a means to work with Business Content DataSources (DXC), which send data from an SAP Business Suite system to SAP HANA. With DXC, the Business Content extractors are redirected, and instead of flowing into SAP Business Warehouse, extracted data flows into SAP HANA directly. SRS has additional restrictions which are worth bearing on mind. It can only replicate Unicode data and does not support IBM DB2 compressed tables at this time.
3.2 What source databases does SAP HANA support for batch loads?
If you use SAP BusinessObjects Data Services 4.0 for bulk loads then pretty much anything. BO-DS is a very flexible Extract, Transform & Load tool that supports many databases. Data Services was previously called Data Integrator, and was previously called Acta, prior to being acquired by Business Objects. You can reasonably load into HANA using Data Services every 10 minutes and Data Services allows for excellent flexibility because you can take care of complex business transformations including e.g. address verification outside of HANA, which may allow simplified modelling within HANA. I hear that SAP plan to open up a certification for third-party ETL tools later in 2012. However there are plans to move the Data Services ETL engine into SAP HANA which would allow transformations to happen in-memory. This would provide a significant benefit over any other ETL tool.
Q. What is the difference between SAP Business Warehouse Accelerator (SAP BWA) & SAP HANA?
SAP BW Accelerator (SAP BWA) is an in-memory accelerator for BW. HANA is a full featured in-memory platform. BWA was specifically designed accelerate BW queries by reducing the data acquisition time by persisting copies of the InfoCube data in-memory. SAP BWA is focused 14
on improving the query performance of SAP NetWeaver BW. SAP BWA can be used today with any SAP BW 7.0 release and above. SAP HANA is an in-memory appliance and platform for delivering high-performance analytics and applications. As such, it includes a full-featured in-memory database. Data can be loaded into SAP HANA from SAP & non-SAP data sources and viewed using SAP BusinessObjects front end tools. In the near future, SAP HANA will also act as an In-memory database that will power SAP NetWeaver BW 7.3 and above. In this way it will be able to dramatically improve the overall performance of SAP NetWeaver BW by combining the value proposition of both the database & BWA into a single platform.. HANA & BW 7.3 PART -1 Over the past few months, I've presented on this topic to many customers and colleagues in and outside Walldorf. As there seems to be such a high demand, I've decided to convert the underlying slide presentation into two blogs, with the first focusing on the motivation, scenarios and use cases while HANA and BW 7.30 - Part 2 looks at the combination of HANA and BW from a technical angle. Before I start with the first blog please note that the usual disclaimer applies. Everything here has been announced at some SAP event - see The SAP Run Better Tour - BW Roadmap, for example. So I'm focusing on bringing pieces into context rather than revealing something that has not been known before. Overview Part 1 Review In-Memory Overview HANA and BW 7.30 - Part 2 In-Memory @ SAP HANA and BW 7.30 - Part 2 How HANA affects Data Warehousing HANA and BW 7.30 - Part 2 HANA Scenarios HANA and BW 7.30 - Part 2 HANA as BWA HANA and BW 7.30 - Part 2 Conclusion HANA and BW 7.30 - Part 2 Review In-Memory For a start, let's review the fundamentals behind in-memory computing. To that end, let's have a look at the table in figure 1 that I've gratefully borrowed from Andy Bechtolsheim's presentation at HPTS 2009. It shows what the semiconductor industry predicts on how the listed components will evolve - see the ITRS.
15
Figure 1: CPU module roadmap It is sufficient to look at the first two lines, the clock rateand the cores. Two things can be concluded from that: A. Moore's law will continue to apply. B. However, it will be based on scaling the number of CPU cores rather than the CPU clock rate - with power efficiency being the main reason for this change. The "however part" (B) is fundamental and carries a big mandate for the software industry, namely that parallelism will be key on those future CPU architectures. SAP's response to this is what has been labeled in-memory computing. However, this term over-emphasizes the aspect of main memory and comes a bit short of some other aspects that are at the heart of the performance benefits achieved in this context. The logic goes along the following lines: parallelism: as seen in figure 1, supporting the multi-core architectures via software parallelism is key in-memory: a prerequisite for parallelism is to have the related data located close to the cores in local memory columnar data structures: this, in turn, is a prerequisite to fit data into main memory; the columnar approach is extremely I/O efficient and is an enabler for the next bullet compression: columnar data can be more efficiently compressed than rowbased data due to a higher repetition of values and thus a higher potential to compress application-awareness: this is separate from the previous four technology arguments and comes down to building an engine tailored towards the SAP applications; the second blog will provide examples in the context of BW for this. In my opinion, the last item is one of the most overlooked and undervalued in the current debate. Actually, it is something that many other companies already and successfully do, namely exploiting inherent properties of the underlying applications to relieve some of the traditional RDBMS constraints in order to build innovative data processing clusters, e.g. based on MySQL nodes or Hadoop. The CAP theorem is an instance of that; see here for a few examples implemented by Ebay. SAP's BWA is another good example as it is tailored towards the BW schema. 16
In-Memory @ SAP SAP's response to the imperative for a new software architecture is its In-Memory Computing Engine (IMCE; aka NewDB). I don't want to engage into a deep essay on IMCE and think that - for simple purposes - you can look at IMCE as an evolution of BWA, albeit not tied to BW alone anymore, SAP's implementation of an in-memory DB, tailored towards SAP applications, a full, stand-alone SQL database, an OLAP processor for MDX queries. Now, HANA is the acronym for High Performance Analytical Appliance. Also, in a simplified (albeit not 100% technically correct) way, you can look at HANA as roughly: IMCE as an appliance however, it comprises more than just IMCE HANA is the term you likely hear in public for the remainder of this presentation: IMCE HANA (to avoid too much confusion) How HANA affects Data Warehousing The following pseudo equations originate from some joky internal discussions that we had but have proven to be helpful: 1. Today: EDW = RDBMS + X This means that an enterprise data warehouse (EDW) is not equal to a database system but requires a complement (here: X). Under Xyou can imagine code that is manually written or generated by tools, e.g. extraction programs DDL code (like CREATE TABLE statements) constraints, validation rules data transformations and harmonization process definitions, schedules and monitoring, failure handling (especially consistent restart) KPI definitions business semantic like rules on how to convert currencies or fiscal year definitions management of shared and private dimensions, including hierarchies defining and interpreting semantics on top of tables and columns, e.g. o column X is the parent column of a parent-child hierarchy H associated to dimension D o column Y is a unit key figure with the associated unit stored in column U o column Z is an attribute of dimension members whose key is compound in columns A and B o table T holds natural language descriptions for dimension member keys, whereby column L indicates the language and column C the description 17
column P in table Q is a foreign key of members of dimension D; referential integrity is guaranteed (yes/no) o time and calendar semantics, e.g. based on hierarchies like day month - quarter - year, week - year table and data management like defining standards on how to store a dimension (tables and their respective layouts), how to index and/or partition those tables (meta data) lifecycle of models and tables, like versioning, changes including impact analysis and propagation, development / test / production setup (data) lifecycle: archiving and the underlying management of archives (what has been archived and what not, avoid overlapping data containers, etc.) security, especially modeling and management based on higher conceptual levels like dimensions, members, hierarchies logging, auditing and other compliance-related features etc etc etc In summary, X addresses those requirements. It can be a bundle of generated code, meta data definitions, manually written programs etc. BW is an off-the-shelf instance of X. 2. Now: RDBMS HANA This indicates that traditional RDBMS technology gets overhauled by inmemory computing as implemented in HANA. 3. Thus: (new) EDW = HANA + Y Now, 1. and 2. get combined into 3. As HANA is not an exact 1:1 replacement of an RDBMS and as the constraints and "physical rules" of in-memory computing changes - especially the performance cost model - the software that sits on top (i.e. previously the X) needs to be adjusted to accommodate those new constraints and rules. This is indicated by moving from X in 1. to a Y in 3. Still, Y needs to address the same requirements as Xbut in a different way. Beyond that, there are even new and more opportunities given by the new constraints and rules, meaning that many more options are possible in Yin comparison to X. It is a paradigm shift similar to moving from analog to digital photography. Simply think of all the additional things that are possible with digital photography today! BW will follow this transformation from X to Y by tailoring it towards HANA. First steps will be visible with the BW 7.3 enablement of HANA planned for end of 2011. HANA Scenarios From my experience, the slides shown in figures 2, 3 and 4 are extremely helpful as they trigger fruitful discussions with customers. Essentially, I've discussed figures 2 and 3 in my blog on The BW - HANA Relationship. Please note that there is no "best" scenario but that each of the scenarios in figure 2 over-emphasize a certain property at the expense of another one. So, there are trade-off decisions behind those scenarios. This confuses many people who would like SAP to give a simple answer.
o
18
But, I guess, it's like when you buy a car: you need to trade off various aspects for choosing the right model for your specific purposes.
Figure 3: HANA scenarios and their respective trade-offs. HANA as BWA There have been many questions on the BWA-HANA relationship, e.g. whether there will be new releases of BWA, whether investsments into BWA would be safe etc. The basic plan is to enable HANA to play the role of a BWA in the future. In other words: in 2012 (plan!), it should be possible to buy a HANA box that can be set up and configured to run as an accelerator next to a BW like BWA did before. This offers two options to bring HANA into an existing BW 7.3 landscape - note that release 7.3 is a prerequisite for running BW with HANA: "conservative approach" (the two small arrows in figure 4): you bring in HANA as an accelerator for your existing BW. That way, you gain confidence with HANA, learn to operate HANA and already see a large amount of benefits. For example, HANA has a calculation engine that has been improved in comparison to the one in BWA. "progressive approach" (the long arrow in figure 4): this translates into migrating the DBMS server underlying your BW system to HANA. BWA as 19
accelerator becomes obsolete as HANA already incorporates the BWA calculation capabilities.
Figure 4: Migration options for a classic towards a HANA-based BW. Conclusion This concludes this first part. Hopefully, it has clarified what role HANA will play in a BW context. It should become obvious that there is a significant complement even though, and on a technical level, performance critical operators that are today implemented in the BW application stack are moved into the HANA engine. BW will eventually become a pure management software implementing a best practice approach that orchestrates the heavy data lifting inside HANA. HANA and BW 7.30 Part 2 will describe some examples on what is possible.
20