Vous êtes sur la page 1sur 8

TDWI RESEARCH

TDWI CHECKLIST REPORT

Using and Choosing


a Cloud Solution for
Data Warehousing
By Colin White

Sponsored by:

tdwi.org

JULY 2015

T DW I CHECK L IS T RE P OR T

Using and Choosing


a Cloud Solution for
Data Warehousing
By Colin White

TABLE OF CONTENTS

2 FOREWORD
3 NUMBER ONE
Understand the potential technology and business
advantages of the cloud for data warehousing
4 NUMBER TWO
Identify projects with pain points and needs that cloud
data warehousing can address
4 NUMBER THREE
Assess the differences between current cloud data
warehouse technologies and services
5 NUMBER FOUR
Identify the cloud offering that best fits with project
requirements and existing skills and tools
5 NUMBER FIVE
Assess the cost and complexity of deploying and maintaining the selected cloud data warehousing solution
6 NUMBER SIX
Understand how the cloud data warehousing solution
will integrate with the existing Information technology
environment
6 NUMBER SEVEN
Look for opportunities to use cloud data warehousing to
enhance the current data warehouse environment
7 ABOUT OUR SPONSOR
7 ABOUT THE AUTHOR
7 ABOUT TDWI RESEARCH
7 ABOUT THE TDWI CHECKLIST REPORT SERIES

555 S Renton Village Place, Ste. 700


Renton, WA 98057-3295
T 425.277.9126
F 425.687.2842
E info@tdwi.org
tdwi.org

2015 by TDWI, a division of 1105 Media, Inc. All rights reserved. Reproductions in whole or in
part are prohibited except by written permission. E-mail requests or feedback to info@tdwi.org.
Product and company names mentioned herein may be trademarks and/or registered trademarks of
their respective companies.

TDWI CHECKLIST REPORT: USING A ND CHOOSING A CLOUD SOLUTION FOR DATA WA REHOUSING

FOREWORD

Cloud computing is a hot topic and more organizations are using the
cloud for data warehousing. The cloud environment offers a pay-asyou-go, on-demand, and elastic scalability model that can provide
significant benefits for both the business and IT.
Compared to an on-premises IT environment, cloud computing
reduces up-front project costs and enables organizations to
scale their applications as required while paying only for the
resources they use. The cloud is, therefore, an ideal environment
for data warehousing projects given the large data volumes and
unpredictable nature of the analytic workloads involved.
Determining the data warehousing projects best suited to cloudbased computing is not easy, especially given the significant
changes taking place in data warehousing. Companies are now
beginning to take advantage of new data sources, advances in
business analytics, and enhanced database technologies, and this
can require significant changes and upgrades to the data warehouse
architecture.

For organizations with existing data warehousing and business


intelligence systems, one potential barrier to successful cloud
adoption is the complexity of integrating the cloud and on-premises
systems so that data can be efficiently and rapidly moved into and
out of the cloud. For these companies, the ability to seamlessly
integrate cloud services into their existing data warehouse
environment is an important and critical requirement.
To address these concerns and help readers succeed in their cloud
data warehousing projects, this checklist identifies the benefits the
cloud offers, offers potential use cases, and presents key criteria for
using and choosing a cloud solution for data warehousing.

Organizations are also anxious that without proper management,


the cloud could simply become a way of bypassing IT bottlenecks
and budget constraints, which could lead to data warehousing
projects being deployed in the cloud that are not well suited to that
environment. Data proliferation, data security, poor quality data,
and inconsistent analytics results are also concerns if there is no
gatekeeper managing access to the cloud environment.
It is essential, therefore, that IT and business groups work together
to identify, develop, and manage those projects that can gain the
most from the business and technological advantages the cloud
offers for data warehousing and analytical processing.
Another challenge facing organizations moving to the cloud is
choosing the right platform for deploying and supporting data
warehousing as a service. A wide range of products and services
are offered by both traditional and start-up vendors. Selecting the
right vendor is difficult and careful evaluation is required before
committing to a particular vendor and service.

2TDWI RESE A RCH

tdwi.org

TDWI CHECKLIST REPORT: USING A ND CHOOSING A CLOUD SOLUTION FOR DATA WA REHOUSING

NUMBER ONE

UNDERSTAND THE POTENTIAL TECHNOLOGY AND BUSINESS


ADVANTAGES OF THE CLOUD FOR DATA WAREHOUSING

Aids data warehouse modernization. Data warehousing is


changing rapidly as vendors introduce powerful new technologies
that enable companies to process a wide range of new types of data
and employ sophisticated analytics to gain greater in-depth insight
into their business operations than ever before. To rapidly maximize
the business benefits these new technologies offer and keep ahead
of competitors, organizations need to deploy new analytic solutions
both faster and at a lower cost.
Unfortunately, traditional data warehouse approaches can
become a bottleneck to achieving this fast time to value, so many
organizations are modernizing their data warehouse architectures
and development processes to exploit the business benefits of new
data management and business analytics technologies. The main
goals of modernizing the data warehouse
are to:
Capture, manage, and analyze data from a broader set of both
internal and external data sources, including data from Web
operations, social media interactions, sensors, public information
databases, and cloud-based operational systems
Maintain business-user service levels despite ever-increasing
and unpredictable data volumes and analytic workloads
Increase business user self-sufficiency and support the rapid
growth of mobile device use

Cloud computing can help overcome the costs and delays often
incurred in deploying new technologies for prototyping, developing,
and operationalizing new analytic solutions. The software-as-aservice cloud model eliminates the need to install and maintain new
hardware and software and reduces up-front infrastructure costs by
offering pay-as-you-go pricing.
Elastic capacity supports changing resource requirements.
Cloud processing and storage capacity adjusts to changes in
resource and workload needs, which is especially important given
the unpredictable nature of analytics workload processing needs
and growth. Some cloud-computing vendors also provide additional
services that make it easier for data warehousing applications to
adapt to changing resource requirements.
Provides fast time to value for the business. Although it can
be argued that business users should not be concerned about the
underlying technologies supporting the analytics they use, there is
still a direct correlation between IT benefits and business benefits.
For example, if the business user requires a certain type of
information, and new technologies enable this information to be
made available, modeled, and analyzed rapidly at a low cost, then
this is a direct benefit to the business user. The technology benefits
of cloud computing to the IT department are also of business
value because they enable IT to respond rapidly to the needs of the
business.

Provide an investigative computing environment that enables


users to rapidly prototype potential analytic solutions against
both existing and new types of data without adversely affecting
the production data warehousing environment
The cloud environment is an important component of a data
modernization effort because, as discussed later in more detail, it
helps reduce costs, increase flexibility, and speed up deployment.
Pay-as-you-go pricing model reduces up-front costs. The
challenge facing IT is how to modernize the data warehouse while
enabling analytics solutions to be built faster and at a lower cost.
New technologies often require upgrades to hardware, operating
systems, data management systems, and data integration and
analytic tools and applications. These upgrades frequently increase
IT costs and create delays in implementing new business analytics
applications.

3TDWI RESE A RCH

tdwi.org

TDWI CHECKLIST REPORT: USING A ND CHOOSING A CLOUD SOLUTION FOR DATA WA REHOUSING

NUMBER TWO

IDENTIFY PROJECTS WITH PAIN POINTS AND NEEDS THAT


CLOUD DATA WAREHOUSING CAN ADDRESS

Start with business units that recognize the value of cloud


computing. The best place to begin identifying potential projects
is in a business unit that recognizes the benefits of cloud
computing for data warehousing and that has specific pain points
or needs that are not being addressed by IT for priority, resource,
budget, technology, or skills reasons. The cloud data warehousing
environment has been especially successful in small and midsize
companies with limited IT resources and in business units that may
already run some of their operational business processes in the
cloud.
However, not all data warehousing projects are suited to cloud
computing, and it is important to clearly identify those projects that
lend themselves to a cloud environment. Examples of potential use
cases include the following:
Standalone reporting and analysis of Web, social media, or
sensor data: A cloud-based reporting and analysis system is
a cost-effective way of capturing, storing, and analyzing highvolume Web log, clickstream, social media (Twitter, for example),
or sensor (such as telemetric devices) data.
Analysis and visualization of e-business data and processes:
Many organizations (Web retailers, online gaming companies,
etc.) run their entire businesses on the Web. The applications
involved in e-business are often deployed on hundreds of servers
and generate terabytes of data every day. A cloud-based system
is ideally suited to analyzing and visualizing all of this data to
help managers analyze business operations and performance.
Data warehouse augmentation: A cloud-based data
refinery or data lake is a cost-effective way of capturing,
storing, transforming, and archiving raw data while providing
connectivity to an in-house data warehouse for transferring
data. The cloud can also be used for investigative computing
(i.e., data exploration and discovery) that can be expensive to
implement on premises.
The cloud can be used for prototyping, development, and
production. It is also essential that the complete project
management life cycle be considered when evaluating projects.
This is important because not all components of the life cycle will
necessarily occur in the cloudsome projects may be prototyped or
developed in the cloud but deployed on premises.

4TDWI RESE A RCH

NUMBER THREE

ASSESS THE DIFFERENCES BETWEEN CURRENT CLOUD


DATA WAREHOUSE TECHNOLOGIES AND SERVICES

Look for vendors that understand the unique requirements of


data warehousing. When public cloud services initially became
available, the market was defined as three types of service:
software-as-a-service (SaaS), platform-as-a-service (PaaS), and
infrastructure-as-a-service (IaaS). As cloud use has grown and
more vendors have entered the market, this simple classification
scheme has become inadequate, and a variety of new schemes have
emerged.
These newer categorization schemes may be useful for selecting a
solution that addresses a specific type of technology, but they say
little about implementing a data warehousing project that employs a
variety of technologies. When assessing cloud vendors, do not focus
on terminology. Instead, identify vendors that understand the unique
requirements of data warehousing and can provide a complete
solution for implementing data warehousing in the cloud.
Evaluate the components of the data life cycle the vendor
supports. There are many components to the data life cycle, from
data integration and management to data analysis and delivery. The
actual components required will vary by project, but it is important
to assess the components of the life cycle supported by the vendor
or provided by partner organizations. Some vendors offer datawarehousing-as-a-service (DWaaS) or analytics-as-service (AaaS),
which combine SaaS, Paas, and IaaS and enable:
Capturing and extracting data from trusted sources
Managing and controlling data under comprehensive policy and
governance guidelines
Performing data integration, transformation, analysis, and
visualization
Managed services for data warehousing is a key requirement.
To simplify deployment and administration in the cloud, some
vendor offerings include managed services that may involve free
or fee-based consulting services or additional capabilities that
simplify development and administration. The issue here is that the
term managed services is used differently by vendors. Often these
services are technology- or platform-specific and are not suited to
helping implement, manage, and administer a data warehousing
environment. During vendor evaluations, its important to understand
what a vendor means by managed services and whether they provide
specific services for deploying and managing a data warehousing
environment.

tdwi.org

TDWI CHECKLIST REPORT: USING A ND CHOOSING A CLOUD SOLUTION FOR DATA WA REHOUSING

NUMBER FOUR

NUMBER FIVE

IDENTIFY THE CLOUD OFFERING THAT BEST FITS WITH


PROJECT REQUIREMENTS AND EXISTING SKILLS AND TOOLS

ASSESS THE COST AND COMPLEXITY OF DEPLOYING AND


MAINTAINING THE SELECTED CLOUD DATA WAREHOUSING
SOLUTION

Identify the best fit to system hardware and software


requirements. Hardware processing and storage requirements will
be largely dependent on the projects data volumes and analytics
workloads. These requirements are often difficult to determine and
this is where the elasticity of the cloud is beneficial. The system
software required will depend on the data warehousing software
used, so be sure you understand what tools youll need to integrate
with your chosen solution.

Consider the complete application and data life cycle. The total
cost of ownership (TCO) is a key metric for assessing the cost and
complexity of using a particular data warehousing solution. This
metric enables an organization to select the right cloud service
and also determine how much can be saved by using a cloud
environment. It is important to note that TCO considers more than
just the projects hardware and software costs. TCO calculations
must consider the complete application and data life cycle, from
initial conceptual design to final operation, administration, and
support.

Identify the best fit to data management requirements. Most


data warehouses have been implemented with relational technology,
but new developments provide open source and non-relational
options (there are several Hadoop-based products, for example) that
can reduce software costs and improve performance for certain
types of workloads. A barrier to success with these new options is
that their reliability and development costs are often misunderstood
and underestimated.

Importing, exporting, and accessing data can be costly. It


is essential to consider the costs of importing, exporting, and
accessing data in a cloud service. These costs will depend on where
the data resides and its volume. Companies are often surprised by
the costs of data movement. Potential data growth and the archiving
of less-active data are also important cost factors to consider.

Identify the best fit to data integration tools and applications


requirements. Modern data warehousing projects often involve
the integration of new data types. This data may be structured
or multi-structured and may reside on internal and/or external
systems. Multi-structured data, such as Web data, is more difficult
to process, making integration more difficult. Data integration is
one of the most resource-intensive of data warehousing tasks, and
your cloud solution must fit with your data integration strategy and
software.

Understand the managed services provided. The availability and


cost of managed services varies by cloud vendor. Many vendors
provide some level of services and/or tools for helping companies
implement projects in the cloud. In many situations, however,
these services only provide support and administration of the
system hardware and software infrastructure. In the case of a
data warehousing project, managed services often do not support
data-related tasks in areas such as data design, acquisition,
transformation, loading, exporting, archiving, and analysis.

Identify the best fit to analytic tools and application


requirements. The technology involved in a modern data warehouse
may require installing new analytic products. The main objective
here is to provide a seamless user interface to data no matter where
it resides.

A data-warehouse-specific service is a distinguishing


factor. Support for data-warehouse-specific operations is a
key distinguishing feature among cloud vendors. An end-to-end
managed cloud service for data warehousing cuts time to value
when implementing a new cloud project. The managed service
model is especially attractive to companies with limited IT resources
because it eliminates many of the standard tasks in an in-house
project.

Assess the differences between the vendor solution and the


in-house IT environment. Additional factors to consider include
data import and export, workload management, authorization and
security, disaster recovery, and help desk support. The tools used
in a cloud solution may also differ from those used in house, and
this can impact skills and education requirements for both IT and
business users. Careful evaluation is also required of the vendors
pricing model and the managed services provided.

5TDWI RESE A RCH

tdwi.org

TDWI CHECKLIST REPORT: USING A ND CHOOSING A CLOUD SOLUTION FOR DATA WA REHOUSING

NUMBER SIX

UNDERSTAND HOW THE CLOUD DATA WAREHOUSING


SOLUTION WILL INTEGRATE WITH THE EXISTING
INFORMATION TECHNOLOGY ENVIRONMENT

Integrated data warehouse service simplifies implementation.


The main tasks in any data warehousing project involve acquiring
and integrating the raw source data, managing and processing
the data, and delivering the results to the systems and users that
require the processed data. As in an in-house environment, cloud
users have the choice of integrating various cloud products and
services themselves or using an integrated, end-to-end solution. In
the same way that an integrated hardware and software appliance
simplifies development, deployment, and administration for
on-premises projects, an integrated end-to-end cloud solution for
data warehousing offers similar benefits to an appliance approach.
Data integration and movement can become a barrier to
success. One of the biggest barriers to cloud deployment is
data integration and data movement. Ideally, the data should be
processed where it resides, but even when the source data already
resides in the cloud, it may still have to be moved to a different
cloud system for processing in the same way that data is moved
from business transaction systems to a data warehouse in an
in-house environment.
Users need to access both cloud and in-house data. An added
complication is that the project may also involve a mixture of
in-house and cloud data. In this case, the in-house data may be
accessed dynamically (using data virtualization, for example) by a
cloud application or staged from the in-house environment to the
cloud for use by the cloud application. Again, this is the same as
in an in-house environment where data warehouse projects are
increasingly using data from a variety of sources in addition to
a data warehouse. It is important to realize, however, that data
movement in a cloud environment occurs across a public Internet
connection and this may have security, performance, and cost
implications.
Data integration is an important success factor. It is very
important in a cloud environment to look for solutions that simplify
development, deployment, and administration as well as provide
solid and well-performing data integration and data movement
capabilities.

6TDWI RESE A RCH

NUMBER SEVEN

LOOK FOR OPPORTUNITIES TO USE CLOUD DATA


WAREHOUSING TO ENHANCE THE CURRENT DATA
WAREHOUSE ENVIRONMENT

Enhance the existing data warehouse. When considering


modernizing the traditional data warehouse and looking at the use
of cloud computing for data warehousing, much of the emphasis is
on building new applications and capturing and analyzing new data
sources for those applications. It should not be forgotten, however,
that data warehouse modernization and cloud computing can also be
used to enhance the current data warehouse environment.
Apply new technologies to existing projects. New types of data
can be used in existing applications to broaden and deepen an
organizations understanding of the business factors that affect
business operations and processes. New analytics processes enable
business analysts and managers to move beyond basic reporting
and descriptive (i.e., diagnostic) analytics to exploit advances
in predictive analytics and data visualization. The items in this
checklist can be used to evaluate the use of cloud-based data
warehousing for new projects as well as for enhancing existing ones.
Overcome in-house performance and cost issues. At a more
general level, cloud computing can be used to reduce the costs
and/or improve the performance of existing data warehouse
operations. In some cases, cloud data warehousing can even
make possible what cannot be achieved with an in-house system.
Increasing data volumes, aging hardware, rising software costs,
and loss of skills due to staff turnover can all affect the ability to
maintain existing service levels and manage costs.
Equally, in many organizations, administration and maintenance
costs are steadily increasing and becoming a higher percentage of
the total IT budget. A cloud service provides an elastic computing
environment that automatically adjusts to changing resource
requirements, and outsourcing resource-constrained projects to
cloud-based data warehousing can alleviate cost and service-level
issues.
Enable the organization to focus on data rather than
technology. Cloud computing eliminates the obstacles and pains
of managing infrastructure, enabling your organization to focus
on using its data rather than on dealing with the technology. The
inclusion of a managed service by the cloud vendor is also a key
success factor in using the cloud to reduce the costs and improve
the performance of existing data warehouse operations.

tdwi.org

TDWI CHECKLIST REPORT: USING A ND CHOOSING A CLOUD SOLUTION FOR DATA WA REHOUSING

ABOUT OUR SPONSOR

www.snowflake.net
Snowflake Computing, the cloud data warehousing company, has
reinvented the data warehouse for the cloud and todays data. The
Snowflake Elastic Data Warehouse is built from the cloud up with
a patent-pending new architecture that delivers the power of data
warehousing, the flexibility of big data platforms, and the elasticity
of the cloudat a fraction of the cost of traditional solutions.
The company is backed by leading investors including Altimeter
Capital, Redpoint Ventures, Sutter Hill Ventures, and Wing Ventures.
Snowflake is headquartered in Silicon Valley and can be found online
at snowflake.net.

ABOUT TDWI RESEARCH

TDWI Research provides research and advice for data professionals


worldwide. TDWI Research focuses exclusively on business
intelligence, data warehousing, and analytics issues and teams
up with industry thought leaders and practitioners to deliver both
broad and deep understanding of the business and technical
challenges surrounding the deployment and use of business
intelligence, data warehousing, and analytics solutions. TDWI
Research offers in-depth research reports, commentary, inquiry
services, and topical conferences as well as strategic planning
services to user and vendor organizations.

ABOUT THE TDWI CHECKLIST REPORT SERIES

ABOUT THE AUTHOR

Colin White is the founder of BI Research and president of


DataBase Associates Inc. As an analyst, educator, and writer he
is well known for his in-depth knowledge of data management,
information integration, and business intelligence technologies and
how they can be used for building the smart and agile business.
With many years of IT experience, he has consulted for dozens
of companies throughout the world and is a frequent speaker at
leading IT events. Colin has written numerous articles and papers
on deploying new and evolving information technologies for business
benefit, and is a regular contributor to several leading print- and
Web-based industry journals. For 10 years he was the conference
chair of the Shared Insights Portals, Content Management, and
Collaboration conference. He was also the conference director of the
DB/EXPO trade show and conference.

7TDWI RESE A RCH

TDWI Checklist Reports provide an overview of success factors for


a specific project in business intelligence, data warehousing, or
a related data management discipline. Companies may use this
overview to get organized before beginning a project or to identify
goals and areas of improvement for current projects.

tdwi.org

Vous aimerez peut-être aussi