Vous êtes sur la page 1sur 12

Best Practices in Leveraging a Staging

Area for SaaS-to-Enterprise Integration

David S. Linthicum
SaaS-to-enterprise integration requires that a number of architectural calls are made around
the process of integration, including the frequency of data movement, the mechanism
to maintain data integrity, and the underlying integration technology and location of that
technology. When approaching SaaS-to-enterprise integration, there are many architectural
options. These options include the direct migration of information from source to target, or
the use of a staging area to process information more thoroughly from source to target.
A staging area is a relational database that exists within the frewall. Staging areas are a place
to store temporary data for import, and thus allow you to check and alter this data before
committing it to the fnal target database.
Clearly, the use of a staging area as an approach to integrating SaaS-based systems to
enterprise-based systems provides a number of advantages, including the ability to better
maintain data integrity and data quality, as well as providing better control over the data.
However, a staging area approach to integration requires that the data integration architect
understand both the core notions and best practices of SaaS-to-enterprise integration.
Moreover, the architect must leverage the right technology.
In this paper well look at the notion of leveraging a staging area for SaaS-to-enterprise
integration, which includes suggesting a methodology that leverages best practices. Well
examine the value of this approach, as well as provide information around the proper
enabling technology.
Integration Patterns
Those who deal with SaaS-to-enterprise integration need to deal with a number of issues,
including semantic mediation, data quality, interface mediation, routing, fltering, and other
logical operations on the data moving from SaaS-to-enterprise, or enterprise-to-SaaS. The
idea is to make the data as useful as possible when it reaches the target system, including
any business intelligence operations that need to occur and/or any business operations on
that data, on-premise or SaaS-resident.
Thus, in order to fgure out the best data integration approach, you frst need to understand
your own integration requirements and your existing application and data infrastructure. This
includes the usage of the data when placed on the target system, and how that maps from
the source. To that end, there are two basic patterns emerging around data integration for
SaaS-to-Enterprises. They include Direct integration and Staging integration.
Direct integration, as depicted in Figure 1, is really about moving information from one
data source and data schema to another, and translating the differences in semantics from
the source to the target system. While we depict databases here, this could as easily be
data bound to enterprise applications, such as ERP or CRM, or data bound to SaaS-based
applications, such as Salesforce.com.
The idea is to extract the information found in the source system, such as customer
information, and place it in the target system using whatever native schema the target system
uses, and translate the differences in schema and content, as needed and on the fy. This
allows information to fow from one system to another without having to change either the
source or the target system. The Direct integration approach accounts for the differences in
schema using a translation and routing engine that exists between two systems. This is more
along the lines of traditional integration, as has been defned for years.
[ 2 ]
Figure 1: Direct integration simply moves information from source to target (e.g., SaaS-to-Enterprise), translat-
ing and transforming the data on the fy. While this is okay for simple data integration needs, it falls short
when the integration requirements are more complex.
Direct integration is typically leveraged when quick and dirty information integration needs
to occur on operational-type data, usually from one system to another. For instance, updating
the on-demand CRM system with customer data entered in the sales order entry system,
which is on-premise.
Direct integration is characterized by:
Limitations in supporting complex data operations, generally doing very simple data
operations such as transformation and routing.
More fne grained, meaning the data sets are usually small and repeat.
Operational focused, or when data needs to fow between systems to support ongoing
light transactions.
In contrast to direct integration, a staging approaching to integration supports more complex
and valuable data integration operations, including support for many large data sets and data
operations that are more complex and of higher value. Using a staging area, or a temporary
location where the data from the source system or systems is replicated, provides a logical
location to perform complex operations on the data that would be diffcult if not impossible
to do when using direct integration approaches.
An example would be extracting data from multiple enterprise data stores, manipulating the
data as to structure and content, performing complex operations such as data replication,
data aggregation, and data cleansing, and then posting the data to a SaaS-based application
such as Salesforce.com. Or, perhaps doing the same type of data integration in reverse order,
from SaaS to the enterprise.
Leveraging a staging area for integration is characterized by:
The ability to perform more complex operations on data, including complete
transformation of semantics and the data content using any number of dimensions since,
in essence, you operate on an intermediary database that you control completely.
The ability to leverage more coarse grained and complex data sets that may not
always repeat.
Informational focused, supporting valuable information externalization approaches,
including business intelligence.
More fexibility around business cycles, data processing cycles, widely disbursed systems,
and hardware and network limitations, where it may not be feasible to extract all
operational databases at the same time.
The ability to better support complex database functions, including replication, cleansing,
and aggregation.
[ 3 ]
As Figure 2 depicts, the use of a staging area is really about the gathering of information from
many different sources that leverage different structures, data content, and operations on
that data. For instance, within some enterprise systems you maybe able to extract sales data
on a daily basis. However, when considering fnancial data, daily extracts are not suitable or
not possible, since fnancial data typically requires a month-end reconciliation process. You
may fnd similar things in geographically disbursed data, such as extracting sales data from
London and New York at the same time, when the data is posted at different times of the
day due to time zone differences.
[ 4 ]
Figure 2: The use of a staging area is really about gathering the information from many different sources and
creating a single, unifed view of the data for integration to a SaaS-based systems, such as Salesforce.com.
Defning the Value
Considering the advantages of leveraging staging for SaaS-to-enterprise integration, there are
many business advantages over more traditional integration approaches, including:
The cost advantage of keeping volatility within a single domain.
The value of agility, as related to the ability for the integration patterns to change as
the needs of the business change.
The value of customized complex views for specifc systems and users.
The value of leveraging clean, reliable data.
The cost advantage of keeping volatility within a single domain refers to the fact that, since
a staging area is where all of the data from different sources is temporary housed, using
different formats and different time sensitivity, then any changes made to the source and/or
target systems are handled within the staging area. This is especially useful when considering
enterprise-to-SaaS integration, since the dynamic nature of the information on both the
enterprise and SaaS sources and targets constantly changes, in terms of structure, content,
and the interfaces. All of these changes can be abstracted within the staging area which is
able to adjust using a confguration mechanism, rather than force expensive redevelopment.
The value of agility is an outcome of keeping volatility in the single domain of the staging
area, since the more a company changes, and thus requires changes to the enterprise
and SaaS-delivered systems, the more value agility brings. Remember, its agility that allows
enterprises to adjust to take advantage of new market trends, or quickly reposition the
business to avoid the negative and take advantage of the positive. Thus, the value of agility is
substantial, considering that the ability to quickly move into a new market could make the
company millions of dollars in the forthcoming year, or provide the ability to move quickly
away from an unproftable business process to avoid huge losses.
The value of customized, complex views for specifc systems and users refers to the fact that,
when you use a staging area, youre able to provide the information thats needed to the
system that needs it, customized for users specifc needs. An example would be the ability
to extract customer sales information from a SaaS-based system and credit information
from a credit reporting system. Through the use of a staging area where the information
is combined, it is then updated using a specifc format into a sales reporting system that is
leveraged by the executives within the company. This allows the information to be combined
and presented to the target system, in this case, the target customer reporting systems, using
views of the information that are specifc to that system.
Finally, the value of leveraging clean, reliable data that goes through a cleansing and validation
process insures there is no missing or invalid information. The value of this is obvious.
[ 5 ]
Leveraging a Staging Area Best Practices
In order to do effective staging for enterprise-to-SaaS problem domains, we need to follow a
few basic steps. These steps include:
Understand the source and target data.
Defne the data staging approach.
Design the data staging operations.
Implementation and testing (see Figure 3).
[ 6 ]
Figure 3: For SaaS-to-Enterprise data staging integration, you need to carefully consider the approach and the
design, after understanding your own requirements and existing infrastructure. Testing needs to occur as well, in
order to validate the solution.
[ 7 ]
Understand the source and target data refers to the process of understanding all
source and target systems at a semantic level, meaning what the data means in the larger
context of the enterprise architecture. This is important since integration requires that we
have an understanding of the source information in order to perform the operations that
will be completed within the staging area, including aggregation, replication, cleansing, and
Defne the data staging approach refers to the process of creating a high-level approach
to the data staging for your particular problem domain, including defnition of the core
purpose of the data staging operations, and a description of the transformation, replication,
cleansing, and aggregation operations that need to occur. This describes the What needs to
be done step.
Design the data staging operations refers to what happens to the data as its fed from
the operational data sources, including consumption of the information into a common
format, and the manipulation of the data for use by the target system, as defned by the
approach we created in the previous step. This describes the How something is done step.
Implementation and testing refers to implementing the staging area and making sure that
it works as designed, and is able to maintain integrity of the data as its manipulated within
the staging area. This typically requires the creation of a test plan, test data, and acceptance
testing before the staging area is placed into production.
Leveraging Informatica PowerExchange for Salesforce CRM
Informatica PowerExchange for Salesforce CRM lets you use Informatica PowerCenter to
migrate, replicate, cleanse, and synchronize your Salesforce CRM or Force.com application
data, with your on-premise enterprise business applications and databases (see Figure 4).
PowerExchange takes advantage of the staging approaching to integration, as defned in this
[ 8 ]
Figure 4: Informatica PowerExchange for Salesforce CRM
Benefts of using PowerExchange for Salesforce CRM include the reduction of development
costs due to the confguration and visual approach that this technology provides. Thus, there
is an increase in IT productivity since it reduces or eliminates coding to the Salesforce.com
API. Because you capitalize on the value of data integration through the use of staging, you
take full advantage of your investment in Salesforce.com.
Other features of PowerCenter include:
Execute bidirectional Salesforce integration without requiring code.
Bring Salesforce data within your corporate frewall.
Use one tool to transform and cleanse data before delivering it to your on-premise
enterprise software and databases.
Pass Salesforce data directly from source to target without intermediate staging in
message queues.
Simplify the tasks of migrating, replicating, cleansing, and synchronizing Salesforce data
across the Internet with business analytics applications or master data hubs.
Distill new and updated Salesforce data from unchanged background data, and send
only the changes for further processing.
Receive Salesforce data in a batch on any chosen schedule, or as a real-time stream
of updates, all with a common architecture, and all without coding or scripting.
Support any enterprise data integration initiative involving Salesforce data.
Scale to support Web services in SOAs, event-driven architectures, and traditional
data integration techniques on multiple platforms.
Choose your preferred technology and evolve gradually as conditions change.
[ 9 ]
[ 10 ]
Leveraging Informatica On-Demand
When it comes to populating the staging area, you should consider Informatica On-Demand.
Informatica On-Demand allows you to leverage a staging area, either to or from the SaaS-
delivered applications, such as Salesforce.com. The facilities of Informatica On-Demand allow
you to use this staging area without a signifcant amount of data transformation, and it is
easily leveraged by the application administrator.
Moreover, by leveraging Informatica On-Demand, the application administrator is able to
focus on the staging area, and not the transformation. The transformation is carried out by
mechanisms introduced by the developer, leveraging tools such as PowerCenter (discussed
previously). This allows for the implementation of strict validation rules and logic required by
the target systems.
Therefore, the use of both Informatica On-Demand and PowerCenter frees IT from the
requirements of understanding the natives of a particular application, and thus changes that
are requested by the end user (see Figure 5). Informatica is the only vendor that provides
the technology to serves both needs, and thus is one-stop-shopping for those looking to do
data integration, from SaaS-to-enterprise, leveraging a staging area.
Figure 5: The use of both Informatica On-Demand and PowerCenter frees IT from the requirements of under-
standing the natives of a particular application, and changes that are requested by the end user. Informatica
is the only vendor that solves both of those problems.
[ 11 ]
Data Staging Taking Center Stage
As we learned in this paper, the need for data integration to support integration from and
to SaaS-based applications, such as Salesforce.com to on-premise information systems, is
a core need as the interest in SaaS and cloud computing grows. However, those who look
to address this integration using more traditional, tactically-oriented tools are doomed to
replicate improperly processed information to the target system that receives it.
The use of staging as an approach to data integration provides the data or enterprise
architect with the ability to perform complex processing on an intermediate form of the
data, which in turn provides a location to gather the data when its ready to be gathered,
aggregates the data so it has better business meaning, and cleans the data where needed to
insure data quality. In essence, staging provides a location where value can be added to the
data as it moves from on-premise to SaaS, and from SaaS to on-premise.
[ 12 ]
2009 David S. Linthicum, LLC
6986 (06/04/2009)
About the Author
David Linthicum (Dave) is an internationally known Enterprise Application Integration (EAI),
Service Oriented Architecture (SOA), and cloud computing expert. In his career, Dave has
formed or enhanced many of the ideas behind modern distributed computing including EAI,
B2B Application Integration, and SOA, approaches and technologies in wide use today.
Currently, Dave is the founder of David S. Linthicum, LLC, a consulting organization dedicated
to excellence in SOA product development, SOA implementation, corporate SOA strategy,
and leveraging the cloud computing. Dave is the former CEO of BRIDGEWERX, former
CTO of Mercator Software, and has held key technology management roles with a number
of organizations including CTO of SAGA Software, Mobil Oil, EDS, AT&T, and Ernst and
Young. Dave is on the board of directors serving Bondmart.com, and provides advisory
services for several venture capital organizations and key technology companies.
In addition, Dave was an associate professor of computer science for eight years, and
continues to lecture at major technical colleges and universities including the University of
Virginia, Arizona State University, and the University of Wisconsin. Dave keynotes at many
leading technology conferences on application integration, SOA, Web 2.0, cloud computing,
and enterprise architecture, and has appeared on a number of TV and radio shows as a
computing expert.