Vous êtes sur la page 1sur 12

Guide to Reducing ETL and Data Integration Costs by 80%

This white paper will show how to save up to 80% on open source data integration and ETL (Extract, Transform, and Load). You will learn about the trends in todays data integration market that help companies benefit from lower Total Ownership Cost (TCO) of open source ETL (Extract, Transform, and Load)solutions.

Apatar Open Source Data Integration Tutorial Series

Guide to Reducing ETL and Data Integration Costs by 80%

2 of 12

Table of Contents
1. 2. 3. 4. 5. 6. 7. 8. Executive Summary Driving Down Data Integration Costs Enterprises Go Open Source Major Companies Recognize Lower TCO Measuring TCO Benefits of Open Source Data Integration Apatar Open Source Data Integration Best Practices About Authors 3 3 4 6 7 8 10 11 12

Appendix: References

Guide to Reducing ETL and Data Integration Costs by 80%

3 of 12

1.

Executive Summary
Companies of all sizes are challenged to deliver their products and services to market faster and to manage more complex sales and marketing programs with limited budgets and decreasing time frames in order to accelerate revenue generation. To do so, having the right data integration and data quality model is critical. And financial expense on data integration is certainly one of the most important points about its model.

Corporate developers spend approximately 65 percent of their effort building bridges between applications.

-- Gartner

Today, companies spend enough on data Extraction, Transforming, and Loading (ETL) to start thinking whether this technology is as beneficial as it is positioned. And the sums are still going up. Yet, most companies seem to underestimate their own expenses on ETL. License costs are the only factor (albeit not an inexpensive one, for sure) to be typically taken into account, while the real Total Cost of Ownership (TCO) is comprised of labor costs and hardware costs as well, and can outnumber license costs by many times. The true amount that an enterprise may spend on ETL can reach millions of dollars. Non-IT executives would be horrified to realize how enormously money-consuming data integration is. But there is a way out. This white paper will show how saving up to 80% on ETL cost is possible. We will analyze Total Cost of Ownership of a typical data integration project, break down cost structure, determine how each of them can be cut down, and then turn to major companies experience of saving on data integration.

2.

Why Data Integration is So Expensive?

In 2003, the total spending on data integration was about $9.3 billion. In 2008, it is expected by IDC to comprise more than $13 billion. The reality is that data integration projects becoming more complex, amounts of data expanding, and the cost of taking a good care of data isbecoming more and more expensive. Under tough economic conditions for many companies the total cost of ownership of enterprise of proprietary data integration solutions is becoming prohibitive. And the issue is not only about the license costs. According to Yankee Research, over a three-year period, the total cost of ownership (TCO) of an integration application is more than eight times the initial software license investment. So in fact, its the running cost that is the most expensive. Comprised of the yearly license fee, hardware costs, and labor costs, the real TCO of a data integration solution, according to Yankee Research, can rise up to $509,600 annually and the number seems to be increasing with every year. One may start to think of data integration as of something that constantly consumes enormous resources, human and financial, and that would make a point. But another point is that to reduce the total cost of ownership you have to change the cost structure, thats it. Today you can save on most items in a data integration shopping cart, starting with the license fees. Today that there is a number of open source solutions available for

Guide to Reducing ETL and Data Integration Costs by 80%

4 of 12

evaluation and real-world projects at no charge. With open source, you can practically eliminate licensing fees at all.
Data integration software TCO: initial and follow-on (yearly) costs, $ thousands

1400 1200 1000 800 600 400 200 0 License costs Hardware costs Labor costs

As far as hardware costs are concerned, solutions that require their own servers and/or mainframes seem to be unreasonably expensive to implement. While the hardware costs of implementing such solutions are on average about $300,400, the hardware costs of products which can run on existing servers and desktops drop down drastically, as to a mere $10,300 in some cases. So eliminating the vendors whose solutions do not provide the acceptable level of openness can result in a considerable saving. When it comes to labor cost, there are two main criteria. The first of them is whether the solution is straightforward enough for a non-trained user to use, and whether it is effective, which means that the user is able to do the job in minimal time. The second criterion is the flexibility of the product, the extent of reusability of its configuration for follow-on tasks. Usually, open source manages to provide both, oriented on and supported by a large community that aims to consistently enhance the development. When choosing a data integration solution, it is important to keep in mind a lot of things, the most crucial being that while many companies tend to underestimate the TCO of data integration, for every dollar spent on integration software, enterprises spend $6 on subsequent implementation and support. This goes for proprietary software.

3.

Enterprises Go Open Source

Thousands of companies worldwide entrust their enterprises to open source solutions like Linux and MySQL, including such industry leaders as Yahoo, Google, Cisco, Panasonic, Alcatel-Lucent, Nokia, Associated Press, and many others. Just ten years ago, using open source might have seemed inappropriate for a big companys ideology. Now that the benefits of freely distributed software have become more evident than ever and an abundance of such products has appeared at the market, more and more companies go open source.

Guide to Reducing ETL and Data Integration Costs by 80%

5 of 12

Over the past few years open source has become the technology we consider when there's something we need," says Jeremy Zawodny, a member of Yahoo's technology development team. "When I joined Yahoo, the data-management part of that system was crude and written internally, and one of the first things I did was replace that with MySQL," Zawodny says.

Open source data integration is no exception. Freely available solutions are doubly beneficial, bringing the license cost to minimum and enabling companies to save dramatic amounts on maintenance. This is why organizations like Continental Airlines, NASA, AXA, Fidelity and many others rely on open source when it comes to integrating their data. But is it all only about money? In truth, open source is not only good in terms of saving money. It also has a number of further advantages over traditional software: Better performance and reliability Open source solutions have vast communities of developers, which ensures testing all the functional range of a product on different platforms before releasing. It also guarantees that bugs are found and fixed rapidly. The required enhancements to the code are also easier to make due to the number of developers and the availability of the source code. Available for many platforms Typically, open source software supports numerous platforms, leaving it to the user to choose the one that fits their requirements better. Somehow, this freedom of platform choice seems to be something many proprietary software solutions cannot offer.

Guide to Reducing ETL and Data Integration Costs by 80%

6 of 12

Higher level of security With the source code publically available, open source software typically suffers fewer vulnerability attacks than proprietary solutions. And as soon as a vulnerability is revealed, it is instantly addressed to the developers who fix the problem promptly.

Flexible Highly tailored open source solutions are a very rare thing to see. Most open source developments allow a tremendous scale of flexibility and can be reused in a vast range of cases with little to none customizing required. With open source, you do not have to use multiple solutions to integrate data from Salesforce CRM to a MySQL database and from Goldmine to SugarCRM.

Easier to deploy There is a tendency for open source software to concentrate on the essential features instead of implementing dozens of secondary features that hardly anyone uses. Due to that such software is usually more straightforward in use than proprietary products. Moreover, with the huge communities open source developments have, getting help on whatever feature of the product is a matter of hours.

Safety from vendor lock-in When using traditional software, one is greatly dependent on the vendor. To a degree, this could be regarded as one of the reasons of data integration getting more expensive nowadays. Entrusting data to a vendor is very responsible, as moving it to a different vendor afterwards is such a hassle that it is often considered less resource-consuming to accept whatever conditions the current vendor lays down. Even if these conditions are different from the expected. Open source is without doubt the most reliable option in this respect. With the solution distributed freely and the source code out in the open, lock-in is absolutely ruled out.

4.

Major Companies Recognize Lower TCO

It certainly looks as if the tendency of major companies switching over to open source is not just to stay, but is turning into a quiet revolution in the software market. The practical beneficence of using open source often goes beyond expectation. And as many enterprises have already verified that for themselves, even more companies are following in their steps. Financially, for any open source convert, the advantages of going open source over staying with proprietary software are irrefutable.
NASAs Acquisition Internet Service saves over $4 million per year with open source.

NASA's Acquisition Internet Service (NAIS), which has grown vital to the agencys business, is managing large acquisitions online with worlds most popular open source database MySQL.

Guide to Reducing ETL and Data Integration Costs by 80%

7 of 12

When their previous database vendor decided to restructure its license program, NASA was faced with fees that would cost more than twice their total annual budget for a simple upgrade, according to Dwight Clark, NASA Systems Analyst. Switching to MySQL helped NASA resolve the issue, saving over $4 million per year. Furthermore, using MySQL turned out to be beneficial in a number of unexpected ways, providing better reliability, productivity, and slashed support costs.
Initial costs of implementing data integration applications, $ thousands

Informatica Software Hardware


Ascential

Labor

100 200 300 400

The ultimate open source office application suite OpenOffice.org is used by 14% of large enterprises worldwide, including the French Gendarmerie, Bristol City Council, and Singapores Ministry of Defense. General Brachet, the French gendarmerie's head of IT, says using open source products helps the police save millions of Euros per year. According to Netcraft, the open source project Apache has been the #1 HTTP server on the Internet for more than 12 years now, and it is recognized as such by Hewlett Packard, Adobe, and Apple, to name just a few. This has to be because of the extreme economy it provides, as, according to a research conducted by TechRepublic, you save about 60%-90% with Apache, depending on which of the popular proprietary competitors it is compared with.

5.

Measuring TCO Benefits of Open Source Data Integration

Open source ETL and data integration solutions can save your money in a number of areas. The key areas of economy open source provides are:

Guide to Reducing ETL and Data Integration Costs by 80%

8 of 12

1) License costs Even though todays proprietary data integration software market involves great competition, the license costs of different vendors products remain very different. The license costs of solutions provided by some of the major vendors can be a substantial expense item for an enterprise, sometimes exceeding $500,000 annually. Reducing the license cost by going open source enables an enterprise to use the released budget for other business tasks. 2) Lower operation and support costs Many open source solutions are notably easier to use than proprietary tools. But even more importantly, thanks to so many people participating, open source products are in most cases extensively documented. The thorough documentation they provide ensures mastering the product quickly and without unnecessary effort. On top of that, the huge developer and user communities make it possible to receive support from other users, without paying for it. With such communities, one can get any possible question quickly answered, which guarantees minimum time and money loss due to downtime. 3) Ready customization schemes available from communities Another benefit open source product communities give is the large number of ready-to-go customization schemes developed by their members and often available for free. So after installing the application, you might not even need to spend your time setting it up for your specific situation, you can just use the scheme created by someone who has been in the same shoes before you. Proprietary software quite often allows creating reusable schemes too, but it is rather seldom for the vendors to encourage free sharing of those. By choosing open source, you can save on labor time greatly, relieving the user of the necessity to do everything manually.

6.

Apatar Open Source Data Integration

One of the open source tools that can help cut down data integration costs is Apatar, an open source ETL project. A response to the overwhelming demand by companies dealing with business information scattered across distinct applications, Apatar is distributed under open source GNU General Public License (GPL 2.0). Apatar was designed to help developers and business users move data in and out of a variety of data sources and formats. Remarkably flexible as an ideal ETL solution should be, it provides connectivity to MySQL, Salesforce.com, Goldmine, Flickr, Amazon S3, SugarCRM, XML, RSS, CSV, Microsoft Excel, Oracle, Microsoft SQL, FTP, POP3, WebDav, Autodesk Buzzsaw, any JDBC data sources, and more.

"I was impressed how easy it is to use this tool and to obtain results quickly. The lack of knowledge on how to perform some specific functions was rapidly solved within the forum,
INTERROLL Management AG Fabio Pifferini, PMP Internal SAP Consultant

Guide to Reducing ETL and Data Integration Costs by 80%

9 of 12

100% Java-based, Apatar is platform-independent and runs on Windows, Linux, and Mac OS. With the source code included, the solution is easily customizable.

Apatars visual work panel

Making it possible to accomplish integration of virtually all levels of complexity, Apatar requires no coding skills. Users can drag-and-drop data between databases and applications using a visual mapping interface. This helps save time and effort, which results in lower labor costs. Data-integration jobs created with Apatar (called datamaps) can be stored on your local drive, which makes subsequent reuse possible. Saved data maps can be shared via Apatarforge.org, Apatar DataMap Repository, hosting hundreds of readyto-go datamaps for a variety of situations. Before creating their own datamap, you might want to check if there is a ready one at the Data Map Repository, since one of 5000+ Apatar users could have encountered the same task as you and posted up a solution. Apatar Scheduler, which can schedule data maps to start automatically, allows running recurring jobs without employing any labor resources. Not only does Apatar integrate data, it also helps improve its quality. By means of such integrated data verification services as StrikeIron US Address Verification, StrikeIron Email Verification, CDYNE Death Index, and others, Apatar can automatically filter outdated and invalid data.

Guide to Reducing ETL and Data Integration Costs by 80%

10 of 12

Apatar Scheduler automates data integration

Large financial services company Credit Suisse Group, major software vendor Autodesk, worlds largest international insurance and financial services organization Allianz, major industry player in telecommunications Alcatel-Lucent, and the Fortune 500 company R.R. Donnelley have at least one thing in common: they rely on Apatar, alongside many more users.

7.

Best Practices
The truth is that most data integration projects in todays enterprises never get built. The ROI (Return On Investment) on these small projects is simply too low to justify bringing in expensive middleware. Thats why you may consider using commercially supported open source tools for your integration projects. You may want to consider Apatar's application to design and orchestrate data integration processes, as well as MySQL database to host data warehouse and staging tables. Take the time to clarify conditions of use, and make sure that what you are dealing with actually is open source, and the terms of use suit you well. Different licenses can have very different consequences. Besides, not every freely distributed product is open source. Be aware there can be pitfalls.

a) Go Open Source

b) Read the License

Guide to Reducing ETL and Data Integration Costs by 80%

11 of 12

c) Remember the Source Code

One of the huge advantages of open source over proprietary software is the openness of the source code. Keep in mind that you are always free to view, fix, and modify it. d) Make Sure to It is not rare for open source software to be released while on Use Version 1.0 or rather early stages of development. Therefore, version numbers Later like 0.2.5 or 0.7.1.2 are no rarity either. Although these versions actually can be very stable and mature, it is generally believed that a product has to reach version 1.0 to be considered for enterprise use. e) Check the It is clear that not all open source solutions work out. Latest Versions Sometimes, they do for the users, but not the developers. If you Release Date want the product to be regularly updated with new features and bug fixes, you have to make sure that you are dealing with an active developer. Check the release date of the latest version of the product. If its been a while, probably you should reconsider your choice. f) Opt for While most open source products are rather flexible in their Flexibility capabilities, there is a number of very narrow solutions which are applicable for just one particular situation. Its certainly up to you to decide whether such products suit you, but bear in mind that your specific needs might change slightly, and if your solution fails to match them anymore, it will take time and resources to move to a different platform. g) Ponder over the Remember that license costs are not your only expense when Real Economy implementing software. If an open source product is difficult to learn to use or implement, think again whether it is actually worth it.

8.

About Authors

Michael Fedotov is Apatar Evangelist and has been working as a freelance journalist for a number of IT-related periodicals since 2003, covering next to all aspects of IT, and specifically software developments, in an abundance of articles. He has taken part in several scientific conferences and provided service for many more participants creating and holding their presentations. He also has an experience of working as an interpreter and presently is studying Japanese. Alex Khizhnyak is Chief Evangelist at Apatar, Inc. and co-founder of Belarus Java User Group. Since 1998, he has gained experience as an author, editor, media specialist, event manager, conference speaker, and blogger. So far, his education background combines IT, programming, economics, and journalism. You may also read his blog on Open Source and Data Integration at http://apatar.com/blogs/alex Renat Khasanshyn is founder and CEO of Apatar, Inc. Mr. Khasanshyn is a subject matter expert on data mashups and open source business models, and speaks frequently at a wide range of events. Most recently Mr. Khasanshyn was selected as a finalist for the 2007 Emerging Executive of the Year award by Massachusetts Technology Leadership Council. In 2006, Mr. Khasanshyn founded Apatar, the world's first on-demand, open source data mashup software company. Prior to founding Altoros Systems in 2001, Renat was VP of Engineering for Tampa-based insurance company PriMed, Inc. Renat has a passion for emerging technologies and won the 2007 IBM Business Mashup Challenge. Mr. Khasanshyn is a co-founder of

Guide to Reducing ETL and Data Integration Costs by 80%

12 of 12

Belarusian Java User Group and studied Engineering at the Belarusian State Technical University. Renat writes a blog Naked Open Source, found at http://www.nakedopensource.com Apatar (www.apatar.com) is the leading provider of open source software tools for the data integration market. With powerful Extract, Transform, and Load (ETL) capabilities, Apatar enables its users to easily link information between databases (such as MySQL, Microsoft SQL, Oracle), applications (Salesforce.com, SugarCRM), and the top Web 2.0 destinations (Flickr, Amazon S3, RSS feeds). Apatar provides support, training, and consulting services for its integration solutions. Headquartered in Western Massachusetts, Apatar operates a development center in Minsk, Belarus. For a free download of Apatar Open Source Data Integration and more information on how to save money on open source data integration, please visit www.apatar.com and www.apatarforge.org.

Appendix: References
1. Uncovering the Hidden Costs in Data Integration. (Yankee Group)

2. Steve McClure. Market Analysis. Worldwide Data Integration Forecast: 20042008. (IDC) 3. 4. The Future of Data Integration Technologies. (META Group) Rick Banister. Choosing an ETL Technology. (Sesame software)

5. Wayne Eckerson, Colin White. Evaluating ETL and Data Integration Platforms. (TDWI) 6. Philip Russom. How to Evaluate Enterprise ETL. (Forrester)

7. Colleen Graham, Nicole Latimer, Fabrizio Biscotti, Joanne Correia, Chad Eschinger, Chris Pang, Thomas Topolinski. Enterprise Software Industry Analysis. Software Market Research, Methodology and Definitions, 2003-2004 (Forrester) 8. Jason "Hiner MCSE, CCNA". Evaluating TCO for iPlanet, Apache, and IIS. (TechRepublic.com) 9. Larry Greenemeier. Open Source Goes Corporate. InformationWeek, September 26, 2005. 10. Sean Michael Kerner. Open Source ETL Takes On Proprietary Intelligence. Internetnews.com, July 25, 2007. 11. Randy Metcalfe. Top Tips for Selecting Open Source Software. OSS Watch, February 2004. 12. Dennis Kennedy. Best Legal Practices for Open Source Software: Ten Tips For Managing Legal Risks for Businesses Using Open Source Software. Llrx.com, February 7, 2006.

Vous aimerez peut-être aussi