Vous êtes sur la page 1sur 61
E-guide Why RDBMS Siill Rules Database Roost Your essential guide to RDBMS on ORs a a aa) Ye E af a ar Sal Sr) =] Fl A Es msl an thd x oN a =) ey 5, i = Pr eer Inthise-guide INSection : Relational databases p2 ‘Section 2: NoSL databases p36 [AGetting more PRO+ essential content Page 60 p60 MMM E-guide Content VMOU LLL In this e-guide: Relational databases have enjoyed a long run as the database mainstay across a wide variety of businesses, and for good reasons. However, they haven't necessarily adapted well to changes in the types and quantities of data now being generated, such as the unstructured data that is prevalent inbig data applications. In addition, expanding traditional databases to accommodate rapid growth is costly. Asa result, NoSQL database technologies are challenging the monopoly of the relational database management system. Yet, despite their modern designs and efficiency in managing large data sets, NoSQL databases aren't the right fit for all projects. Depending on your business goals, traditional databases, NoSQL databases or a hybrid of the two may be best to deliver the most value. The articles in this guide examine these technologies from different perspectives and explore the case for the ongoing relevance of relational databases. E-guide Content MMLC oO io eer ininise-aude Section 1: Relational databases seston Agatonal Large Internet companies lke Facebook, Twitter, Linkedin and Netflix are jatavoves Ls well-known users of NoSQL database technology, as it works well with the —____ large data sets they need to manage. However, many organizations find that Msecton 2:NoSOL databases traditional databases are still best for their business needs. In this section, 336 learn how relational database technologies are holding their own inthe database world by evolving to meet higher levels of efficiency as well as specific business needs for various companies --even Facebook. a LALILLLLLLLLLLLLLLALLLLLLLLLLLLILLLDLALALLLLLLLLLLLLOLALALLLLLLLDLLLLALALLLLLLLO ‘Ww Next article [AGetting more PRO+ essential content pet Page 0160 Pr eer Inthise-guide INSection : Relational databases p2 ‘Section 2: NoSL databases p36 [AGetting more PRO+ essential content Pageot60 p60 MMMM E-guide Content VMOU LLL J Don't get distracted by new database technology Joshua Greenbaum Princinal Enterprise Appcations Consulting New database technologies are coming tomarket with increasing regularity, and if these products live up to the hype as superfast and crazy cheap, hundreds of thousands of workhorse relational databases in use today will be put out to pasture. Who needs a 20th-century relational database when you can have a decidedly more modern NoSQL, columnar or in-memory database ~ or even the Hadoop Distributed File System? Most organizations, it turns out. At least for now. While the seductive powers of thenew database technology are not to be denied, you should resist the siren song of the new, post-elational database vendors, Not because the new database options lack merit on the contrary-- but because making your company's next database move a technology decision is the wrong way to go about it. The choice of database should be secondary. Your business goal-- that comes first. Avery good place to start Consider a battery of practical questions about your project: Are you creating net new applications in support of net new business processes or i PRO+ PRLS E-guide Content MUA Inthise-guide INSection : Relational databases p2 ‘Section 2: NoSL databases p36 [AGetting more PRO+ essential content 60 Page 40160 merely upgrading the ones you already have? Engaging new types of users, data or analysis? Supporting a new line of business or reinvigorating an existing one? Answers to these questions will provide essential criteria for understanding which new database technology, if any, to deploy. ‘Only then should you look around to see whether a new database is better for the job than something you already have. Implementing a database of any kind isn't cheap. While many of the new varieties are open source, they aren't free-- and even more costs enter the ‘equation when a project involves migrating an existing relational database to ‘one of the newbies. Myriad complexity issues also stand in the way. New database technologies, particularly in-memory ones, often need new hardware. Many of the available options promise to lower total cost of ‘ownership over time -- but new hardware will have to be obtained, and that up-front cost must be taken into consideration. The fine print Finding people with the right skills is an even bigger issue. The new models may require fewer administrators-- most proponents insist that their databases are less expensive to implement and manage than oldschool relational databases are. And in many cases that's an easy argument to make: Top-tier database administrators are some of the highest paid people in the IT department, and their numbers ~ most relational databases are oO aa Inthise-guide INSection : Relational databases p2 ‘Section 2: NoSL databases p36 [AGetting more PRO+ essential content 60 ages of 60 MMM E-guide Content MMM rotorious for the number of admins required to keep them finely tuned -- clearly add significant costs. But the likelihood of finding a Hadoop or columnar database expert in a traditional relational database shop is slim, which means youll have to go out land hire these in-demand people or get the required skills from a consulting ‘company. ‘And, as anyone who has worked to bring a major application project to fruition can attest, the bulk of the complexity is centered on everything but the cost ofthe software license. Creating new algorithms, analytical models, transactional components and business processes that need to be engineered and implemented is where the real expense is. Uni they're well understood and the necessary stakeholder input and approvals have been ‘obtained, the choice of database technology is at best a distraction. At worst, i's a great way to knock a project of its axis and send it spinning out of control Think big This is particularly true in the era of big data, which is driving a considerable percentage of the new application projects in organizations. For many, big data projects involve data types that are new, unfamiliar and often unstructured-- time-series data, Web server logs, text. While some new database technology might eventually need to be deployed, figuring out what the new data sources are and what the new algorithms should look like must be the first order of business, right after you've reached agreement on oO Po PRO+ Se a E-guide ae MMLC LLL” ‘what the new business processes are all about. To do otherwise is to march In thise-guide your company down the path of cost overruns, scope creep and eventual if a not inevitable failure section + Relational ULLILLLLLLLLLLLALLALLLLLLLLLLLLALLLLLLLLLLLLLLLLALLLLLLLLOLLLALLEELOLLLL databases p2 ‘SwNext article INsecton 2: NoSOL databases p36 [AGetting more PRO+ essential content p60 agesot60 < i PRO+ prea E-guide br MMLC Pr In this e-guide 5 Relational databases are far from dead -- section + Relational just ask Facebook databases p2 Nicole Laskowakl Snir News Weer Section 2: NSQL databases: Hadoop is not enough! Just ask Ken Rubin, director of analytics for 936 Facebook Inc, who delivered what ClOs probably considera refreshing message at the Strata Conference + Hadoop World 2013in New York: Geting more PRO» escent Facebook needs the relational database. content p60 "We're a young-enough company that we started by using Hadoopas our core data technology rather than relational [databases}," he said. “As we start thinking about big data from the perspective of business needs, we're realizing that Hadoop isn't always the best tool for everything we need to fe When a Web 20 superstar --and Hadoop exemplar, at that ~ says there's a time and place for relational technology, CIOs have another bit of proof, if any were needed, that building for big dataisn't the black-and-white proposition some Hadoop zealots make it out to be. It's shades of gray, because what matters for the business at the end of the day is solving business problems. Thinking about bg data in those terms rather than in terms of tools or architecture “opens up the possibilities of using a much broader range of technologies," Rubin said. age7 of 60 Pr eer Inthise-guide INSection : Relational databases p2 ‘Section 2: NoSL databases p36 [AGetting more PRO+ essential content Pageot60 p60 MMMM E-guide Content VMOU LLL ‘So when exactly does Facebook's analytics team use relational technology rather than Hadoop? That depends on what they'e looking for and when land how they want to see the data. “Exploratory analysis,” such as pinpointing what metrics really matter, is done in Hadoop; “operational analysis," such as slicing and dicing data, is done in a relational database, Rubin said, Particularity matters. “If we look at the granularity of the data, we keep the lowest level of grain in our Hadoop system. So whenever you want to look at ‘something at the lowest level of detail, Hadoop is optimized for that.” he ‘said, "However, if we want to look at transformed data and aggregated data, relational is easier for doing that.” ‘And timing is important. Al of Facebook's data streams directly into Hadoop, ‘which can be used for real-time monitoring. But if the analytics team wants todo trending analysis over several days, weeks, months or years, “relational is a better technology,” he said, Social television Not surprisingly, open data was a central theme at Strata Conference + Hadoop World. Shawndra Hill assistant professor at the University of Pennsylvania, and her work on the intersection of tweets and TV was a prime example. Social television, according to Hil is going to be “a $256 billion business by 2017" Pr eer Inthise-guide INSection : Relational databases p2 ‘Section 2: NoSL databases p36 [AGetting more PRO+ essential content Page .ot60 p60 MMM E-guide Content MMM ‘She's looking into how Twitter can spur viewer engagement for television shows and advertisers. She's also using datasets from GetGlue and Vigale, ‘apps that let viewers “check into" a television show the same way they would check into a location on Foursquare Combining this kind of data with tweets might just become the next generation of Nielsen ratings. "Can we predict customer lifetime value for shows and the network? Can we measure time shifting so for which shows are people checking in when the ‘show is aired for the first time and which shows are people waiting to watch?" she said. And-- so critical to advertisers — can it be done “at the individual level as opposed to the household level?” Stay tuned, Say what!?! “You can use science and technology and statistics to figure out what the answers are, but is stil an art to figure out what the right questions are." Ken Rubin director of analytics, Facebook “If you have more eyeballs working on data, you're more likely to get better insights and better analysis." ~ Michael Chui researcher, McKinsey Global Institute “Ittook Facebook around nine months to achieve the same number of subscribers/users as it took the radio community 40 years to achieve David Parker, vce president of big data technologies, SAP i oo Se Inthise-guide INSection : Relational databases p2 ‘Section 2: NoSL databases content Paget 0 p36 Jetting more PRO+ essential p60 MMM E-guide Content MMM "How much investment is going into big data? Venture capital money, last ‘count I saw, is about $2.6 billion, That's the equivalent of a Navy destroyer ‘coming after your wallet." -- John Choi, director of product management, IBM “Big data doesn't really exist. How do I know? It is along truth in technology that anything that appears in the press in capital letters and surrounded by ‘quotes isn't real" Douglas Merrill, CEO and founder, ZestFinance "When we're talking about data sclence-- and big data as well--one of the fundamental principles we should keep in mind is that data should be thought of as an asset" -- Foster Provost professor of information systems, New York University's Stern School of Business 1 of the top 10 fastest growing technologies overall in terms of Jack Norris chief marketing officer, MapR Technologies (Of course, he would say that.) ‘Next article *Hadoop is job growth: Oo , PRO Seas E-guide Content MMLC Pr In this e-guide Cf J In-memory technology gets the relational Section Relational treatment databases 2 oe ack Vaughan, Senor Nows We Msecton 2:NoSOL databases {As i overnight, inmemory technology has crept out of the rare worlds of 936 high-performance computing and Wall Street trading and entered into the mainstream, ‘Getting more PRO+ essential In-memory technology that bypasses disk drives and resides in main content ps0 semiconductor memory got a big boost in recent years from SAP AG, which loudly trumpeted its HANA in-memory database management system and its use continues to widen. ‘The technique is seen in analytics appliances, as well as in Hadoop, NoSQL and NewSAL territories. The activity is hard to overlook. Incumbent relational database makers have also taken notice ~- adding in- memory technology to their leading SQL products to improve performance. IBM, Oracle and Microsoft have added in-memory traits to their flagship offerings, in no small part to keep up with the high velocity of business today. ‘Speed increases of 10 times or more for transaction processing have been reported, with data warehouse analytics speed boosts going even higher. Paget Pr eer Inthise-guide INSection : Relational databases p2 ‘Section 2: NoSL databases p36 [AGetting more PRO+ essential content age t2ot60 p60 MMMM E-guide Content VMOU LLL High-performance applications are the "sweet spot" for inmemory offerings ‘generally, and for in-memory relational database offerings specifically, according to William McKnight, president of Plano, Texas-based McKnight Consulting Group. The usefulness of faster in-memory performance can ‘come to play in both analyfical and operational applications, he said. High performance gets the nod ‘Speed-sensitive applications are a good fit for in-memory relational databases, sald Andrew Mendelsohn, executive vice president of servers technologies at Oracle, especially ones that "require access to large: amounts of data in order to answer business-driving questions.” Oracle's in-memory lineage is deep. Since 2006, it has offered the TimesTen in-memory database, which it acquired from HP Labs. Also, beginning in 2007, Oracle fietied the Coherence Java-based in-memory data grid for middleware software object persistence. Last year at Oracle Open World 2013, the company announced the Oracle Database In-Memory option for Oracle Database 12c, which is currently in beta, Like McKnight, Mendelsohn sees benefits for both operations and analytics. New classes of “hybrid applications’ that combine analytics with transactions for real-time commerce can drive immediate returns, he said, Pr eer Inthise-guide INSection : Relational databases p2 ‘Section 2: NoSL databases p36 [AGetting more PRO+ essential content 60 agers 60 MMM E-guide Content MMM Am|1BLU and in-memory too? IBM's DB2 BLU Acceleration software also got a notable in-memory refresher in 2013, Like Oracle, IBM has offered a variety of inmemory technologies across its middleware and data processing portfolios. Now, in- memory data handling is one of the many enhancements that so-called BLU acceleration brings to IBM's mainstay relational database. “Anything that needs online analytical processing or [data] ‘cubing'is a beneficiary of in-memory,” said Nancy Kopp, director of database software and systems at IBM. "Reporting, data mining and data discovery ll benefit.” What some viewers describe as “real-time analytics’ has been something of ‘holy grail, Kopp admitted, and it comes closer with the application of in- memory methods. Often, data applications have been limited by /O latency ‘and that in turn may have limited what Kopp calls “the speed of thought" for human analysts. In-memory has special value where “latency is critical and the number of users is really high,” she said. “People want to get answers as fast as they can ask the questions. With in- memory [technology], we can get more toward operational BI [business intelligence)" Pr eer Inthise-guide INSection : Relational databases p2 ‘Section 2: NoSL databases p36 [AGetting more PRO+ essential content Page oO p60 MMM E-guide Content MMM Like others, she sees in-memory capabilities bringing a new blend of ‘applications to the relational database. Eventually, there willbe less of a line between the transactional world and the analytical world, she said. Batch is out the window People used to waiting for overnight batch jobs will quickly become accustomed to realtime execution as in-memory finds greater use in relational databases, according to Tiffany Wissrer, director of product marketing for SQL Server at Microsoft. Moreover, she said, such capabilities, prepare customers for a move to larger-scale, cloud-style processing, ‘She said Microsoft has included in-memory of sorts as part of the basic SQL Server database offering since 2008, when PowerPivot support allowed people to analyze billions of rows of Excel in memory. "With SQL Server 2012, we expanded the footprint with an in-memory columnar store,” she ssa, This week, SQL Server 2014 became generally avalable, which has new in- memory transaction-processing support. Wissner emphasized that, as part of the core offering, SQL Server 2014 jobs can be optimized for oniine transactional processing (OLTP) with high numbers of read/write ‘operations, or can be optimized to run in a datawarehouse-style column store that i fine-tuned for high search query speed, PRO+ Pe Ek E-guide MLL LLL Inthise-guide INSection : Relational databases p2 ‘Section 2: NoSL databases p36 [AGetting more PRO+ essential content 60 aget560 Placing a bet on in-memory technology In-memory adaptations to relational database performance can reduce stress on large-scale transactional systems, according to Wolfgang "Rick" Kutschera, who is manager for database engineering at bwin.party in Vienna, ‘Austria, and whose team has gone into full production with Microsoft's latest ‘SQL Server incarnation. Kutschera's data group was a beta user of "Hekaton," which was the pre- release codename for the new version of SQL Server 2014 with in-memory ‘OLTP. For bwin party --which offers online gaming for soccer and tennis, as ‘well as poker and other casino games --Microsoft's latest SQL Server version helped meet the need for transaction scalability and data consistency. The transition was fairly straightforward, he said “We started on an application that had hit an actual performance limit ~ it could not scale up or out in an easy way. With Hekaton, it took us a day or two to convert to the in memory technology, and once we did, we could scale to a factor 20 times [faster] than what we had before," he sald. "Alot of performance-ritical [application parts were converted” Now that itis established, people are finding more things to do with it Like other high-transaction websites, bwin.party has looked at in-memory NoSAL alternatives to established relational systems, Kutschera said. But there is a difference between a tweet and a bet. E-guide Content MMLC oO io Se a @LLLLLLLLLE™ “The main problem is the websites that use NoSQL in most cases have no In thise-guide problem if they lose one record. If you lose, for example, one Twitter ee message, nobody cares, but," he continued, “if you lose a bet that might be a Msecton Relational {$20,000 or $30,000 return itis a big deal” databases p2 The inmemory technology trend is ike a catchy song heard everywhere of —____ late nmemory approaches are appearing in analytics engines of al kinds Msecton 2: NoSOL detaboses ‘Their appearance In elational databases may soon turn out tobe one ofthe p26 most influential of these uses, eting more PRO esentia Nextarti content p60 ‘wNext article age 660 i PRO+ BeCCR ea E-cuido m Pr MMLC Inthise-ouide \ In relational database design, don't Msecon ¢Reatonal shortchange requirements stage sees “ Jack Vaughan, Senior News Writer Msecton 2:NoSOL databases In many organizations, relational database design isan afterthought ora lost 336 art But Michae! J Hemandez considers it an important undertaking one in which core principles still bear deep consideration. Hernandez is the author of Database Design for Mere Mortals which was published in its third edition [AGetting more PRO+ essential in February 201 content 60 ‘A long-time database developer, Hernandez has worked as a program ‘manager at Microsoft and an instructor for companies such as AppDev Training Co. and Deep Training, Originally published in 1996, his book focuses ‘on database design and configuration practicalities — from requirements- gathering interviews on. ‘Hernandez champions the cause of flexible but well-structured relational databases that can underlay quickly launched Web applications but that ‘support data growth and business changes. Ina world that often asks ifdata ‘modeling and full-fledged database planning and design are really necessary ~can't we just start coding? —his message has always been: Don't sshortchange the design process. SearchDataManagement spoke recently to Hernandez about database design best practices. Excerpts from the interview follow. Page 060 E-guide Content MMLC Pr eer @LLLLLLLLLE™ In your book, you suggest that data professionals are often in too much In thise-guide of a hurry to start coding, without doing the requirements gathering that oo is part of good relational database design. Why is the requirements section Relational ‘gathering stage so important? And how should it be approached? databases p2 Michael Hernandez: Many times, people make a lot of assumptions and then — create the database and rollit out. Later, there is pushback from the users. IRSoction 2: NoSOL databases The fact is i's important to have conversations with the business users p36 ahead of time and get a sense of what is going on and what they need. That informs a ot of what is going to be built. ‘Getting more PRO+ essential Basically, you want to be talking to different individuals at different stages content ps0 [of a project] so that you're sure you're capturing the proper concepts [and] that you understand the ideas they have about their business. So, | ‘emphasize interviews. To do this right, you need to understand the different relationships of aspects of the organization and its processes. As you work on the relationships, you find the details and concepts that have to be represented in the database. You have a conversation with the users and understand ‘what they need, Then that informs what is going to be bull. What are the ways toward effective interviewing? Hernandez: Interviewing for database design isn't an exact science. But itis ‘skill that can be learned. The people who do it have to have very good analytical skills and good people skills. You ask people how they define their daily work. You ask them what is the first task that they do in the day and age B60 oO aa Inthise-guide INSection : Relational databases p2 ‘Section 2: NoSL databases p36 [AGetting more PRO+ essential content 60 age oF60 MMM E-guide Content MMM [about] what they are dealing with conceptually. Usually they're dealing with ‘customers. So you ask, ‘What i a customer to you?" The answer is different for afferent companies and different departments within companies. You have to learn their perspective, and their semantics. High-level concepts Ike customers, visits, schedules, orders or tasks - those are main concepts that have details around them that you have to record, ‘And what are some things that stand in the way of efforts to do the requirements gathering that forms the basis for good database design? Hernandez: I's all about time, Today, people want to do it quickly. They want to getit out there, and then they/l see what happens. They say, if we get pushback, well fix it as we go. To me, that is such a bad way to dot. You ‘can avoid a lot of headaches and problems ahead of time if you just invest the time to plan. That's one of the things | tell people: This is not a waste of time. You are: investing the time to go through this process in a considered manner and to create aqualily data product that probably has a higher success rate than if you just rushed right through it. ‘So, personally, am not a big fan of Agile design, Agile computing and the whole Agile concept | think that is the opposite way than the one we should bbe going. A lot of people try to shortchange or avoid interviewing, But it drives what you design. I's what makes it successful, what makes it usable by the people that are going to work with [the data Oo or PRO+ Se ee E-guide Content MMLC a, ‘To make sure you establish the proper specification, you need to revisit the In thise-guide design with the users and [business] managers. Users and managers are a ‘going to have different perspectives on how the data is used, That's why Niseoion t Relational discussing the evolution of the relational data structures with them is useful. ae we People that don't do that often expect to fix things later with coding. What they end up with is just a mess —a railroad wreck, Socton 2 NoSdL databases ALLLLLILLLLLLLLLLLLLLLLLLLLLLLLLLALALALALLLLALALLLALLLLLLLLLELLLLLLLLLLALLLELELEDDDD 936 ‘SWNext article [AGetting more PRO+ essential content p60 age20ct60 ~ oa 7 PRO+ Cech femal E E-quide Content VMOU LLLLLLLLLLLLLLLLLLLLLLO Inthise-guide INSection : Relational databases p2 ‘Section 2: NoSL databases p36 [AGetting more PRO+ essential content 60 age2tet60 SQL Server 2014 adds In-Memory OLTP power boost, hybrid cloud support Jessica Skin and Mark Fontecchio Product of the Month Product of the Month: SQL Server 2014, from Microsoft Release date: April, 2014 What it does ‘SQL Server 2014 is the latest version of Microsoft's relational database management system, released to general availabilty at the start of this month. Among other enhancements, it offers increased processing speed, {greater cloud connectivity and higher memory limits ~all part of Microsoft's ‘ongoing effort to improve SQL Server's ability to handle enterprise-class, transaction processing and analytics applications. PRO+ Pe Ek E-guide MLL LLL Inthise-guide INSection : Relational databases p2 ‘Section 2: NoSL databases p36 [AGetting more PRO+ essential content 60 age22 60 What sets it apart ‘SQL Server 2014 can be run on-premises, entiely in the cloud or in hybria ‘loud environments that include on-premises data. Its most-anticipated new feature is In-Memory OLTP, a memory-optimized online transaction processing engine integrated into the database that lets tables stored in memory be processed alongside disk-based tables. Microsoft boasts that using In-Memory OLTP can boost transaction processing performance by ‘as much as 30 times compared to conventional approaches with data stored on disks. SQL Server 2014 also accelerates the InMemory ColumnStore data warehousing technology introduced in the 2012 version, ‘iving the new database a powerful one-two punch on in-memory processing What users say Wolfgang "Rick" Kutschera is team leader of database engineering at Bwin.Party Digital Entertainment, a SQL Server 2014 beta tester. The Gibraltar-based company, which specializes in online betting, has 180 servers with a total of 4,000 SQL Server instances. Kutschera said that Using In Memory OLTP enabled BwinParty to scale up its processing ‘capacity to support business growth without spending money on more hardware. He described the in-memory feature as “one of the most amazing things Microsoft has done in a while." my ; ee Eouide MMLC Inthise-guide INSection : Relational databases p2 ‘Section 2: NoSL databases p36 [AGetting more PRO+ essential content 60 age2s.60 Organizations should go into SQL Server 2014 implementations “with open eyes," Kutschera said. But, he added, the database “is so flexible and so stable that we're comfortable using the beta in production” Edgenet Inc, an Atlanta-based software and services provider for the retail industry, has had a similar experience with SQL Server 2014. Vice President of IT Michael Steineke said the in-memory computing capabilites allow Edgenet to process product pricing and availabilty data from clients’ stores innear real time, "We needed to leverage the in-memory functionality to do continuous updates to live systems without having a lot of latch contention of lock contention,” Steineke said, “That way, we could update product availability information from various retallers as quickly as they can provide it to us.” Drilldown ‘+ Can be deployed ompremises, in the cloud or in hybrid environments. + Adds anew in’memory transaction processing engine to boost OLTP. performance. + Improves on SQL Server's AlwaysOn Availability Groups high- availabilty technology. PRO+ oO io eer E-guide Content MMLC Inthise-guide Price INSection : Relational ‘SQL Server 2014 has three main editions: Standard, Business Intelligence databases p2 and Enterprise. Each edition has aset list of features for example, In- Memory OLTP is available only in the Enterprise Edition, The editions are sowvon 2 NoSdL databones priced either per CPU core or by server and client access licenses. to Microsoft wouldn't disclose specific pricing, but a representative said there " are "no pricing changes to SQL Server 2014" from SL Server 2012's licensing costs ‘Ww Next article [AGetting more PRO+ essential content p60 age2d 60 < i PRO+ prea E-guide br Pr MMLC Inthise-ouide Oracle Database In-Memory option Msecon ¢Reatonal something to remember sees “ Jessica Sirkin, Associate Site Editor MSecion2NoSOL databases “The Oracle Database InMemory option, released today, promises a 100x 936 ‘speed increase for analytics, an improvement that could help customers provide nearrealtime information to its business users. That isa pretty tall ‘order, and Oracle customers will get to see ifthe add-on can live up to the hype. While conceptually similar to SAP HANA, the Oracle Database In-Memory ‘option is an add-on to the Oracle Database, and does not require alterations to the database infrastructure or data migration for implementation. Because of that, itis not confined to a specific platform, and can run on nore Oracle systems. [AGetting more PRO+ essential content p60 Users of the in-memory add-on don't have to place the entire database in memory, but can spread the database across clusters and only select, specific clusters for in-memory, according to Tim Shetler, vice president of product management at Oracle. ‘The Oracle Database In Memory option has a new fault tolerance system ‘also designed around Real Application Clusterswith inmemory data distributed over multiple clusters. This way, when one cluster goes down, Page2ot 60 oO aa Inthise-guide INSection : Relational databases p2 ‘Section 2: NoSL databases p36 [AGetting more PRO+ essential content 60 MMM E-guide Content MMM there is an immediate transparent switch to another cluster. According to ‘Shetler, this will keep faults from interfering with database performance. Christo Kutrovsky, senior consultant with the Pythian Group, emphasized the importance of the Oracle Database In Memory option's compression capabilities. He said it allows the Oracle Database to keep very large tables in compressed memory. According to Kutrovsky, the add-on can compress a 100 GB table by 30x. That means a tremendous amount of compression cover the whole database, which means more data can be loaded into the database with inmemory. “The compression is what makes it really worth ity he sald. "This feature ‘applies to more use cases than anything Oracle's released in a couple years." Real-time's one of the words Oracle has brought up again and again when discussing the in memory add-on. But, when Oracle says real-time, it doesn't mean instantaneous. ‘It really means ‘don't wait," said Shetler, “just do things immediately. Maybe itl take a few seconds." He explained that processes that used to take half the day were reduced to 10 minutes, and analytics give responses in less than a second, Holger Mueller, vice president and principal analyst at Constellation Research defined real time as having no batch process, no storing of aggregates and no intermediate steps. "You don't use other time delay constructs," he said. You can go back to the data." Mueller described the difference between previous processing speeds and real time as the difference between the telegram and the emal Pr eer Inthise-guide INSection : Relational databases p2 ‘Section 2: NoSL databases p36 [AGetting more PRO+ essential content age 270160 p60 MMM E-guide Content MMM “If it used to take hours and now it takes minutes, then that is real time to the people who used to wait hours,” Oracle's Shetler said. He added that with the Oracle Database In-Memory option, analytics and transactions can be run at the same time. Inmemory also opens the possibilty for running high- performance transactions against production. ‘The Oracle Database In-Memory option has dua-format architecture, which means both memory-optimized columnstore and row store. The optimizer ssorts incoming data to in-memory columnstore and row store, Updates go to the row format, while analytics go to the columnstore. "The whole dua format architecture is what's really unique," Mueller said, In the Oracle Database In Memory option, columnstore and row store are synchronized land changes to row store are reflected in columnstore. The changes are made in the background asynchronously, However, updates are made immediately for needed data and queries. "You're never going to see old data," Shetler said. “The issues you generally have with inmemory-columnstore, compression — these features let you work around these things," Kutrovsky sald, ‘Ww Next article i PRO+ relia Manogret = MMLC Pr Inthise-auide / DB2 BLU Acceleration boosts IBM's INsection Relational flagship RDBMS databases p2 — ‘Jack Vaughan, Senior News Writer ‘Section 2: NoSL databases Data managers have lately directed a lot of attention at advances in 36 specialized data warehouse engines and NoSQL databases, but flagship relational databases are not standing stil, as IBM's DB2 BLU Acceleration [AGetting more PRO+ essential software shows, content ps0 IBM's stalwart DB2 relational database management system (RDBMS), for ‘example, has added numerous capabilities, including enhanced in-memory data handling, data skipping, improved compression, support for columnar analytical processing and more, Some of these traits are just the kind of thing that has given new-generation relational analytic engines and NoSQL. upstarts their allure Columnar processing, often coupled with compression, has become associated with the new breed analytical engines that arose from the tikes of Aster Data (now part of Teradata), Vertica (now part of HP), ParAccel (now part of Actian) and others. But several mainstay relational databases have come out with columnar enhancements. Columnar processing focuses processing efforts more narrowly on data sets specifically needed for common queries. Ithas multiple advantages, including reduced I/O and improved use of cache. age2s.oi60 Pr eer Inthise-guide INSection : Relational databases p2 ‘Section 2: NoSL databases p36 [AGetting more PRO+ essential content age29 60 p60 MMM E-guide Content MMM Recent updates that arrived in DB2 10.5, known collectively as “BLU ‘Acceleration, can support sped-up “I/O bound!" operations while still capitalizing on available in-house RDEMS skis, according to Kent Collins, who is database engineer and architect with Burlington Northern Santa Fe Railway (BNSF) Corp, based in Fort Worth, Texas. Improved data compression has had an immediate helpful effect in cutting memory requirements, he said “It's been very positive for us. We just moved a 400 GB database, and when ‘we finished it was 80 GB," he said, BNSF has also seen speed increases of ‘as much as a hundredfold for some queries with BLU. ‘Stepping down big data and turbocharging queries is important to BNSF, a railroad that is collecting more and more types of data on far-flung ‘operations that saw it in 2012 haul more than 1 million carloads of agricultural commodities, 2.2 million coal shipments, 47 million trailer or container shipments, and 1.7 milion carloads of industrial products. Said Collins, whose data feeds include text messages, radio messages and Video, "I am up to my elbows in unstructured data" He then quickly recalibrated the estimate. "I am up to my eyeballs." He said columr-level data processing that can be programmed using established SQL methods has been a big step toward taming the unstructured data deluge. PRO+ Pe Ek E-guide MMMM Inthise-guide INSection : Relational databases p2 ‘Section 2: NoSL databases p36 [AGetting more PRO+ essential content 60 agesat60 Take me out to the new RDBMS game Ina way, additions to relational databases are mirroring larger changes in data architecture, said Bernie Spang, director for strategy and marketing for IBM Database Software and Systems, "We've moved from the world where you defined your data problem and then decided which relational database to use. Now the question is, What data technology should | use?' And even in the RDBMSs, there is a difference between the old generation and the new generation. I's a new ball game" IBM has applied some state-of-the-art data technology with DB2 BLU, said IBM Distinguished Engineer Sam Lightstone. The compression is “actionable,” he sald, meaning that the mode of compression adapts to the kind of data being processed. It allows analytics to run on compressed data directly without decompression steps that add processing overhead, according to Lightstone. “BLU is compression-optimized, in-memory-optimized and its columnar,” he said. It supports data skipping (in which irrelevant data is ignored), parallelism and vector-processing scans too. “Itis the combination of these things that gives DB2 huge speedups," Lightstone said. E-guide Content MMLC Pr eer In thise-guide Narrowing the analytics gap section + Relational Many advances in data technology in recent years have been inthe realm of databases 92 ‘specialized analytical relational database management systems, according to industry observer Curt Monash, president of Monash Research and editor and publisher of DBMS2 and other blogs. But in general, flagship relational Seeton 2 NoSaL databases databases are “narrowing the gap,” he sai, p36 ——__. Monash said that DB2 BLU could be seen as a first step. “Inits first iteration, [NGetting more PRO+ essential itis a single-server product, and ‘in-memory single server’ is definitely a ‘content p60 limitation." As well, he points out that the first version of BLU is optimized for 10 TB databases, although it is capable of ramping up to 20 TB. Monash noted that IBM has other specialized analytical RDMBS approaches beyond DB2, one of which is its Netezza data warehouse appliance. IBM is far from alone in the race to enhance the major RDBMSs. As data- related challenges grow, resurgent RDBMS technology could well be ‘welcome by many. ‘Next article age 60 Pr eer Inthise-guide INSection : Relational databases p2 ‘Section 2: NoSL databases p36 [AGetting more PRO+ essential content p60 MMMM E-guide Content VMOU LLL Experts debate big data vs. SQL development ‘Mark Bruneli,Former News Director With the rise of big data technologies like Apache Hadoop, MapReduce and the bevy of open source products growing up around them, the good old traditional SQL database has been losing mindshare with some fairly influential application developers. And its not difficult to understand why. Big data is hot, and there have been plenty of headlines over the last few years questioning the long-term viability of SQL in the era of unstructured data, It's no surprise that many developers ‘want to follow suit with the big data pioneers at Google and Facebook — but the desire to go big isn't always a practical one. Just ask Tim O'Brien, an author and independent consultant who specializes inhelping companies work more effectively with developers, O’Brien, who ‘spoke about the future of relational databases at the recent O'Reilly Strata Conference in Santa Clara, believes that when one looks at the history of IT over the last several years, i's easy to understand why attitudes have changed, “There is a certain kind of developer that is really focused on the trends that are being set by that group of 50 people that do big architecture at a place like Facebook or Google,” O'Brien sald during a phone call after the oO aa Inthise-guide INSection : Relational databases p2 ‘Section 2: NoSL databases p36 [AGetting more PRO+ essential content 60 MMM E-guide Content MMM conference. “The conclusion that they came to in the last couple of years is ‘We would never use a relational database. Relational databases don't wee ‘While well-funded startups and big data crunching organizations like the Chicago Mercantile Exchange, NASDAQ, the Internal Revenue Service and others will follow suit with the likes of Google, the average company will continue to find that SQLs the right tool for most development projects for the foreseeable future, according to O'Brien, ‘OBrien offers three main reasons why organizations in general ‘can't ‘escape SQL development. For starters, its a language that has a great deal of inertia, he said. The majority of development tools and platforms, such as Ruby on Rails, are using SQL. Secondly i's the best query language available. Lastly, SQL was originally created as a way to help organizations ‘work more easily with multiple vendor’ databases —- and O'Brien predicts that SQL's abiity to unify will continue to be important for years to come. “I think the big data community is focused on creating this perception that the world is changing right now, and if you continue to use that old relational database technology, you're just going to be an old useless man working on. old useless systems," O'Brien said. "And | think that's false." ‘OBrien went on to suggest that in the next few years the “traditional” SQL database may evolve into something better and far more scalable - ‘something that blurs the lines between big data technology and more: familiar database management systems. He pointed to Google's Spanner database as one possible example of things to come. E-guide Content MMLC oO aa OL “I think Spanner points the way toward the future of big data for most In thise-guide companies," he said. "The important thing about Spanner is that it's SQL- a based, it provides transactions, itis horizontally scalable — and that's the big section Relational alfference. ‘ataases e2 ‘Another company that offers a possible glimpse of how SQL fits into the — future is Drawn to Scale, which bill its Spire product as “the first database IRSoction 2: NoSOL databases for large, user-facing applications built on Hadoop." Spire supports SQL and p36 MongoDB queries in addition to MapReduce, and is built to power large- scale websites, mobile deployments and other applications. INGetting more PRO+ essential “There is no reason why you can't use SQL to query everything, right? That content p60 is already happening. People are using SQL to query Hadoop," O’Brien said "Fast forward 20 years and | don't care how the database is deployed to me ‘as a developer. 'm just executing a SQL query and getting a result back. It's lke the difference between a cloud-based Linux machine and a real Linux machine, It's the interface that defines the experience.” ‘When it comes time to develop a big application or website, i's important to avoid the hype and simply pick the right tool for the job. While there may be temptation to discount relational altogether and go straight to big data technologies, i's important to weigh both approaches against the need of the job at hand. Conference attendee Felix Giguere Villegas, a distributed systems specialist who runs the Big Data Montreal user group, said he agrees with that point “For analyzing logs, you're probably better off with a tool like Hadooy said. "But for a lot of use cases, SQL does the trick quite well, especially at Page24ot60 Pr eer Inthise-guide INSection : Relational databases p2 ‘Section 2: NoSL databases p36 [AGetting more PRO+ essential content Pape.0160 p60 MMM E-guide Content MMM the scale most of us run at and especially considering the skis that are available in the marketplace at the moment.” Giguere Villegas went on to say that he would welcome any big data technologies that incorporate SQL. He said some of the SQL engines that run on top of Hadoop— such as Cloudera Impala — are proving that horizontal scalability for SQL is possible. The only problem is that these offerings do not boast the same level of maturity as the popular relational databases of today. "SQL is a very useful abstraction and, of course, there is momentum behind the fact that a bunch of people know it," Giguere Villegas said, “But it's not just momentum that is going to keep it there. Itis genuinely useful to have ‘SQL, and if we can have a mature, working, interactive, scalable SQL solution on top of a big data platform, that would be a big boon for everyone” ‘“wNextsection Pr eer Inthise-guide INSection : Relational databases p2 ‘Section 2: NoSL databases p36 [AGetting more PRO+ essential content ages6t60 p60 MMMM E-guide Content VMOU LLL Section 2: NoSQL databases ‘While myriad NoSQL database options have emerged to help businesses address big data requirements and scalability concerns, they aren't full replacements for traditional databases, Some companies are choosing NoSQL systems to support big data applications in completely non- relational environments, but others are combining them with a relational database management system or data warehouse —-an approach that ilustrates the frequent use of NoSQL to mean “not only SQL." The articles in this section examine the varied roles of NoSQL technologies and how they relate to mainstream relational databases, WALLLLLLLLLLLLLLLLIALLALALALLILILIDLLLLLALLLLLLLLILLLALALALLLLLLLLALLLALLLLLLLLLOD ‘SwNext article i PRO+ relia Manogret = MMLC Pr Inthise-auide I NoSQL databases dent relational Msecon ¢Reatonal software's data processing dominance cee = Jack Vaughan, Senior News Writer Msecton 2:NoSOL databases In 922, automaker Henry Ford famously wrote that his customers could 336 have a car painted any color they wanted-~as long as it was black. Unt recently, IT managers, application developers and business executives faced similarly limited choices in selecting database technologies. Relational databases built on top of the SQL programming language were the dominant engines powering corporate IT and business systems, with no real challengers in sight. [AGetting more PRO+ essential content 60 But things have changed. Startingin the mid-2000s, SQLs absolute ‘supremacy was undone by the likes of Yahoo, Google, Facebook, ‘Amazon.com and eBay. At those Internet giants and other companies, the need to run colossally scalable Web applications with varied and fast- ‘changing data requirements prompted efforts to findalternatives to mainstream relational databases. That ushered in first a stream, and over the past few yearsa torrent, of new technologies that eschewed rigid SQL development principles in favor of more flexible and scalable data designs. ‘Those databases are spread across several distinct product categories based on different data models. But they share a pithy umbrella term with a stake-in-the-ground sound: NoSQL, E-quide PRO+ Zon De MMLC Inthise-guide Allin the NoSQL Family ee org ties gue naa ue sae a Ree vo a on section Relational (tedsynma rence tay ah tg sbonessaecendwihean dats abe ace databases pe herhorsves yond we eben pay com MSecion2NoSOL databases 0) ces ast cet eeceteeereveecs emote Scone sever coon as anes p36 [AGetting more PRO+ essential content 60 ‘The truth is, though, that the NoSQL movementisn't really an upagainst- the-wall revolution seeking to eradicate relational databases. Yes, some NoSQL vendors do talk lke that's their ultimate goal. But the term NoSQL has been softened to also mean "not only SQL,” in recognition of the fact that many of the databases do incorporate some elements of SQL. More substantively, NoSQL technologies aren't positioned as wholesale apes8ot60 E-guide Content MMLC oO aa @LLLLLLLLLE™ replacements for relational software ~ they tend to be built for specific In thise-guide uses, usually involving large data sets that need to be accessed and updated a frequently. And that's how things are playing out on the ground thus far: NoSQL databases have become must-have items for companies with fast- ‘growing vaults of Web, social media, demographic and machine data, but often they're sharing data processing and analysis workloads with SQL- based software. INSection : Relational databases p2 INsecton 2: NoSOL databases 736 For example, Crttercism In. is a startup that helps organizations monitor the performance of their mobile applications, based on reabtime data . collected from more than 800 million mobile devices. In application Getting more PRO> essential performance management parlance, a user interaction with an app is called content ps0 ‘a request; Crittercism pulls in information about more than 30,000 requests per second, arate that adds up to nearly 3 billion a day. That has created a pool of more than 20 terabytes of data~ and the total only keeps growing, said Lars Kamp, vice president of business development at the San Francisco company. Included in the mix is data on application errors, crash diagnostics and what Crittercism calls "network breadcrumbs" documenting the trail of network calls and other processing events leading up to app problems. That data “is very unstructured and non-uniform, and varies widely from customer to ‘customer and application to application," said Mike Chesnut, the company's director of operations engineering, ages00t60 ~~ ; PRO+ Se E-guide Content MMLC Inthise-guide Meeting the old way halfway section + Relational “The sheer amount of information involved, and its variable nature, mandated databases 92 1 fresh approach to formatting the data, Using relational software would have required substantial processing overhead to maintain a database ‘schema that could accommodate all of theinformation, plus frequent Seeton 2 NoSaL databases downtime for making changes to the schema, Chesnut said; he added that p36 ‘the company had to be able to modify how it collects and stores data "on the fly, often several times a day.” Kamp was even blunter: “Crittercism as a Getting more PRO essential ‘compary would not have been possible 10 years ago." when SQL was the content 60 only choice, he sald Enter MongoDB, a NoSQL database running on the Amazon Web Services cloud. Like other NoSQL technologies, it offered schema design flexibility. That made it possible for Crittercism to store the error and crash data in a single “collection” - the MongoDB equivalent of a relational table without imposing a strict schema on the information. In turn, the lack of a fixed data structure with uniform fields has enabled the company's performance management service to "evolve organically" to meet the needs of different ‘customers, Chesnut said. Crittercism also uses Amazon.com's DynamoDB NoSQL database to store data on a specific request path that requires particularly fast performance, ‘according to Chesnut. But there's SQL in the company's database architecture, too. A PostgreSQL open source database holds highly relational operations data, and all of the information is summarized in a SQL- based Amazon Redshift data warehouse for analysis and reporting. Chesnut age 60 Pr eer Inthise-guide INSection : Relational databases p2 ‘Section 2: NoSL databases p36 [AGetting more PRO+ essential content age 60 p60 MMM E-guide Content MMM and his colleagues aren't NoSQL purists: "We're very engaged with exploring ‘any and all technology offerings that can help us solve our problems and better serve our customers," he sald, Recent surveys show that NoSQL databases are making inroads with big data users but overall, adoption is stil relatively low. For example, TechTarget’s 2013 Analytics & Data Warehousing Reader Survey found that, 21% of 222 respondents with active or in-the-works big data programs were Using or planning to deploy NoSQL systems as part of the efforts. Another survey conducted last year by Enterprise Management Associates Inc. and sight Consulting produced an almost identical result: In that case, 22% of the 259 respondents said they had NoSQL platforms in place. In a third survey, done by The Data Warehousing Institute, 32% of 189 respondents said their organizations were using NoSQL software. Even there, though, NoSQL technology was last on the adoption lst, traling behind relational databases, data appliances, columnar software and big-data fellow traveler Hadoop (see Figure ). Pr Seas Inthise-guide INSection : Relational databases p2 ‘Section 2: NoSL databases p36 [AGetting more PRO+ essential content 60 age42ot60 MMM E-guide Content MMM ‘What's in Your Big Data Environment? ‘cemagesctoranaatansusngyers cat pats fb oa araperet er of emece trainin rahe penttoming meio om in eatenaionah | TT cn seomewen atm recemornoune rie sc SE cements at ssxcctee Eas Greater penetration of data centers is expected going forward: Analyst ‘group Wikibon forecast last year that worldwide revenue for NoSQL ‘software and services would grow from $286 million in 2012 to $1826 billion in 2017, And venture capitals are betting big on that kind of growth, ‘MongoDB Inc, which leads the development of its namesake database, raised $150 milion in new funding last fll. That came shortly after $45 milion and $25 milion funding rounds by DataStax Ine. and Couchbase Inc, two other NoSQL vendors. Relational players hit from both sides Even the big relational database vendors have gotten into the NoSQL game. Oracle introduced a NoSQL database in late 2011 and was one of the lead oO aa Inthise-guide INSection : Relational databases p2 ‘Section 2: NoSL databases p36 [AGetting more PRO+ essential content 60 Page 42060 MMM E-guide Content MMM ‘sponsors of ast year's NoSQL Now! conference; Oracle representatives ‘gave a keynote speech and led two technical sessions at the event. Last June, IBM added support for MongoDB's application programming interface to its DB2 relational database, enabling users to store data there in the JavaScript Object Notation (JSON) format. DB2 can also handle graph and XML data, and IBM in March acquired Cioudant Inc, a NOSQL vendor that runs a hosted version of the JSON-based CouchDB database. Microsoft offers a NOSQL data store as part ofits Windows Azure cloud platform. Application-driven data needs and the growing move toward cloud ‘computing are creating a wider opening for NoSQL methods, said Carl Olofson, a database analyst at market research company IDG. For IT managers and business executives, though, he compared buying into NoSQL with investing in a new stock that doesn't have alot of market history. "Most of the NoSQL databases are new. They stil need to be battle tested," Olofson said. "If you're constantly changing data definitions and you can't ‘change your relational database fast enough, you might look at NoSQL. But there is risk” For one thing, NoSQL technologies typically don't provide full ACID capabilities — atomicity, consistency, isolation and durability — for ‘guaranteeng transaction integrity, as relational databases do. In adltion, they often lack enterprise-class services in areas such as disaster recovery, security and data quality, according to Olofson, Like other analysts, he also ‘expects a whittling of the welkpopulated ranks of NoSQL vendors as the market matures. oO aa Inthise-guide INSection : Relational databases p2 ‘Section 2: NoSL databases p36 [AGetting more PRO+ essential content 60 Page 44ot60 MMM E-guide Content MMM *NoSQL databases are really good for handling XML and JSON data, which includes a lot of things Java developers are working on these days,” said Wayne Eckerson, a TechTarget industry analyst and president of consultancy Eckerson Group Inc. In particular, they're well suited to high- performance Web applications "with a high volume of reads and writes,” Eckerson said. But, he added, they aren't such a good fit for “long-running ‘queries® and other complex analytics jobs. NoSQL software provides speed boost ‘That maps to the database architecture at Exelate, a marketing data ‘services and technology provider that uses a diverse range of tools to ‘supply information on household demographics and purchases to online advertisers and publishers. "Data is what we do," sald Elad Efraim, co- founder and chief technology officer at the New York company. That makes performance paramount, he added. And while Exelate didn't start out with NoSAL technology when it was founded seven years ago, the need for speed eventually led Efraim and his team to deploy Aerospike, an in-memory NoSQL database that has helped scale the company's infrastructure to rapidly handle as many as one trillion realtime data transactions a month, ‘Aerospike provides a highperformance repository for data on the user ‘session activity of website visitors that is constantly being updated, Efraim said, ‘We're talking about a large-scale system with a very high capacity of reads and writes that have to complete in some milliseconds, I's very important for us to make sure we can access the data in a way so that it can be made availabe [to our customers] for decision making." Pr eer Inthise-guide INSection : Relational databases p2 ‘Section 2: NoSL databases p36 [AGetting more PRO+ essential content ape4.o160 p60 MMM E-guide Content MMM ‘The database runs on servers at four fully replicated data centers ‘worldwide, indexing everything to memory and holding itn the server cluster for further processing, From there, the data can be mined and correlated to other information in analytics and back-office systems. To make that happen, though, Exelate’s applications don't solely use NoSQL software, ‘One layer above the Aerospike repository is a “pretty standard” MySQL relational database that lets customers aggregate data, Efraim said, The ‘company also uses an IBM Netezza appliance and relational database as a data warehouse for analytics uses. To put things in Henry Ford's terms users like Exelate and Crittercism no longer have to limit themselves to basic-black relational databases —and they're taking advantage of NoSQL's new color choices to drive applications that mainstream relational software isn't suited for. But SQL black isn't going ‘completely out of style with IT shoppers. For now, the two technologies are likely to share space in database garages. ‘Next article < i PRO+ prea E-guide br MMLC Pr Inthise-auide IN Slew of disparate NoSQL databases vie to Secon Raton displace RDBMSs, fit by fit databases p2 — ‘Jack Vaughan, Senior News Writer ‘Section 2: NoSL databases Cassandra, MongoDB, HBase—- they're just a few of the many NoSQL 36 databases now proliferating. These databases look to solve one problem or ‘another encountered by the steadfast relational database systems (RDBMSs) that have long ruled in the enterprise. But the very variety that makes the NoSQL sector so vibrant can make comparing afferent products ‘challenging and often fruitless - proposition for would-be users. [AGetting more PRO+ essential content p60 Before looking more at that issue, i's reasonable to ask why any of these NoSQL things matter at all. The short answer is that large-scale distributed processing is taking hold in more applications, thus exposing some of the ‘creaky flooring on which the RDBMS sits. In Web applications and enterprise apps alike, a common theme has been emerging: The relational database may not always be the best fi Examples of RDBMS misfits are common. The relational database can be too expensive to grow out in a widely distributed version. It doesn't easily ‘adapt to new styles of data for example, the unstructured information that's common in ig data applications It struggles with the massive data volumes coming from in-the-field sensors or Web server activity logs. Page 4660 Pr eer Inthise-guide INSection : Relational databases p2 ‘Section 2: NoSL databases p36 [AGetting more PRO+ essential content age 47ot 60 p60 MMMM E-guide Content VMOU LLL ‘As people have found more and more reasons to move work off of incumbent relational databases, what has emerged is a “fit for purpose” mentality of the kind that was a bit more prevalent in the days before the RDBMS became the all-purpose flour in the database server pantry. And the number of NoSQL database options developed to fit various purposes has grown greatly Searching for Cassandra ‘Apache Cassandra is a good example. Like some other NoSQL. technologies, the Cassandra database came about because of a big Web 2.0 fish -in this case, Facebook. The purpose for which Facebook created ‘Cassandra was to enable users of the social network to search their inboxes. When the database was launched in 2008, it supported replication ‘across geographically distributed data centers to quickly service the ‘searches of as many as 100 million users. Inside, Cassandra is a distributed key-value database that uses a row store ‘scheme and a peer-to-peer (or shared nothing) architecture. Its design incorporates some of the characteristics of Google BigTable and Amazon Dynamo, two early and influenti NoSQL databases. Along the way, ‘Cassandra has added support for MapReduce, gained a query language and triggers and refined its support for lightweight transactions and database ‘compaction. Facebook eventually replaced the Cassandra-based search system with a Hadoop and HBase implementation, but the company ceded the software to Pr eer Inthise-guide MMM E-guide Content MMM ‘open source; a community arose to carry it forward, and Cassandra became ‘a top-level Apache Software Foundation project in 2010. INSection : Relational databases p2 ‘Section 2: NoSL databases p36 [AGetting more PRO+ essential content Pape48ct60 p60 Mapping to the problems ‘Cassandra represented a good fit for the needs of Internet Identity, according to Jason Atlas, vice president of technology and engineering at the Tacoma, Wash-based security services company. Known as IID, the ‘company had a rapidly growing database of IP addresses running on a MySQL RDBMS cluster, But for cost and other reasons, the MySQL path didn't seem tenable going forward, ID was harvesting and collecting 600,000 unique IPv4 addresses and host names per week. Related metadata collections were also growing. "We started to see that we couldn't store more than 30 days of information at ‘one time,” Atlas said. "The problems largely revolved around scale.” He added that the IPv4 data “lent itself to a key-value approach,” which Ultimately led IID to the DataStax Enterprise version of Cassandra. Cassandra is built to run on commodity clusters, as might be expected given its Google-Amazon-Facebook lineage. Its focus on scalability bears frut, in ‘Atlas's estimation: He said its "coming as close to linear scaling" as anything he has previously seen He also gives points to DataStax for a Cassandra-MapReduce integration that he expects to use going forward But he cautioned those who are looking to embrace Cassandra or other NoSQL databases, offering a reminder that itis unwise to forceit Pr eer Inthise-guide MMM E-guide Content MMM technologies onto problems. “It's always best to map the problem onto the solution, Atlas said, INSection : Relational databases p2 ‘Section 2: NoSL databases p36 [AGetting more PRO+ essential content age490t60 p60 How do | NoSQL? Let me count the ways Sorting through the variety in the NoSQL. spaceis nothing short of daunting ‘Some NoSQL vendors are becoming household names in database circles for example, DataStax and a quartet of other NoSQL database makers (Basho Technologies Inc, Couchbase Inc, MarkLogie Corp. and MongoDB Inc.) were listed among the top vendors of operational database management systems in a recent Gartner Inc. Magic Quadrant report. But there are dozens of NeSQL offerings in several distinct product categories - ~ and different databases in the same category were bul to support different uses. Is all abit of a maze to navigate. | caught up with Gartner analyst Merv Adrian on this issue in the ‘Twittersphere. In a tweet, he had pointed to a Linux Journal reader poll ‘comparing NoSQL databases. Adrian deadpanned: “in related news --do you prefer apples, cocktails or broccoli?" While rolling on the floor laughing, | tweeted him that I thought | understood his point. He tweeted back: "It's Useless -- and meaningless ~- to compare ‘NoSQL' products that are so wildly different in structure and intent.” ‘Atlas made a similar point. “Mongo and Cassandra have nothing to do with ‘one another, but are still both called 'NoSQL’ Their use cases are very different,” he said PRO+ E-guide Content MMLC Oo Po eer @LLLLLLLLLE™ Ultimately, we should expect some thinning of the NoSQL ranks. Cassandra In thise-guide is showing signs that it could be one of the survivors. But despite being fit a {for some specific purposes, it and others under the NoSQL umbrella may INSection Relational need to find more general uses to truly thrive. databases p2 LALILLLLLLLLLLILLLALILLLALLLLLLILLLALALALLLLLLLLLLLLALALALLLLLLLLALALALLLILLLLLO ‘Next article ‘Section 2: NoSL databases p36 [AGetting more PRO+ essential content p60 apes060 < i PRO+ prea E-guide br Pr MMLC Inthise-auide When does a NoSQL DB trump a section + Relational traditional database? databases 2 Mork Whitehorn, mers Professor of Anaycs [Section 2: NoSQL databases We're looking at the issue of NoSQL vs. SQL databases. When does it 936 ‘make sense to consider using a NoSQL DB rather than a relational database? Getting more PRO> essential Put simply, NOSQL databases are a better choice when you have data that content p60 doesn't fit well nto tables. We have worked on SQL-based relational databases for about 40 years now; the result ofall that work is that they are very good at handling trarsactions involving tabular data stored in rows and columns. We can also analyze such data very effectively in dimensional databases. The kind of data that fits well into relational tables is known as atomic dat which simply means that we split the data up into the smallest components ‘we want to manipulate. For example, we don't usually store the complete name of a customer in one field. If we're adding data about a customer named "Mr. James Mason’ to a relational database—but we want to be able to find al ofthe people withthe tile "Mr in the database and sort ‘customers by both first and last name —-we would store the name data in three distinct columns in a table. age sc60 oO aa Inthise-guide INSection : Relational databases p2 ‘Section 2: NoSL databases p36 [AGetting more PRO+ essential content 60 Pages20160 MMM E-guide Content MMM However, a great deal of the data we are now collecting doesn't tabularize ‘well. We're talking about images, sensor data, Word documents, Twitter feeds and so on - or what is often called big data. Even when we can put ‘such data in tables, it may not be efficlent to do so. For example, you could store every pixel of an image as a row in a relational table, But then you have to ask yourself, "What SQL code could | write to determine if the image includes a person?" | can't even begin to imagine what that would look like. ‘The good news is we have specitic database engines that are built to hold ‘and manage big data: NoSQL databases. Relational databases require us to impose what is called a schema in the data. Think of the schema as a way of ‘organizing the data: In relational databases, we have to split up data into atomic units and then organize it as columns and rows in tables, NoSQL database engines come in a variety of different types, so too much ‘generalization can be misleading, But in general they require only a very simple schema and sometimes no schema at all. For example, in some NoSQL database systems, we could put mage files straight into the database without altering their structure, We could also put audio fies into the same database. Getting data into the database becomes much simpler, ‘and there's more flexibility on how that data is structured So if you have data that can't be put into tabular form in an elegant way, or that needs queries that can't be comfortably expressed in SQL, think about looking at the range of NoSQL database engines that are available. ‘Next article i PRO+ BeCCR ea E-cuido m MMLC Pr Innse-guide FM NoSQL security: Do NoSQL database INsection Relational ; security features stack up to RDBMS? sees “ ‘Michael Cobb, CISSP-ISSAP omens oso cones NoSOL, oF Not On SOL, an approach to data storage and ceva thats Soe Wary fashionable ith startope developing nterseve Web apcaons ae enterprises dealing with huge quantities of data The main reason for its popularity is that t provides better scalabilty and availabilty, as well as faster access to data, than traitional relational database management systems (RDBMS), including Oracle's MySQL and Microsoft's SQL Server. [AGetting more PRO+ essential content 60 Data held in a RDBMS has to be predictable so it can be stored in organized tables and rows, with relationships defined between different elements. Data ina NoSQL database, on the other hand, doesn't need to be so structured or follow a fixed schema. When performance and real-time access are more important than consistency, such as when indexing and retrieving a large number of records, NoSQL is a better fit than a relational database. Data ‘can also be more easily held across multiple servers, providing improved fault tolerance and scalability. Companies like Google and Amazonuse their ‘own cloud-friendly NoSQL database technologies, and there are a number ‘of commercial and open source NoSQL databases available, such as ‘Couchbase, MongoDB, Cassandra and Riak. For all the advantages of storing data in a NoSQL database, NoSQL security is adversely impacted by the need to access data quickly and easily. To <2 ic Ser ras E-guide PRO+ MMLC Pr XXKLL_LLE™ store information securely, a database needs to provide confidentiality, In thise-guide integrity and availablity (CIA). Enterprise RDBMS databases provide CIA a through integrated security features such as role-based security, encrypted Niseoion t Relational ‘communications, support for row and field access control, as well as access ae we control through userevel permissions on stored procedures. RDBMS databases also have ACID (atomicity, consistency, isolation, durability) —_ properties that guarantee database transactions are processed reliably; IMSection 2: NoSL databases data replication and logging ensure durability and data integrity. These p36 features increase the time it takes to retrieve large amounts of data, so they ee are not implemented in NoSQL databases. [AGetting more PRO+ essential tr “ In order to maintain fast access to data, NoSQL databases come with litle content p builtin security. They have what's called BASE (basically availabe, soft state, eventually consistent) properties; rather than requiring consistency after every transaction, the database just needs to eventually reach a consistent state. For example, when users view data, such as the number of items in stock, they may see the last snapshot taken of the data rather than ‘a current view. Because transactions aren't written to the database immediately, there is a possiblity that simultaneous transactions could interfere with each other. This inherent race condition, in which users do not necessarily see the same data at the same time, means a NoSQL database could never be used for handling financial transactions, NoSQL databases also lack confidentiality and integrity. As NoSQL databases don't have a schema, permissions on a table, column or row can't be segregated. This can also lead to multiple copies of the same data, This, ‘can make it hard to keep data consistent, particularly as changes to multiple ages 0160, oO aa Inthise-guide INSection : Relational databases p2 ‘Section 2: NoSL databases p36 [AGetting more PRO+ essential content 60 Page ot 60 MMM E-guide Content MMM tables can't be wrapped in a transaction where a logical unit of insert, Update or delete operations is executed as a whole. With more than 20 different implementations of NoSQL available, a lack of standards also increases the complexities of keeping data secure. Confidentiality and integrity have to be provided entirely by the application accessing the NoSQL data, It is not a sound practice to have the last line of defense for any valuable data at the application level. Application developers are not renowned for implementing security features, and new code usually means new bugs. Any requests sent to a NoSQL database need to be escaped, filtered and validated, while the database itself needs to reside in a hardened environment. Interestingly some NoSQL projects are now starting to add back RDBMS- type security features. Oracle, for example, added transactional control over data written to one node. Cassandra supports transaction logging and ‘automatic replication, and MongoDB supports master-slave replication. If scalability and availablity are the key database requirements for an ‘organization, then NoSQL may be the right choice for certain large data sets. However, system architects should take a close look at their requirements for security, privacy and data integrity before choosing a NoSQL database. The lack of NoSQL security features, namely ‘authentication or authorization support, means that sensitive data is best kept in a traditional RDBMS. ‘Next article < i PRO+ prea E-guide br MMLC Pr Inthise-ouide ® How non-relational database technologies Msecon ¢Reatonal free up data to create value sees “ ‘Nick Millman and Pankaj Sodhi MSecion2NoSOL databases “The proliferation of multiple nor-elational databases is transforming the 338 data management landscape Instead ot having to force structures nto their data, organisations can now choose NoSQL database architectures. that fit their emerging data needs, as well as combining these new technologies with conventional relational databases to drive new value from their information, [AGetting more PRO+ essential content p60 Until recently, data’s potential as a source of rich business insight has been limited by the structures that have been imposed upon it. Without access to the new database technologies now available, standard back-end design practice has been to force data into rigid architectures (regardless of variations in the structure of the actual data). Inherently inflexible, these legacy architectures have prevented ‘organisations from developing new use cases for the exploitation of structured and unstructured information. ‘The ongoing proliferation of non-relational database architectures marks a watershed in data management. What is emerging is a new world of horizontally scaling, unstructured databases that are better at solving some apes60 E-guide Content MMLC oO aa LLAMA problems, along with traditional relational databases that remain relevant for In thise-guide others. culm Teehnoogy as evolved to the extent that organisations need no longer be Sesion feston! gate by alk of chai in database arcitectires As fronteunners have moved to dently the database options that match the specifi data hea ative hey enanges becoming increasingly prevalent Curing secon NS taba bow 038 1. Arebalancing of the database landscape, asdata architects began to ‘embrace the fact that their architecture and design toolkit has ‘Getting more PRO+ essential evolved from being relational database-centric to also including a content ps0 vatied and maturing set of non-relational options (NoSQL database systems). 2. The increasing pervasiveness of hybrid data ecosystems powered by disruptive technologies and techniques (such as the Apache Hadoop software framework for costeffective processing of data at extreme scale). 8. The emergence of more responsive data management ecosystems to provide the flexibility needed to undertake prototyping-enabled delivery (test-prove-industralise) at lower cost and at scale, From now on, savvy analytical leaders will be seeking to crystallise the use ‘cases to which platforms are best suited. Instead of becoming overly focused on the availabilty of new technologies, they wil identify the "sweet spots" where relational and non-relational databases can be combined to ccreate value for information above and beyond its original purpose. Page 70160 E-guide Content MMLC oO aa LLL” By taking advantage of the new world of choice in data architectures, more Inthise-guide ‘organisations will be equipped to identify and exploit breakthrough ‘opportunities for data monetisation. Sector Reston ‘Just as communications operators have created valuable B2B revenue joeses a ‘streams from the wealth of customer data at their disposal, so better usage _ of their existing data will empower other companies to build potent new INsecton 2: NoSOL databases business models. 936 Implementing a rethink of how data is stored, processed and enriched means re-evaluating the traditional world of data management. Until now, [AGetting more PRO+ essential data has been viewed as a structured asset and a cost centre that must be content p60 maintaned. ‘The availabilty of new database architectures means that this mindset will ‘change forever. Data management in a services-led world will require IT leaders to think about how the business can most easily take advantage of the data they have and the data they may previously have been unable to harness. Agile data services architecture ‘As more architecture options become available, data lifecycles will shrink ‘and become more agile. Rather than seeking to “over control” data, ‘approaches to data managemert will become much less rigid, One key alm wil be to open up new possiblities by encouraging and facilitating data sharing, Amazon stands out as a pioneer inthis field. By bulding a service- oriented platform with an agile data services architecture, the company has been able to offer new services around cloud storage and data management ages 60 Zon Co . PRO+ Seas E-guide Content MMLC LLL” ~ a8 well as giving itself the flexiblity needed to cope with future demand for In thise-guide as yet unknown services. Unprecedented accessibility to non-relational databases is reinvigorating the Sesion | Relational role of conventional architectures and “traditional” data management satabases p2 disciplines. From now on, analytics leaders will increasingly move to adopt hybrid architectures that combine the best of both worlds to leverage fresh ‘Section 2: NoSL databases new insights from the surging volumes of structured and unstructured 36 information that are now the norm, In summary, there has never been a more exciting time to be a data management professional [AGetting more PRO+ essential SALIILLLLLLLLLLLILILLLLLLLLLLLLLLLLDLLLLLLLLLLL LLL LLLLLLALLLLLLLEDDALALLLLLLLED content p60 Next article ages.t60 Oo Po eer Inthise-guide INSection : Relational databases p2 ‘Section 2: NoSL databases p36 [AGetting more PRO+ essential content 60 age60ot 60 MM E-guide Content CMLL LLL LOLOL J Getting more PRO+ exclusive content This e-guideis made available to you, our member, through PRO+ Offers—a collection of free publications, training and special opportunities specifically gathered from our partners and across our network of sites. PRO+ Offers isa free benefit only available to members of the TechTarget network of sites. Take full advantage of your membership by visiting http://pro.techtarget.com/ProLP/

Vous aimerez peut-être aussi