Distribution, Data, Deployment: Software Architecture Convergence in Big Data Systems

FEATURE: BIG DATA SOFTWARE SYSTEMS
Distribution, Addressing the challenges of soft-

ware for big data systems requires
careful design tradeoffs spanning
Data,
the distributed software, data, and
deployment architectures. It also
requires extending traditional soft-
Deployment
ware architecture design knowledge
to account for the tight coupling
that exists in scalable systems. Scale
drives a consolidation of concerns,
Software Architecture so that distribution, data, and de-
ployment architectural qualities can
Convergence no longer be effectively considered

separately. To illustrate this, we’ll
use an example from our current re-
in Big Data Systems search in healthcare informatics.
The Challenges
Ian Gorton and John Klein, Software Engineering Institute of Big Data
Data-intensive systems have long
been built on SQL database tech-
// Big data systems present many challenges to nology, which relies primarily on
vertical scaling—faster processors
software architects. In particular, distributed and bigger disks—as workload or
software architectures become tightly coupled storage requirements increase. SQL
databases’ inherent vertical-scaling
to data and deployment architectures. This limitations 4 have led to new prod-
causes a consolidation of concerns; designs ucts that relax many core tenets of
must be harmonized across these three relational databases. Strictly defi ned
normalized data models, strong data
architectures to satisfy quality requirements. // consistency guarantees, and the
SQL standard have been replaced
by schemaless and intentionally de-
normalized data models, weak con-
sistency, and proprietary APIs that
expose the underlying data man-
agement mechanisms to the pro-
grammer. These NoSQL products 4
typically are designed to scale hori-
zontally across clusters of low-cost,
THE EXPONENTIAL GROWTH of data repositories ever constructed. moderate-performance servers. They
data over the last decade has fueled a Their pioneering efforts, 2,3 along achieve high performance, elastic
new specialization for software tech- with those of numerous other big storage capacity, and availability by
nology: data-intensive, or big data, data innovators, have provided a va- partitioning and replicating datasets
software systems.1 Internet-born or- riety of open source and commercial across a cluster. Prominent examples
ganizations such as Google and Am- data management technologies that of NoSQL databases include Cas-
azon are on this revolution’s cutting let any organization construct and sandra, Riak, and MongoDB (see the
edge, collecting, managing, storing, operate massively scalable, highly sidebar “NoSQL Databases”).
and analyzing some of the largest available data repositories. Distributed databases have funda-
78 I E E E S O F T WA R E | PUBLI S HED BY THE IEEE COMPUTER SO CIE T Y 0 74 0 -74 5 9 / 15 / $ 3 1. 0 0 © 2 0 15 I E E E

NOSQL DATABASES
The rise of big data applications has caused a significant flux typically, some form of directed graph. They can provide ex-
in database technologies. While mature relational database ceptional performance for problems involving graph traversals
technologies continue to evolve, a spectrum of databases and subgraph matching. Because efficient graph partition-
called NoSQL has emerged over the past decade. The rela- ing is an NP-hard problem, these databases tend to be less
tional model imposes a strict schema, which inhibits data concerned with horizontal scaling and commonly offer ACID
evolution and causes difficulties scaling across clusters. In (atomicity, consistency, isolation, durability) transactions to
response, NoSQL databases have adopted simpler data mod- provide strong consistency. Examples include Neo4j (www
els. Common features include schemaless records, allowing .neo4j.org) and GraphBase (http://graphbase.net).
data models to evolve dynamically, and horizontal scaling, by NoSQL technologies have many implications for applica-
sharding (partitioning and distributing) and replicating data tion design. Because there’s no equivalent of SQL, each tech-
collections across large clusters. Figure A illustrates the four nology supports its own query mechanism. These mecha-
most prominent data models, whose characteristics we sum- nisms typically make the application programmer responsible
marize here. More comprehensive information is at http:// for explicitly formulating query executions, rather than relying
nosql-database.org. on query planners that execute queries based on declara-
Document databases (see Figure A1) store collections of tive specifications. The programmer is also responsible for
objects, typically encoded using JSON (JavaScript Object combining results from different data collections. This lack of
Notation) or XML. Documents have keys, and you can build the ability to perform JOINs forces extensive denormalization
secondary indexes on nonkey fields. Document formats are of data models so that JOIN-style queries can be efficiently
self-describing; a collection might include documents with executed by accessing a single data collection. When data-
different formats. Leading examples are MongoDB (www bases are sharded and replicated, the programmer also must
.mongodb.org) and CouchDB (http://couchdb.apache.org). manage consistency when concurrent updates occur and
Key–value databases (see Figure A2) implement a distrib- must design applications to tolerate stale data due to latency
uted hash map. Records can be accessed primarily through in update replication.
key searches, and the value associated
with each key is treated as opaque, requir-
ing reader interpretation. This simple model “id”: “1”, “Name”: “John”, “Employer”: “SEI”
“id”: “2”, “Name”: “Ian”, “Employer”: “SEI”, “Previous”: “PNNL”
facilitates sharding and replication to cre-
ate highly scalable and available systems. (1)
Examples are Riak (http://riak.basho.com) “key”: “1”, value{“Name”: “John”, “Employer”: “SEI”}
and DynamoDB (http://aws.amazon.com/ “key”: “2”, value{“Name”: “Ian”, “Employer”: “SEI”, “Previous”: “PNNL”}
dynamodb). (2)
Column-oriented databases (see Figure
“row”: “1”, “Employer” “Name”
A3) extend the key–value model by organiz- “SEI” “John”
ing keyed records as a collection of columns, “row”: “2”, “Employer” “Name” “Previous”
where a column is a key–value pair. The key “SEI” “Ian” “PNNL”
(3)
becomes the column name; the value can
be an arbitrary data type such as a JSON Node: Employee “is employed by” Node: Employer
“id”: “1”, “Name”: “John” “Name”: “SEI”
document or binary image. A collection can “id”: “2”, “Name”: “Ian”
contain records with different numbers of “Name”: “PNNL”
“previously employed by”
columns. Examples are HBase (http://hbase
(4)
.apache.org) and Cassandra (https:/
/cassandra.apache.org).
Graph databases (see Figure A4) orga- FIGURE A. Four major NoSQL data models. (1) A document store. (2) A key–
nize data in a highly connected structure— value store. (3) A column store. (4) A graph store.
M AY/J U N E 2 0 1 5 | I E E E S O F T WA R E 79
mental quality constraints, defined igently evaluate candidate database lion.10 Analysis of petabytes of data
by Eric Brewer’s CAP (consistency, technologies and select databases across patient populations, taken
availability, partition tolerance) the- that can satisfy application require- from diverse sources such as insur-
orem.5 When a network partition ments. This often leads to polyglot ance payers, public health entities,
occurs (causing an arbitrary mes- persistence—using different data- and clinical studies, can reduce costs
sage loss between cluster nodes), a base technologies to store different by improving patient outcomes. In
system must trade consistency (all datasets in a single system, to meet addition, operational efficiencies
readers see the same data) against quality attribute requirements. 8 can extract new insights for disease
availability (every request receives a Furthermore, as data volumes treatment and prevention.
success or failure response). Daniel grow to petascale and beyond, the Across these and many other do-
Abadi’s PACELC provides a practi- required hardware resources grow mains, big data systems share four
cal interpretation of this theorem. 6 If from hundreds to tens of thousands requirements that drive the design of
a partition (P) occurs, a system must of servers. At this deployment scale, suitable software solutions. Collec-
trade availability (A) against consis- many widely used software architec- tively, these requirements represent
tency (C). Else (E), in the usual case ture patterns are unsuitable. Archi- a significant departure from tradi-
of no partition, a system must trade tectural and algorithmic approaches tional business systems, which are
latency (L) against consistency (C). that are sensitive to hardware re- relatively well constrained in terms
Additional design challenges for source use can significantly reduce of data growth, analytics, and scale.
scalable data-intensive systems stem overall costs. For more on this, see First, from social media sites to
from the following three issues. the sidebar “Why Scale Matters.” high-resolution sensor data collec-
tion in the power grid, big data sys-
tems must be able to sustain write-
heavy workloads.1 Because writes
Big data systems must be able to sustain are costlier than reads, systems can
use data sharding (partitioning and
write-heavy workloads. distribution) to spread write opera-
tions across disks and can use rep-
lication to provide high availability.
Sharding and replication introduce
First, achieving high scalability and Big Data Application availability and consistency issues
availability leads to highly distrib- Characteristics that the systems must address.
uted systems. Distribution occurs in Big data applications are rapidly The second requirement is to deal
all tiers, from webserver farms and becoming pervasive across a range with variable request loads. Business
caches to back-end storage. of business domains. One example and government systems experience
Second, the abstraction of a sin- domain in which big data analytics highly variable workloads for rea-
gle system image, with transactional looms prominently on the horizon sons including product promotions,
writes and consistent reads using is aeronautics. Modern commer- emergencies, and statutory deadlines
SQL-like query languages, is diffi- cial airliners produce approximately such as tax submissions. To avoid the
cult to achieve at scale.7 Applications 0.5 Tbytes of operational data per costs of overprovisioning to handle
must be aware of data replicas; han- flight.9 This data can be used to di- these occasional spikes, cloud plat-
dle inconsistencies from conflicting agnose faults in real time, optimize forms are elastic, letting applications
replica updates; and continue opera- fuel consumption, and predict main- add processing capacity to share
tion in spite of inevitable processor, tenance needs. Airlines must build loads and release resources when
network, and software failures. scalable systems to capture, manage, loads drop. Effectively exploiting this
Third, each NoSQL product em- and analyze this data to improve re- deployment mechanism requires ar-
bodies a specific set of quality attri- liability and reduce costs. chitectures with application- specific
bute tradeoffs, especially in terms of Another domain is healthcare. strategies to detect increased loads,
performance, scalability, durability, Big data analytics in US healthcare rapidly add new resources, and re-
and consistency. Architects must dil- could save an estimated $450 bil- lease them as necessary.
80 I E E E S O F T WA R E | W W W. C O M P U T E R . O R G / S O F T W A R E | @ I E E E S O F T WA R E
WHY SCALE MATTERS
Scale has many implications for software architecture; here Other strategies target the development tool chain to
we look at several. maintain developer productivity while decreasing resource
The first implication focuses on how scale changes our use. For example, Facebook built HipHop, a PHP-to-C++
designs’ problem space. Big data systems are inherently transformation engine that reduced the CPU load for serving
distributed. Their architectures must explicitly handle partial Web pages by 50 percent.4 At the scale of Facebook’s deploy-
failures, communication latencies, concurrency, consistency, ment, this creates significant operational-cost savings. Other
and replication. As systems grow to thousands of process- targets for reduction are software license costs, which can be
ing nodes and disks and become geographically distributed, prohibitive at scale. This has led some organizations to cre-
these issues are exacerbated as the probability of a hardware ate custom database and middleware technologies, many of
failure increases. One study found that 8 percent of servers in which have been released as open source. Leading examples
a typical datacenter experience a hardware problem annually, of technologies for big data systems are from Netflix (http://
with disk failure most common.1 Applications must also deal netflix.github.io) and LinkedIn (http://linkedin.github.io).
with unpredictable communication latencies and network Other implications of scale include testing and fault diag-
connection failures. Scalable applications must treat failures nosis. Owing to these systems’ deployment footprints and the
as common events that are handled gracefully to ensure un- massive datasets they manage, comprehensively validating
interrupted operation. code before deployment to production can be impossible.
To deal with these issues, resilient architectures must ful- Canary testing and Netflix’s Simian Army are examples of the
fill two requirements. First, they must replicate data to ensure state of the art in testing at scale.5 When problems occur in
availability in the case of a disk failure or network partition. production, advanced monitoring and logging are needed for
Replicas must be kept strictly or eventually consistent, using rapid diagnosis. In large-scale systems, log collection and
either master–slave or multimaster protocols. The latter need analysis itself quickly becomes a big data problem. Solutions
mechanisms such as Lamport clocks2 to resolve inconsisten- must include a low-overhead, scalable logging infrastructure
cies due to concurrent writes. such as Blitz4j.6
Second, architecture components must be stateless,
replicated, and tolerant of failures of dependent services. For
example, by using the Circuit Breaker pattern3 and returning
cached or default results whenever failures are detected, an References
1. K.V. Vishwanath and N. Nagappan, “Characterizing Cloud Comput-
architecture limits failures and allows time for recovery. ing Hardware Reliability,” Proc. 1st ACM Symp. Cloud Computing
Another implication is economics based. At very large (SoCC 10), 2010, pp. 193–204.
scales, small optimizations in resource use can lead to very 2. L. Lamport, “Time, Clocks, and the Ordering of Events in a Distrib-
uted System,” Comm. ACM, vol. 21, no. 7, 1978, pp. 558–565.
large cost reductions in absolute terms. Big data systems can 3. M.T. Nygard, Release It! Design and Deploy Production-Ready
use many thousands of servers and disks. Whether these are Software, Pragmatic Bookshelf, 2007.
capital purchases or rented from a service provider, they remain 4. H. Zhao, “HipHop for PHP: Move Fast,” blog, 2 Feb. 2010; https://
developers.facebook.com/blog/post/2010/02/02/hiphop-for-php
a major cost and hence a target for reduction. Elasticity can re- --move-fast.
duce resource use by dynamically deploying new servers as the 5. B. Schmaus, “Deploying the Netflix API,” blog, 14 Aug. 2013; http://
load increases and releasing them as the load decreases. This techblog.netflix.com/2013/08/deploying-netflix-api.html.
6. K. Ranganathan, “Announcing Blitz4j—a Scalable Logging Frame-
requires servers that boot and initialize quickly and application- work,” blog, 20 Nov. 2012; http://techblog.netflix.com/search/label
specific strategies to avoid premature resource release. /appender.
The third requirement is to sup- lytics on significant portions of software and data architectures to
port computation-intensive analyt- the data collection. This leads to partition simultaneously between
ics. Most big data systems must software and data architectures low-latency requests and requests
support diverse query workloads, explicitly structured to meet these for advanced analytics on large
mixing requests that require rapid varying latency demands. Netflix’s data collections, to continually en-
responses with long-running re- Recommendations Engine is a pio- hance personalized recommenda-
quests that perform complex ana- neering example of how to design tions’ quality.11
Consider the consistency require-

ments for two categories of data in
this system: patient demographics
(for example, name and insurance
Web or application
servers provider) and diagnostic-test results
Web or application Web or application (for example, blood or imaging test
servers servers results). Patient demographic records
are updated infrequently. These up-
Global dates must be immediately visible
network at the local site where the data was
modified (“read your writes”), but a
Shard Shard Shard Shard Shard Shard
1 2 3 1 2 3 delay is acceptable before the update
Patient data Patient data is visible at other sites (“eventual
replica consistency”). In contrast,
Shard Shard Shard Shard Shard Shard diagnostic-test results are updated
1 2 3 1 2 3
more frequently, and changes must
Test results data Test results data
be immediately visible everywhere
Datacenter 1 Data replication Datacenter 2 to support telemedicine and re-
mote consultations with specialists
FIGURE 1. A MongoDB-based healthcare data management prototype. (“strong replica consistency”).
Geographically distributed datacenters increase availability and reduce latency for We’re prototyping solutions using
globally distributed users. several NoSQL databases. We focus
here on one prototype using Mon-
goDB to illustrate the architecture
The fourth requirement is high reliably satisfy queries under an drivers and design decisions. The
availability. With thousands of nodes increased load. design segments data across three
in a horizontally scaled deployment, shards and replicates data across
hardware and network failures inev- This coupling of architectures to sat- two datacenters (see Figure 1).
itably occur. So, distributed software isfy a particular quality attribute is MongoDB enforces a master–
and data architectures must be resil- common in big data applications. It slave architecture; every data collec-
ient. Common approaches for high can be regarded as a tight coupling tion has a master replica that serves
availability include replicating data of the process, logical, and physical all write requests and propagates
across geographical regions,12 state- views in the 4 + 1 view model.13 changes to other replicas. Clients
less services, and application-specific can read from any replica, opening
mechanisms to provide degraded An Example of the an inconsistency window between
service in the face of failures. Consolidation of writes to the master and reads from
These requirements’ solutions Concerns other replicas.
crosscut the distribution, data, and At the Software Engineering Insti- MongoDB allows tradeoffs be-
deployment architectures. For exam- tute, we’re helping to evolve a sys- tween consistency and latency
ple, elasticity requires tem to aggregate data from multiple through parameter options on each
petascale medical-record databases write and read. A write can be un-
• processing capacity that can be for clinical applications. To attain acknowledged (no assurance of du-
acquired from the execution high scalability and availability at rability, low latency), durable on
platform on demand, low cost, we’re investigating using the master replica, or durable on
• policies and mechanisms to NoSQL databases for this aggrega- the master and one or more replicas
appropriately start and stop tion. The design uses geographically (consistent, high latency). A read can
services as the application load distributed datacenters to increase prefer the closest replica (potentially
varies, and availability and reduce latency for inconsistent, low latency), be re-
• a database architecture that can globally distributed users. stricted to the master replica (consis-
tent, partition intolerant), or require
most replicas to agree on the data Performance
value to be read (consistent, parti-
tion tolerant). Control resource demand Manage resources
The application developer must
choose appropriate write and read Reduce Increase Increase Manage multiple
overhead resources concurrency copies of data
options to achieve the desired per-
formance, consistency, and durabil-
ity and must handle partition errors
to achieve the desired availability. In
our example, patient demographic
Data model denormalized
data writes must be durable on the Data Database partitioning to to support single
primary replica, but reads can be di- distribute read load query per use case
rected to the closest replica for low Distributed webserver
Distribution Replicated stateless
latency. This makes patient demo- caching to reduce webservers
graphic reads insensitive to network database read load
partitions at the cost of potentially Replicated database
inconsistent responses. Deployment
across clusters
In contrast, writes for diagnostic-
test results must be durable on all
replicas. Reads can be performed FIGURE 2. Performance tactics for big data systems. The design decisions span the
from the closest replica because the data, distribution, and deployment architectures.
write ensures that all replicas are
consistent. This means writes must
handle failures caused by network Systematic Design ity requires masking faults that inev-
partitions, whereas reads are insensi- Using Tactics itably occur in a distributed system.
tive to partitions. In designing an architecture to sat- At the data level, replicating data
Today, our healthcare informatics isfy quality drivers such as those in items is an essential step to handle
application runs atop an SQL data- this healthcare example, one proven network partitions. When an ap-
base, which hides the physical data approach is to systematically select plication can’t access any database
model and deployment topology and apply a sequence of architec- partition, another tactic to enhance
from developers. SQL databases pro- ture tactics.15 Tactics are elemental availability is to design a data model
vide a single-system-image abstrac- design decisions, embodying archi- that can return meaningful default
tion, which separates concerns be- tectural knowledge of how to satisfy values without accessing the data.
tween the application and database one design concern of a quality attri- At the distributed-software layer,
by hiding the details of data distribute. Tactic catalogs enable reuse of caching is a tactic to achieve the
bution across processors, storage, this knowledge. However, existing default-values tactic defi ned in the
and networks behind a transactional catalogs don’t contain tactics spe- data model. At the deployment layer,
read/write interface.14 In shifting to cific to big data systems. an availability tactic is to geographi-
a NoSQL environment, an applica- Figures 2 and 3 extend the basic cally replicate the data and distrib-
tion must directly handle the faults performance and availability tac- uted application software layers to
that will depend on the physical tics15 to big data systems. Figure 4 protect against power and network
data distribution (sharding and rep- defi nes scalability tactics, focusing outages. Each of these tactics is nec-
lication) and the number of replica on the design concern of increased essary to handle the different types
sites and servers. These low-level in- workloads. Each figure shows of faults that threaten availability.
frastructure concerns, traditionally how the design decisions span the Their combined representation in
hidden under the database interface, data, distribution, and deployment Figure 3 provides architects with
must now be explicitly handled in architectures. comprehensive guidance to achieve
application logic. For example, achieving availabil- highly available systems.
Availability
B ig data applications are
pushing the limits of soft-
ware engineering on mul-
tiple horizons. Successful solutions
span the design of the data, distribu-
Detect faults Recover from faults
tion, and deployment architectures.
Health monitoring Mask faults The body of software architecture
knowledge must evolve to capture
this advanced design knowledge for
big data systems.
This article is a fi rst step on this
Data model supports Model for path. Our research is proceeding
Data replication to eliminate default in two complementary directions.
single point of failure data values
First, we’re expanding our collec-
Handle Handle database failures tion of architecture tactics and en-
Distribution
write through caching or
inconsistencies employing default responses coding them in an environment that
supports navigation between quality
Balance Data replicas
Deployment data reside in geographically attributes and tactics, making cross-
distribution distributed datacenters cutting concerns for design choices
explicit. Second, we’re linking tac-
tics to design solutions based on spe-
FIGURE 3. Availability tactics for big data systems. The tactics’ combined cific big data technologies, enabling
representation provides architects with comprehensive guidance to achieve highly architects to rapidly relate a particu-
available systems. lar technology’s capabilities to a spe-
cific set of tactics.
Scalability
Acknowledgments
This material is based on research
funded and supported by the US Depart-
Respond to increased load Respond to decreased load ment of Defense under contract FA8721-
05-C-0003 with Carnegie Mellon Uni-
versity for the operation of the Software
Automatically increase capacity Manually increase capacity Automatically release capacity
Engineering Institute, a federally funded
research and development center. Refer-
ences herein to any specifi c commercial
product, process, or service by trade
name, trademark, manufacturer, or oth-
erwise, doesn’t necessarily constitute or
Data distribution mechanism imply its endorsement, recommendation,
Data supports distributing data or favoring by Carnegie Mellon Univer-
over new nodes sity or its Software Engineering Insti-
tute. This material has been approved
Server resources are Server resources for public release and unlimited distri-
Distribution
provisioned on demand are released as bution. DM-0000810.
as load increases load decreases
Cluster management Cluster management
Deployment platform supports platform supports dynamic References
dynamic provisioning decommissioning of nodes
1. D. Agrawal, S. Das, and A. El Abbadi,
“Big Data and Cloud Computing: Current
State and Future Opportunities,” Proc.
14th Int’l Conf. Extending Database
FIGURE 4. Scalability tactics for big data systems. These tactics focus on the design Technology (EDBT/ICDT 11), 2011, pp.
concern of increased workloads. 530–533.
2. W. Vogels, “Amazon DynamoDB—a Fast
and Scalable NoSQL Database Service
ABOUT THE AUTHORS

Designed for Internet Scale Applications,”
blog, 18 Jan. 2012; www.allthingsdistri
buted.com/2012/01/amazon-dynamodb IAN GORTON is a senior member of the technical staff on the
.html. Carnegie Mellon Software Engineering Institute’s Architecture
3. F. Chang et al., “Bigtable: A Distributed Practices team, where he investigates issues related to software
Storage System for Structured Data,” ACM architecture at scale. This includes designing large-scale data
Trans. Computing Systems, vol. 26, no. 2, management and analytics systems and understanding the
2008, article 4. inherent connections and tensions between software, data, and
4. P.J. Sadalage and M. Fowler, NoSQL Dis- deployment architectures. Gorton received a PhD in computer
tilled, Addison-Wesley Professional, 2012. science from Sheffield Hallam University. He’s a senior member of
5. E. Brewer, “CAP Twelve Years Later: How the IEEE Computer Society. Contact him at igorton@sei.cmu.edu.
the “Rules” Have Changed,” Computer,
vol. 45, no. 2, 2012, pp. 23–29.
6. D.J. Abadi, “Consistency Tradeoffs in
Modern Distributed Database System
Design: CAP Is Only Part of the Story,” JOHN KLEIN is a senior member of the technical staff at the
Computer, vol. 45, no. 2, 2012, pp. 37–42. Carnegie Mellon Software Engineering Institute, where he does
7. J. Shute et al., “F1: A Distributed SQL Da- consulting and research on scalable software systems as a
tabase That Scales,” Proc. VLDB Endow- member of the Architecture Practices team. Klein received an
ment, vol. 6, no. 11, 2013, pp. 1068–1079.
ME in electrical engineering from Northeastern University. He’s
8. M. Fowler, “PolyglotPersistence,” blog, 16 the secretary of the International Federation for Information
Nov. 2011; www.martinfowler.com/bliki
/PolyglotPersistence.html.
Processing Working Group 2.10 on Software Architecture, a
member of the IEEE Computer Society, and a senior member of
9. M. Finnegan, “Boeing 787s to Create Half
a Terabyte of Data per Flight, Says Virgin ACM. Contact him at jklein@sei.cmu.edu.
Atlantic,” Computerworld UK, 6 Mar.
2013; www.computerworlduk.com/news
/infrastructure/3433595/boeing-787s-to
-create-half-a-terabyte-of-data-per-fl ight
-says-virgin-atlantic.
10. B. Kayyali, D. Knott, and S. Van Kuiken,
“The ‘Big Data’ Revolution in Healthcare:
Accelerating Value and Innovation,” Mc-
Kinsey & Co., 2013; www.mckinsey.com
/insights/health_systems_and_services/the
_big data_revolution_in_us_health_care.
11. X. Amatriain and J. Basilico, “System
Architectures for Personalization and Rec-
ommendation,” blog, 27 Mar. 2013; http://
techblog.netfl ix.com/2013/03/system
-architectures-for.html.
Take the CS Library

12. M. Armbrust, “A View of Cloud Comput-
ing,” Comm. ACM, vol. 53, no. 4, 2010,
pp. 50–58.
13. P.B. Kruchten, “The 4 + 1 View Model of
Architecture,” IEEE Software, vol. 12, no.
6, 1995, pp. 42–50.
wherever you go!
14. J. Gray and A. Reuter, Transaction IEEE Computer Society magazines and Transactions are now
Processing: Concepts and Techniques, available to subscribers in the portable ePub format.
Morgan Kaufmann, 1993.
15. L. Bass, P. Clements, and R. Kazman, Just download the articles from the IEEE Computer Society Digital
Software Architecture in Practice, 3rd ed., Library, and you can read them on any device that supports ePub. For more
Addison-Wesley, 2012. information, including a list of compatible devices, visit
www.computer.org/epub
Selected CS articles and columns

are also available for free at
http://ComputingNow.computer.org.

Distribution, Data, Deployment: Software Architecture Convergence in Big Data Systems

Transféré par

Informations du document

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Distribution, Data, Deployment: Software Architecture Convergence in Big Data Systems

Transféré par

Droits d'auteur :

Formats disponibles

FEATURE: BIG DATA SOFTWARE SYSTEMS

Distribution, Addressing the challenges of soft-

Convergence no longer be effectively considered

78 I E E E S O F T WA R E | PUBLI S HED BY THE IEEE COMPUTER SO CIE T Y 0 74 0 -74 5 9 / 15 / $ 3 1. 0 0 © 2 0 15 I E E E

Consider the consistency require-

ABOUT THE AUTHORS

Take the CS Library

Selected CS articles and columns

Vous aimerez peut-être aussi