Académique Documents
Professionnel Documents
Culture Documents
APPS OPERATIONS
WITH DC/OS
Table of Contents
Executive Summary.................................................................................................................................................. 3
A New Battlefront for Enterprises ............................................................................................................................ 3
The Modern Enterprise App ..................................................................................................................................... 5
Architectural Components of Modern Enterprise Apps.................................................................................. 5
Microservices ................................................................................................................................................... 6
Containerization .............................................................................................................................................. 6
Enterprise Big Data .......................................................................................................................................... 7
Open Source .................................................................................................................................................... 7
Modern App Engineering and Operations ............................................................................................................... 8
Engineering Challenges of the Modern Enterprise App .................................................................................. 8
Resource Management Challenges ................................................................................................................. 9
Hyperscale Computing vs. IaaS Operating Models......................................................................................... 9
The Datacenter Operating System Model.............................................................................................................. 10
Developer and Operator Experience ............................................................................................................. 10
Apache Mesos the Enabling Technology ...................................................................................................... 11
Platform Considerations for Modern Enterprise Apps .......................................................................................... 12
DC/OS Model Business Outcomes ......................................................................................................................... 13
Hyperscale Operations .................................................................................................................................. 13
Developer Agility ............................................................................................................................................ 14
Data Agility ..................................................................................................................................................... 14
Transitioning Towards the DC/OS Operating Model ............................................................................................. 15
Conclusions ............................................................................................................................................................ 16
About the Authors .................................................................................................................................................. 16
WHITE PAPER | 2
Executive Summary
As Marc Andreessen foresaw almost five years ago: Software is eating the world. Businesses of all types need
to develop and deploy new software services, quickly, to stay competitive. This turns out to be quite
challenging because enterprises must: (1) Quickly adopt entirely new processes for building and deploying
software, including such modern concepts as microservices, containers, and continuous integration and
deployment; (2) Ingest and store vast amounts of data, in real time, such as from machine sensors and
customer and business activities; and (3) Derive actionable insight from data, again in real time, in order to
save money, respond to market conditions more quickly, and deliver better products and services.
IT organizations must meet these challenges while addressing the traditional concerns of efficiency, security,
service quality, and operational flexibility. Early web companies like Google and Facebook were the first to
encounter these challenges and found the answer in hyperscale computing modern applications composed
of distributed microservices with big-data built-in often running on commodity hardware. For mainstream
enterprises, building and operating modern apps can be a significant challenge. The Datacenter Operating
System (DC/OS) applies best practices established by early web companies, and is powered by the
production-proven Apache Mesos distributed systems kernel. With DC/OS, modern apps are now practical for
mainstream enterprises.
About a decade ago, web companies like Google, Facebook and Netflix addressed the challenge of
serving millions of users in real time and processing unprecedented volumes of data. This started a
quiet revolution in the datacenter, and launched the start of the mobile-cloud era dramatically
raising user expectations, creating new businesses, and placing competitive pressures on industries like retail,
manufacturing, healthcare, and financials, among others.
Today, mainstream enterprises are looking for ways to better engage customers, improve operational
decision-making, and capture new value streams. Doing this requires the enterprise to develop two related
capabilities. The first is developer agility rolling out new services or product enhancements quickly, and
reducing time to value of new services. The second is data agility gaining actionable insights from the large
volume of data enterprises are collecting.
The idea that enterprises need to get faster and smarter is not controversial. The real battlefront, and the
question business leaders need to answer, is how to find the right technologies, operating models and talent
that can collectively enable developer and data agility.
WHITE PAPER | 3
Developer agility isnt about getting developers to code faster. Its about getting code to production
faster. Rolling out new services quickly means enterprises need to change how they build and
operate software. No longer can enterprises afford to build legacy-style, monolithic applications
that run in virtual machines (VMs), Instead, the modern enterprise needs to adopt new ways of doing things
(e.g., microservices in containers) and select a technology stack that eliminates the latency from concept to
development to production, while managing risk.
In todays mobile-cloud era, software is evolving towards distributed systems and microservices (sometimes
called cloud native). This approach provides service scalability and maintainability, and increases the speed
that software can be deployed, especially when used in conjunction with continuous integration and
continuous delivery models. Many think of cloud native apps as inherently stateless. However, modern
enterprise apps cannot be entirely stateless since they need to rely on data either to support a business
function, or to engage customers in a meaningful way. And its not just about data for stateful apps. The way
enterprises are using big data is shifting.
There have been two major waves of big data in the enterprise. The first enterprise big data efforts
essentially built large data warehouses (and tools) for use by analysts or data scientists. In the
current second wave, successful enterprises are building business applications powered by big data,
capturing the value of actionable insights in real time.
First wave big data efforts had mixed results. Gartners 2015 Hadoop Adoption Survey1 cited 49 percent of
respondents were struggling to figure out how to get value from Hadoop. There are three main reasons.
First, the technology stack these enterprises have sought to build is just too complex and inflexible. Second,
most mainstream enterprises do not have the engineering expertise of companies like Google. Third, big data
initiatives were often ill-defined2 projects akin to give us your data, and we will find new patterns and insights
to drive your business. The result has been heavy investment yielding some insights, with little impact to the
business.
Successful enterprises leading the second wave of big data are shifting from building systems of record (e.g.,
what were last quarter's sales of a product in a particular segment) to systems of real-time insight and
prediction (e.g., what is an individual customer likely to buy, and what types of engagement can best
influence their behavior?). These businesses are deriving true value out of their big data efforts and for those
still stuck in the first wave, competitive pressures are mounting.
The battlefront for business leaders today is finding the right technology, operating model, and
talent to deliver new services quickly and leverage insights from stateful big data services. The
technology should also provide flexibility and not be locked into a particular vendor, or cloud.
Because the answer lies in distributed application architectures, the virtual machine is the wrong abstraction
for addressing these challenges a new approach to building apps is needed. For the small number of
successful companies that have been successful, a clear pattern has emerged: the modern enterprise
application architecture.
Microservices Enable teams to work self-sufficiently, and deliver and iterate quickly
Containers Simply packaging of code (often stateless)
Big Data Process information and retain the states of modern apps
Open Source Leverage the work of others in building the application
WHITE PAPER | 5
Microservices
The basic principle of microservices architecture is to decompose the application into a set of collaborating
services, rolled out in the smallest deployable units. Each microservice implements a set of narrowly defined
functions. Most commonly, these microservices interface with each other using the REST protocol. The
benefits of microservices include the ability to have many teams working on different components in parallel,
building and deploying independently, scaling only portions of an application that need the additional
capacity, and being able to update/upgrade portions of the application without impacting users, as long as
API contracts between microservices are maintained.
The real power of microservices architecture is in enabling small cross-functional teams to build and deploy
functions independently using continuous integration and continuous deployment pipelines. These
techniques enable enterprises to create and sustain autonomous and innovative teams that can build highlyscalable applications easily and new functionality quickly.
Containerization
When applications are engineered as independently-deployed microservices, those microservices need a
nimble infrastructure on which to run. For modern enterprise apps, containerization is the simplest and best
way to run microservices, better than virtual machines because they are faster to spin up (and kill) and allow
greater workload density. Containerization consists of two main components: the container image and the
container runtime.
WHITE PAPER | 6
Container images are an elegant way for developers to package their apps, provide their code access to all the
libraries they need, and give their code the illusion of the entire machines filesystem, without actually
including the operating system, unlike a virtual machine. Docker has become the most commonly used
container image.
The container runtime is responsible for executing whats defined in a container image and creating a running
process from the image. Several container runtimes exist, and all support the Docker container image format.
Below is an overview of container runtimes and the container images they support.
Container Runtime
DCOS w/Mesos
Docker
Docker
Rocket
Enterprise big data is challenging because the relevant components (data streaming, batch analytics,
databases, machine learning) are each themselves complex distributed systems that are difficult to set up,
maintain and operate in a 24x7 environment. For enterprises with multiple teams or business units, this
means multiple deployments for each distributed system to cover different versions of the technology being
used, and different phases of the software development lifecycle significantly driving up costs for
infrastructure, engineering talent and time.
Open Source
Enterprises increasingly recognize that to be competitive they cannot afford to start from the ground floor
when building new systems. Many of the modern enterprise application components described above each
have their own ecosystem of open source projects and companies behind those technologies. Enterprises
successful using these technologies together dramatically accelerate their pace in developing new services.
Modern Enterprise App Components
Container
Databases
Analytics
Message Queues
File System
NFS, HDFS
WHITE PAPER | 7
Deployment complexity While monolithic apps can be launched and monitored by a single
administrator, there simply arent enough admins to launch and monitor all of the decomposed
services needed by a modern app.
Partitioning A large number of microservices means many partitions of machines within the
infrastructureone for each serviceand tracking which partitions are being used by which services.
Or, it requires a new architecture that does not need partitions.
Extremely low utilization Infrastructure is typically configured for peak expected demand for each
service, wasting hardware, power and cooling, and hardware administration time.
Maintaining high availability There is a severe loss of capacity when services in static partitions
fail, and a labor-intensive process of knowing which services need to be restarted or recovered.
Configuration and snowflaking Many services used by the application are distributed systems, and
specific versions of each services may be used. The result is partitions that can only be used with one
application.
WHITE PAPER | 8
Besides being slower than containers, traditional VM-based infrastructure clouds cant meet the modern
enterprise app operations requirements for two main reasons. First is that running microservices in VMs
means proliferations of VMs, overwhelming administrators that need to manage them even with the use of
configuration automation tools. Second, is that VMs effectively partition servers, but whats needed in this
new era is aggregation of the datacenter to run multiple distributed systems each with their own logic for
resourcing and scheduling.
For Twitter, using modern enterprise apps, powered by Apache Mesos the core of datacenter operating
system technology enabled the company to manage the unprecedented scale of users, while also improving
manageability, rolling out new services quickly, and running at higher utilization.
WHITE PAPER | 9
A user running an app on her PC does not care about which CPU core the application will run on, nor does she
care whether the app is taking up one or two cores. The PCs operating systems manages this for her. The
Datacenter Operating System (DC/OS) model applies this basic principle to the full logical datacenter. There
are two key differences. First, the form factor that the user engages with is the complete logical datacenter (as
opposed to a PC). Secondly, apps running on DC/OS are not monolithic apps running on a single PC they are
Modern Enterprise Apps made of distributed systems of stateless and stateful services.
WHITE PAPER | 10
Developers code against the logical datacenter, spending more time on application code and spending less
time on datacenter plumbing. The datacenter developer builds distributed systems that can dynamically
leverage all the resources available in the datacenter, with DC/OS handling actions like task management,
deployment, resource allocation, isolation and quality of service.
Operators in a DC/OS model run the logical datacenter using policies, and do not spend time managing
individual machines (physical or virtual), dramatically reducing time and effort. A key benefit for operators is
the ability to run at high levels of utilization, even as demand from multiple distributed services changes over
time.
But the largest benefit of the DC/OS model is perhaps reducing the technical skills hurdle for running modern
apps. With DC/OS, complex distributed systems like Spark, Kafka, Cassandra and many other services become
dramatically easier to install and operate. Single commands in a DC/OS UI can launch datacenter-wide
services, scale those services and maintain those services. DC/OS effectively applies prescriptive best
practices in running these services, based on the production operational experience of others.
WHITE PAPER | 11
Traditional Apps
Modern Apps
Operating
Model
Stateful, Non-distributed
Applications
Microservices
DC/OS
Yes
Yes
Yes
Yes
CaaS
Some static
partitioning only
Yes
Some static
partitioning,
manual failure
management
Some
not elastic
IaaS
Yes
No
No
No
PaaS
No
Yes
(stateless only)
No
No
WHITE PAPER | 12
Data Agility Enabling applications and services that can capture value from ubiquitous data
Hyperscale Operations
Every company is a software company, and enterprises with the DC/OS model get hyperscale
infrastructure adaptable to new technologies, without requiring an engineering department with
the size or expertise of Google. With DC/OS, enterprises can rely on an open production-proven
platform, run a mix of traditional and modern apps, provide administrators the greatest flexibility and
empowerment, and use a platform that is at its core, future-ready.
The distributed systems kernel at the heart of DC/OS (Apache Mesos) has been production-proven for over five
years in datacenters of web companies running at massive scale (in order of tens of thousands of nodes).
DC/OS provides a highly available architecture with no single point of failure. DC/OSs native container
orchestrator (Marathon) has similarly been proven in production.
DC/OS enables a single flexible infrastructure for the full range of enterprise applications, from traditional to
modern enterprise apps. Two properties make this possible first, DC/OS only requires apps run on any
modern Linux distribution (with Windows support in the near future), and has an extensible two-level
scheduler design. Two-level scheduling essentially allows stateful and distributed services to apply their own
business logic for needed infrastructure capacity, while still protecting the quality of service for other
workloads. This approach includes todays distributed systems (e.g., Spark, Kafka, Cassandra) as well as those
yet to come, as new technologies can be adopted to DC/OS services.
A key benefit with the DC/OS model is running workloads with very different latency needs. For example,
DC/OS can run workloads with tights SLAs (e.g., real-time transaction processing) along with latency-tolerant
or less time-sensitive workloads (e.g., transcoding streaming media, batch analytics), driving extremely high
utilization while meeting workload service level requirements.
The most important benefit for DC/OS is empowering operators and administrators responsible for running
the infrastructure. First, as an open source project backed by dozens of leading companies using DC/OS in
production, enterprises have the flexibility of running their infrastructure on open source software, and the
confidence of knowing DC/OS applies the lessons learned from production experience. Second, DC/OS
enables operators to perform datacenter-wide actions with single commands (as opposed to managing
individual hosts, virtual machines, or containers). Third, DC/OS automates many of the responsibilities of IT
operators, including monitoring, application health checks, auto-recover in case of failure, and applying
dynamic scaling policies. Most importantly, DC/OS supports non-disruptive upgrades to the infrastructure.
Lastly, DC/OS enables a single operating experience from on premises infrastructure to cloud-based
services, the operating experience is identical.
WHITE PAPER | 13
Developer Agility
Businesses using the DC/OS model dramatically improve time to value of new services in two ways.
First, by enabling the teams to roll out new services quickly and continuously refine these services
across all relevant functional teams sometimes described as DevOps. Second, by enabling
teams to experiment with new technologies at will, so they can find the right services (e.g., a new open-source
message queue or analytics engine) to power their application.
DevOps is an operational and cultural model where teams work cross-functionally to ensure as new services
are developed they are also operational in a predictable and maintainable way. Key enablers for this model
are automation tools for continuous integration and continuous delivery (CI/CD):
A common challenge for enterprises is scaling out CI/CD platforms and ensuring enough capacity to build the
code for testing prior to release. In many instances, different development teams have their own CI/CD cluster
because these platforms are often difficult to scale out to other teams, or teams use different versions of
toolchain components. One team may be running out of capacity to build code while other teams clusters sit
idle. Under a DC/OS model, CI/CD platforms can be easily scaled with simple datacenter-scale commands, and
DC/OSs elastic scheduling capability enable dev teams to share resources through build-bursting.
Enterprises recognize using open source technologies is key to developing new services quickly. A developer
may hear of a new open source technology at a meetup and decide to try it. She might spend several weeks
trying to understand the technology, find the right folks to set it up, research best practice implementations,
and configure it given she can find an environment taking several weeks before she uses the technology
with her app. In a DC/OS model, popular open source services are part of the DC/OS Universe, or ecosystem.
Developers can use a service by installing it with a simple command against the datacenter, while isolation,
security and access controls ensure production systems are not impacted.
Data Agility
For enterprises, the difference between success and failure in the mobile cloud era is the ability to
capture value from actionable insights all enterprises are already collecting vast amounts of data.
Using DC/OS, big data becomes significantly easier to use and integrate with applications, the
infrastructure is more performant in processing big data, and developers have the flexibility to adapt the next
generation of big data services.
Modern enterprise apps need to ingest, analyze, and store data, and present insights to users or trigger
actions. Data scientists may also use some of the same services in analyzing data. In the traditional approach,
data engineers or developers might research a set of open source technologies and pull together relevant
experts, and arrive at a non-production configuration useful for R&D several months later leaving
production readiness for the next phase. DC/OS enables data engineers and developers to install and run
these services with simple commands with no engineering work, as DC/OS services are built with best
practices. When its time to build modern enterprise apps with these services, and later pass them to
operations all can run on the same DC/OS platform, which runs both stateless and stateful services.
DC/OS infrastructure is more performant running big data due to elastic and fine-grained resource sharing.
Successful enterprises have been able to run their datacenters at over 95% utilization, prioritizing latencysensitive production workloads, and running batch analytics with remaining capacity.
Lastly, the DC/OS services model enables enterprises to easily deploy and use new distributed stateful
services. Apache Spark began as a project built on Mesos to demonstrate Mesos power. Other big data
services like Apache Flink and Apache Storm can also be DC/OS services and benefit from resource pooling
and applying best practices implementations.
Copyright Mesosphere, Inc. 2016
WHITE PAPER | 14
Mode 1 Goal: Reliability Best price for performance, with plan-based governance, waterfall
delivery model, built using enterprise suppliers, resourced with talent good for conventional
processes and projects, with a culture that is IT-centric and removed from the customer
Mode 2 Goal: Agility Focused on revenue, brand, and customer experience, governance that is
empirical and continuous, agile delivery model, built using small (innovative) new vendors,
resources with talent good for new and uncertain projects, with a culture that is business-centric and
close to the customer
While the modern enterprise app delivers agility and facilitates capturing business value, it is very likely that it
will also become the model for Mode 1 IT applications over time. Enterprises rolling out DC/OS today fall
into a range of adoption patterns.
The first is a greenfield model where the enterprise is rolling out customer-facing services based on the
modern enterprise app, and recognizes the need for a next generation infrastructure and operating model.
Here, modern enterprise apps are being developed and operated entirely on DC/OS environments.
In a second model, the enterprise is already running silos of big data and microservices environments, and
rolls out DC/OS in phases. First to transition to DC/OS are often the stateless microservices-based apps,
followed by analytics, followed by more complex stateful services. This enables engineering champions to
demonstrate the impact of DC/OS as they syndicate with additional business units or application teams.
The third and most conservative approach is one where workloads are transitioned to DC/OS, but running as
statically partitioned to start. The rationale here is to give comfort to stakeholders who want to know where
their workloads are running. The immediate benefit is a more scalable infrastructure, with the long term goal
of enabling elastic scaling when the organization is ready.
Gartner - http://www.gartner.com/it-glossary/bimodal/
McKinsey Business Technology. December 2014 - http://www.mckinsey.com/business-functions/businesstechnology/our-insights/a-two-speed-it-architecture-for-the-digital-enterprise
Copyright Mesosphere, Inc. 2016
WHITE PAPER | 15
3
4
Conclusions
The modern enterprise application, composed of microservices, containers, and stateful big data services, is
key for enterprises to capture new value chains in the mobile-cloud era. The Datacenter Operating System
(DC/OS) model is unique from IaaS, PaaS, or CaaS in that only DC/OS is fully capable of running all the
components of the modern enterprise app.
DC/OS is based on technologies that have been proven in production at scale, and engineered based on
established distributed systems production best practices. Mainstream enterprises using DC/OS gain from
these built-in best practices, and have the flexibility of using an open-source platform that gives their modern
enterprise app complete flexibility on where they run - on premises or in the cloud.
For enterprises moving towards microservices, containers, stateful services and open source software, DC/OS
reduces the skills and effort required to be successful. The result is broader adoption of these technologies
and faster capture of the impact these technologies deliver.
WHITE PAPER | 16