Académique Documents
Professionnel Documents
Culture Documents
This eBook gives an overview of what big data The functional section of this book discusses Canonical is involved in Big Data
is and its growing importance. It talks about applications, tools, managed services and
Canonical, the company behind Ubuntu,
some of the different kinds of big data, as well clouds, used together or separately, that
works closely with its partners on all aspects
as some of the different things you would will help you benefit most from big data.
infrastructure and partner solutions to
do with it.
You can skip directly to any section and focus support storing, managing, and analysing
on what’s most important to you, or read the big data.
book straight through.
Tweet this
3
Big Data Explained, Analysed, Solved
The Author
Bill Bauman
Strategy & Content, Canonical
Tweet this
4
Big Data Explained, Analysed, Solved
Contents
Tweet this
5
Big Data Explained, Analysed, Solved
Predefined datasets
datasets represent.
Tweet this
Rapid growth
7
Big Data Explained, Analysed, Solved
The increasing
importance of Big Data
Collect Analyse Act
Organisations of all sizes and functions are The analysis of big data can have big returns. The ability to do something with the data
increasingly gathering more information The ability to understand the types of data that is collected and analysed is the most
about their interactions and transactions. that are collected, to correlate one type of compelling part of big data. Corporations
They are also looking to third parties to data with another, observe trends, identify can offer more compelling products and
provide additional data. Regardless of how outliers, and many other analytic functions, solutions. Governments can better predict
they gather data and the types and quantities are increasingly valuable in organisations and serve the needs of citizens. Even small
are increasing. In a modern, data-driven world, of all types. business can identify short and long term
an organisation that isn’t taking advantage trends in their sales and interactions with
Without thorough analysis via the use of
of big data collection, analytics, and action, customers, as well as other businesses.
modern, big data analytics tools, it can be
is likely going to become uncompetitive with All of these outcomes are about improved
easy to miss or overlook important trends,
those that are. efficiencies and experiences for everyone
shifts in perspective, or subtle changes
involved, from the provider to the consumer.
in customer interaction. Through analysis,
you can learn patterns and predict actions
before they occur and even begin to direct
them via actions discussed here in the
Act section.
Tweet this
8
Big Data Explained, Analysed, Solved
Tweet this
9
Big Data Explained, Analysed, Solved
Big data is most commonly associated with Every day, millions of Internet users post This is generally the data that is generated
unstructured data. Unstructured data, like pictures, videos, short messages, audio, without specific intent or interaction from
photos and IoT datasets, were largely the and more. Much of this data is completely users. For example, cell phones are perpetually
genesis of modern big data. unassociated with a category or field. updating GPS coordinates of their users’
Essentially, it is completely unstructured respective locations. Logistics information,
and it is the function of targeted big data bar code scans, delivery information, are
applications to aggregate, cull, present, all data that are passively updated but can
and analyze these datasets. provide valuable insights when analyzed.
Tweet this
10
Big Data Explained, Analysed, Solved
Tweet this
11
Big Data Explained, Analysed, Solved
Even though big data was born in the cloud, For more information on Juju, see section For more on building your own private cloud,
it doesn’t mean you need a cloud to take Design, deploy, package Big Data solutions. see the sections OpenStack is a Big Data
advantage of big data solutions or to act warehouse and BootStack for Big Data later
Although it isn’t necessary, a cloud can
on the data. The most important aspects in this eBook.
be tremendously beneficial to big data
of working with big data are that you
processing. The nature of big data is that it is
have chosen the right tools and the right
constantly changing, and the purpose of that
applications for your solution. Canonical
data, the analysis of that data, and the storage
can help you with both.
of that data can change just as quickly. A tool
Canonical has created an open source solution like Juju can help you keep up with the change
for system design and service modeling called in usage by deploying new big data charmed
Juju. Juju simplifies the process of designing solutions. But Juju can’t do it all.
your solution, then configuring, associating,
For system scalability and the ability to easily
and deploying the applications in it. Having
access different types of storage for different
a tool like Juju means that selecting the right
needs, a cloud is recommended. Juju can
big data applications for your needs is the
talk directly to both public and private cloud
most important remaining factor.
solutions, like AWS and Canonical OpenStack,
respectively.
Tweet this
12
Big Data Explained, Analysed, Solved
Tweet this
13
Big Data Explained, Analysed, Solved
Tweet this
14
Big Data Explained, Analysed, Solved
• Pig • Docker
• Flume • Kubernetes
• Kafka • Mesos
• Tez
• Storm
• Hue
Tweet this
16
Big Data Explained, Analysed, Solved
Ubuntu Server is the most popular cloud Ubuntu Server can be used as a traditional Ubuntu allows you to process your big
operating system in use. There are many operating system. There are also optimised data anywhere. Keep sensitive information
reasons why Ubuntu is so popular, but one variants for low latency and other task-specific in-house, leverage the public cloud for
of the primary reasons is that Canonical solutions, like big data processing. unpredictable workloads, and trusted private
started to focus on OS scalability many years cloud partners for both.
Where Ubuntu runs:
ago. When you’re working with big data, you
need a cloud-ready platform, like Ubuntu, • On-premise, in your own cloud
that is designed for scalability and reliability.
• In an external, private cloud
Tweet this
17
Big Data Explained, Analysed, Solved
Private cloud guest instance Container on bare metal Container as a virtual machine Container as a cloud instance
Tweet this
18
Big Data Explained, Analysed, Solved
The section Do I need a cloud for Big Data Autopilot is designed to work with an The base platform of Canonical OpenStack
in this book addresses some of the benefits extended tool set beyond just OpenStack. is Ubuntu. Ubuntu is not only the most popular
of clouds for big data. Specifically, an OpenStack cloud operating system, it is also the most
MaaS, Metal as a Service, automates the
cloud is the most popular private cloud solution popular OpenStack infrastructure operating
configuration of the physical nodes in your
for big data. system. Ubuntu runs on the OpenStack
OpenStack environment. Juju, discussed
physical nodes, providing critical services
OpenStack is a community-based private further in the Design, deploy, package Big
like compute, networking, and storage.
cloud solution. It is not a single product, but Data solutions section of this eBook, allows
It is also the platform for your guest instances,
a collection of individual projects designed you to automatically deploy applications
whether they are LXD machine containers
to seamlessly interact to create a functional and their respective relationships within your
or virtual machines, where you run your big
cloud. Canonical OpenStack is a production- OpenStack cloud. Landscape manages the
data applications.
ready, supported OpenStack distribution, Autopilot experience, as well as the cloud
and more. itself, and the guest instances within it. Combining OpenStack with Canonical’s
feature-rich tools and Ubuntu creates a
The best way to build an OpenStack cloud The comprehensive tool set that comes
scalable, reliable, automated platform for
is using Autopilot. Autopilot is a graphical with a Canonical OpenStack cloud makes it
deploying and managing big data solutions
installation tool that allows you to select easier, faster, and more robust to deploy big
for any type of analytics, monitoring, and
the components of OpenStack you would like data solutions - from the bare metal, to the
more. Canonical even guarantees upgrade
to install and deploys them for you. It can even platform operating system to the
ability of your OpenStack Big Data cloud.
deploy them with high availability. applications themselves.
Tweet this
19
Big Data Explained, Analysed, Solved
BootStack is a unique, managed Canonical All of the tools that make Canonical Whether you just want to try it out, don’t
OpenStack offering. It is unique in that you OpenStack the platform of choice for big have the in-house skills, or want to get up
may choose to run the solution in your own data are included in BootStack. Even better, and running quickly, BootStack can provide
datacenter, on your own hardware, or in a 3rd- they can be preconfigured for you and ready the answer to a big data cloud. To learn more
party hosted facility, like IBM SoftLayer, for use. As soon as your BootStack cloud is about BootStack, and use the BootStack
an Ubuntu Certified Public Cloud partner. ready, you can start using all the big data calculator to calculate potential savings,
solutions in the Juju Charm Store. You’ll find visit the BootStack managed cloud page.
Canonical’s engineers have years of OpenStack
the core big data solutions you expect and can
experience. With BootStack, you can leverage
even start discovering new big data solutions
their knowledge of how-to and best practices
from all our Charm partners.
and have a Canonical OpenStack cloud ready
for big data processing in days. BootStack is billed on a pay for use model.
The model is similar to that of Ubuntu
With BootStack, you focus on the data, and
Advantage Storage. These unique and
Canonical takes care of the infrastructure.
innovative price models are part of the
Additionally, when you want, Canonical can
initiative to make private cloud usage and
transfer total control of your OpenStack
consumption as easy to calculate and predict
environment to you.
as that of public clouds.
Tweet this
20
Big Data Explained, Analysed, Solved
Ubuntu Advantage Storage is a unique and Pay for what you use
ideal storage solution for big data storage
Unused Capacity
Another unique feature of Ubuntu Advantage
Unused Capacity
Your
and real-time processing. It is based on
Storage is its pay for use, metered model. As Content
New
Software Defined Storage (SDS) solutions, Storage
opposed to paying for all the storage in your
allowing for flexibility and modern data
datacenter, you just pay for the storage that’s
Redundant Data
management approaches.
Unused Capacity
Unused Capacity
actively in use. Additionally, you don’t pay
Total Capacity
Total Capacity
for replicas or online backups. The cost savings
Choose the right technology
Total Capacity
compared to other SDS-based and managed
Ceph, NexentaEdge, Swift and SwiftStack are storage solutions can be 2x to 3x,
all supported by Ubuntu Advantage Storage. or even more. What What What
Used
Used
Used
you you you
That means, you choose the right technology pay for pay for pay for
The pay for use model of Ubuntu Advantage
for your solution, and it is all directly supported
Storage is similar to that of our managed
by Canonical. The hardware you choose to Grow your capacity, Increase your redundancy,
OpenStack solution, BootStack. These unique without growing your bill pay the same!
run the solution on is just as important, and
and innovative price models are part of the
Canonical’s partners and engineers can help
initiative to make private cloud usage and
you with that, as well.
consumption as easy to calculate and predict
as that of public clouds.
Tweet this
21
Big Data Explained, Analysed, Solved
Machine containers are a relatively new LXD isn’t just about performance. There are
technology in the virtualisation ecosystem. big data workloads that run in public clouds
Delivered by Ubuntu as a technology called as guest instances. Almost all of those instances
LXD, they provide the management of are virtual machines. One of the benefits
traditional virtual machines without the of LXD machine containers is that it provides
system overhead. process isolation and application mobility
(live migration) to running processes. That
Many big data solutions execute optimally
means increased manageability for public
when run at bare metal speed. That can limit
cloud instances, as well as bare metal and
the use of virtualisation, though, and restrict
private cloud solutions.
system placement. By using LXD, multiple
services can share a single system and all have
direct hardware access.
Multiple services can share a single system and all have direct
hardware access
Tweet this
22
Big Data Explained, Analysed, Solved
Canonical as a strategic
partner for Big Data
Working with Canonical as your valued partner will maximise your success with big data.
Some attributes to keep in mind and that Canonical delivers are:
Tweet this
23
Big Data Explained, Analysed, Solved
Conclusion
There are many kinds of big data. Your data is important. You need to know how If you’re excited to hear more and talk
to store, process, and act on your data. The to us directly, you can reach us on our
There are many big data applications, services,
overview, explanations, and solutions outlined Contact Us page.
and solutions.
in this book will get you started or accelerate
To learn more about a managed solution for
Canonical has domain expertise, understands your journey to maximising the benefits of the
big data, download the paper BootStack Your
big data, has strong industry partnerships, and data you have and the new data you will start
Big Data Cloud.
can provide a scalable, supported solution. collecting.
If you want to start trying things out
Your best next step is to contact
immediately, we highly encourage you
Canonical today.
to visit Juju solutions for big data.
Tweet this
24
Big Data Explained, Analysed, Solved
About Canonical
Tweet this