Vous êtes sur la page 1sur 1

BIG DATA: FROM DATA TO DECISIONS QUEENSLAND UNIVERSITY OF TECHNOLOGY RO

To do Activity Progress

Your access to this course expires on 27 August. Upgrade for unlimited access.

2.13 8 MORE STEPS TO GO

Support

© QUT 2016

The big data ecosystem


The big data ecosystem includes many libraries and frameworks that
interoperate with each other.

To be a big data ninja you need to know many different technologies and
how they interoperate with each other to create platforms that target
your specific use cases. Platforms are built of libraries and frameworks.
Libraries usually provide solutions to specific problems; for instance,
applying neural-network methods on your data. Frameworks integrate
various libraries to provide even more functionality. Here are a few
examples:

Frameworks: Hadoop Ecosystem, Apache Spark, Apache Storm, Apache


Pig, Facebook Presto
Patterns: MapReduce, Actor Model, Data Pipeline

Platforms: Cloudera, Pivotal, Amazon Redshift, Hortonworks, IBM,


Google Compute Engine

The Hadoop ecosystem is complemented and surrounded by many


different tools. Some of them are covered in our series of courses, such
as:

Apache Mahout: a scalable machine learning and data mining library

Apache Pig: a high-level data-flow language and execution framework


for parallel computation

Apache Spark: a fast and general compute engine for Hadoop data.
Spark provides a simple and expressive programming model that
supports a wide range of applications, including Extract, Transform and
Load (ETL), machine learning, stream processing, and graph
computation.

Where to go for information


The company website is a good place to start when you need detailed
information about a particular tool. The websites we’ve listed below
provide user documentation and other sources of support. We’ve also
compiled a list of recommended books for you. These are books you’re
likely to find on a big data ninja’s bookshelf that you might like to borrow
from your library.

© QUT 2016. All rights reserved. CRICOS No. 00213J

 3 comments

Mark as
complete

DATA SCHEMA THE BASIC CONCEPT OF MAPREDUCE


ARTICLE ARTICLE

DOWNLOADS

¡ RECOMMENDED BOOKS PDF

SEE ALSO

APACHE HADOOP

Apache Hadoop is an open-source framework that enables the distributed storage and processing
of very large datasets.

APACHE MAHOUT

Apache Mahout is an environment for creating scalable machine-learning algorithms.

APACHE PIG

Apache Pig is a platform for analysing large datasets using MapReduce programs with Hadoop.

APACHE SPARK

Apache Spark is an open-source, cluster computing framework.

CLOUDERA

Observe what industry is doing with the Big Data frameworks and Hadoop ecosystem.

HADOOP HDFS AND MAPREDUCE

Brief overview of Hadoop HDFS and MapReduce for beginners and pointer to the ‘Dummies’ series
on Big Data.

Categories Courses Programs Degrees


Courses grouped by subjects Browse all individual online courses Master a specific subject in depth Full postgraduate degrees

FutureLearn’s purpose is to transform access to education.

About Learning with Need some Developing skills Course Small print
FutureLearn FutureLearn help? providers
Career advice Terms
Our story Using our platform FAQ Current and conditions
Workplace learning
partners Privacy policy
Our team An effective way Child safety Healthcare training
to learn Become Cookies
Our values Contact Learning a partner
Learning guide with students Code of conduct
Our learners
Certificates
Our blog
Shop
Jobs
Press

8 2 6 9

Vous aimerez peut-être aussi