Académique Documents
Professionnel Documents
Culture Documents
1612010
1) List out any 20 modern data analytic tools and explain any three out of that
with its application
MODERN DATA ANALYTIC TOOLS:
They are tools which bring cost efficiency, better time management into the data
analytical tasks.
LIST OF 20 TOOLS:
The tools namely,
● Cassandra.
● Hadoop.
● Plotly.
● Bokeh.
● Neo4j.
● Cloudera.
● OpenRefine.
● Storm
● KNIME
● OpenText
● Orange
● RapidMiner
● Pentaho
● Weka
● Gephi
● Datawrapper
● Infogram
● Semantria
● Opinion Crawl
● Octoparse
1)HPCC:
Anitha R
1612010
2)Storm:
Storm is a free and open source big data computation system. It offers distributed
real-time, fault-tolerant processing system. With real-time computation
capabilities.
Features:
● It benchmarked as processing one million 100 byte messages per second per
node
● It uses parallel calculations that run across a cluster of machines
● It will automatically restart in case a node dies. The worker will be restarted
on another node
● Storm guarantees that each unit of data will be processed at least once or
exactly once
● Once deployed Storm is surely easiest tool for Bigdata analysis.
Anitha R
1612010
3)Cassandra:
Pentaho provides big data tools to extract, prepare and blend data. It offers
visualizations and analytics that change the way to run any business. This Big data
tool allows turning big data into big insights.
Features:
● Data access and integration for effective data visualization
● It empowers users to architect big data at the source and stream them for
accurate analytics
● Seamlessly switch or combine data processing with in-cluster execution to
get maximum processing
● Allow checking data with easy access to analytics, including charts,
visualizations, and reporting
Anitha R
1612010
Cloudera is the fastest, easiest and highly secure modern big data platform. It
allows anyone to get any data across any environment within single, scalable
platform.
Features:
● High-performance analytics
● It offers provision for multi-cloud
● Deploy and manage Cloudera Enterprise across AWS, Microsoft Azure and
Google Cloud Platform
● Spin up and terminate clusters, and only pay for what is needed when need it
● Developing and training data models
● Reporting, exploring, and self-servicing business intelligence
● Delivering real-time insights for monitoring and detection.
form for querying. Queries from users are submitted to OLAP engines for
execution.
Massively Parallel Processing (MPP)
Massive Parallel Processing is the shared nothing‖ approach of parallel computing.
It is a type of computing wherein the process is being done by many CPUs
working in parallel to execute a single program.
The Cloud Computing
Cloud computing is the delivery of computing services over the Internet. Cloud
services allow individuals and businesses to use software and hardware that are
managed by third parties at remote locations. Examples of cloud services include
online file storage, social networking sites, webmail, and online business
applications.
Grid Computing
Grid computing is a form of distributed computing whereby a super and virtual
computer is composed of a cluster of networked, loosely coupled computers, acting
in concert to perform very large tasks.
Hadoop Apache
Hadoop is an open-source software framework for storage and large-scale
processing of data-sets on clusters of commodity hardware. Two main building
blocks inside this runtime environment are MapReduce and Hadoop Distributed
File System (HDFS). Hadoop MapReduce is a software framework for easily
writing applications which process vast amounts of data .