Vous êtes sur la page 1sur 5

Anitha R

1612010

1) List out any 20 modern data analytic tools and explain any three out of that
with its application
MODERN DATA ANALYTIC TOOLS:

They are tools which bring cost efficiency, better time management into the data
analytical tasks.

LIST OF 20 TOOLS:
The tools namely,
● Cassandra.
● Hadoop.
● Plotly.
● Bokeh.
● Neo4j.
● Cloudera.
● OpenRefine.
● Storm
● KNIME
● OpenText
● Orange
● RapidMiner
● Pentaho
● Weka
● Gephi
● Datawrapper
● Infogram
● Semantria
● Opinion Crawl
● Octoparse

FEATURES OF SOME TOOLS

1)HPCC:
Anitha R
1612010

HPCC is a big data tool developed by LexisNexis Risk Solution. It delivers on a


single platform, a single architecture and a single programming language for data
processing.
Features:
● Highly efficient accomplish big data tasks with far less code.
● Offers high redundancy and availability
● It can be used both for complex data processing on a Thor cluster
● Graphical IDE for simplifies development, testing and debugging
● It automatically optimizes code for parallel processing
● Provide enhance scalability and performance
● ECL code compiles into optimized C++, and it can also extend using C++
libraries

2)Storm:

Storm is a free and open source big data computation system. It offers distributed
real-time, fault-tolerant processing system. With real-time computation
capabilities.
Features:
● It benchmarked as processing one million 100 byte messages per second per
node
● It uses parallel calculations that run across a cluster of machines
● It will automatically restart in case a node dies. The worker will be restarted
on another node
● Storm guarantees that each unit of data will be processed at least once or
exactly once
● Once deployed Storm is surely easiest tool for Bigdata analysis.
Anitha R
1612010

3)Cassandra:

The Apache Cassandra database is widely used today to provide an effective


management of large amounts of data.
Features:
● Support for replicating across multiple data centers by providing lower
latency for users
● Data is automatically replicated to multiple nodes for fault-tolerance
● It is most suitable for applications that can't afford to lose data, even when
an entire data center is down
● Cassandra offers support contracts and services are available from third
parties
4)Pentaho:

Pentaho provides big data tools to extract, prepare and blend data. It offers
visualizations and analytics that change the way to run any business. This Big data
tool allows turning big data into big insights.
Features:
● Data access and integration for effective data visualization
● It empowers users to architect big data at the source and stream them for
accurate analytics
● Seamlessly switch or combine data processing with in-cluster execution to
get maximum processing
● Allow checking data with easy access to analytics, including charts,
visualizations, and reporting
Anitha R
1612010

● Supports wide spectrum of big data sources by offering unique capabilities.


5)Cloudera:

Cloudera is the fastest, easiest and highly secure modern big data platform. It
allows anyone to get any data across any environment within single, scalable
platform.
Features:
● High-performance analytics
● It offers provision for multi-cloud
● Deploy and manage Cloudera Enterprise across AWS, Microsoft Azure and
Google Cloud Platform
● Spin up and terminate clusters, and only pay for what is needed when need it
● Developing and training data models
● Reporting, exploring, and self-servicing business intelligence
● Delivering real-time insights for monitoring and detection.

2)Explain about the evolution of the analytic scalability.


The amount of data organizations process continues to increase. More data cross
the internet every second than were stored in the entire internet just 20 years ago.
The old methods for handling such huge data won‘t work anymore. Hence we need
technologies that might handle and tame the big data.

Traditional Analytic Architecture


Traditional analytics collects data from heterogeneous data sources and we had to
pull all data together into a separate analytics environment to do analysis which
can be an analytical server or a personal computer with more computing capability.
Modern In-Database Architecture
Data from heterogeneous sources are collected, transformed and loaded into data
warehouse for final analysis by decision makers. The processing stays in the
database where the data has been consolidated. The data is presented in aggregated
Anitha R
1612010

form for querying. Queries from users are submitted to OLAP engines for
execution.
Massively Parallel Processing (MPP)
Massive Parallel Processing is the shared nothing‖ approach of parallel computing.
It is a type of computing wherein the process is being done by many CPUs
working in parallel to execute a single program.
The Cloud Computing
Cloud computing is the delivery of computing services over the Internet. Cloud
services allow individuals and businesses to use software and hardware that are
managed by third parties at remote locations. Examples of cloud services include
online file storage, social networking sites, webmail, and online business
applications.
Grid Computing
Grid computing is a form of distributed computing whereby a super and virtual
computer is composed of a cluster of networked, loosely coupled computers, acting
in concert to perform very large tasks.
Hadoop Apache
Hadoop is an open-source software framework for storage and large-scale
processing of data-sets on clusters of commodity hardware. Two main building
blocks inside this runtime environment are MapReduce and Hadoop Distributed
File System (HDFS). Hadoop MapReduce is a software framework for easily
writing applications which process vast amounts of data .

Vous aimerez peut-être aussi