Big Data

Introduction to Big Data Analytics
Analytics is an encompassing and multidimensional field. It uses mathematics,

statistics, predictive modeling and machine-learning techniques to find meaningful
patterns and knowledge in recorded data.
What is Big Data Analytics?
In other words, Big data analytics examines large amounts of data to uncover hidden
patterns, correlations and other insights. With today’s technology, it’s possible to analyze
your data and get answers from it immediately. Big Data Analytics helps you to
understand your organization better. With the use of Big data analytics, one can make
the informed decisions without blindly relying on guesses.
Big data analytics is the often complex process of examining large and varied data sets -
- or big data -- to uncover information including hidden patterns, unknown correlations,
market trends and customer preferences that can help organizations make informed
business decisions.
The term “big data” can be defined as data that becomes so large that it cannot be
processed using conventional methods. The size of the data which can be considered
to be Big Data is a constantly varying factor and newer tools are continuously being
developed to handle this big data. It is changing our world completely and shows no
signs of being a passing fad that will disappear anytime in the near future. In order to
make sense out of this overwhelming amount of data it is often broken down using five
V's: Velocity, Volume, Value, Variety, and Veracity.
Velocity
Obviously, velocity refers to the speed at which vast amounts of data are being
generated, collected and analyzed. Every day the number of emails, twitter messages,
photos, video clips, etc. increases at lighting speeds around the world. Every second of
every day data is increasing. Not only must it be analyzed, but the speed of transmission,
and access to the data must also remain instantaneous to allow for real-time access to
website, credit card verification and instant messaging. Big data technology allows us
now to analyze the data while it is being generated, without ever putting it into databases.
Volume
Volume refers to the incredible amounts of data generated each second from social
media, cell phones, cars, credit cards, M2M sensors, photographs, video, etc. The vast
amounts of data have become so large in fact that we can no longer store and analyze
data using traditional database technology. We now use distributed systems, where
parts of the data is stored in different locations and brought together by software. With
just Facebook alone there are 10 billion messages, 4.5 billion times that the “like” button
is pressed, and over 350 million new pictures are uploaded every day. Collecting and
analyzing this data is clearly an engineering challenge of immensely vast proportions.
Value
When we talk about value, we’re referring to the worth of the data being
extracted. Having endless amounts of data is one thing, but unless it can be turned into
value it is useless. While there is a clear link between data and insights, this does not
always mean there is value in Big Data. The most important part of embarking on a big
data initiative is to understand the costs and benefits of collecting and analyzing the data
to ensure that ultimately the data that is reaped can be monetized.
Variety
Variety is defined as the different types of data we can now use. Data today looks very
different than data from the past. We no longer just have structured data (name, phone
number, address, financials, etc) that fits nice and neatly into a data table. Today’s data
is unstructured. In fact, 80% of all the world’s data fits into this category, including photos,
video sequences, social media updates, etc. New and innovative big data technology is
now allowing structured and unstructured data to be harvested, stored, and used
simultaneously.
Veracity
Last, but certainly not least there is veracity. Veracity is the quality or trustworthiness of
the data. Just how accurate is all this data? For example, think about all the Twitter
posts with hash tags, abbreviations, typos, etc., and the reliability and accuracy of all that
content. Gleaning loads and loads of data is of no use if the quality or trustworthiness is
not accurate. Another good example of this relates to the use of GPS data. Often the
GPS will “drift” off course as you peruse through an urban area. Satellite signals are lost
as they bounce off tall buildings or other structures. When this happens, location data
has to be fused with another data source like road data, or data from an accelerometer
to provide accurate data.
Challenges and limitations of big data analytics

Data has become an essential part of every economy industry, organization and every
individual. The amount of data generating daily on internet is not only increasing rapidly,
but also complex, this led to the contribution to big data. The big data is a term for
massive data sets which have complex structure and cannot be handled by standard
software. Big data is challenging in terms of effective storage, effective computation and
analysis.
The data originated from single source has less volume. Storing, managing and
analysing that kind of data does not present greater challenges and most processing is
done through database and data warehouse. But in Big data environment the data is of
large volume and cannot be processed through database because of unstructured data.
We discuss the problems of big data-analytics.
1.Huge data sources and poor data quality: Big data is characterized by heterogeneous
data sources like images, videos and audios. A traditional method for describing structure
of data is not suitable for big data. Big data is also affected by noise and data may be
inconsistent. So, efficient storing and processing are prerequisites for Big data.
2.Efficient Storage of Big data: The way Big data stored effects not only cost but also
analysis and processing. To meet service and analysis requirements in Big data reliable,
high performance, high availability and low-cost storage need to be developed. As data
come from different sources it may causes redundancy. Detecting and eliminating
redundancy may improve storage area.
3.Efficiently processing Unstructured and Semi-Structured data: Databases and

warehouses are unsatisfactory for processing of unstructured and semi structured data.
With Big data read/write operations are highly concurrent for large number of users. As
the size of database increases, algorithm may become insufficient. The CAP theorem
states that it is impossible for a distributed system to have all the three, we can choose
only 2 out of 3. Fig 2 shows the CAP theorem. Because consistency is required.
4.Mass Data mining: Research has shown that larger the data sets, machine algorithm
is more accurate. Most traditional data mining algorithms invalid by Big data sets. When
the size of data reaches petabytes, serial algorithm may fail to compute within timeframe.
Effective machine learning and data mining algorithms need to be developed.
LIMITATIONS OF BIG DATA

Although big data tools had benefits, they have limitations too. The main Big data
limitation are:
1. Prioritizing correlations: “Correlation is not simply causation.” Big data is used

to tease out correlations. Correlation exists when one variable is linked with
another, just because two variables linked doesn’t mean that causative relationship
exists between them.
2. Security: Security is foremost aspect for every technology. Big data is prone to
data breaches. The important information that is provided to some third party may
get leaked to customers. Proper encryptions must be made in order to protect the
data.
3. Large growth in data: data is growing faster than the processing power. Large
volumes of data are being exploded in past years. We need some new machines
to work; otherwise we will get over run by data. Large Data centers can solve this
problem.
4. Inconsistency in data collection: Sometimes the tools we use to gather big data
sets are imprecise. This will happen when the data is collecting for example,
consider Google search the results of the search on one day will be different from
other day, this is mainly due to inconsistency in data collection.

Big Data

Transféré par

Informations du document

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Big Data

Transféré par

Droits d'auteur :

Formats disponibles

Introduction to Big Data Analytics

Analytics is an encompassing and multidimensional field. It uses mathematics,

What is Big Data Analytics?

Challenges and limitations of big data analytics

3.Efficiently processing Unstructured and Semi-Structured data: Databases and

LIMITATIONS OF BIG DATA

1. Prioritizing correlations: “Correlation is not simply causation.” Big data is used

Vous aimerez peut-être aussi