Vous êtes sur la page 1sur 6

Sahil Raj

MBA Business Analytics


University of Petroleum & Energy Studies (Dehradun)

Big data as by its name suggests data in great volumes. But there is lot more than just volume when it comes
to Big Data, and I will try to address all those aspects through this article. Simultaneously it has brought both,
great opportunity and change to the technological industry at the same time. Data scientists traditionally look
at the existing V's, the ones that have classically been utilized to understand key variables of any data set.
Subsequently, we should also look for other factors which determine the character and value of the
voluminous data available.

Fig.1: http://goranxview.blogspot.in/2011/03/new-buzz-social-media-analytics.html

1960s
Legacy
Systems

1970s
Mainframes

1990s
Personal
Computers

1990s

2000s

.com Boom

Social media

Fig.2: Big Data Timeline

Let us go back and discover how Big Data came into picture. In early 1960s we used to have traditional
legacy systems. Then we moved towards Mainframes by 1970s. They grew till 1990s, the time when
personal computers surfaced. Around mid to late 1990s revolutionary changes occurred in storage as well as
computing capabilities. This was the time when .com boom came; companies like Yahoo, Google, Ebay,
Amazon etc. came into picture and started generating huge data streams. After this period social media came
into existence with huge names like Orkut, Facebook, Linkedin, Twitter etc. All this created a huge surge in
the data we were generating till date. It can be understood by the following image:

Fig.2: http://freepress.intel.com/servlet/JiveServlet/showImage/38-4608-2199/InternetMinuteInfographic.jpg

As we all know data is a gold mine of information. Thats why companies planned to store it and mine it to
gather important information. But the data generated was not only huge in size it was also not homogeneous.
It was in the form of text, video, audio, pictures, geospatial information etc. This forced industry big wigs to
gather and develop solutions for storing, mining and gaining advantage out of it. These efforts gave birth to
the term Big Data and initiated the Hadoop Project. To simply classify which data to call big and which not
to, some guidelines in terms of Vs were created, which will be discussed in this post.
Whenever we talk about big data, we generally come across 3 major Vs used to describe the issues of
information overload in our digital world. Let us talk about 3 existing Vs, what other Vs can be added and
how to deal with some of the problems arising due to Vs.
The Existing Vs
Analyst Doug Laney first of all coined the 3 Vs of Big Data. Data scientists traditionally look at the existing Vs
that have classically been used to understand key variables of any data set. These are:1. Volume:Every mouse click, like, phone call, text message, web search and purchase transaction now a day is
catalogued and stored in a cloud of big data by the organizations. The amount of data created in
digital universe is around one Zettabyte which is equal to sextillion bytes. This explains in what
volumes data has been created and stored these days and why it is called big data. Also with
technology spreading ever widely this data is supposed to increase. With Internet of things being
implemented fastly by the industry this figure is going to hit Brontobytes by the end of this decade.
The primary goal of this large volume of data is to make it useful to companies as well as consumers
by optimizing future results.
2. Variety:In todays multi-faceted internet culture, the great volumes of data are also extremely varied in form.
So many variables can be thrown at a company that the true value of this information is often lost in
the sea of data. For example we have purchase transactions, website traffic, rewards programs, heat
maps, social media conversations, IoT, IT/OT, sensors data etc.
3. Velocity:More than 90% of the data that we have stored or are using has been generated in last 10 years or so.
This statement shows how fastly data is being generated. Velocity is also a factor which signifies the
big data one of its very significant attributes. Information is being created at a faster pace than ever
before. The varied channels of big data are each day increasing their output of content. There are

over 1.49 billion users of Facebook alone which gives our imagination a complexity about the kind of
data they are generating every moment.

The missing Vs:


With passage of time, these Vs are also not being able to classify big data. Its time to look beyond and
inculcate some new parameters which can be helpful. The two more Vs which can be added to resolve some
of the problems are:
4. Veracity:
This V talks about the accuracy of the data available to us. It may happen that whatever data we are
storing may be less than 2% of that is useful. It is required to understand the problems like

inconsistency, incompleteness, missing data problem which can occur during data generation or
storing. Also it may happen that we are storing data which is not even relevant to what we do.
5. Value:
This can trump all the Vs discussed till now for Big Data. All the enterprises deal with fixed business
and before dwelling into deploying Big Data initiatives, the look at the return what they will be
getting out of it. Until and unless it is useful to the company there is no point having access to it. So it
is very important to understand what value we want to derive out of the data. We can also talk about
what we want to store and what not for a specific business.
In the end we can represent Big Data in following manner, which will not only help us in
understanding it well, but also working with it will become easier.

Volume
Velocity
Variety

Veracity
Value
Links:
http://www.enterprisecioforum.com/en/blogs/jdodge/who-came-5-vs-big-data-0
https://hrboss.com/blog/2014-03-26/missing-vs-big-data-hr-5-v-model-here
http://davebeulke.com/big-data-impacts-data-management-the-five-vs-of-big-data/
https://www.linkedin.com/pulse/20140306073407-64875646-big-data-the-5-vs-everyone-must-know
http://dataconomy.com/seven-vs-big-data/
https://datafloq.com/read/3vs-sufficient-describe-big-data/166
http://www.pros.com/big-vs-big-data
Sources:
1. Big Data, for Better or Worse: 90% of Worlds Data Generated over Last Two Years.
2. New York Stock Exchange Ticks on Data Warehouse Appliances.
3. The Rising Data Deluge Opportunity.

Vous aimerez peut-être aussi