Vous êtes sur la page 1sur 9

Available online at www.sciencedirect.

com

ScienceDirect
Procedia - Social and Behavioral Sciences 195 (2015) 1042 1050

World Conference on Technology, Innovation and Entrepreneurship

Yesterday, Today and Tomorrow of Big Data


Hakan zksea,*(PLQ6HUWDo$Ua, Cevriye Gencerb
a
Gazi UniYHUVLW\.DYDNOGHUH$QNDUD, Turkey
b
Gazi University, Maltepe, Ankara, Turkey

Abstract

Owing to the self-improvement desire, the human being always tries to reach to the current information and generate new ones
from the data on hand. The practices are realized by processing and transforming the data, whose existence is broadly accepted,
into information. Generating information from data is vitally important in terms of regulating the life. Especially firms need to
store and transform data quickly and properly into information in order to achieve the objectives such as having a competitive
edge, producing new products, moving the firm ahead and stabilizing the internal dynamics. The increase in the amount of data
sources also increases the amount of the data acquired. Therefore storing and processing data become difficult and classical
approaches remain incapable to do such transactions. By means of Big Data large amount of data with a wide range can be
stored, managed and processed. Besides Big Data ensures proper information quickly and offers advantage and convenience to
the firms, researchers and consumers by taking the properties of Volume, Value, Variety, Veracity and Velocity into
consideration. This study consists of 5 parts. In the Introduction part the features, classification, the process, the areas of usage
and the techniques of Big Data are explained. In the second part the appearance process and the advantages of the concept of Big
Data are illustrated with examples. A detailed literature review is produced in the third part. The actual studies and the most
interested areas of Big Data are told in this part. In the fourth part the future of the Big Data is evaluated. Besides the situation
and distribution of the studies on Big Data in Turkey and all over the world is presented. In the Conclusion part, an overall
assessment is included and probable troubles are mentioned.


2015
2015TheTheAuthors.
Authors.Published
Publishedbyby
Elsevier Ltd.Ltd.
Elsevier This is an open access article under the CC BY-NC-ND license
(http://creativecommons.org/licenses/by-nc-nd/4.0/).
Peer-review under responsibility of Istanbul University.
Peer-review under responsibility of Istanbul Univeristy.
Keywords: Big data, data, information

*
Corresponding author.
E-mail address: hakan_ozkose@hotmail.com

1877-0428 2015 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license
(http://creativecommons.org/licenses/by-nc-nd/4.0/).
Peer-review under responsibility of Istanbul Univeristy.
doi:10.1016/j.sbspro.2015.06.147
Hakan zkse et al. / Procedia - Social and Behavioral Sciences 195 (2015) 1042 1050 1043

1. Introduction

Big Data has various definitions in the literature. Some of those are specified below:
Big Data is the amount of data beyond the ability of technology to store, manage and process efficiently
(Manyika et.al, 2011).

Big Data is a term which defines the hi-tech, high speed, high-volume, complex and multivariate data to capture,
store, distribute, manage and analyze the information (TechAmerica Foundation, 2014).

Big data is high volume, high velocity, and/or high variety information assets that require new forms of
processing to enable enhanced decision making, insight discovery and process optimization (Gartner, 2014;
Grsakal, 2014).

Big Data Technologies are new generation technologies and architectures which were designed to extract value
from multivariate high volume data sets efficiently by providing high speed capturing, discovering and analyzing
(Gantz and Reinsel, 2011).

Hashem et.al. define Big Data by combining various definitions in literature as follows:
The cluster of methods and technologies in which new forms are integrated to unfold hidden values in diverse,
complex and high volume data sets (Hashem et.al., 2015).

As the definitions suggest, there are some points to take into consideration in big data sets. The data should be
complex and multiple together with its size. Therefore conventional methods have difficulty in analyzing big data
sets and new methods and technologies are needed.

1.1. Characteristics of Big Data

Various studies in the literature show that big data has 3, 4 or 5 characteristics; 3 of whom are common at all:
Volume, Velocity and Variety. Others are Veracity and Value (Hashem et.al, 2015; Elragal, 2014; Fadiya et.al.,
2014; Yang et.al., 2014; Lpez et.al., 2015). 5 characteristics of big data is shown in Figure 1.

Fig 1. 5 Characteristics of Big Data (Elragal, 2014)

These 5 characteristics are explained as follows (Hashem et.al, 2015; Elragal, 2014; Fadiya et.al., 2014; Yang
et.al., 2014; Lpez et.al., 2015):
Volume: It is the most important characteristic of big data. It represents the size of the big data set.
Variety: Various data come to the companies from numerous resources (internal or external). These data entries
from separate resources cause variance in data set. External data are hardly ever structural.
Velocity: The production rate of big data is notably high. The heavy increase in data means that the data should
be analyzed more swiftly. The faster the data increases, the faster the need for the data increases; therefore the
process shows increase as well.
1044 Hakan zkse et al. / Procedia - Social and Behavioral Sciences 195 (2015) 1042 1050

Veracity: It is the accuracy of the data. The data should be acquired from correct resources and its security
should be provided. Only authorized people should have the access permission.
Value: A result should be generated after all of the procedures and the result should enrich the process.

1.2. Classification of Big Data

The characteristic of big data can be understood better by dividing it into classes. These classes are Data Sources,
Content Format, Data Stores, Data Staging and Data Processing (Hashem et.al, 2015).
Data Sources : Web & Social, Machine, Sensing, Transactions and IoT
Content Format : Structured, Semi-Structured and Unstructured
Data Stores : Document-oriented, Column-oriented, Graph based and Key-value
Data Staging : Cleaning, Normalization and Transform
Data Processing : Batch and Real time

1.3. Big Data Process

Big data process is shown below:


x Data Management
o Acquisition and Recording
o Extraction, Cleaning and Annotation
o Integration, Aggregation and Representation
x Analytics
o Modeling and Analysis
o Interpretation
Big data process is visualized by Gondomi and Haider as shown in Figure 2:

Fig. 2. Big Data Process (Gandomi and Haider, 2015)

1.4. Usage Areas of Big Data

Big data is used efficiently in numerous fields. Some of them are listed below:
x Automotive industry, x Travel and transport sector,
x High technology and industry, x Financial services,
x Oil and gas, x Social media and online services,
x Telecommunication sector, x Public services,
x Medical field, x Education and research,
x Retail industry, x Health services,
x Packaged consumer products, x Law enforcement and defense industry.
x Media and show business,
Hakan zkse et al. / Procedia - Social and Behavioral Sciences 195 (2015) 1042 1050 1045

1.5. Methods Used in Big Data

1.5.1. Text Analytics

Text analytics is used for information retrieval from data. E-mails, blogs, online forums, news and call center
records are all examples of text data. Text analytics involve machine learning, statistical analysis and computational
linguistics. Text analytics enable to extract meaningful summaries from large scale data (Gandomi and Haider,
2015).

Information Extraction, Text Summarization, Question Answering and Sentiment Analysis are some of the
techniques used in text analytics.

1.5.2. Audio Analytics

Audio analytics is used to extract information from unstructured audio data. Call centers and health services are
commonly used utilization areas of audio analytics. Audio analytics can be used in numerous fields such as
increasing the customer experience, the performance of customer representative and the sales rate; comprehending
several tasks such as customer behaviors and the troubles of products (Gandomi and Haider, 2015).

1.5.3. Video Analytics

Video analytics is the usage of various techniques to extract meaningful information, track and analyze video
streams. Marketing and operations management is the main application area of video analytics (Gandomi and
Haider, 2015).

1.5.4. Social Media Analytics

Social media analytics is the analysis of the structured and unstructured data on the social media channels. Social
media can be categorized as follows (Gandomi and Haider, 2015):

x Social networks (Facebook, LinkedIn), x Media sharing (Instagram, YouTube),


x Blogs (BlogSpot, WordPress), x Wiki (Wikipedia, Wikihow),
x Microblogs (Twitter, Tumblr), x Question-and-answer sites (Yahoo! Answers,
x Social news (Digg, Reddit), Ask.com),
x Social bookmarks (Delicious, StumbleUpon), x Review sites (Yelp, TripAdvisor).

1.5.5. Predictive Analytics

Predictive analytics is based upon estimating future considering current or stale data. Predictive analysis is used
to capture the relationships of data and discover the patterns. Predictive analytics which is primarily based on
statistical methods, is highly applicable on many disciplines (Gandomi and Haider, 2015).

2. Yesterday of Big Data

The extremities of big data go long way back, however it has lately been understood that most of those former
studies were big data studies. For example in 1839 Matthew Fontaine Maury; the head of the Depot of Charts and
Instruments of the U.S. Navy; collected data about the tides, winds and sea flows of the places he visited. There
were numerous navigation books, maps and charts at the depot where he was working. The log books of the former
voyages were also present there. There were lots of records about wind, water and air conditions in the log books.
Maury realized that he could achieve a new voyage chart as he combines all of the data in hand. He generated new
routes by utilizing them. He developed a standard form for U.S. Army battleships to expand his study and he
enhanced the accuracy of the route information he had. Then he included merchant ships into his study and utilized
1046 Hakan zkse et al. / Procedia - Social and Behavioral Sciences 195 (2015) 1042 1050

the log book data of them. As a result he passed on huge savings by cutting the durations across by a third. Since
then he has been commemorated as Pathfinder of the Seas (Mayer-Schnberger and Cukier, 2013).

In the academic field big data can be accepted as a new concept. The change in the interest in big data by years is
shown in the Figure 3.

Fig. 3. The change in the interest in big data by years (Google Trends-Big Data, 2015)

As it is seen in the Figure 3, the interest in big data violently increases by year 2011. At the present time the
search rate of big data is on its peak.

Big data and data mining concepts are usually confused. Frawley et.al. define data mining as the discovery of the
data which wasnt known before and which has the makings of being useful (Frawley et.al., 1992). According to
Dunham, it is detection of hidden data in the database (Dunham, 2006). Fayyad et.al. describe it as the application of
specific algorithms to extract patterns from the dataset (Fayyad et.al., 1996). As it is understood from the
definitions; even though big data and data mining have several steps in common, data mining doesnt cover all
properties of big data.

The interest in big data increases day by day whereas the interest in data mining decreases. Google Trend
Analysis gives us the comparative diagram in Figure 4.

Fig. 4. Comparison of the interests in Big Data and Data Mining (Google Trends-Big Data vs. Data Mining, 2015)

As it is seen in the Figure 4, it is obvious that the search field has been heading from Data Mining through Big
Data.

3. Today of Big Data

As it is mentioned before, the academic studies on big data have been increased by leaps and bounds since 2012
and the tendency through the studies on it increases day by day. There are numerous studies on big data in diverse
fields. In this section the academic studies on big data are mentioned.

The study of Xiang et.al. aimed to determine the experiences and satisfactions of hotel guests. Online customer
testimonials and satisfaction rates on expedia.com were taken into consideration. After preprocessing, classification
Hakan zkse et al. / Procedia - Social and Behavioral Sciences 195 (2015) 1042 1050 1047

method was applied to the data. Finally statistical relationship analysis was performed (Xiang et.al., 2015). Hashem
et.al. studied diffusively on the usage of big data in cloud computing. They mentioned the mutual affinity of the two
concepts in 5 case studies. After that they informed about big data storage systems. The importance of Map Reduce
algorithm was also told within the study (Hashem et.al, 2015). Gandomi and Haiders study gave general
information about big data. A specific definition, the characteristics of big data and the analyses used in big data
were included in this study (Gandomi and Haider, 2015).

An Arabic sentiment lexicon was created in the text mining study of Mahyoub et.al. The study was inspired by
WordNet; which was developed as an English dictionary data base (Mahyoub et.al., 2014). Elragal presented that
ERP and big data could be combined, thus a strong platform could be constructed (Elragal, 2014). Du et.al. stated
that real estate firms could provide a competitive advantage by using big data technologies (Du et.al., 2014).

Ackerman and Angus did a big data study to visualize spatial and temporal IP mobility. They visualized the
temporal IP mobility in Los Angeles, New York, London, Moscow, Tokyo and Melbourne. Moreover, they plotted
the IP map of Australia and showed the hourly variation of Melbournes IP mobility (Ackerman and Angus, 2014).
Young focused on HIV and the importance of big data studied to be safe from HIV. Young claimed that a new HIV
monitoring system could be set up by analyzing social media networking, thus new ideas could be generated for
early intervention and disease control. According to Young, big data was applicable to various fields (Young, 2015).

Shin and Chois study was on ecology. In this sense, the socio-ecological effects of big data transactions; such as
social dynamics, political rhetoric technological choices were analyzed. Koreas big data studies were also
mentioned and some hints and tips were offered for Korean big data entrepreneurs (Shin and Choi, 2015).

Weichselbraun et.al. studied on opinion mining to enrich semantic knowledge. They gathered the data from
Amazon (electronics and software) and IMDB (comedy, crime and drama). A quantitative evaluation was done with
sentiment analysis. It was seen that the accuracy increased with contextualization and contextualization had a
positive effect on recall and precision. Besides with concept grounding the concepts could be understood whether
they were positive, negative or neutral; with word grounding the words could be understood whether they were
synonyms, antonyms, emotions or explanations (Weichselbraun et.al., 2014).

Quian et.al. first introduced granular computing. Then they defined hierarchical encoded decision table and
discussed some criteria. Finally map reduce based hierarchical attribute reduction algorithm was suggested and the
efficiency of the algorithm was tested with examples (Qian et.al., 2015). Jifa and Lingling explained the data,
DIKW, big data and data science concepts and mentioned the relationship between them (Jifa and Lingling, 2014).
Other than the academic studies firms like Google, Netflix, Amazon or Facebook carry on big data studies. Google
uses big data in numerous fields and integrates it to its own work flow successfully.

For example in 2009 due to the need for medical report the determination process of the diffusion area of H1N1
would take weeks; even months. However Google could have an idea about the area utilizing the storage of the
search terms of the users. That meant that the area could be determined before the health organizations (Mayer-
Schnberger and Cukier, 2013; Cook et.al., 2011).

Google also presented another big data study; Google Flu Trends; which was helpful to analyze world-wide flu
trends by using Google search terms (Google Flu Trends, 2015). Similarly by means of Search Google Trends, it is
possible to learn the trending topics and their fluctuation (Search Google Trends, 2015). Firms like Facebook, Flickr,
YouTube, Academia and The Marker need big data for link prediction. By this means it is possible to present
network connections such as You may also know or You may also be interested (Fire et.al., 2011). Firms
like eBay and Amazon use big data for offering systems. As the customer purchases an item, another item which has
the possibility to be purchased is offered before the payment process considering the previous and current purchases.
Thus the sales rates increase and various items which can be overlooked are served (Chen et.al., 2012).
1048 Hakan zkse et al. / Procedia - Social and Behavioral Sciences 195 (2015) 1042 1050

4. The Future of Big Data

The interest through big data increases day by day. The firms with the ability to store large amount data carry
their work one step further and provide an advantage within the market. Firms like Google, Amazon, Facebook,
YouTube and eBay have an advantage over the others in self-improvement and competition due to having a great
quantity of data. It should not be forgotten that data processing and information retrieval is as important as data
storage. Aforementioned firms which have the ability to store great quantity of data have also been highly successful
in data processing and putting it into service and have become movers and shakers.

Big data studies have been included in numerous fields. The importance of big data studies which have been used
in a wide range of industries from automotive and communication to finance and health will increase in the future.
As is seen from Figure 3, the interest in big data have increased day by day whereas the interest in data mining have
diminished in importance by 2000s.

Turkey have recently placed importance on big data academically. As is seen from Figure 5, the interest in big
data in Turkey have just started in 2012 and shown increase over the years. The interest in big data is seen on its top
in 2015 (Figure 5). Turkey have fallen behind the world in beginning the studies on big data.

Fig.5. The change of the interest by years in Turkey (Google Trends-Big Data, 2015)

$QNDUDDQGVWDQEXODUHWKHOHDGLQJFLWLHVLQ7XUNH\RQSODFLQJ LPSRUWDQFHRQELJGDWDVWXGLHV and the mostly


searched term is What is big data (Google Trends- The change of the interest by years in Turkey, 2015).

As is seen from Figure 6, the most interested country in big data is India (Search volume index: 100). Singapore
(Search volume index: 71) and South Korea (Search volume index: 57) follow India. Turkey has a relatively low
search volume index considering the leading countries. The search volume index of Turkey is 4 (Google Trends-
The interest distribution of Big Data by countries and cities, 2015).

Fig. 6. The interest distribution of Big Data by countries (Google Trends- The interest distribution of Big Data by countries and cities, 2015)

Figure 7 shows the interest distribution of big data by cities. Chinchwad (Search volume index: 100) and
Bangalore (Search volume index: 51) are the most interested countries in big data.
Hakan zkse et al. / Procedia - Social and Behavioral Sciences 195 (2015) 1042 1050 1049

Fig. 7. The interest distribution of Big Data by cities (Google Trends- The interest distribution of Big Data by countries and cities, 2015)

Numerous searches have been made on big data. Some of them are big data analytics, Hadoop and big data
(Google Trends- The interest distribution of Big Data by countries and cities, 2015).

5. Conclusion and Recommendations

Several difficulties may show up in the acquisition, storage and processing of data. As the interest in big data
increases, such difficulties will decrease or will be solved in shorter time. Turkey has fallen behind the world in
terms of the academic interest and the academic studies in big data. There are plenty of academic studies in big data
which can be taken by the researchers as a model.

Data providers bear tremendous responsibility as much as the researchers in big data. The data providers which
cannot process data will get harmed in terms of competition as they hide their data. Moreover, they prevent the data
transform into knowledge. The data providers and the researchers should be in cooperation perspicuously based on
the basis of trust. This will increase the reliability of the results and keep the firms advantageous in competition.

Big firms like Facebook, Flickr, YouTube, Academia, The Marker, Google and Amazon have already gone
towards big data because they all have broad visions. Those firms each hold its own market, expand their dominance
at the same time ensure customer satisfaction. They keep their leadership and increase their market values day by
day.
As the studies in big data increase; technological developments and customer satisfaction will increase, diseases
will be cured or precautions will be taken earlier and the general price level will decrease.

References

Ackermann, K., & Angus, S. D. (2014). A Resource Efficient Big Data Analysis Method for the Social Sciences: The Case of Global IP Activity.
Procedia Computer Science, 29, 2360-2369.
Chen, H., Chiang, R. H., & Storey, V. C. (2012). Business Intelligence and Analytics: From Big Data to Big Impact. MIS quarterly, 36(4), 1165-
1188.
Cook, S., Conrad, C., Fowlkes, A. L., & Mohebbi, M. H. (2011). Assessing Google flu trends performance in the United States during the 2009
influenza virus A (H1N1) pandemic. PloS one, 6(8), e23610.
Du, D., Li, A., & Zhang, L. (2014). Survey on the Applications of Big Data in Chinese Real Estate Enterprise. Procedia Computer Science, 30,
24-33.
Dunham, M. H. (2006). Data mining: Introductory and advanced topics. Pearson Education India.
Elragal, A. (2014). ERP and Big Data: The Inept Couple. Procedia Technology, 16, 242-249.
Fadiya, S. O., Saydam, S., & Zira, V. V. (2014). Advancing big data for humanitarian needs. Procedia Engineering, 78, 88-95.
Fayyad, U., Piatetsky-Shapiro, G., & Smyth, P. (1996). From data mining to knowledge discovery in databases. AI magazine, 17(3), 37.
Fire, M., Tenenboim, L., Lesser, O., Puzis, R., Rokach, L., & Elovici, Y. (2011, October). Link prediction in social networks using
computationally efficient topological features. 2011 IEEE International Conference on Privacy, Security, Risk, and Trust, and IEEE
International Conference on Social Computing,73-80. IEEE.
1050 Hakan zkse et al. / Procedia - Social and Behavioral Sciences 195 (2015) 1042 1050

Frawley, W. J., Piatetsky-Shapiro, G., & Matheus, C. J. (1992). Knowledge discovery in databases: An overview. AI magazine, 13(3), 57.
Gandomi, A., & Haider, M. (2015). Beyond the hype: Big data concepts, methods, and analytics. International Journal of Information
Management, 35(2), 137-144.
Gantz, J., & Reinsel, D. (2011). Extracting value from chaos. IDC iview, (1142), 9-10.
Gartner IT Glossary, What is Big Data?, URL: http://www.gartner.com/it-glossary/big-GDWD 6RQ(ULLP7DULKL 
Google Flu Trends. Flu Trends (07.04.2015). URL: https://www.google.org/flutrends/
Grsakal, N., Byk Veri, Dora Kitabevi, Bursa, 2014.
Hashem, I. A. T., Yaqoob, I., Anuar, N. B., Mokhtar, S., Gani, A., & Khan, S. U. (2015). The rise of big data on cloud computing: Review and
open research issues. Information Systems, 47, 98-115.
Jifa, G., & Lingling, Z. (2014). Data, DIKW, Big data and Data science. Procedia Computer Science, 31, 814-821.
Lpez, V., del Ro, S., Bentez, J. M., & Herrera, F. (2015). Cost-sensitive linguistic fuzzy rule based classification systems under the
MapReduce framework for imbalanced big data. Fuzzy Sets and Systems, 258, 5-38.
Mahyoub, F. H., Siddiqui, M. A., & Dahab, M. Y. (2014). Building an Arabic Sentiment Lexicon Using Semi-Supervised Learning. Journal of
King Saud University-Computer and Information Sciences, 26(4), 417-424.
Manyika, J., Chui, M., Brown, B., Bughin, J., Dobbs, R., Roxburgh, C., ... & McKinsey Global Institute. (2011). Big data: The next frontier for
innovation, competition, and productivity.
Mayer-Schnberger, V., & Cukier, K. (2013). Big data: A revolution that will transform how we live, work, and think. Houghton Mifflin
Harcourt.
Qian, J., Lv, P., Yue, X., Liu, C., & Jing, Z. (2015). Hierarchical attribute reduction algorithms for big data using MapReduce. Knowledge-Based
Systems, 73, 18-31.
Search Google Trends. Big Data (02.04.2015). URL: http://www.google.com.tr/trends/explore#q=Big%20Data
Search Google Trends. Big Data vs Data Mining (02.04.2015).
URL: http://www.google.com.tr/trends/explore#q=Big%20Data%2C%20Data%20Mining&cmpt=q&tz=
Search Google Trends. Google Trends (07.04.2015). URL: http://www.google.com.tr/trends/?hl=tr
Search Google Trends. The change of the interest by years in Turkey (08.04.2015).
URL: https://www.google.com.tr/trends/explore#q=big%20data&geo=TR&date=1%2F2009%2076m&cmpt=q&tz=
Search Google Trends. The interest distribution of Big Data by countries and cities (08.04.2015). URL:
https://www.google.com.tr/trends/explore#q=big%20data&date=1%2F2009%2071m&cmpt=q&tz=
Shin, D. H., & Choi, M. J. (2015). Ecological views of big data: Perspectives and issues. Telematics and Informatics, 32(2), 311-320.
TechAmerica Foundations Federal Big Data Commission, Demystifying Bigdata: A Practical Guide To Transforming The Business Of
Government, URL: http://www.techamerica.org/Docs/fileManager.cfm?f=techamerica-bigdatareport-ILQDOSGI 6RQ (ULLP
Tarihi:20.12.2014).
Weichselbraun, A., Gindl, S., & Scharl, A. (2014). Enriching semantic knowledge bases for opinion mining in big data applications. Knowledge-
Based Systems, 69, 78-85.
Xiang, Z., Schwartz, Z., Gerdes, J. H., & Uysal, M. (2015). What can big data and text analytics tell us about hotel guest experience and
satisfaction?. International Journal of Hospitality Management, 44, 120-130.
Yang, C., Zhang, X., Zhong, C., Liu, C., Pei, J., Ramamohanarao, K., & Chen, J. (2014). A spatiotemporal compression based approach for
efficient big data processing on Cloud. Journal of Computer and System Sciences, 80(8), 1563-1583.
Young, S. D. (2015). A big data approach to HIV epidemiology and prevention. Preventive medicine, 70, 17-18.

Vous aimerez peut-être aussi