Vous êtes sur la page 1sur 6

Big Data

What is Big Data? Big Data Vs Small Data Needs different approaches Techniques, Tools, Architectures To help in finding solutions to new problems and existing problems using a better approach

What increases difficulty of Big Data? Increase in complexity of operations on data (E.g. Modeling and reasoning of data of different types )

Application of Big Data: Business Scenario 1: Online advertising for NY Times

Application of Big Data: Business Scenario 2 - Analysis of Traffic loop


detection data Traffic loop detection data comprises measurements of traffic intensity. Each loop counts the number of vehicles per minute passing at that location, and measures speed and length. Usage of this data: o Interesting for traffic and transport statistics o Statistics on other economic phenomena related to transport. Data o Data collected at 12,622 measurement locations on Dutch roads and this is vastly growing o Centrally stored in the National Data Warehouse for Traffic Information (NDW) o Managed by a collaboration of participating government organizations (NDW 2012). o The National Data Warehouse contains historic traffic data collected from 2010 onwards. To determine the usability of the NDW-data for statistics and to get an idea of peculiar features o minute level data was studied for all locations in the Netherlands for a single day: December 1st, 2011.

Dataset extracted from the NDW contained 76 million records The extracted dataset was analysed in the open source software R environment (R Development Core Team 2012). Application of Big Data: Business Scenario 3 - Analysis of Social media messages Estimate: 70% of the Dutch population actively posts messages on Social media (Eurostat 2012). The millions of messages generated each day o May be an interesting data source for statistical analysis Social media messages were studies from two points of view o Content and Sentiment Studies of the content of Dutch Twitter messages the dominant social medium in the Netherlands o revealed that nearly 50% of the messages were composed of pointless babble (Daas et al. 2012a). The rest included discussion on spare time activities, work, media (TV & radio) and politics Suggests that these messages could be used to extract opinions, attitudes, and sentiments towards these topics. Opens up possibilities to collect a considerable amount of information in a quick way without any response burden. The major problem in social media is discriminating the informative from the non-informative messages. Because of the large share of the non-informative babble messages, use of the more serious (informative) messages is negatively affected.

Text mining approaches used to automatically differentiate between both groups of messages have not been very successful so far and require further research. Another potential use of social media messages is sentiment analysis.

Conclusion
Big Data exists everywhere o We are just not used to deal with it Data is either free (e.g. Wikipedia) or paid Hype of Big Data is very recent Increasing Growth Lack of expertise in building Big Data applications Can Big Data be handled without much investment? o Usage of Open Source tools o Computing systems are not very expensive It is critical to have knowledge on how to deal with data

*******

Vous aimerez peut-être aussi