0 évaluation0% ont trouvé ce document utile (0 vote)
47 vues7 pages
We continue to create 2.5 quintillion bytes of data each day. 90% of the data in the world today has been created in the last two years alone. The sources of data can be sensors used to gather climate information, posts to social media sites, digital pictures and videos, purchase transaction records, and cell phone GPS signals etc. This data is Big Data
We continue to create 2.5 quintillion bytes of data each day. 90% of the data in the world today has been created in the last two years alone. The sources of data can be sensors used to gather climate information, posts to social media sites, digital pictures and videos, purchase transaction records, and cell phone GPS signals etc. This data is Big Data
We continue to create 2.5 quintillion bytes of data each day. 90% of the data in the world today has been created in the last two years alone. The sources of data can be sensors used to gather climate information, posts to social media sites, digital pictures and videos, purchase transaction records, and cell phone GPS signals etc. This data is Big Data
Manoranjan Kr. Singh (Department of Mathematics, Magadh University, Bodh Gaya) drmksingh_gaya@gmail.com Deepak Mitra (Department of Computer Applications, Gaya College Gaya, Bihar) d_mitra123@yahoo.com Introduction We continue to create 2.5 quintillion bytes of data each day. 90% of the data in the world today has been created in the last two years alone. The sources of data can be sensors used to gather climate information, posts to social media sites, digital pictures and videos, purchase transaction records, and cell phone GPS signals etc. This data is Big Data[1]. Big data is used to describe a massive volume of both structured and unstructured data that is so large that it's difficult to process using traditional database and software techniques. In most enterprise case the data is too big or it moves too fast or it exceeds current processing capacity. Big data also refers to the technology (which includes tools and processes) that an organization requires handling the large amounts of data and storage facilities. Big data Technology [2]
Big data technology must support search, development, governance and analytics services for all data typesfrom transaction and application data to machine and sensor data to social, image and geospatial data, and more. Systems Infrastructure must capitalize on real-time information flowing through the organization. It must be optimized for analytics to respond dynamicallywith automated business processes, better agility and improved economicsto the increasing demands of big data. Privacy To protect organizations reputation and brand, the platform must implement strict policies and practices around privacy and data protection, safeguarding all of the data and insights on which the business relies. Governance It controls how information is created, shared, cleansed, consolidated, protected, maintained, retired and integrated within the enterprise. 2
Storage To achieve economies and efficiencies, certain analytics must run close to the data, while it is in motion. But for data to store, the infrastructure must embody a defensible disposal strategy that reduces the run rate of storage, legal expense and risk. Security As analytics is infused into organization, data security becomes more central. Infrastructure must have strong security measures built in to guard organization against internal and external threats. Cloud To relieve the pressure that big data is placing on IT infrastructure, big data and analytics solutions can be hosted on the cloud to achieve the scalability, flexibility, expandability and economics that will provide competitive advantage into the future. Difference between Big Data and Open Data [3]
Big data and the new phenomenon open data are closely related but they're not the same. Open data brings a perspective that can make big data more useful, more democratic, and less threatening. While big data is defined by size, open data is defined by its use. Big data is the term used to describe very large, complex, rapidly-changing datasets. But those judgments are subjective and dependent on technology: today's big data may not seem so big in a few years when data analysis and computing technology improve. Open data is accessible public data that people, companies, and organizations can use to launch new ventures, analyses patterns and trends, make data-driven decisions, and solve complex problems. All definitions of open data include two basic features: the data must be publicly available for anyone to use, and it must be licensed in a way that allows for its reuse. Open data should also be relatively easy to use, although there are gradations of "openness". And there's general agreement that open data should be available free of charge or at minimal cost.
3
The relationship between big data and open data
This Venn diagram maps the relationship between big data and open data, and how they relate to the broad concept of open government. Both big data and open data can transform business, government, and society and a combination of the two is especially potent. Big data gives us unprecedented power to understand, analyse, and ultimately change the world we live in. Open data ensures that power will be shared and that the world we change will, with luck, become a fairer and more democratic one. As far back as 2001, industry analyst Doug Laney (currently with Gartner) articulated the now mainstream definition of big data as the three Vs: volume, velocity and variety [4]
Volume. : Many factors contribute to the increase in data volume. A typical PC might have had 10 gigabytes of storage in 2000. Today, Facebook ingests 500 terabytes of new data every day; a Boeing 737 will generate 240 terabytes of flight data during a single flight across the US; the proliferation of smart phones, the data they create and consume; sensors embedded into everyday objects will soon result in billions of new, constantly-updated data feeds containing environmental, location, and other information, including video. Velocity : Clickstreams and ad impressions capture user behavior at millions of events per second; high-frequency stock trading algorithms reflect market changes within microseconds; machine to machine processes exchange data between billions of devices; infrastructure and sensors generate massive log data in real-time; on-line gaming systems support millions of concurrent users, each producing multiple inputs per second 4
Variety : Data today comes in all types of formats. Big Data data isn't just numbers, dates, and strings. Big Data is also geospatial data, 3D data, audio and video, and unstructured text, including log files and social media. Traditional database systems were designed to address smaller volumes of structured data, fewer updates or a predictable, consistent data structure. Big Data databases, such as MongoDB, solve these problems and provide companies with the means to create tremendous business value. At SAS two additional dimensions are considered about big data: Variability : In addition to the increasing velocities and varieties of data, data flows can be highly inconsistent with periodic peaks. Daily, seasonal and event-triggered peak data loads can be challenging to manage. Even more so with unstructured data involved. Complexity : Today's data comes from multiple sources. And it is still an undertaking to link, match, cleanse and transform data across systems. However, it is necessary to connect and correlate relationships, hierarchies and multiple data linkages or your data can quickly spiral out of control. Importance of Big Data [5]
Organizations will be able to take data from any source, harness relevant data and analyze it to find answers that enable cost reductions, time reductions, new product development and optimized offerings smarter business decision making. Big Data for the Enterprise With Big Data databases, enterprises can save money, grow revenue, and achieve many other business objectives, in any vertical. Build new Applications: Big data might allow a company to collect billions of real-time data points on its products, resources, or customers and then repackage that data instantaneously to optimize customer experience or resource utilization. Improve the effectiveness and lower the cost of existing applications: Big data technologies can replace highly-customized, expensive legacy systems with a standard solution that runs on commodity hardware. And because many big data technologies are open source, they can be implemented far more cheaply than proprietary technologies. 5
Realize new sources of competitive advantage: Big data can help businesses act more nimbly, allowing them to adapt to changes faster than their competitors. Increase customer loyalty: Increasing the amount of data shared within the organization and the speed with which it is updated allows businesses and other organizations to more rapidly and accurately respond to customer demand. Big Data is Big Business for Commerce [6]
Three ways big data can benefit your business Detect, prevent and remediate financial fraud Across consumer and B2B industries, every day around the world, criminals are busily at work trying to defraud companies through a constantly evolving portfolio of schemes and strategies. As the volume and sophistication of these schemes increases, many organizations are turning to powerful analytics to sift through massive data volumes and uncover hidden patterns, trends and suspicious events that can indicate criminal fraud. Calculate risk on a large portfolio of loans An industry wide failure to properly assess the latent risks lurking in thousands of substandard loans led to billions of dollars of losses. Execute high-value marketing campaigns Companies face big data challenges in its marketing operations as well. Company operates a sophisticated marketing operation, running campaigns to millions of targets. However, as the data volumes grew and the campaigns began to target 10 million to 15 million recipients, it couldn't physically process the data, preventing the company from maximizing its customer lifetime value and executing more efficient and effective cross-sell/up-sell campaigns. Using high-performance analytics, the company has achieved tremendous gains in the throughput of its database marketing as much as 215 times faster dramatically compressing the model development life cycle and enabling its teams to test and validate additional variables for greater reliability in their models. Big Data and the Market in India [8]
IDC predicts that this year Big Data will reach $16.1 billion and grow six times faster than the IT market overall. India itself is also moving at a phenomenal pace with escalating volumes of consumers moving online. 6
According to IDC, by 2020 the world is set to generate 50 times the amount of information and 75 times the number of information containers and new information taming technologies that would be aimed at driving down the cost of creating, capturing, managing, and storing information. Abhijit Potnis, Director Technology Services, EMC India & SAARC highlights that the Digital Universe in India alone is set to grow to 2.9 Zettabytes by 2020 and that the liability for 84 percent of this digital universe rests with the enterprises. This percentage gives a fair idea of how the enterprise storage infrastructure across various industry verticals would experience the need of a consistent overhaul to accommodate the escalating enterprise data requirements. Big Data Tools [1]
Open Source tools for big data, divided into four arenas: Data stores, development platforms, development tools, and integration, analytics and reporting tools. Data Stores Apache Hadoop Cloud Foundry (VMware), Hortonworks, Hadapt NoSql Databases MongoDB, Cassandra, Hbase SQL Databases MySql (Oracle), MariaDB, PostgreSQL, TokuDB Development Platforms On Apache Hadoop Impala (Massively Parallel Processing (MPP) query engine that runs natively); Lingual (ANSI SQL); Pattern (analytics); Cascading (an application framework for Java developers for data analytics and data Management apps) On Apache Lucene and Solr Search from LucidWorks and ElasticSearch OpenStack (open source software for building private and public clouds.) Red Hat (Hadoop Servers standard Linux distro) REEF (Microsofts Hadoop development platform) Storm (integrates with any queuing system and any database system) Development Tools Apache Mahout (programming language for machine learning) Python and R (programming language for predictive analytics) Integration, Analytics and Reporting Tools Jaspersoft (reporting and analytics server) Pentaho (data integration and business analytics) Splunk (platform for IT analytics) 7
Talend (big data integration, data management and application integration)