Bienvenue sur Scribd !

Paper 1.2

Transféré par

0% ont trouvé ce document utile (0 vote)

10 vues2 pages

The document discusses two key ideas about working with large, messy datasets: 1) Rather than viewing erroneous or corrupted data as problems, it is better to accept messiness as an unavoidable part of real-world data. More data, even if messier, can improve algorithms and systems more than smaller cleaner datasets. 2) With large datasets, correlations within the data are more important than understanding causation. Insights can be gained from correlations without fully understanding the reasons behind them. Correlations can serve as useful proxies for analysis and prediction.

Description originale:

Titre original

paper 1.2.docx

Copyright

Formats disponibles

DOCX, PDF, TXT ou lisez en ligne sur Scribd

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Signaler ce document

Droits d'auteur :

Formats disponibles

Téléchargez comme DOCX, PDF, TXT ou lisez en ligne sur Scribd

Signaler comme contenu inapproprié

0% ont trouvé ce document utile (0 vote)

10 vues2 pages

Paper 1.2

Transféré par

Ruchit Shrestha

Droits d'auteur :

Formats disponibles

Téléchargez comme DOCX, PDF, TXT ou lisez en ligne sur Scribd

Signaler comme contenu inapproprié

Passer à la page

Vous êtes sur la page 1sur 2

Rechercher à l'intérieur du document

2.

Embracing Real-world messiness

In the 3rd chapter of the book, authors Viktor Mayer-Schonberger and Kenneth
Cukier have put forward the idea that looking at erroneous, corrupted data not as
problems to get rid of but as an unavoidable part of real-world data is one of the
fundamental shifts in going to big data from small.
While recognizing the historical importance of precision and exactitude, the
writers of the book claim that imprecision - or messiness -might actually be a
positive feature, not a shortcoming when it comes to Big Data. They argue that
allowing for messiness to have a much bigger set of data is the core idea behind
Big Data. In this context, they put forward the idea of More trumps better.
Although we may be able to overcome the errors if we throw enough resources at
them, the writers claim in many cases it is more fruitful to tolerate error than it
would be to work at preventing it.
An example in the field of natural language processing is provided in the context,
where the performance of grammar-checking algorithm based on machine
learning improved from 75% accuracy to 95% accuracy when the data fed to it
increased from 10 million words to a billion words.
The chapter also presents an example of how a bigger messier data set is more
efficient than a smaller cleaner data set. To support their argument, the authors
have compared the attempts at language translation by two different companies.
IBM used 10 years worth of Canadian Parliamentary transcripts published in
French and English. This set of data contained about three million sentence
pairs. Google took a slightly different approach to the problem of language
translation - it availed itself to a larger but also much messier dataset: the entire
global Internet and more. This amounted to 95 billion English sentences.
Despite the messiness of the input, the writers assert Googles translation
works best.

3. Correlation > Causation

In the fourth chapter of the book titled Correlation, the writers argue that
Knowing what, not why, is good enough. Schonberger and Cukier provide the
example of how the computer might not have known why a customer who read
Ernest Hemingway might also like to buy F. Scott Fitzgerald(on amazon). But that
didnt matter.
Getting insights from Big Data doesnt necessarily have to do with exploring the
causes; the authors write Correlations let us analyze a phenomenon not by
shedding light on its inner workings but by identifying a useful proxy for it.
Making sense of what the data is trying to say is not nearly as important as what
its actually saying. The writers support their argument by an example of how

correlation-based analysis revealed that the vital signs of a premature baby are
very constant prior to a serious infection. This strange finding comes from a
software that captures and processes patient data in real time. While it doesnt
explain the reason behind this correlation, Big Data allows human caregivers do
what they do best.

Vous aimerez peut-être aussi

Big Data: A Revolution That Will Transform How We Live, Work, and Think
D'Everand
Big Data: A Revolution That Will Transform How We Live, Work, and Think
Viktor Mayer-Schönberger
Évaluation : 3.5 sur 5 étoiles
3.5/5 (128)
Big Data
Document11 pages
Big Data
Shubham Sogani
Pas encore d'évaluation
Candid Appraisal (CMU F 1 ACA 009)
Document2 pages
Candid Appraisal (CMU F 1 ACA 009)
Aivan John Janjan Cañadilla
Pas encore d'évaluation
Binomial Theorem
Document132 pages
Binomial Theorem
Hemanth Kumar K S
Pas encore d'évaluation
Mindfulness - Diverse Perspectives On Its Meaning, Origins, and Applications
Document303 pages
Mindfulness - Diverse Perspectives On Its Meaning, Origins, and Applications
Francisco
100% (7)
Ultimate Guide To Twin Flame Stages
Document6 pages
Ultimate Guide To Twin Flame Stages
Shanneasy Flan
Pas encore d'évaluation
Eight (No, Nine!) Problems With Big Data: by Gary Marcus and Ernest Davis April 6, 2014
Document4 pages
Eight (No, Nine!) Problems With Big Data: by Gary Marcus and Ernest Davis April 6, 2014
jshen5
Pas encore d'évaluation
Big Data (Review and Analysis of Mayer-Schonberger and Cukier's Book)
D'Everand
Big Data (Review and Analysis of Mayer-Schonberger and Cukier's Book)
BusinessNews Publishing
Pas encore d'évaluation
Gleason Literature Review
Document7 pages
Gleason Literature Review
api-548965118
Pas encore d'évaluation
The Slodderwetenschap (Sloppy Science) of Stochastic Parrots - A Plea For Science To NOT Take The Route Advocated by Gebru and Bender
Document10 pages
The Slodderwetenschap (Sloppy Science) of Stochastic Parrots - A Plea For Science To NOT Take The Route Advocated by Gebru and Bender
Zachary
Pas encore d'évaluation
The End of Theory
Document4 pages
The End of Theory
Hafizul Idham
Pas encore d'évaluation
Big Data - A Review Preliminary Information: York Times, The Financial Times, and Elsewhere
Document7 pages
Big Data - A Review Preliminary Information: York Times, The Financial Times, and Elsewhere
James
Pas encore d'évaluation
Book Review: Raw Data Is An Oxymoron
Document6 pages
Book Review: Raw Data Is An Oxymoron
sbevan01
Pas encore d'évaluation
Large Language Models and The Reverse Turing Test
Document33 pages
Large Language Models and The Reverse Turing Test
zhuyi bi
Pas encore d'évaluation
1.1 What Is Data Mining?
Document6 pages
1.1 What Is Data Mining?
Agil Langga
Pas encore d'évaluation
The Dark Side of Information
Document4 pages
The Dark Side of Information
aaaaaaaaaa
Pas encore d'évaluation
Research Paper Annotated Bibliography
Document7 pages
Research Paper Annotated Bibliography
api-595452539
Pas encore d'évaluation
The End of Theory - Data Deluge Makes The Scientific Method Obsolete
Document3 pages
The End of Theory - Data Deluge Makes The Scientific Method Obsolete
nazizza
Pas encore d'évaluation
Watched by The Web: Surveillance Is Reborn
Document5 pages
Watched by The Web: Surveillance Is Reborn
Chin Nguyen
Pas encore d'évaluation
Gil Press Big Data History
Document27 pages
Gil Press Big Data History
Toulouse18
Pas encore d'évaluation
What Is Big Data and Why Is It Important?
Document17 pages
What Is Big Data and Why Is It Important?
kjdhf
Pas encore d'évaluation
Quality Guru
Document6 pages
Quality Guru
tehky63
Pas encore d'évaluation
Big Data: A Revolution That Will Transform How We Live, Work, and Think
Document5 pages
Big Data: A Revolution That Will Transform How We Live, Work, and Think
Jairo Castro Florian
Pas encore d'évaluation
Comprehension Grade 12
Document4 pages
Comprehension Grade 12
Nuaym Al Jafri
Pas encore d'évaluation
Big Data: Are We Making A Big Mistake?
Document6 pages
Big Data: Are We Making A Big Mistake?
bluesfer2007
Pas encore d'évaluation
Getting From Generative AI To Trustworthy AI
Document21 pages
Getting From Generative AI To Trustworthy AI
JoJon Doed
Pas encore d'évaluation
Bangkok Shutdown Puts Squeeze On Businesses
Document5 pages
Bangkok Shutdown Puts Squeeze On Businesses
topimaster
Pas encore d'évaluation
Book Renews: Man-Machine Systems Experiments
Document3 pages
Book Renews: Man-Machine Systems Experiments
gabriela
Pas encore d'évaluation
The End of Theory - The Data Deluge Makes The Scientific Method Obsolete
Document2 pages
The End of Theory - The Data Deluge Makes The Scientific Method Obsolete
claudio castro
Pas encore d'évaluation
Against AI Understanding and Sentience-Large Language Models, Meaning, and The-Durt, Christoph, Froese, Tom, Fuchs, Thomas 2023 LLMs
Document15 pages
Against AI Understanding and Sentience-Large Language Models, Meaning, and The-Durt, Christoph, Froese, Tom, Fuchs, Thomas 2023 LLMs
veltakalmu
Pas encore d'évaluation
Summary of James Reason's Human Error
D'Everand
Summary of James Reason's Human Error
IRB Media
Pas encore d'évaluation
State University of New York at Oneonta: What Is Big Data and Why Is It Important? Harry E. Pence
Document17 pages
State University of New York at Oneonta: What Is Big Data and Why Is It Important? Harry E. Pence
Ruby Skatepunk
Pas encore d'évaluation
According To 3.5 Million Books, Women Are Beautiful and Men Are Rational
Document4 pages
According To 3.5 Million Books, Women Are Beautiful and Men Are Rational
Ngoc Anh Thi Lo
Pas encore d'évaluation
End of Theory
Document4 pages
End of Theory
Jacqueline Timmons Urrutia
Pas encore d'évaluation
Big Data: Are We Making A Big Mistake?
Document6 pages
Big Data: Are We Making A Big Mistake?
kakarito
Pas encore d'évaluation
Ci403 Unit Lesson Plans Final
Document16 pages
Ci403 Unit Lesson Plans Final
api-355737575
Pas encore d'évaluation
Research Log Sophia
Document7 pages
Research Log Sophia
api-613123769
Pas encore d'évaluation
Literasi Bahasa Inggris
Document19 pages
Literasi Bahasa Inggris
Irfan Septiyan
Pas encore d'évaluation
Analyzing Big Data With Benford's Law: A Lesson For The Classroom
Document10 pages
Analyzing Big Data With Benford's Law: A Lesson For The Classroom
Ulises
Pas encore d'évaluation
Leitura Gates Jobs
Document14 pages
Leitura Gates Jobs
Livia F. Pimentel
Pas encore d'évaluation
ALFANO - Extended Knowledge - The Recognition Heuristic and Epistemic Injustice
Document23 pages
ALFANO - Extended Knowledge - The Recognition Heuristic and Epistemic Injustice
GONZALO VELASCO ARIAS
Pas encore d'évaluation
Worksheet in GE Math: #5 Nature of Mathematics 1.3 Mathematics For Our World
Document4 pages
Worksheet in GE Math: #5 Nature of Mathematics 1.3 Mathematics For Our World
AMOLAR, ROSSEL JOY C.
Pas encore d'évaluation
Causality DMKD - Ps
Document33 pages
Causality DMKD - Ps
Vivek Narayanan
Pas encore d'évaluation
Machine Coding of Events Data
Document43 pages
Machine Coding of Events Data
giani_2008
Pas encore d'évaluation
Research Paper em Portugues
Document8 pages
Research Paper em Portugues
liwas0didov3
100% (1)
10 Kinds of Stories To Tell With Data: by Tom Davenport
Document2 pages
10 Kinds of Stories To Tell With Data: by Tom Davenport
allandjae09
Pas encore d'évaluation
Ypes of Cryptography
Document7 pages
Ypes of Cryptography
kavas26
Pas encore d'évaluation
New Rules For Big Data
Document3 pages
New Rules For Big Data
anniestyl
Pas encore d'évaluation
10 Kinds of Stories To Tell With Data
Document5 pages
10 Kinds of Stories To Tell With Data
Daniel Softjoy Montjoy
Pas encore d'évaluation
Global Climate Change
Document9 pages
Global Climate Change
api-486527719
100% (1)
"I'm Sorry Dave, I'm Afraid I Can't Do That": Linguistics, Statistics, and Natural Language Processing Circa 2001
Document6 pages
"I'm Sorry Dave, I'm Afraid I Can't Do That": Linguistics, Statistics, and Natural Language Processing Circa 2001
ramlohani
Pas encore d'évaluation
Shift in Big Data - All Inclusive
Document2 pages
Shift in Big Data - All Inclusive
Ruchit Shrestha
Pas encore d'évaluation
Project #1
Document5 pages
Project #1
ANWAR SHAMIM
Pas encore d'évaluation
Nadezhda Purtova - Propriedade Rdireitos em Pessoal Data
Document18 pages
Nadezhda Purtova - Propriedade Rdireitos em Pessoal Data
Jorge Pedroso
Pas encore d'évaluation
Artificial Intelligence: Understanding Business Applications, Automation, and the Job Market
D'Everand
Artificial Intelligence: Understanding Business Applications, Automation, and the Job Market
John Adamssen
Pas encore d'évaluation
Mastering Clojure Data Analysis Sample Chapter
Document28 pages
Mastering Clojure Data Analysis Sample Chapter
Packt Publishing
Pas encore d'évaluation
Who Owns Big Data?
Document16 pages
Who Owns Big Data?
Claiton Andrade Jr
Pas encore d'évaluation
Information Processing Models Benefits and Limitat
Document8 pages
Information Processing Models Benefits and Limitat
Liow Zhi Xian
Pas encore d'évaluation
Anonymizing Personal Data 'Not Enough To Protect Privacy,' Shows New Study - ScienceDaily
Document3 pages
Anonymizing Personal Data 'Not Enough To Protect Privacy,' Shows New Study - ScienceDaily
Raveena Pushpa Chandan
Pas encore d'évaluation
The Use of Data in Network Security
Document5 pages
The Use of Data in Network Security
Krotch
Pas encore d'évaluation
The Government-Academia Complex and Big Data Religion
Document8 pages
The Government-Academia Complex and Big Data Religion
Francisco Araújo
Pas encore d'évaluation
ChatGPT Broke The Turing Test
Document16 pages
ChatGPT Broke The Turing Test
аяи
Pas encore d'évaluation
Fake News Detection Using Machine Learning
Document6 pages
Fake News Detection Using Machine Learning
IJRASETPublications
Pas encore d'évaluation
Language and the Rise of the Algorithm
D'Everand
Language and the Rise of the Algorithm
Jeffrey M. Binder
Pas encore d'évaluation
Notes On Linear Algebra
Document9 pages
Notes On Linear Algebra
Ruchit Shrestha
Pas encore d'évaluation
Testing Documentation - Pig Game: Program Flow Test
Document2 pages
Testing Documentation - Pig Game: Program Flow Test
Ruchit Shrestha
Pas encore d'évaluation
AI Design For Pig Game
Document3 pages
AI Design For Pig Game
Ruchit Shrestha
Pas encore d'évaluation
Shift in Big Data - All Inclusive
Document2 pages
Shift in Big Data - All Inclusive
Ruchit Shrestha
Pas encore d'évaluation
What Is Enlightment For Kant
Document2 pages
What Is Enlightment For Kant
mattteostettler
Pas encore d'évaluation
Estadística Aplicada A La Geoquímica PDF
Document44 pages
Estadística Aplicada A La Geoquímica PDF
Angel Cubas Rivera
Pas encore d'évaluation
20 Things I Wish I Knew in My 20s
Document24 pages
20 Things I Wish I Knew in My 20s
Abinayha Sriram
Pas encore d'évaluation
WFDP Impact On Fishing Livelihood Final
Document2 pages
WFDP Impact On Fishing Livelihood Final
masskutch
Pas encore d'évaluation
Mounting Instruction - SNL30, SNL31 and SNL32 With Labyrinth Seals (TS) - EN 2008
Document2 pages
Mounting Instruction - SNL30, SNL31 and SNL32 With Labyrinth Seals (TS) - EN 2008
Jose Chambi
Pas encore d'évaluation
Debate Grading
Document2 pages
Debate Grading
Abdullah Mohammad
Pas encore d'évaluation
Delhi Region: Third Round Rank Cutoff (Academic Year 2022-23)
Document1 page
Delhi Region: Third Round Rank Cutoff (Academic Year 2022-23)
Ankit raj Singh
Pas encore d'évaluation
Mud Puddling in Tropical Butterflies.
Document9 pages
Mud Puddling in Tropical Butterflies.
Luvjoy Choker
Pas encore d'évaluation
Paper 134
Document8 pages
Paper 134
Minisoft2002
Pas encore d'évaluation
A Report ON: BY CHAITANYA KRISHNA.B (ID No: 215112021)
Document22 pages
A Report ON: BY CHAITANYA KRISHNA.B (ID No: 215112021)
mechatronicsmk
Pas encore d'évaluation
Lanotec U.P. Product Data Sheet 16
Document3 pages
Lanotec U.P. Product Data Sheet 16
kwakwa4
Pas encore d'évaluation
TCB History and Development Architectures
Document37 pages
TCB History and Development Architectures
Sadot Enrique Castillo Galan
Pas encore d'évaluation
Alan Fine - Towards A Peopled Ethnography Developing Theory From Group Life
Document20 pages
Alan Fine - Towards A Peopled Ethnography Developing Theory From Group Life
JoanManuelCabezas
Pas encore d'évaluation
Herbal Cosmetics Handbook - Chapter 'COSMETIC EMULSIONS'
Document54 pages
Herbal Cosmetics Handbook - Chapter 'COSMETIC EMULSIONS'
randatag
Pas encore d'évaluation
Annotated Bibliography
Document4 pages
Annotated Bibliography
ian lyons
Pas encore d'évaluation
D6977 04 (2016)
Document4 pages
D6977 04 (2016)
ertfgbdfb
Pas encore d'évaluation
Less 1
Document10 pages
Less 1
fatlum1
Pas encore d'évaluation
HP Certification Details
Document8 pages
HP Certification Details
raj_esh_0201
Pas encore d'évaluation
Sanctuary Magazine Issue 9 - The Good Life - Woodbridge, Tasmania Green Home Profile
Document4 pages
Sanctuary Magazine Issue 9 - The Good Life - Woodbridge, Tasmania Green Home Profile
Sanctuary Magazine
Pas encore d'évaluation
How Do Kids Learn To Spell - PART 2
Document10 pages
How Do Kids Learn To Spell - PART 2
Joelynn Khoo
Pas encore d'évaluation
Freud and The Occult
Document8 pages
Freud and The Occult
Ricardo Arribas
Pas encore d'évaluation
Some of The Questions That Floored Grade 6 Maths Teachers
Document3 pages
Some of The Questions That Floored Grade 6 Maths Teachers
CityPress
Pas encore d'évaluation
Research Paper Jurisprudence
Document17 pages
Research Paper Jurisprudence
Puja Kumari
100% (2)
Automation Lab Manual
Document56 pages
Automation Lab Manual
Athul Born
Pas encore d'évaluation
Draw Homework1
Document8 pages
Draw Homework1
qweqwe
Pas encore d'évaluation
Books: Ergonomic Workplace Design For
Document1 page
Books: Ergonomic Workplace Design For
Kalluru Teja Naidu
Pas encore d'évaluation