Vous êtes sur la page 1sur 16

Lecture 1: An Overview of Big Data Analysis Using

Python

Kevin Lee

Department of Statistics
Western Michigan University

August 28, 2019

Kevin Lee (WMU) Lecture 1 (8/28/2019) August 28, 2019 1 / 16


Outline

1 Overview of Big Data Analysis

2 Overview of Python

Kevin Lee (WMU) Lecture 1 (8/28/2019) August 28, 2019 2 / 16


Outline

1 Overview of Big Data Analysis

2 Overview of Python

Kevin Lee (WMU) Lecture 1 (8/28/2019) August 28, 2019 3 / 16


Big Data

“Between the dawn of civilization and 2003, we only created five


exabytes of information; now we’re creating that amount every two
days.”
– Eric Schmidt, CEO, Google

Kevin Lee (WMU) Lecture 1 (8/28/2019) August 28, 2019 4 / 16


Big Data

Kevin Lee (WMU) Lecture 1 (8/28/2019) August 28, 2019 5 / 16


Big Data

To find more about Big Data go to


https://whatsthebigdata.com/2013/07/25/
big-data-3-vs-volume-variety-velocity-infographic/.

Kevin Lee (WMU) Lecture 1 (8/28/2019) August 28, 2019 6 / 16


Data Analysis

“The ability to take data - to be able to understand it, to process it,


to extract value from it, to visualize it, to communicate it’s going
to be a hugely important skill in the next decades, not only at the
professional level but even at the educational level for elementary
school kids, for high school kids, for college kids. Because now we
really do have essentially free and ubiquitous data.”
– Hal Varian, Chief Economist, Google

Kevin Lee (WMU) Lecture 1 (8/28/2019) August 28, 2019 7 / 16


Data Analysis

Kevin Lee (WMU) Lecture 1 (8/28/2019) August 28, 2019 8 / 16


Outline

1 Overview of Big Data Analysis

2 Overview of Python

Kevin Lee (WMU) Lecture 1 (8/28/2019) August 28, 2019 9 / 16


Python as a Programming Language

Python is created by Guido van Rossum and first released in 1991.

Python is an interpreted, high-level, general-purpose programming


language.
Web development
Scientific computing
Data analytics
Etc.

Kevin Lee (WMU) Lecture 1 (8/28/2019) August 28, 2019 10 / 16


Python as a Data Analytics Tool

The Python environment makes it a perfect-fit for data analytics.


Simple and easy to learn
Has a lot of extensions and active community support
Open access to an extensive set of libraries (or packages)

Kevin Lee (WMU) Lecture 1 (8/28/2019) August 28, 2019 11 / 16


R vs. Python

R came from the statistics community as a programming language for


statistical computing and visualization.

Python came from the computer science community as a


general-purpose programming language.

Both released in the early 1990s.

Kevin Lee (WMU) Lecture 1 (8/28/2019) August 28, 2019 12 / 16


R vs. Python

When it comes to statistical analysis (modeling) and solving inference


problem, R may be the better option.

When it comes to machine learning and solving prediction problem,


Python may be the better option.

Choose the one that best fits your needs!

Kevin Lee (WMU) Lecture 1 (8/28/2019) August 28, 2019 13 / 16


Popular Python Data Analytics Libraries

Numerical & scientific computing: NumPy, SciPy

Data manipulation & aggregation: Pandas

Visualization: Matplotlib, Seaborn

Machine learning: SciKit-Learn

Deep learning: Keras, TensorFlow, Theano

Text mining: NLTK, Gensim

Kevin Lee (WMU) Lecture 1 (8/28/2019) August 28, 2019 14 / 16


Python

Kevin Lee (WMU) Lecture 1 (8/28/2019) August 28, 2019 15 / 16


How to Install Python?

In this course, we will use Spyder which is a free integrated


development environment (IDE) that is included in Anaconda.

Anaconda can be downloaded from


https://www.anaconda.com/download.

Download “Python Demo.py” file from Elearning and open the file
after launching Spyder.

Kevin Lee (WMU) Lecture 1 (8/28/2019) August 28, 2019 16 / 16

Vous aimerez peut-être aussi