Vous êtes sur la page 1sur 115

Introduction

to Python
Pandas for
Data
Analytics

Srijith
Rajamohan Introduction to Python Pandas for Data
Introduction Analytics
to Python

Python
programming

NumPy
Srijith Rajamohan
Matplotlib
Advanced Research Computing, Virginia Tech
Introduction
to Pandas

Case study
Tuesday 19th July, 2016
Conclusion

1 / 115
Course Contents

Introduction
to Python
Pandas for
Data
Analytics
This week:
Srijith
Rajamohan Introduction to Python
Introduction Python Programming
to Python

Python NumPy
programming
Plotting with Matplotlib
NumPy

Matplotlib Introduction to Python Pandas


Introduction
to Pandas
Case study
Case study Conclusion
Conclusion

2 / 115
Section 1

Introduction
to Python
Pandas for
Data 1 Introduction to Python
Analytics

Srijith
Rajamohan 2 Python programming
Introduction
to Python 3 NumPy
Python
programming
4 Matplotlib
NumPy

Matplotlib
5 Introduction to Pandas
Introduction
to Pandas

Case study
6 Case study
Conclusion
7 Conclusion

3 / 115
Python Features

Introduction
to Python
Pandas for
Data
Analytics

Srijith
Rajamohan
Why Python ?

Introduction
Interpreted
to Python
Intuitive and minimalistic code
Python
programming
Expressive language
NumPy
Dynamically typed
Matplotlib

Introduction Automatic memory management


to Pandas

Case study

Conclusion

4 / 115
Python Features

Introduction
to Python
Pandas for
Data
Analytics
Advantages
Srijith Ease of programming
Rajamohan
Minimizes the time to develop and maintain code
Introduction
to Python Modular and object-oriented
Python
programming Large community of users
NumPy A large standard and user-contributed library
Matplotlib

Introduction
Disadvantages
to Pandas

Case study
Interpreted and therefore slower than compiled languages
Conclusion Decentralized with packages

5 / 115
Code Performance vs Development Time

Introduction
to Python
Pandas for
Data
Analytics

Srijith
Rajamohan

Introduction
to Python

Python
programming

NumPy

Matplotlib

Introduction
to Pandas

Case study

Conclusion

6 / 115
Versions of Python

Introduction
to Python
Pandas for
Data
Analytics

Srijith Two versions of Python in use - Python 2 and Python 3


Rajamohan
Python 3 not backward-compatible with Python 2
Introduction
to Python A lot of packages are available for Python 2
Python Check version using the following command
programming

NumPy

Matplotlib
Example
Introduction
to Pandas $ python -- version
Case study

Conclusion

7 / 115
Section 2

Introduction
to Python
Pandas for
Data 1 Introduction to Python
Analytics

Srijith
Rajamohan 2 Python programming
Introduction
to Python 3 NumPy
Python
programming
4 Matplotlib
NumPy

Matplotlib
5 Introduction to Pandas
Introduction
to Pandas

Case study
6 Case study
Conclusion
7 Conclusion

8 / 115
Variables

Introduction
to Python
Pandas for
Data
Analytics Variable names can contain alphanumerical characters and
Srijith some special characters
Rajamohan
It is common to have variable names start with a
Introduction
to Python lower-case letter and class names start with a capital letter
Python
programming
Some keywords are reserved such as and, assert,
NumPy
break, lambda. A list of keywords are located at
Matplotlib https://docs.python.org/2.5/ref/keywords.html
Introduction Python is dynamically typed, the type of the variable is
to Pandas

Case study
derived from the value it is assigned.
Conclusion A variable is assigned using the = operator

9 / 115
Variable types

Introduction
to Python
Pandas for
Data
Analytics Variable types
Srijith
Rajamohan
Integer (int)
Float (float)
Introduction Boolean (bool)
to Python
Complex (complex)
Python
programming String (str)
NumPy ...
Matplotlib User Defined! (classes)
Introduction Documentation
to Pandas
https://docs.python.org/2/library/types.html
Case study
https://docs.python.org/2/library/datatypes.html
Conclusion

10 / 115
Variable types

Introduction
to Python
Pandas for
Data
Analytics

Srijith Use the type function to determine variable type


Rajamohan

Introduction
to Python
Example
Python
programming >>> log_file = open ( " / home / srijithr /
NumPy logfile " ," r " )
Matplotlib >>> type ( log_file )
Introduction file
to Pandas

Case study

Conclusion

11 / 115
Variable types

Introduction
to Python
Pandas for
Data
Analytics
Variables can be cast to a different type
Srijith
Rajamohan
Example
Introduction
to Python

Python
>>> share_of_rent = 295.50 / 2.0
programming >>> type ( share_of_rent )
NumPy
float
Matplotlib
>>> rounded_share = int ( share_of_rent )
Introduction
to Pandas >>> type ( rounded_share )
Case study int
Conclusion

12 / 115
Operators

Introduction
to Python
Pandas for
Data
Analytics

Srijith
Rajamohan
Arithmetic operators +, -, *, /, // (integer division for
Introduction
to Python floating point numbers), ** power
Python Boolean operators and, or and not
programming

NumPy Comparison operators >, <, >= (greater or equal), <=


Matplotlib (less or equal), == equality
Introduction
to Pandas

Case study

Conclusion

13 / 115
Strings (str)

Introduction
to Python
Pandas for
Example
Data
Analytics
>>> dir ( str )
Srijith
Rajamohan [... , capitalize , center , count ,
decode , encode , endswith ,
Introduction
to Python expandtabs , find , format , index ,
Python isalnum , isalpha , isdigit ,
programming

NumPy
islower , isspace , istitle ,
Matplotlib
isupper , join , ljust , lower ,
Introduction
lstrip , partition , replace , rfind
to Pandas
, rindex , rjust , rpartition ,
Case study
rsplit , rstrip , split , splitlines
Conclusion
, startswith , strip , swapcase ,
title , translate , upper , zfill ]

14 / 115
Strings

Introduction
to Python
Pandas for
Data
Analytics Example
Srijith
Rajamohan >>> greeting = " Hello world ! "
Introduction
>>> len ( greeting )
to Python 12
Python
programming
>>> greeting
NumPy Hello world
Matplotlib >>> greeting [0] # indexing starts at 0
Introduction H
to Pandas
>>> greeting . replace ( " world " , " test " )
Case study
Hello test !
Conclusion

15 / 115
Printing strings

Introduction
to Python Example
Pandas for
Data
Analytics # concatenates strings with a space
Srijith
Rajamohan
>>> print ( " Go " , " Hokies " )
Go Hokies
Introduction
to Python
# concatenated without space
Python >>> print ( " Go " + " Tech " + " Go " )
programming
GoTechGo
NumPy
# C - style string formatting
Matplotlib
>>> print ( " Bar Tab = % f " %35.28)
Introduction
to Pandas Bar Tab = 35.280000
Case study # Creating a formatted string
Conclusion >>> total = " My Share = %.2 f . Tip = % d " %
(11.76 , 2.352)
>>> print ( total )
My Share = 11.76. Tip = 2 16 / 115
Lists

Introduction
to Python
Pandas for
Data
Analytics Array of elements of arbitrary type
Srijith
Rajamohan Example
Introduction
to Python >>> numbers = [1 ,2 ,3]
Python
programming
>>> type ( numbers )
NumPy
list
Matplotlib
>>> arbitrary_array = [1 , numbers , " hello " ]
Introduction >>> type ( arbitrary_array )
to Pandas
list
Case study

Conclusion

17 / 115
Lists

Introduction
to Python
Pandas for
Data
Analytics Example
Srijith
Rajamohan
# create a new empty list
Introduction >>> characters = []
to Python

Python
# add elements using append
programming >>> characters . append ( " A " )
NumPy >>> characters . append ( " d " )
Matplotlib
>>> characters . append ( " d " )
Introduction
to Pandas >>> print ( characters )
Case study [ A , d , d ]
Conclusion

18 / 115
Lists

Introduction
to Python
Pandas for
Data
Analytics Lists are mutable - their values can be changed.
Srijith
Rajamohan Example
Introduction
to Python >>> characters = [ " A " ," d " ," d " ]
Python
programming
# Changing second and third element
NumPy
>>> characters [1] = " p "
Matplotlib
>>> characters [2] = " p "
Introduction >>> print ( characters )
to Pandas
[ A , p , p ]
Case study

Conclusion

19 / 115
Lists

Introduction
to Python
Pandas for Example
Data
Analytics

Srijith >>> characters = [ " A " ," d " ," d " ]


Rajamohan
# Inserting before " A " ," d " ," d "
Introduction >>> characters . insert (0 , " i " )
to Python
>>> characters . insert (1 , " n " )
Python
programming >>> characters . insert (2 , " s " )
NumPy >>> characters . insert (3 , " e " )
Matplotlib >>> characters . insert (4 , " r " )
Introduction
to Pandas
>>> characters . insert (5 , " t " )
Case study
>>> print ( characters )
Conclusion
[ i , n , s , e , r , t , A , d ,
d ]

20 / 115
Lists

Introduction
to Python
Pandas for Example
Data
Analytics

Srijith >>> characters = [ i , n , s , e , r ,


Rajamohan
t , A , d , d ]
Introduction # Remove first occurrence of " A " from list
to Python
>>> characters . remove ( " A " )
Python
programming >>> print ( characters )
NumPy [ i , n , s , e , r , t , d , d ]
Matplotlib # Remove an element at a specific location
Introduction
to Pandas
>>> del characters [7]
Case study
>>> del characters [6]
Conclusion
>>> print ( characters )
[ i , n , s , e , r , t ]

21 / 115
Tuples

Introduction
to Python
Tuples are like lists except they are immutable. Difference is in
Pandas for
Data
performance
Analytics

Srijith
Example
Rajamohan

Introduction
>>> point = (10 , 20) # Note () for tuples
to Python instead of []
Python
programming
>>> type ( point )
NumPy
tuple
Matplotlib
>>> point = 10 ,20
Introduction >>> type ( point )
to Pandas
tuple
Case study
>>> point [2] = 40 # This will fail !
Conclusion
TypeError : tuple object does not support
item assignment

22 / 115
Dictionary

Introduction
to Python
Pandas for
Dictionaries are lists of key-value pairs
Data
Analytics Example
Srijith
Rajamohan
>>> prices = { " Eggs " : 2.30 ,
Introduction ... " Sausage " : 4.15 ,
to Python
... " Spam " : 1.59 ,}
Python
programming >>> type ( prices )
NumPy dict
Matplotlib >>> print ( prices )
Introduction
to Pandas
{ Eggs : 2.3 , Sausage : 4.15 , Spam :
Case study 1.59}
Conclusion >>> prices [ " Spam " ]
1.59

23 / 115
Conditional statements: if, elif, else

Introduction
to Python
Pandas for Example
Data
Analytics

Srijith >>> I_am_tired = False


Rajamohan
>>> I_am_hungry = True
Introduction >>> if I_am_tired is True : # Note the
to Python
colon for a code block
Python
programming ... print ( " You have to teach ! " )
NumPy ... elif I_am_hungry is True :
Matplotlib ... print ( " No food for you ! " )
Introduction
to Pandas
... else :
Case study
... print " Go on ...! "
Conclusion
...
No food for you !

24 / 115
Loops - For

Introduction
to Python Example
Pandas for
Data
Analytics >>> for i in [1 ,2 ,3]: # i is an arbitrary
Srijith
Rajamohan
variable for use within the loop
section
Introduction
to Python
... print ( i )
Python 1
programming
2
NumPy
3
Matplotlib
>>> for word in [ " scientific " , " computing "
Introduction
to Pandas , " with " , " python " ]:
Case study ... print ( word )
Conclusion scientific
computing
with
python 25 / 115
Loops - While

Introduction
to Python
Pandas for
Data
Analytics Example
Srijith
Rajamohan >>>i = 0
Introduction
>>> while i < 5:
to Python ... print ( i )
Python
programming
... i = i + 1
NumPy 0
Matplotlib 1
Introduction 2
to Pandas
3
Case study
4
Conclusion

26 / 115
Functions

Introduction
to Python
Pandas for
Data
Analytics Example
Srijith
Rajamohan >>> def pr int_wo rd_len gth ( word ) :
Introduction
... """
to Python ... Print a word and how many
Python
programming
characters it has
NumPy ... """
Matplotlib ... print ( word + " has " + str ( len (
Introduction word ) ) + " characters . " )
to Pandas
>>> print_wo rd_len gth ( " Diversity " )
Case study
Diversity has 9 characters .
Conclusion

27 / 115
Functions - arguments

Introduction
to Python
Pandas for
Data
Analytics

Srijith
Rajamohan
Passing immutable arguments like integers, strings or
Introduction
to Python tuples acts like call-by-value
Python They cannot be modified!
programming

NumPy
Passing mutable arguments like lists behaves like
Matplotlib
call-by-reference
Introduction
to Pandas

Case study

Conclusion

28 / 115
Functions - arguments

Introduction
to Python
Pandas for
Data
Analytics Call-by-value
Srijith
Rajamohan Example
Introduction
to Python >>> def make_me_rich ( balance ) :
Python
programming
balance = 1000000
NumPy
account_balance = 500
Matplotlib
>>> make_me_rich ( account_balance )
Introduction >>> print ( account_balance )
to Pandas
500
Case study

Conclusion

29 / 115
Functions - arguments

Introduction
to Python
Pandas for
Call-by-reference
Data
Analytics Example
Srijith
Rajamohan
>>> def talk_to_advisor ( tasks ) :
Introduction tasks . insert (0 , " Publish " )
to Python
tasks . insert (1 , " Publish " )
Python
programming tasks . insert (2 , " Publish " )
NumPy >>> todos = [ " Graduate " ," Get a job " ," ... " ,
Matplotlib " Profit ! " ]
Introduction
to Pandas
>>> talk_to_advisor ( todos )
Case study >>> print ( todos )
Conclusion [ " Publish " ," Publish " ," Publish " ," Graduate "
," Get a job " ," ... " ," Profit ! " ]

30 / 115
Functions - arguments

Introduction
to Python
However, you cannot assign a new object to the argument
Pandas for A new memory location is created for this list
Data
Analytics This becomes a local variable
Srijith
Rajamohan Example
Introduction
to Python
>>> def switcheroo ( favorite_teams ) :
Python ... print ( favorite_teams )
programming
... favorite_teams = [ " Redskins " ]
NumPy
... print ( favorite_teams )
Matplotlib
>>> my_favor ite_te ams = [ " Hokies " , "
Introduction
to Pandas Nittany Lions " ]
Case study >>> switcheroo ( my _favor ite_te ams )
Conclusion [ " Hokies " , " Nittany Lions " ]
[ " Redskins " ]
>>> print ( my _favor ite_te ams )
[ " Hokies " , " Nittany Lions " ] 31 / 115
Functions - Multiple Return Values

Introduction
to Python
Pandas for
Data
Analytics

Srijith
Example
Rajamohan

>>> def powers ( number ) :


Introduction
to Python ... return number ** 2 , number ** 3
Python >>> squared , cubed = powers (3)
programming

NumPy
>>> print ( squared )
Matplotlib
9
Introduction
>>> print ( cubed )
to Pandas
27
Case study

Conclusion

32 / 115
Functions - Default Values

Introduction
to Python
Pandas for
Data Example
Analytics

Srijith >>> def likes_food ( person , food = " Broccoli "


Rajamohan
, likes = True ) :
Introduction
to Python ... if likes :
Python ... print ( str ( person ) + " likes "
programming
+ food )
NumPy
... else :
Matplotlib
... print ( str ( person ) + " does not
Introduction
to Pandas like " + food )
Case study >>> likes_food ( " Srijith " , likes = False )
Conclusion Srijith does not like Broccoli

33 / 115
Section 3

Introduction
to Python
Pandas for
Data 1 Introduction to Python
Analytics

Srijith
Rajamohan 2 Python programming
Introduction
to Python 3 NumPy
Python
programming
4 Matplotlib
NumPy

Matplotlib
5 Introduction to Pandas
Introduction
to Pandas

Case study
6 Case study
Conclusion
7 Conclusion

34 / 115
NumPy

Introduction
to Python
Pandas for
Data
Analytics

Srijith
Rajamohan
Used in almost all numerical computations in Python
Introduction
to Python
Used for high-performance vector and matrix computations
Python Provides fast precompiled functions for numerical routines
programming

NumPy
Written in C and Fortran
Matplotlib Vectorized computations
Introduction
to Pandas

Case study

Conclusion

35 / 115
Why NumPy?

Introduction
to Python Example
Pandas for
Data
Analytics >>> from numpy import *
Srijith
Rajamohan
>>> import time
>>> def trad_version () :
Introduction
to Python
t1 = time . time ()
Python X = range (10000000)
programming
Y = range (10000000)
NumPy
Z = []
Matplotlib
for i in range ( len ( X ) ) :
Introduction
to Pandas Z . append ( X [ i ] + Y [ i ])
Case study return time . time () - t1
Conclusion

>>> trad_version ()
1.9738149642 94 43 36
36 / 115
Why NumPy?

Introduction
to Python
Pandas for
Data
Analytics Example
Srijith
Rajamohan >>> def numpy_version () :
Introduction
t1 = time . time ()
to Python X = arange (10000000)
Python
programming
Y = arange (10000000)
NumPy Z = X + Y
Matplotlib return time . time () - t1
Introduction
to Pandas
>>> numpy_version ()
Case study
0.05930709 8 3 8 8 6 7 1 8 7 5
Conclusion

37 / 115
Arrays

Introduction
to Python
Pandas for
Data
Analytics Example
Srijith
Rajamohan >>> from numpy import *
Introduction
# the argument to the array function is a
to Python Python list
Python
programming
>>> v = array ([1 ,2 ,3 ,4])
NumPy # the argument to the array function is a
Matplotlib nested Python list
Introduction >>> M = array ([[1 , 2] , [3 , 4]])
to Pandas
>>> type ( v ) , type ( M )
Case study
( numpy . ndarray , numpy . ndarray )
Conclusion

38 / 115
Arrays

Introduction
to Python
Pandas for
Data
Analytics Example
Srijith
Rajamohan >>> v . shape , M . shape
Introduction
((4 ,) , (2 , 2) )
to Python >>> M . size
Python
programming
4
NumPy >>> M . dtype
Matplotlib dtype ( int64 )
Introduction # Explicitly define the type of the array
to Pandas
>>> M = array ([[1 , 2] , [3 , 4]] , dtype =
Case study
complex )
Conclusion

39 / 115
Arrays - Using array-generating functions

Introduction
to Python
Pandas for
Data
Analytics Example
Srijith
Rajamohan
>>> x = arange (0 , 10 , 1) # arguments :
Introduction start , stop , step
to Python

Python
array ([0 , 1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 , 9])
programming >>> linspace (0 ,10 ,11) # arguments : start ,
NumPy end and number of points ( start and
Matplotlib
end points are included )
Introduction
to Pandas array ([ 0. , 1. , 2. , 3. , 4. , 5. ,
Case study 6. , 7. , 8. , 9. , 10.])
Conclusion

40 / 115
Diagonal and Zero matrix

Introduction
to Python
Pandas for
Data
Analytics Example
Srijith
Rajamohan
>>> diag ([1 ,2 ,3])
Introduction array ([[1 , 0 , 0] ,
to Python

Python
[0 , 2 , 0] ,
programming [0 , 0 , 3]])
NumPy >>> zeros ((3 ,3) )
Matplotlib
array ([[ 0. , 0. , 0.] ,
Introduction
to Pandas [ 0. , 0. , 0.] ,
Case study [ 0. , 0. , 0.]])
Conclusion

41 / 115
Array Access

Introduction
to Python
Pandas for
Data
Analytics Example
Srijith
Rajamohan
>>> M = random . rand (3 ,3)
Introduction >>> M
to Python

Python
array ([
programming [ 0.37389376 , 0.64335721 , 0.12435669] ,
NumPy [ 0.01444674 , 0.13963834 , 0.36263224] ,
Matplotlib
[ 0.00661902 , 0.14865659 , 0.75066302]])
Introduction
to Pandas >>> M [1 ,1]
Case study 0.13963834 21 4 7 55 5 8 8
Conclusion

42 / 115
Array Access

Introduction
to Python Example
Pandas for
Data
Analytics # Access the first row
Srijith
Rajamohan
>>> M [1]
array (
Introduction
to Python
[ 0.01444674 , 0.13963834 , 0.36263224])
Python # The first row can be also be accessed
programming
using this notation
NumPy
>>> M [1 ,:]
Matplotlib
array (
Introduction
to Pandas [ 0.01444674 , 0.13963834 , 0.36263224])
Case study # Access the first column
Conclusion >>> M [: ,1]
array (
[ 0.64335721 , 0.13963834 , 0.14865659])
43 / 115
Array Access

Introduction
to Python
Pandas for
Data
Analytics Example
Srijith
Rajamohan
# You can also assign values to an entire
Introduction row or column
to Python

Python
>>> M [1 ,:] = 0
programming >>> M
NumPy array ([
Matplotlib
[ 0.37389376 , 0.64335721 , 0.12435669] ,
Introduction
to Pandas [ 0. , 0. , 0. ],
Case study [ 0.00661902 , 0.14865659 , 0.75066302]])
Conclusion

44 / 115
Array Slicing

Introduction
to Python
Pandas for
Data
Analytics Example
Srijith
Rajamohan # Extract slices of an array
Introduction
>>> M [1:3]
to Python array ([
Python
programming
[ 0. , 0. , 0. ],
NumPy [ 0.00661902 , 0.14865659 , 0.75066302]])
Matplotlib >>> M [1:3 ,1:2]
Introduction array ([
to Pandas
[ 0. ],
Case study
[ 0.14865659]])
Conclusion

45 / 115
Array Slicing - Negative Indexing

Introduction
to Python
Pandas for
Data
Analytics Example
Srijith
Rajamohan
# Negative indices start counting from the
Introduction end of the array
to Python

Python
>>> M [ -2]
programming array (
NumPy [ 0. , 0. , 0.])
Matplotlib
>>> M [ -1]
Introduction
to Pandas array (
Case study [ 0.00661902 , 0.14865659 , 0.75066302])
Conclusion

46 / 115
Array Access - Strided Access

Introduction
to Python
Pandas for
Data
Analytics

Srijith
Rajamohan
Example
Introduction
to Python # Strided access
Python
programming
>>> M [::2 ,::2]
NumPy array ([[ 0.37389376 , 0.12435669] ,
Matplotlib [ 0.00661902 , 0.75066302]])
Introduction
to Pandas

Case study

Conclusion

47 / 115
Array Operations - Scalar

Introduction
to Python
Pandas for
These operation are applied to all the elements in the array
Data
Analytics Example
Srijith
Rajamohan
>>> M *2
Introduction array ([
to Python
[ 0.74778752 , 1.28671443 , 0.24871338] ,
Python
programming [ 0. , 0. , 0. ],
NumPy [ 0.01323804 , 0.29731317 , 1.50132603]])
Matplotlib >>> M + 2
Introduction
to Pandas
array ([
Case study [ 2.37389376 , 2.64335721 , 2.12435669] ,
Conclusion [ 2. , 2. , 2. ],
[ 2.00661902 , 2.14865659 , 2.75066302]])

48 / 115
Matrix multiplication

Introduction
to Python
Pandas for
Data Example
Analytics

Srijith >>> M * M # Element - wise multiplication


Rajamohan
array ([
Introduction
to Python [1.397965 e -01 ,4.139085 e -01 ,1.546458 e -02] ,
Python [0.000000 e +00 ,0.000000 e +00 ,0.00000 e +00] ,
programming
[4.381141 e -05 ,2.209878 e -02 ,5.634949 e -01]])
NumPy
>>> dot (M , M ) # Matrix multiplication
Matplotlib
array ([
Introduction
to Pandas [ 0.14061966 , 0.25903369 , 0.13984616] ,
Case study [ 0. , 0. , 0. ],
Conclusion [ 0.00744346 , 0.1158494 , 0.56431808]])

49 / 115
Iterating over Array Elements

Introduction
to Python
Pandas for
Data
Analytics

Srijith
Rajamohan In general, avoid iteration over elements
Introduction Iterating is slow compared to a vector operation
to Python
If you must, use the for loop
Python
programming In order to enable vectorization, ensure that user-written
NumPy functions can work with vector inputs.
Matplotlib Use the vectorize function
Introduction Use the any or all function with arrays
to Pandas

Case study

Conclusion

50 / 115
Vectorize

Introduction
to Python Example
Pandas for
Data
Analytics >>> def Theta ( x ) :
Srijith
Rajamohan
... """
... Scalar implemenation of the
Introduction
to Python
Heaviside step function .
Python ... """
programming
... if x >= 0:
NumPy
... return 1
Matplotlib
... else :
Introduction
to Pandas ... return 0
Case study ...
Conclusion >>> Theta (1.0)
1
>>> Theta ( -1.0)
0 51 / 115
Vectorize

Introduction
to Python
Pandas for
Data Without vectorize we would not be able to pass v to the
Analytics
function
Srijith
Rajamohan
Example
Introduction
to Python >>> v
Python
programming
array ([1 , 2 , 3 , 4])
NumPy >>> Tvec = vectorize ( Theta )
Matplotlib >>> Tvec ( v )
Introduction array ([1 , 1 , 1 , 1])
to Pandas
>>> Tvec (1.0)
Case study
array (1)
Conclusion

52 / 115
Arrays in conditions

Introduction
to Python
Pandas for
Data
Analytics Use the any or all functions associated with arrays
Srijith
Rajamohan Example
Introduction
to Python >>> v
Python
programming
array ([1 , 2 , 3 , 4])
NumPy
>>> ( v > 3) . any ()
Matplotlib True
Introduction >>> ( v > 3) . all ()
to Pandas
False
Case study

Conclusion

53 / 115
Section 4

Introduction
to Python
Pandas for
Data 1 Introduction to Python
Analytics

Srijith
Rajamohan 2 Python programming
Introduction
to Python 3 NumPy
Python
programming
4 Matplotlib
NumPy

Matplotlib
5 Introduction to Pandas
Introduction
to Pandas

Case study
6 Case study
Conclusion
7 Conclusion

54 / 115
Matplotlib

Introduction
to Python
Pandas for
Data
Analytics

Srijith
Rajamohan

Introduction
Used for generating 2D and 3D scientific plots
to Python
Support for LaTeX
Python
programming Fine-grained control over every aspect
NumPy
Many output file formats including PNG, PDF, SVG, EPS
Matplotlib

Introduction
to Pandas

Case study

Conclusion

55 / 115
Matplotlib - Customize matplotlibrc

Introduction
to Python
Pandas for
Data
Analytics

Srijith Configuration file matplotlibrc used to customize almost


Rajamohan
every aspect of plotting
Introduction
to Python
On Linux, it looks in .config/matplotlib/matplotlibrc
Python On other platforms, it looks in .matplotlib/matplotlibrc
programming

NumPy
Use matplotlib.matplotlib fname() to determine
Matplotlib from where the current matplotlibrc is loaded
Introduction Customization options can be found at
to Pandas

Case study
http://matplotlib.org/users/customizing.html
Conclusion

56 / 115
Matplotlib

Introduction
to Python
Pandas for
Data
Analytics

Srijith
Rajamohan
Matplotlib is the entire library
Introduction Pyplot - a module within Matplotlib that provides access
to Python

Python
to the underlying plotting library
programming
Pylab - a convenience module that combines the
NumPy
functionality of Pyplot with Numpy
Matplotlib

Introduction
Pylab interface convenient for interactive plotting
to Pandas

Case study

Conclusion

57 / 115
Pylab

Introduction
to Python
Pandas for
Example
Data
Analytics
>>> import pylab as pl
Srijith
Rajamohan >>> pl . ioff ()
>>> pl . isinteractive ()
Introduction
to Python False
Python >>> x = [1 ,3 ,7]
programming

NumPy
>>> pl . plot ( x ) # if interactive mode is
Matplotlib
off use show () after the plot command
Introduction
[ < matplotlib . lines . Line2D object at 0
to Pandas
x10437a190 >]
Case study
>>> pl . savefig ( fig_test . pdf , dpi =600 ,
Conclusion
format = pdf )
>>> pl . show ()

58 / 115
Pylab

Introduction
to Python
Pandas for
Data
Analytics Simple Pylab plot
7

Srijith
Rajamohan 6

Introduction
to Python 5

Python
programming 4

NumPy
3
Matplotlib

Introduction
to Pandas 2

Case study
1
Conclusion 0.0 0.5 1.0 1.5 2.0

59 / 115
Pylab

Introduction
to Python Example
Pandas for
Data
Analytics >>> X = np . linspace ( - np . pi , np . pi , 256 ,
Srijith
Rajamohan
endpoint = True )
>>> C , S = np . cos ( X ) , np . sin ( X )
Introduction
to Python
# Plot cosine with a blue continuous line
Python of width 1 ( pixels )
programming
>>> pl . plot (X , C , color = " blue " , linewidth
NumPy
=1.0 , linestyle = " -" )
Matplotlib
>>> pl . xlabel ( " X " ) ; pl . ylabel ( " Y " )
Introduction
to Pandas >>> pl . title ( " Sine and Cosine waves " )
Case study # Plot sine with a green continuous line
Conclusion of width 1 ( pixels )
>>> pl . plot (X , S , color = " green " , linewidth
=1.0 , linestyle = " -" )
>>> pl . show () 60 / 115
Pylab

Introduction
to Python
Pandas for
Data
Analytics 1.0 Sine and Cosine waves
Srijith
Rajamohan

0.5
Introduction
to Python

Python
programming 0.0
Y

NumPy

Matplotlib
0.5
Introduction
to Pandas

Case study
1.0
Conclusion 4 3 2 1 0 1 2 3 4
X

61 / 115
Pylab - subplots

Introduction
to Python
Pandas for
Data
Analytics Example
Srijith
Rajamohan >>> pl . figure ( figsize =(8 , 6) , dpi =80)
Introduction
>>> pl . subplot (1 , 2 , 1)
to Python >>> C , S = np . cos ( X ) , np . sin ( X )
Python
programming
>>> pl . plot (X , C , color = " blue " , linewidth
NumPy =1.0 , linestyle = " -" )
Matplotlib >>> pl . subplot (1 , 2 , 2)
Introduction >>> pl . plot (X , S , color = " green " , linewidth
to Pandas
=1.0 , linestyle = " -" )
Case study
>>> pl . show ()
Conclusion

62 / 115
Pylab - subplots

Introduction
to Python
Pandas for
Data
Analytics 1.0 1.0
Srijith
Rajamohan

0.5 0.5
Introduction
to Python

Python
programming 0.0 0.0
NumPy

Matplotlib
0.5 0.5
Introduction
to Pandas

Case study
1.0 1.0
Conclusion 4 3 2 1 0 1 2 3 4 4 3 2 1 0 1 2 3 4

63 / 115
Pyplot

Introduction
to Python Example
Pandas for
Data
Analytics >>> import matplotlib . pyplot as plt
Srijith
Rajamohan
>>> plt . isinteractive ()
False
Introduction
to Python
>>>x = np . linspace (0 , 3* np . pi , 500)
Python >>> plt . plot (x , np . sin ( x **2) )
programming
[ < matplotlib . lines . Line2D object at 0
NumPy
x104bf2b10 >]
Matplotlib
>>> plt . title ( Pyplot plot )
Introduction
to Pandas < matplotlib . text . Text object at 0
Case study x104be4450 >
Conclusion >>> savefig ( fig_test_pyplot . pdf , dpi =600 ,
format = pdf )
>>> plt . show ()
64 / 115
Pyplot

Introduction
to Python
Pandas for
Data
Analytics Pyplot plot
1.0

Srijith
Rajamohan

0.5
Introduction
to Python

Python
programming 0.0

NumPy

Matplotlib
0.5
Introduction
to Pandas

Case study
1.0
Conclusion 0 2 4 6 8 10

65 / 115
Pyplot - legend

Introduction
to Python
Pandas for
Data
Analytics Example
Srijith
Rajamohan >>> import matplotlib . pyplot as plt
Introduction
>>> line_up , = plt . plot ([1 ,2 ,3] , label =
to Python Line 2 )
Python
programming
>>> line_down , = plt . plot ([3 ,2 ,1] , label =
NumPy Line 1 )
Matplotlib >>> plt . legend ( handles =[ line_up , line_down
Introduction ])
to Pandas
< matplotlib . legend . Legend at 0 x1084cc950 >
Case study
>>> plt . show ()
Conclusion

66 / 115
Pyplot - legend

Introduction
to Python
Pandas for
Data
Analytics 3.0
Line 2
Srijith
Rajamohan
Line 1
2.5
Introduction
to Python

Python
programming 2.0
NumPy

Matplotlib
1.5
Introduction
to Pandas

Case study
1.0
Conclusion 0.0 0.5 1.0 1.5 2.0

67 / 115
Pyplot - 3D plots

Introduction
to Python
Pandas for
Data Surface plots
Analytics

Srijith
Rajamohan

Introduction
to Python

Python
programming

NumPy

Matplotlib

Introduction
to Pandas

Case study

Conclusion
Visit http://matplotlib.org/gallery.html for a gallery of
plots produced by Matplotlib

68 / 115
Section 5

Introduction
to Python
Pandas for
Data 1 Introduction to Python
Analytics

Srijith
Rajamohan 2 Python programming
Introduction
to Python 3 NumPy
Python
programming
4 Matplotlib
NumPy

Matplotlib
5 Introduction to Pandas
Introduction
to Pandas

Case study
6 Case study
Conclusion
7 Conclusion

69 / 115
What is Pandas?

Introduction
to Python
Pandas for
Data
Analytics

Srijith
Rajamohan

Introduction
Pandas is an open source, BSD-licensed library
to Python
High-performance, easy-to-use data structures and data
Python
programming analysis tools
NumPy
Built for the Python programming language.
Matplotlib

Introduction
to Pandas

Case study

Conclusion

70 / 115
Pandas - import modules

Introduction
to Python
Pandas for
Data
Analytics

Srijith
Rajamohan Example
Introduction
to Python >>> from pandas import DataFrame , read_csv
Python # General syntax to import a library but
programming
no functions :
NumPy
>>> import pandas as pd # this is how I
Matplotlib
usually import pandas
Introduction
to Pandas

Case study

Conclusion

71 / 115
Pandas - Create a dataframe

Introduction
to Python
Pandas for
Example
Data
Analytics

Srijith
Rajamohan >>>d = { one : pd . Series ([1. , 2. , 3.] ,
index =[ a , b , c ]) ,
Introduction
to Python two : pd . Series ([1. , 2. , 3. , 4.] , index
Python =[ a , b , c , d ]) }
programming

NumPy
>>> df = pd . DataFrame ( d )
Matplotlib
>>> df
Introduction
one two
to Pandas
a 1.0 1.0
Case study
b 2.0 2.0
Conclusion
c 3.0 3.0
d NaN 4.0

72 / 115
Pandas - Create a dataframe

Introduction
to Python
Pandas for Example
Data
Analytics

Srijith
Rajamohan
>>> names = [ Bob , Jessica , Mary , John ,
Introduction Mel ]
to Python
>>> births = [968 , 155 , 77 , 578 , 973]
Python
programming # To merge these two lists together we will
NumPy use the zip function .
Matplotlib

Introduction
to Pandas
>>> BabyDataSet = list ( zip ( names , births ) )
Case study
>>> BabyDataSet
Conclusion
[( Bob , 968) , ( Jessica , 155) , ( Mary ,
77) , ( John , 578) , ( Mel , 973) ]

73 / 115
Pandas - Create a data frame and write to a csv file

Introduction
to Python
Pandas for
Data
Analytics
Use the pandas module to create a dataset.
Srijith
Rajamohan
Example
Introduction
to Python

Python
programming >>> df = pd . DataFrame ( data = BabyDataSet ,
NumPy columns =[ Names , Births ])
Matplotlib >>> df . to_csv ( births1880 . csv , index = False ,
Introduction
to Pandas
header = False )
Case study

Conclusion

74 / 115
Pandas - Read data from a file

Introduction
to Python
Pandas for
Data
Analytics Import data from the csv file
Srijith
Rajamohan Example
Introduction
to Python >>> df = pd . read_csv ( filename )
Python
programming
# Don t treat the first row as a header
NumPy
>>> df = pd . read_csv ( Location , header = None )
Matplotlib
# Provide specific names for the columns
Introduction >>> df = pd . read_csv ( Location , names =[
to Pandas
Names , Births ])
Case study

Conclusion

75 / 115
Pandas - Get data types

Introduction
to Python
Pandas for
Data
Analytics Example
Srijith
Rajamohan
# Check data type of the columns
Introduction >>> df . dtypes
to Python

Python
Names object
programming Births int64
NumPy dtype : object
Matplotlib
# Check data type of Births column
Introduction
to Pandas >>> df . Births . dtype
Case study dtype ( int64 )
Conclusion

76 / 115
Pandas - Take a look at the data

Introduction
to Python
Pandas for Example
Data
Analytics

Srijith >>> df . head (2)


Rajamohan
Names Births
Introduction 0 Bob 968
to Python
1 Jessica 155
Python
programming >>> df . tail (2)
NumPy Names Births
Matplotlib 3 John 578
Introduction
to Pandas
4 Mel 973
Case study
>>> df . columns
Conclusion
Index ([ u Names , u Births ] , dtype = object
)

77 / 115
Pandas - Take a look at the data

Introduction
to Python
Pandas for
Data
Analytics Example
Srijith
Rajamohan >>> df . values
Introduction
array ([[ Bob , 968] ,
to Python [ Jessica , 155] ,
Python
programming
[ Mary , 77] ,
NumPy [ John , 578] ,
Matplotlib [ Mel , 973]] , dtype = object )
Introduction
to Pandas
>>> df . index
Case study
Int64Index ([0 , 1 , 2 , 3 , 4] , dtype = int64 )
Conclusion

78 / 115
Pandas - Working on the data

Introduction
to Python
Pandas for
Data
Analytics

Srijith Example
Rajamohan

Introduction >>> df [ Births ]. plot ()


to Python
# Maximum value in the data set
Python
programming >>> MaxValue = df [ Births ]. max ()
NumPy # Name associated with the maximum value
Matplotlib >>> MaxName = df [ Names ][ df [ Births ] ==
Introduction
to Pandas
df [ Births ]. max () ]. values
Case study

Conclusion

79 / 115
Pandas - Describe the data

Introduction
to Python
Pandas for
Data
Analytics Example
Srijith
Rajamohan >>> df [ Names ]. unique ()
Introduction
array ([ Mary , Jessica , Bob , John ,
to Python Mel ] , dtype = object )
Python
programming
>>> print ( df [ Names ]. describe () )
NumPy count 1000
Matplotlib unique 5
Introduction top Bob
to Pandas
freq 206
Case study
Name : Names , dtype : object
Conclusion

80 / 115
Pandas - Add a column

Introduction
to Python
Pandas for
Data
Analytics Example
Srijith
Rajamohan >>>d = [0 ,1 ,2 ,3 ,4 ,5 ,6 ,7 ,8 ,9]
Introduction
to Python # Create dataframe
Python
programming
>>> df = pd . DataFrame ( d )
NumPy # Name the column
Matplotlib >>> df . columns = [ Rev ]
Introduction # Add another one and set the value in that
to Pandas
column
Case study
>>> df [ NewCol ] = 5
Conclusion

81 / 115
Pandas - Accessing and indexing the data

Introduction
to Python
Pandas for
Data
Analytics Example
Srijith
Rajamohan

Introduction
# Perform operations on columns
to Python >>> df [ NewCol ] = df [ NewCol ] + 1
Python
programming
# Delete a column
NumPy >>> del df [ NewCol ]
Matplotlib # Edit the index name
Introduction >>>i = [ a , b , c , d , e , f , g , h , i
to Pandas
, j ]
Case study
>>> df . index = i
Conclusion

82 / 115
Pandas - Accessing and indexing the data

Introduction
to Python
Pandas for Example
Data
Analytics

Srijith # Find based on index value


Rajamohan
>>> df . loc [ a ]
Introduction >>> df . loc [ a : d ]
to Python
# Do integer position based indexing
Python
programming >>> df . iloc [0:3]
NumPy # Access using the column name
Matplotlib >>> df [ Rev ]
Introduction
to Pandas
# Access multiple columns
Case study
>>> df [[ Rev , test ]]
Conclusion
# Subset the data
>>> df . ix [:3 ,[ Rev , test ]]

83 / 115
Pandas - Accessing and indexing the data

Introduction
to Python
Pandas for
Data
Analytics

Srijith
Rajamohan Example
Introduction
to Python # Find based on index value
Python >>> df . at [ a , Rev ]
programming
0
NumPy
>>> df . iat [0 ,0]
Matplotlib
0
Introduction
to Pandas

Case study

Conclusion

84 / 115
Pandas - Accessing and indexing for loc

Introduction
to Python
Pandas for
Data
Analytics

Srijith A single label, e.g. 5 or a, (note that 5 is interpreted as a


Rajamohan
label of the index. This use is not an integer position
Introduction along the index)
to Python

Python A list or array of labels [a, b, c]


programming
A slice object with labels a:f, (note that contrary to
NumPy

Matplotlib
usual python slices, both the start and the stop are
Introduction included!)
to Pandas
A boolean array
Case study

Conclusion

85 / 115
Pandas - Accessing and indexing for iloc

Introduction
to Python
Pandas for
Data
Analytics

Srijith
Rajamohan

Introduction An integer e.g. 5


to Python

Python A list or array of integers [4, 3, 0]


programming

NumPy
A slice object with ints 1:7
Matplotlib

Introduction
to Pandas

Case study

Conclusion

86 / 115
Pandas - Accessing and indexing summarized

Introduction
to Python
Pandas for
Data
Analytics Example
Srijith
Rajamohan loc : only work on index
Introduction
iloc : work on position
to Python ix : this is the most general and
Python
programming
supports index and position based
NumPy retrieval
Matplotlib at : get scalar values , it s a very fast
Introduction loc
to Pandas
iat : get scalar values , it s a very fast
Case study
iloc
Conclusion

87 / 115
Pandas - Missing data

Introduction
to Python
Pandas for
How do you deal with data that is missing or contains NaNs
Data
Analytics Example
Srijith
Rajamohan
>>> df = pd . DataFrame ( np . random . randn (5 , 3)
Introduction , index =[ a , c , e , f , h ] ,
to Python
columns =[ one , two , three ])
Python
programming >>> df . loc [ a , two ] = np . nan
NumPy one two three
Matplotlib a -1.192838 NaN -0.337037
Introduction
to Pandas
c 0.110718 -0.016733 -0.137009
Case study e 0.153456 0.266369 -0.064127
Conclusion f 1.709607 -0.424790 -0.792061
h -1.076740 -0.872088 -0.436127

88 / 115
Pandas - Missing data

Introduction
to Python
Pandas for
Data
Analytics
How do you deal with data that is missing or contains NaNs?
Srijith
Rajamohan Example
Introduction >>> df . isnull ()
to Python

Python
one two three
programming
a False True False
NumPy
c False False False
Matplotlib
e False False False
Introduction
to Pandas f False False False
Case study h False False False
Conclusion

89 / 115
Pandas - Missing data

Introduction
to Python
Pandas for
Data
Analytics
You can fill this data in a number of ways.
Srijith
Rajamohan Example
Introduction >>> df . fillna (0)
to Python

Python
one two three
programming
a -1.192838 0.000000 -0.337037
NumPy
c 0.110718 -0.016733 -0.137009
Matplotlib
e 0.153456 0.266369 -0.064127
Introduction
to Pandas f 1.709607 -0.424790 -0.792061
Case study h -1.076740 -0.872088 -0.436127
Conclusion

90 / 115
Pandas - Query the data

Introduction
to Python
Pandas for Also, use the query method where you can embed boolean
Data
Analytics expressions on columns within quotes
Srijith
Rajamohan Example
Introduction
to Python >>> df . query ( one > 0 )
Python one two three
programming
c 0.110718 -0.016733 -0.137009
NumPy
e 0.153456 0.266369 -0.064127
Matplotlib

Introduction
f 1.709607 -0.424790 -0.792061
to Pandas >>> df . query ( one > 0 & two > 0 )
Case study one two three
Conclusion
e 0.153456 0.266369 -0.064127

91 / 115
Pandas - Apply a function

Introduction
to Python
Pandas for
Data
Analytics

Srijith You can apply any function to the columns in a dataframe


Rajamohan

Introduction
Example
to Python

Python >>> df . apply ( lambda x : x . max () - x . min () )


programming
one 2.902445
NumPy
two 1.138457
Matplotlib

Introduction
three 0.727934
to Pandas

Case study

Conclusion

92 / 115
Pandas - Applymap a function

Introduction
to Python
Pandas for
Data You can apply any function to the element wise data in a
Analytics
dataframe
Srijith
Rajamohan
Example
Introduction
to Python >>> df . applymap ( np . sqrt )
Python
programming
one two three
NumPy a NaN NaN NaN
Matplotlib c 0.332742 NaN NaN
Introduction e 0.391735 0.516109 NaN
to Pandas
f 1.307520 NaN NaN
Case study
h NaN NaN NaN
Conclusion

93 / 115
Pandas - Query data

Introduction
to Python
Pandas for
Data Determine if certain values exist in the dataframe
Analytics

Srijith Example
Rajamohan

Introduction
>>>s = pd . Series ( np . arange (5) , index = np .
to Python arange (5) [:: -1] , dtype = int64 )
Python
programming
>>>s . isin ([2 ,4 ,6])
NumPy 4 False
Matplotlib 3 False
Introduction 2 True
to Pandas
1 False
Case study
0 True
Conclusion

94 / 115
Pandas - Query data

Introduction
to Python
Pandas for
Data Use the where method
Analytics

Srijith Example
Rajamohan

Introduction
>>>s = pd . Series ( np . arange (5) , index = np .
to Python arange (5) [:: -1] , dtype = int64 )
Python
programming
>>>s . where (s >3)
NumPy 4 NaN
Matplotlib 3 NaN
Introduction 2 NaN
to Pandas
1 NaN
Case study
0 4
Conclusion

95 / 115
Pandas - Grouping the data

Introduction
to Python
Pandas for
Data
Analytics

Srijith Creating a grouping organizes the data and returns a groupby


Rajamohan
object
Introduction
to Python Example
Python
programming
grouped = obj . groupby ( key )
NumPy
grouped = obj . groupby ( key , axis =1)
Matplotlib

Introduction
grouped = obj . groupby ([ key1 , key2 ])
to Pandas

Case study

Conclusion

96 / 115
Pandas - Grouping the data

Introduction
to Python
Pandas for
Data
Analytics Example
Srijith
Rajamohan

Introduction df = pd . DataFrame ({ A : [ foo , bar ,


to Python

Python
foo , bar ,
programming foo , bar , foo , foo ] ,
NumPy B : [ one , one , two , three ,
Matplotlib
two , two , one , three ] ,
Introduction
to Pandas C : np . random . randn (8) ,
Case study D : np . random . randn (8) })
Conclusion

97 / 115
Pandas - Grouping the data

Introduction
to Python
Pandas for
Data Example
Analytics

Srijith
Rajamohan
A B C D
Introduction
to Python 0 foo one 0.469112 -0.861849
Python 1 bar one -0.282863 -2.104569
programming
2 foo two -1.509059 -0.494929
NumPy
3 bar three -1.135632 1.071804
Matplotlib
4 foo two 1.212112 0.721555
Introduction
to Pandas 5 bar two -0.173215 -0.706771
Case study 6 foo one 0.119209 -1.039575
Conclusion 7 foo three -1.044236 0.271860

98 / 115
Pandas - Grouping the data

Introduction
to Python
Pandas for
Data
Analytics
Group by either A or B columns or both
Srijith
Rajamohan
Example
Introduction
to Python
>>> grouped = df . groupby ( A )
Python
programming >>> grouped = df . groupby ([ A , B ])
NumPy # Sorts by default , disable this for
Matplotlib potential speedup
Introduction
to Pandas
>>> grouped = df . groupby ( A , sort = False )
Case study

Conclusion

99 / 115
Pandas - Grouping the data

Introduction
to Python
Pandas for
Data
Analytics

Srijith
Rajamohan Get statistics for the groups
Introduction Example
to Python

Python
programming >>> grouped . size ()
NumPy >>> grouped . describe ()
Matplotlib >>> grouped . count ()
Introduction
to Pandas

Case study

Conclusion

100 / 115
Pandas - Grouping the data

Introduction
to Python
Pandas for
Print the grouping
Data
Analytics Example
Srijith
Rajamohan
>>> list ( grouped )
Introduction A B C D
to Python
1 bar one -1.303028 -0.932565
Python
programming 3 bar three 0.135601 0.268914
NumPy 5 bar two -0.320369 0.059366)
Matplotlib 0 foo one 1.066805 -1.252834
Introduction
to Pandas
2 foo two -0.180407 1.686709
Case study 4 foo two 0.228522 -0.457232
Conclusion 6 foo one -0.553085 0.512941
7 foo three -0.346510 0.434751) ]

101 / 115
Pandas - Grouping the data

Introduction
to Python
Get the first and last elements of each grouping. Also, apply
Pandas for
Data
the sum function to each column
Analytics

Srijith
Example
Rajamohan

>>> grouped . first ()


Introduction
to Python A B C D
Python bar one -1.303028 -0.932565
programming

NumPy
foo one 1.066805 -1.252834
Matplotlib
# Similar results can be obtained with g .
Introduction
last ()
to Pandas
>>> grouped . sum ()
Case study
A C D
Conclusion
bar -1.487796 -0.604285
foo 0.215324 0.924336

102 / 115
Pandas - Grouping the data

Introduction
to Python
Pandas for
Data
Analytics

Srijith Group aggregation


Rajamohan

Introduction
Example
to Python

Python >>> grouped . aggregate ( np . sum )


programming
A C D
NumPy
bar -1.487796 -0.604285
Matplotlib

Introduction
foo 0.215324 0.924336
to Pandas

Case study

Conclusion

103 / 115
Pandas - Grouping the data

Introduction
to Python
Pandas for
Data
Analytics Apply multiple functions to a grouped column
Srijith
Rajamohan Example
Introduction
to Python >>> grouped [ C ]. agg ([ np . sum , np . mean ])
Python
programming

NumPy
A sum mean
Matplotlib

Introduction bar -1.487796 -0.495932


to Pandas
foo 0.215324 0.043065
Case study

Conclusion

104 / 115
Pandas - Grouping the data

Introduction
to Python
Pandas for
Data
Analytics

Srijith Visually inspecting the grouping


Rajamohan

Introduction
Example
to Python

Python >>>w = grouped [ C ]. agg ([ np . sum , np . mean ])


programming
. plot ()
NumPy
>>> import matplotlib . pyplot as plt
Matplotlib

Introduction
>>> plt . show ()
to Pandas

Case study

Conclusion

105 / 115
Pandas - Grouping the data

Introduction
to Python
Pandas for
Data
Analytics

Srijith
Rajamohan Apply a transformation to the grouping
Introduction Example
to Python

Python
programming >>>f = lambda x : x *2
NumPy >>> transformed = grouped . transform ( f )
Matplotlib >>> print transformed
Introduction
to Pandas

Case study

Conclusion

106 / 115
Pandas - Grouping the data

Introduction
to Python
Pandas for Apply a filter to select a group based on some criterion.
Data
Analytics
Example
Srijith
Rajamohan
>>> grouped . filter ( lambda x : sum ( x [ C ]) >
Introduction
to Python 0)
Python
programming
A B C D
NumPy
0 foo one 1.066805 -1.252834
Matplotlib

Introduction
2 foo two -0.180407 1.686709
to Pandas 4 foo two 0.228522 -0.457232
Case study 6 foo one -0.553085 0.512941
Conclusion
7 foo three -0.346510 0.434751

107 / 115
Section 6

Introduction
to Python
Pandas for
Data 1 Introduction to Python
Analytics

Srijith
Rajamohan 2 Python programming
Introduction
to Python 3 NumPy
Python
programming
4 Matplotlib
NumPy

Matplotlib
5 Introduction to Pandas
Introduction
to Pandas

Case study
6 Case study
Conclusion
7 Conclusion

108 / 115
Cost of College

Introduction
to Python
Pandas for
Data
Analytics

Srijith
Rajamohan

Introduction
to Python
We are going to analyze the cost of college data scorecard
Python provided by the federal government
programming
https://collegescorecard.ed.gov/data/
NumPy

Matplotlib

Introduction
to Pandas

Case study

Conclusion

109 / 115
Cost of College

Introduction
to Python
Pandas for
Data
Analytics

Srijith
Rajamohan
Find the top 10 median 10 year debt
Find the top 10 median earnings
Introduction
to Python
Find the top 10 schools with the best sat scores
Python
programming Find the top 10 best return of investment
NumPy
Find average median earnings per state
Matplotlib

Introduction
Compute the correlation between the SAT scores and
to Pandas median income
Case study

Conclusion

110 / 115
Cost of College

Introduction
to Python
Pandas for
Data
Analytics

Srijith Columns of interest


Rajamohan
UNITID
Introduction
to Python INSTNM
Python
programming STABBR
NumPy CITY
Matplotlib
GRAD DEBT MDN SUPP
Introduction
to Pandas SAT AVG
Case study

Conclusion

111 / 115
Cost of College - Generate metrics and create
interactive visualizations using Bokeh
Introduction
to Python
Pandas for
Data
Analytics

Srijith
Rajamohan
Generate metrics and create interactive visualizations
Introduction
to Python using Bokeh
Python Create an interactive chloropleth visualization
programming

NumPy Sample given here at


Matplotlib http://sjster.bitbucket.org/sub2/index.html
Introduction
to Pandas

Case study

Conclusion

112 / 115
Interactive Chloropleth for querying and
visualization
Introduction
to Python
Pandas for
Data
Analytics

Srijith
Rajamohan

Introduction
to Python

Python
programming

NumPy

Matplotlib

Introduction
to Pandas

Case study

Conclusion

113 / 115
Section 7

Introduction
to Python
Pandas for
Data 1 Introduction to Python
Analytics

Srijith
Rajamohan 2 Python programming
Introduction
to Python 3 NumPy
Python
programming
4 Matplotlib
NumPy

Matplotlib
5 Introduction to Pandas
Introduction
to Pandas

Case study
6 Case study
Conclusion
7 Conclusion

114 / 115
Questions

Introduction
to Python
Pandas for
Data
Analytics

Srijith
Rajamohan

Introduction
to Python

Python
Thank you for attending !
programming

NumPy

Matplotlib

Introduction
to Pandas

Case study

Conclusion

115 / 115

Vous aimerez peut-être aussi