Académique Documents
Professionnel Documents
Culture Documents
Rodrigo Belo
rbelo@cmu.edu
Spring 2015
NumPy
NumPy
NumPy
ndarray
0.25494037,
0.3420035 ,
0.79516021],
0.08914765]])
ndarray
0.25494037,
0.3420035 ,
0.79516021],
0.08914765]])
You can get the shape of an array and the type of its elements by
accessing the values shape and dtype:
print data . shape
print data . dtype
(2, 3)
float64
Creating ndarrays
From a list:
import numpy as np
data1 = [1 ,2 ,3 ,4]
arr1 = np . array ( data1 )
arr1
array([1, 2, 3, 4])
Creating ndarrays
From a list:
import numpy as np
data1 = [1 ,2 ,3 ,4]
arr1 = np . array ( data1 )
arr1
array([1, 2, 3, 4])
Creating ndarrays
0.,
0.,
0.,
0.,
0.,
0.,
0.,
0.,
0.,
0.,
0.,
0.,
0.],
0.],
0.]])
Creating ndarrays
0.25494037,
0.3420035 ,
0.79516021],
0.08914765]])
Example
arr = np . array ( [ Hello , np . random . rand ] )
arr
array([Hello,
<built-in method rand of mtrand.RandomState object at 0x1002b6708>], dtype=object)
Multiplication by a scalar
data * 10
array([[ 6.39219315,
[ 0.34237044,
6.8102819 ,
5.39243817,
4.34637984],
1.26276343]])
Multiplication by a scalar
data * 10
array([[ 6.39219315,
[ 0.34237044,
6.8102819 ,
5.39243817,
4.34637984],
1.26276343]])
1.36205638,
1.07848763,
0.86927597],
0.25255269]])
Addition
data + data
array([[ 1.27843863,
[ 0.06847409,
2.,
5.,
8.,
3.],
6.],
9.]])
Multiplication
arr * arr
array([[ 1.,
[ 16.,
[ 49.,
4.,
25.,
64.,
9.],
36.],
81.]])
Division
1 / arr
array([[ 1.
,
[ 0.25
,
[ 0.14285714,
0.5
0.2
0.125
,
,
,
0.33333333],
0.16666667],
0.11111111]])
10
2.,
5.,
8.,
3.],
6.],
9.]])
arr [1]
array([ 4.,
5.,
6.])
5.,
8.])
arr [ : , 1 ]
array([ 2.,
11
2.,
5.,
8.,
3.],
6.],
9.]])
arr [1]
array([ 4.,
5.,
6.])
5.,
8.])
arr [ : , 1 ]
array([ 2.,
arr [1: ,: 1]
array([[ 4.,
[ 7.,
5.],
8.]])
11
Boolean Indexing
names = np . array ( [ Bob , Joe , B i l l , Tess , Joe , Joe , Bob ] )
data = randn (7 ,4)
print names
data
[Bob Joe Bill Tess Joe Joe Bob]
array([[ 1.31273264, -1.4027545 , 1.76375476, -0.19194064],
[-0.41950145, -0.21455786, 0.28687505, 0.70312942],
[ 0.76013575, 0.68719731, 1.45771087, 0.07268093],
[ 1.20720934, -0.52305673, 0.56317445, 0.33062879],
[ 0.81500408, 1.12185486, 1.31608209, 0.80725464],
[-0.70584486, -0.86788517, -0.07373691, 0.83189097],
[-0.44775752, -1.67612963, -0.01396541, 0.2745861 ]])
12
Boolean Indexing
names = np . array ( [ Bob , Joe , B i l l , Tess , Joe , Joe , Bob ] )
data = randn (7 ,4)
print names
data
[Bob Joe Bill Tess Joe Joe Bob]
array([[ 1.31273264, -1.4027545 , 1.76375476, -0.19194064],
[-0.41950145, -0.21455786, 0.28687505, 0.70312942],
[ 0.76013575, 0.68719731, 1.45771087, 0.07268093],
[ 1.20720934, -0.52305673, 0.56317445, 0.33062879],
[ 0.81500408, 1.12185486, 1.31608209, 0.80725464],
[-0.70584486, -0.86788517, -0.07373691, 0.83189097],
[-0.44775752, -1.67612963, -0.01396541, 0.2745861 ]])
We can create an array of Booleans that is used to select the relevant rows:
print names == Bob
data [names == Bob ]
[ True False False False False False True]
array([[ 1.31273264, -1.4027545 , 1.76375476, -0.19194064],
[-0.44775752, -1.67612963, -0.01396541, 0.2745861 ]])
12
Boolean Indexing
You can use different indexing methods at once:
data [names == Bob , 2 : ]
array([[ 1.76375476, -0.19194064],
[-0.01396541, 0.2745861 ]])
13
Boolean Indexing
You can use different indexing methods at once:
data [names == Bob , 2 : ]
array([[ 1.76375476, -0.19194064],
[-0.01396541, 0.2745861 ]])
13
Boolean Indexing
You can use different indexing methods at once:
data [names == Bob , 2 : ]
array([[ 1.76375476, -0.19194064],
[-0.01396541, 0.2745861 ]])
13
Boolean Indexing
You can use different indexing methods at once:
data [names == Bob , 2 : ]
array([[ 1.76375476, -0.19194064],
[-0.01396541, 0.2745861 ]])
Note: Selecting data from an array always creates a copy of the data, even
if the returned array is unchanged
13
Boolean Indexing
You can use boolean indexing to assign values to specific positions of the
array:
data [ data < 0 ] = 0
data
array([[
[
[
[
[
[
[
1.31273264,
0.
,
0.76013575,
1.20720934,
0.81500408,
0.
,
0.
,
0.
,
0.
,
0.68719731,
0.
,
1.12185486,
0.
,
0.
,
1.76375476,
0.28687505,
1.45771087,
0.56317445,
1.31608209,
0.
,
0.
,
0.
],
0.70312942],
0.07268093],
0.33062879],
0.80725464],
0.83189097],
0.2745861 ]])
14
Boolean Indexing
You can use boolean indexing to assign values to specific positions of the
array:
data [ data < 0 ] = 0
data
array([[
[
[
[
[
[
[
1.31273264,
0.
,
0.76013575,
1.20720934,
0.81500408,
0.
,
0.
,
0.
,
0.
,
0.68719731,
0.
,
1.12185486,
0.
,
0.
,
1.76375476,
0.28687505,
1.45771087,
0.56317445,
1.31608209,
0.
,
0.
,
0.
],
0.70312942],
0.07268093],
0.33062879],
0.80725464],
0.83189097],
0.2745861 ]])
14
Boolean Indexing
7.
,
0.
,
7.
,
7.
,
0.81500408,
0.
,
7.
,
7.
,
0.
,
7.
,
7.
,
1.12185486,
0.
,
7.
,
7.
,
0.28687505,
7.
,
7.
,
1.31608209,
0.
,
7.
,
7.
],
0.70312942],
7.
],
7.
],
0.80725464],
0.83189097],
7.
]])
15
Fancy Indexing
Fancy indexing is a term adopted by NumPy to describe indexing using
integer arrays
arr = np . empty( ( 8 , 4 ) )
for i in range ( 8 ) :
arr [ i ] = i
arr
array([[
[
[
[
[
[
[
[
0.,
1.,
2.,
3.,
4.,
5.,
6.,
7.,
0.,
1.,
2.,
3.,
4.,
5.,
6.,
7.,
0.,
1.,
2.,
3.,
4.,
5.,
6.,
7.,
0.],
1.],
2.],
3.],
4.],
5.],
6.],
7.]])
3.,
0.,
2.,
3.,
0.,
2.,
3.],
0.],
2.]])
Example
arr [ [ 3 , 0 , 2 ] ]
array([[ 3.,
[ 0.,
[ 2.,
16
Transposing Arrays
0.,
0.,
0.,
0.,
1.,
1.,
1.,
1.,
2.,
2.,
2.,
2.,
3.,
3.,
3.,
3.,
4.,
4.,
4.,
4.,
5.,
5.,
5.,
5.,
6.,
6.,
6.,
6.,
7.],
7.],
7.],
7.]])
17
18
Universal Functions
Examples
arr = np . arange(10)
np . sqrt ( arr )
array([ 0.
,
2.23606798,
1.
,
2.44948974,
1.41421356,
2.64575131,
1.73205081,
2.82842712,
2.
3.
,
])
np . exp( arr )
array([
1.00000000e+00,
2.00855369e+01,
4.03428793e+02,
8.10308393e+03])
2.71828183e+00,
5.45981500e+01,
1.09663316e+03,
7.38905610e+00,
1.48413159e+02,
2.98095799e+03,
19
Example
xarr = np . array ( [ 1 . 1 , 1.2 , 1.3 , 1.4 , 1 . 5 ] )
yarr = np . array ( [ 2 . 1 , 2.2 , 2.3 , 2.4 , 2 . 5 ] )
cond = np . array ( [ True , False , True , True , False ] )
Suppose we want to create an array that takes the value in xarr when cond
is True and the value of yarr when cond is False
20
Example
xarr = np . array ( [ 1 . 1 , 1.2 , 1.3 , 1.4 , 1 . 5 ] )
yarr = np . array ( [ 2 . 1 , 2.2 , 2.3 , 2.4 , 2 . 5 ] )
cond = np . array ( [ True , False , True , True , False ] )
Suppose we want to create an array that takes the value in xarr when cond
is True and the value of yarr when cond is False
np . where(cond , xarr , yarr )
array([ 1.1,
2.2,
1.3,
1.4,
2.5])
20
Example
xarr = np . array ( [ 1 . 1 , 1.2 , 1.3 , 1.4 , 1 . 5 ] )
yarr = np . array ( [ 2 . 1 , 2.2 , 2.3 , 2.4 , 2 . 5 ] )
cond = np . array ( [ True , False , True , True , False ] )
Suppose we want to create an array that takes the value in xarr when cond
is True and the value of yarr when cond is False
np . where(cond , xarr , yarr )
array([ 1.1,
2.2,
1.3,
1.4,
2.5])
20
Method
sum
mean
std, var
min, max
argmin, argmax
cumsum
cumprod
Description
Sum of all the elements in the array or along an axis.
Arithmetic mean. Zero-length arrays have NaN mean.
Standard deviation and variance, respectively
Minimum and maximum.
Indices of minimum and maximum elements, respectively.
Cumulative sum of elements starting from 0
Cumulative product of elements starting from 1
21
Booleans are coerced to 1 and 0, so the sum method can be used to count
the number of true values in an array:
arr = randn(100)
( arr > 0 ) .sum( )
55
22
Sorting
23
Sorting
You can specify the dimension in which you want to sort an n-dimentional
array:
arr = randn (5 ,3)
arr . sort ( axis=0)
arr
array([[-1.17850016,
[-0.15450684,
[ 0.66674063,
[ 0.79119149,
[ 1.61247548,
0.05609878, -1.11894931],
0.14064359, -0.12111114],
0.39402912, -0.09261304],
1.18169535, 0.09052968],
1.48936384, 0.11534684]])
24
Sorting
You can specify the dimension in which you want to sort an n-dimentional
array:
arr = randn (5 ,3)
arr . sort ( axis=0)
arr
array([[-1.17850016,
[-0.15450684,
[ 0.66674063,
[ 0.79119149,
[ 1.61247548,
0.05609878, -1.11894931],
0.14064359, -0.12111114],
0.39402912, -0.09261304],
1.18169535, 0.09052968],
1.48936384, 0.11534684]])
0.05609878],
0.14064359],
0.66674063],
1.18169535],
1.61247548]])
24
Set Logic
We can get all the unique values of an array with the unique method:
names = np . array ( [ Bob , Joe , B i l l , Tess , Joe , Joe , Bob ] )
np . unique (names)
array([Bill, Bob, Joe, Tess],
dtype=|S4)
25
Set Logic
We can get all the unique values of an array with the unique method:
names = np . array ( [ Bob , Joe , B i l l , Tess , Joe , Joe , Bob ] )
np . unique (names)
array([Bill, Bob, Joe, Tess],
dtype=|S4)
True,
True,
True], dtype=bool)
25
Set Logic
Method
unique(x)
intersect1d(x, y)
union1d(x, y)
in1d(x, y)
setdiff1d(x, y)
setxor1d(x, y)
Description
Compute the sorted, unique elements in x
Compute the sorted, common elements in x and y
Compute the sorted union of elements
Compute a boolean array indicating whether each element of
Set difference, elements in x that are not in y
Set symmetric differences
26
Example
Saving an array:
arr = np . arange(10)
print arr
np . save ( my_array , arr )
[0 1 2 3 4 5 6 7 8 9]
27
Example
Saving an array:
arr = np . arange(10)
print arr
np . save ( my_array , arr )
[0 1 2 3 4 5 6 7 8 9]
27
Linear Algebra
Example
We can use the operation np.dot to multiply two matrices:
arr1 = array ( [ [ 1 , 2 , 3 ] , [ 4 , 5 , 6 ] ] )
arr2 = array ( [ [ 2 , 2 , 2 ] , [ 2 , 2 , 2 ] , [ 2 , 2 , 2 ] ] )
np . dot ( arr1 , arr2 )
array([[12, 12, 12],
[30, 30, 30]])
28
Linear Algebra
Example
We can use the operation np.dot to multiply two matrices:
arr1 = array ( [ [ 1 , 2 , 3 ] , [ 4 , 5 , 6 ] ] )
arr2 = array ( [ [ 2 , 2 , 2 ] , [ 2 , 2 , 2 ] , [ 2 , 2 , 2 ] ] )
np . dot ( arr1 , arr2 )
array([[12, 12, 12],
[30, 30, 30]])
np . dot ( arr2 , arr1 . T)
array([[12, 30],
[12, 30],
[12, 30]])
28
-0.32093721, -1.812208 ],
-1.43691046, -0.40667169],
-0.67012289, 0.57377713],
-0.64128062, 0.66803304]])
29
-0.32093721, -1.812208 ],
-1.43691046, -0.40667169],
-0.67012289, 0.57377713],
-0.64128062, 0.66803304]])
29
Term Project
Requirements:
Teams of 2 or 3 students
Include all the three components covered in this class:
1
2
3
30
Term Project
Requirements:
Teams of 2 or 3 students
Include all the three components covered in this class:
1
2
3
Dates:
Project proposal and teams: Thursday, April 2
4 paragraphs:
Goals
Data collection strategy
Data storage strategy
Analysis strategy
pandas
31
pandas
32
pandas
32
Series
A Series is a one-dimensional array-like object containing an array of data
and an array of data labels, called its index.
Example
from pandas import Series
obj = Series ( [ 1 , 3 , 4 , 5])
obj
0
1
1
3
2
4
3
-5
dtype: int64
33
Series
A Series is a one-dimensional array-like object containing an array of data
and an array of data labels, called its index.
Example
from pandas import Series
obj = Series ( [ 1 , 3 , 4 , 5])
obj
0
1
1
3
2
4
3
-5
dtype: int64
You can get the array representation and index object of the Series via its
attributes values and index:
obj . values
array([ 1,
3,
4, -5])
obj . index
Int64Index([0, 1, 2, 3], dtype=int64)
33
Series
34
Series
34
Series
Series automatically aligns differently indexed data in arithmetic
operations
Example
obj3 = Series ([4 ,7 , 4 ,3] , index=[ a , b , c , d ] )
obj4 = Series ([4 ,7 , 4 ,3] , index=[ d , b , a , c ] )
print obj3
print obj4
a
4
b
7
c
-4
d
3
dtype: int64
d
4
b
7
a
-4
c
3
dtype: int64
obj3 + obj4
35
Series
Series automatically aligns differently indexed data in arithmetic
operations
Example
obj3 = Series ([4 ,7 , 4 ,3] , index=[ a , b , c , d ] )
obj4 = Series ([4 ,7 , 4 ,3] , index=[ d , b , a , c ] )
print obj3
print obj4
a
4
b
7
c
-4
d
3
dtype: int64
d
4
b
7
a
-4
c
3
dtype: int64
obj3 + obj4
a
0
b
14
c
-1
d
7
dtype: int64
35
DataFrame
A DataFrame represents a spreadsheet-like data structure containing
an ordered collection of columns
Each column is a Series object
Each column can contain a different data type
A DataFrame can be seen as a dictionary of Series objects
36
DataFrame
A DataFrame represents a spreadsheet-like data structure containing
an ordered collection of columns
Each column is a Series object
Each column can contain a different data type
A DataFrame can be seen as a dictionary of Series objects
Example
from pandas import DataFrame
data = { state : [ Ohio , Ohio , Ohio , Nevada , Nevada ] ,
year : [2000, 2001, 2002, 2001, 2002],
pop : [ 1 . 5 , 1.7 , 3.6 , 2.4 , 2.9]}
frame = DataFrame( data )
frame
pop
state year
0 1.5
Ohio 2000
1 1.7
Ohio 2001
2 3.6
Ohio 2002
3 2.4 Nevada 2001
4 2.9 Nevada 2002
36
DataFrame
The order of the columns can be defined with the argument columns
Example
DataFrame( data , columns=[ year , state , pop ] )
year
state pop
0 2000
Ohio 1.5
1 2001
Ohio 1.7
2 2002
Ohio 3.6
3 2001 Nevada 2.4
4 2002 Nevada 2.9
37
DataFrame
Example
frame [ state ]
0
Ohio
1
Ohio
2
Ohio
3
Nevada
4
Nevada
Name: state, dtype: object
frame . year
0
2000
1
2001
2
2002
3
2001
4
2002
Name: year, dtype: int64
38
DataFrame
Example
frame [ debt ] = 0
frame
pop
state year debt
0 1.5
Ohio 2000
1 1.7
Ohio 2001
2 3.6
Ohio 2002
3 2.4 Nevada 2001
4 2.9 Nevada 2002
0
0
0
0
0
0
1
2
3
4
39
DataFrame
40
Index Objects
pandass Index objects are responsible for holding the axis labels and
other metadata (like the axis name or names)
Any array or other sequence of labels used when constructing a Series or
DataFrame is internally converted to an Index
Example
obj = Series ( range ( 3 ) , index=[ a , b , c ] )
index = obj . index
Index([ua, ub, uc], dtype=object)
Index objects are immutable and thus cant be changed by the user
41
Reindexing
Example
obj = Series ( [ 4 . 5 , 7.2 , 5.3, 3 . 6 ] , index=[ d , b , a , c ] )
obj
d
4.5
b
7.2
a
-5.3
c
3.6
dtype: float64
obj2 = obj . reindex ( [ a , b , c , d ] )
obj2
a
-5.3
b
7.2
c
3.6
d
4.5
dtype: float64
42
Reindexing
We can provide an optional fill value in case some index value does not
exist
Example
obj . reindex ( [ a , b , c , d , e ] , f i l l _ v a l u e = 0)
a
-5.3
b
7.2
c
3.6
d
4.5
e
0.0
dtype: float64
43
Dropping one or more entries from an axis can be performed using the
method drop
Example
obj = Series (np . arange ( 5 . ) , index=[ a , b , c , d , e ] )
obj . drop ( c )
a
0
b
1
d
3
e
4
dtype: float64
44
Example
data = DataFrame(np . arange ( 1 6 ) . reshape ( ( 4 , 4 ) ) ,
index=[ Ohio , Colorado , Utah , New York ] ,
columns=[ one , two , three , four ] )
data . drop ( [ Colorado , Ohio ] )
one two
Utah
New York
three four
8
9
12
13
10
14
11
15
45
Example
data = DataFrame(np . arange ( 1 6 ) . reshape ( ( 4 , 4 ) ) ,
index=[ Ohio , Colorado , Utah , New York ] ,
columns=[ one , two , three , four ] )
data . drop ( [ Colorado , Ohio ] )
one two
Utah
New York
three four
8
9
12
13
10
14
11
15
2
6
10
14
3
7
11
15
45
46
46
Example
frame = DataFrame(np . random . randn (4 , 3) ,
columns=l i s t ( bde ) ,
index=[ Utah , Ohio , Texas , Oregon ] )
frame
b
d
e
Utah
-0.091392 -1.935977
Ohio
-0.034697 0.823547
Texas
0.316441 -0.603441
Oregon 0.045986 -0.965604
0.271981
0.655560
1.380851
0.227028
np . abs ( frame )
b
Utah
Ohio
Texas
Oregon
d
0.091392
0.034697
0.316441
0.045986
e
1.935977
0.823547
0.603441
0.965604
0.271981
0.655560
1.380851
0.227028
47
Example
f = lambda x : x .max( ) x . min ( )
frame . apply ( f )
b
0.407834
d
2.759525
e
1.153823
dtype: float64
frame . apply ( f , axis=1)
Utah
2.207958
Ohio
0.858245
Texas
1.984292
Oregon
1.192632
dtype: float64
48
Example
frame . applymap(lambda x : %.2f % x )
b
d
e
Utah
-0.09 -1.94
Ohio
-0.03
0.82
Texas
0.32 -0.60
Oregon
0.05 -0.97
0.27
0.66
1.38
0.23
49
Example
frame . sort_index ( )
b
d
e
Ohio
-0.034697 0.823547
Oregon 0.045986 -0.965604
Texas
0.316441 -0.603441
Utah
-0.091392 -1.935977
0.655560
0.227028
1.380851
0.271981
0.271981
0.655560
1.380851
0.227028
50
Example
frame . sort_index ( ascending=False )
b
d
e
Utah
-0.091392 -1.935977
Texas
0.316441 -0.603441
Oregon 0.045986 -0.965604
Ohio
-0.034697 0.823547
0.271981
1.380851
0.227028
0.655560
51
Example
frame . describe ( )
b
d
count 4.000000 4.000000
mean
0.059085 -0.670369
std
0.180594 1.143852
min
-0.091392 -1.935977
25%
-0.048871 -1.208198
50%
0.005645 -0.784522
75%
0.113600 -0.246694
max
0.316441 0.823547
e
4.000000
0.633855
0.533834
0.227028
0.260742
0.463770
0.836882
1.380851
52
Description
Number of non-NA values
Compute set of summary statistics
Compute minimum and maximum values
Compute sample quantile ranging from 0 to 1
Sum of values
Mean of values
Arithmetic median (50% quantile) of values
Sample variance of values
Sample standard deviation of values
Cumulative sum of values
Cumulative product of values
53
Example
Get stock prices and volumes obtained from Yahoo! Finance
import pandas . i o . data as web
all_data = {}
for t i c k e r in [ AAPL , IBM , MSFT , GOOG ] :
all_data [ t i c k e r ] = web. get_data_yahoo ( ticker , 1/1/2010 , 3/22/2015 )
price = DataFrame({ t i c : data [ Adj Close ]
for t i c , data in all_data . iteritems ( ) } )
volume = DataFrame({ t i c : data [ Volume ]
for t i c , data in all_data . iteritems ( ) } )
price . t a i l ( )
AAPL
GOOG
IBM
MSFT
Date
2015-03-16 124.95 554.51
2015-03-17 127.04 550.84
2015-03-18 128.47 559.50
2015-03-19 127.50 557.99
2015-03-20 125.90 560.36
157.08
156.96
159.81
159.81
162.88
41.56
41.70
42.50
42.29
42.88
54
55
56
GOOG
1.000000
0.265999
0.368079
0.345835
IBM
0.265999
1.000000
0.315613
0.409107
MSFT
0.368079
0.315613
1.000000
0.500528
0.345835
0.409107
0.500528
1.000000
56
Unique Values
To get unique values we can use the method unique from the Series object:
print len ( price . AAPL)
unique_prices = price . AAPL . unique ( )
print len ( unique_prices )
1312
1192
57
Value Counts
Example
price . AAPL . value_counts ( ) . head ( )
45.29
3
34.26
3
27.75
2
45.47
2
45.72
2
dtype: int64
58
Missing Data
Missing data is common in most data analysis applications
By default pandas functions deal with missing data graciously
Example
First, lets calculate the average price for GOOG:
price .GOOG.mean( )
550.01818548387075
59
Missing Data
Missing data is common in most data analysis applications
By default pandas functions deal with missing data graciously
Example
First, lets calculate the average price for GOOG:
price .GOOG.mean( )
550.01818548387075
59
Missing Data
Missing data is common in most data analysis applications
By default pandas functions deal with missing data graciously
Example
First, lets calculate the average price for GOOG:
price .GOOG.mean( )
550.01818548387075
Now, lets calculate the mean without discarding the missing observations
price .GOOG.mean( skipna=False )
nan
38.33
39.24
39.91
40.33
40.26
60
38.33
39.24
39.91
40.33
40.26
Data starts on 2014-03-27, the first date for which we have data for GOOG
60
26.94
26.95
26.79
26.51
26.69
61
Example
Filling with zeros:
price . f i l l n a ( 0 ) . head ( )
AAPL GOOG
Date
2010-01-04
2010-01-05
2010-01-06
2010-01-07
2010-01-08
IBM
28.84
28.89
28.43
28.38
28.56
MSFT
0
0
0
0
0
119.53
118.09
117.32
116.92
118.09
26.94
26.95
26.79
26.51
26.69
GOOG
28.84
28.89
28.43
28.38
28.56
IBM
MSFT
550.018185
550.018185
550.018185
550.018185
550.018185
119.53
118.09
117.32
116.92
118.09
26.94
26.95
26.79
26.51
26.69
62
IBM
28.84
28.89
28.43
28.38
28.56
MSFT
NaN
NaN
NaN
NaN
NaN
119.53
118.09
117.32
116.92
118.09
26.94
26.95
26.79
26.51
26.69
63
Hierarchical Indexing
Hierarchical indexing enables using multiple (two or more) index levels
on an axis
It provides a way to work with higher dimensional data in a lower
dimensional form
Example
data = Series (np . random . randn (10) ,
index=[[ a , a , a , b , b , b , c , c , d , d ] ,
[2010, 2011, 2012, 2010, 2011, 2012, 2010, 2011, 2011, 2012]])
data
a
2010
0.547634
2011
0.792182
2012
-0.821709
b 2010
0.172503
2011
0.714497
2012
-0.004165
c 2010
-0.095196
2011
0.096810
d 2011
0.553003
2012
0.167027
dtype: float64
64
Hierarchical Indexing
Example
Accessing to a
data [ a ]
2010
0.547634
2011
0.792182
2012
-0.821709
dtype: float64
Accessing to 2011
data [ : , 2011]
a
0.792182
b
0.714497
c
0.096810
d
0.553003
dtype: float64
65
Example
data .sum( l e v e l=0)
a
0.518107
b
0.882835
c
0.001614
d
0.720031
dtype: float64
data .sum( l e v e l=1)
2010
0.624941
2011
2.156492
2012
-0.658846
dtype: float64
66