Académique Documents
Professionnel Documents
Culture Documents
for
Scientific Computing and
Data Science
• Numpy
• Pandas
>>> 10 in x
True
>>> 11 in x
False
>>> type(x)
<type 'numpy.ndarray'>
>>> x.size
3
>>> x.dtype
dtype('int32')
>>> x.ndim
1
>>> M
array([[[1, 2],
[3, 4]],
[[5, 6],
[7, 8]]])
>>> M.ndim
3
Acronyms:
i1 = int8, i2 = int16, i3 = int32, i4 = int64
f2 = float16, f4 = float32, f8 = float64
>>> np.arange(3)
array([0, 1, 2])
>>> np.arange(3.0)
array([0., 1., 2.])
>>> np.arange(3, 15, 2, dtype ='float')
array([ 3., 5., 7., 9., 11., 13.])
>>> a = np.arange(0,60,5)
>>> a
array([ 0, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55])
>>> np.shape(a)
(12,)
>>> a.shape
(12,)
>>> np.reshape(a, (3,4))
array([[ 0, 5, 10, 15],
[20, 25, 30, 35],
[40, 45, 50, 55]])
>>> b = a.reshape(3,4)
>>> np.shape(b)
(3, 4)
>>> b.shape
(3, 4)
Abhijit Kar Gupta,
14
email: kg.abhi@gmail.com
Unique elements
>>> y = np.array([1,2,1,0.5,10,2,10])
>>> np.unique(y)
array([ 0.5, 1. , 2. , 10. ])
>>> A = np.arange(0,60,5).reshape(3, 4)
>>> for i in A:
print i
[ 0 5 10 15]
[20 25 30 35]
[40 45 50 55]
Abhijit Kar Gupta,
16
email: kg.abhi@gmail.com
>>> for i in np.nditer(A, order = 'F'):
print i
0
20
40
5
25
45
10
30
50
15
35
55
0
5
10
15
20
25
30
35
40
45
50
55
>>> np.vsplit(x, 3)
# same as above (row wise split)
>>> np.hsplit(x, 4)
[array([[1],
[5],
[9]]),
array([[ 2],
[ 6],
[10]]),
array([[ 3],
[ 7],
[11]]),
array([[ 4],
[ 8],
[12]])] Abhijit Kar Gupta,
23
email: kg.abhi@gmail.com
Append
>>> np.sum(x)
450
>>> np.sum(x, 0) # 0 axis
array([120, 150, 180])
>>> np.sum(x, 1) # 1 axis
Abhijit Kar Gupta,
27
email: kg.abhi@gmail.com
>>> np.trace(x)
150
>>> np.trace(x, 1)
80
>>> np.trace(x, -1)
120
>>> np.mean(x)
50.0
>>> np.mean(x,1)
array([20., 50., 80.])
>>> np.mean(x,0)
array([40., 50., 60.])
>>> np.median(x)
50.0
>>> np.median(x,0)
array([40., 50., 60.])
>>> np.median(x,1)
array([20., 50., 80.])
>>> x
array([[ 0, -1, 2, 5],
[10, 3, -2, 4]])
>>> np.ptp(x, 0)
array([10, 4, 4, 1])
>>> np.ptp(x, 1)
array([ 6, 12])
>>> x = np.array([10,20,30])
>>> y = np.array([40,50,60])
>>> np.concatenate((x,y))
array([10, 20, 30, 40, 50, 60])
>>> xx
array([[-5. , -5. , -5. , ..., -5. , -5. , -5. ],
[-4.9, -4.9, -4.9, ..., -4.9, -4.9, -4.9],
[-4.8, -4.8, -4.8, ..., -4.8, -4.8, -4.8],
...,
[ 4.7, 4.7, 4.7, ..., 4.7, 4.7, 4.7],
[ 4.8, 4.8, 4.8, ..., 4.8, 4.8, 4.8],
[ 4.9, 4.9, 4.9, ..., 4.9, 4.9, 4.9]])
>>> yy
array([[-5. , -4.9, -4.8, ..., 4.7, 4.8, 4.9],
[-5. , -4.9, -4.8, ..., 4.7, 4.8, 4.9],
[-5. , -4.9, -4.8, ..., 4.7, 4.8, 4.9],
...,
[-5. , -4.9, -4.8, ..., 4.7, 4.8, 4.9],
[-5. , -4.9, -4.8, ..., 4.7, 4.8, 4.9],
[-5. , -4.9, -4.8, ..., 4.7, 4.8, 4.9]])
>>> np.full((3,3),5)
array([[5, 5, 5],
[5, 5, 5],
[5, 5, 5]])
>>> x[::-1]
array([14, 11, 8, 5, 2])
>>> x[::-3]
array([14, 5])
>>> x.flatten()
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8,
9, 10, 11])
>>> x.flatten(0)
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8,
9, 10, 11])
>>> x.flatten(1)
array([ 0, 3, 6, 9, 1, 4, 7, 10, 2,
5, 8, 11])
Abhijit Kar Gupta,
43
email: kg.abhi@gmail.com
Arithmetic Operations
>>> x = np.array([1, 2, 3, 4])
>>> y = np.array([5, 6, 7, 8])
>>> x*y
array([ 5, 12, 21, 32])
>>> x+y
array([ 6, 8, 10, 12])
>>> x-y
array([-4, -4, -4, -4])
>>> x/y
array([0, 0, 0, 0])
>>> x**y
array([ 1, 64, 2187, 65536])
>>> x + [2, 3, 4]
array([[ 3, 5, 7],
[ 6, 8, 10]])
* Broadcasting!
[[5, 6],
[7, 8]]])
>>> np.stack((x,y),axis = 1)
array([[[1, 2],
[5, 6]],
[[3, 4],
[7, 8]]])
>>> f1 = np.vectorize(f)
>>> type(f1)
<class 'numpy.vectorize'>
>>> f1(x)
array([ 1, 4, 9, 16])
>>> np.inner(u, 2)
array([2, 4, 6])
>>> np.inner(np.eye(3),5))
array([[5., 0., 0.],
[0., 5., 0.],
[0., 0., 5.]])
Abhijit Kar Gupta,
52
email: kg.abhi@gmail.com
# Multidimensional inner product
>>> A = np.array([[1,2,3],[4,5,6]])
>>> B = np.array([[1,0,1],[0,1,0]])
>>> np.inner(A,B)
array([[ 4, 2],
[10, 5]])
>>> np.savetxt('random.dat', R)
Volume of a Parallelepiped:
Three sides are given by three vectors: 𝑨 = 𝟐𝒊 − 𝟑𝒋, 𝑩 = 𝒊 + 𝒋 − 𝒌,
𝑪 = 𝟑𝒊 − 𝒌
Volume = 𝑨. 𝑩 × 𝑪 = 𝟒
>>> A.dot(B)
array([[22, 28],
[49, 64]])
>>> B.dot(A)
array([[ 9, 12, 15],
[19, 26, 33],
[29, 40, 51]])
AB ≠ BA
>>> X = np.matrix(A)
>>> Y = np.matrix(B)
>>> X*Y
matrix([[22, 28],
[49, 64]])
>>> Y*X
matrix([[ 9, 12, 15],
[19, 26, 33],
[29, 40, 51]])
>>> lin.eig(A)
(array([-0.37228132, 5.37228132]),
array([[-0.82456484, -0.41597356],[
0.56576746, -0.90937671]]))
>>> eigen_val, eigen_vec = lin.eig(A)
>>> eigen_vec[:,0]
array([-0.82456484, 0.56576746])
>>> eigen_vec[:,1]
array([-0.41597356, -0.90937671])
>>> p.integ(1)
poly1d([0.33333333, 1. , 3. , 0. ])
>>> p.integ(2)
poly1d([0.08333333, 0.33333333,1.5,0.,0. ])
𝑥 0 10 20 30 40 50 60 70 80 90
𝑦 76 92 106 123 132 151 179 203 227 249
Step 2:
>>> import numpy.polynomial.polynomial as poly
>>> coeffs = poly.polyfit(x, y, 2)
>>> coeffs
array([7.81909091e+01, 1.10204545e+00, 9.12878788e-03])
Step 3:
>>> yfit = poly.polyval(x,coeffs)
>>> yfit
array([ 78.19090909, 90.12424242, 103.88333333,
119.46818182,136.87878788, 156.11515152, 177.17727273,
200.06515152,224.77878788, 251.31818182])
Step 4:
>>> import matplotlib.pyplot as plt
>>> plt.plot(x, y, x, yfit )
>>> plt.show() Abhijit Kar Gupta,
71
email: kg.abhi@gmail.com
Python Script for
Polynomial fitting
# Polynomial fitting by Numpy (with plot)
import numpy as np
import numpy.polynomial.polynomial as poly
import matplotlib.pyplot as plt
x = np.array([0,10,20,30,40,50,60,70,80,90])
y = np.array([76,92,106,123,132,151,179,203,227,249])
coeffs = poly.polyfit(x, y, 2)
yfit = poly.polyval(x, coeffs)
N, T = 10000, 1000
t = np.arange(T)
steps = 2*np.random.randint(0, 2, (N, T)) - 1
plt.plot(t, rms)
plt.show()
Abhijit Kar Gupta,
79
email: kg.abhi@gmail.com
Abhijit Kar Gupta,
80
email: kg.abhi@gmail.com
To extract exponent
t = np.log(t[10:-1:50])
rms = np.log(rms[10:-1:50])
• Series
• DataFrame
• Panel
>>> s.values
array([10, 20, 30, 40], dtype=int64)
>>> s.size
4
>>> s.shape
(4,)
>>> s.ndim
1
>>> s.dtype
dtype('int64')
Abhijit Kar Gupta,
87
email: kg.abhi@gmail.com
>>> s['e'] = 50
>>> s
a 10
b 20
c 30
d 40
e 50
dtype: int64
>>> s*2
a 20
b 40
c 60
d 80
e 100
>>> np.sqrt(s)
a 3.162278
b 4.472136
c 5.477226
d 6.324555
e 7.071068
Abhijit Kar Gupta,
91
dtype: float64 email: kg.abhi@gmail.com
>>> sum(s)
150L
>>> min(s)
10L
>>> max(s)
50L
>>> s[1:4]
b 20
c 30
d 40
dtype: int64
>>> s.sum()
100
>>> s.mean()
25.0
>>> s.std()
12.909944487358056
>>> d = pd.DataFrame(x,index,columns =
['A', 'B', 'C', 'D'])
A B C D
a 10 20 30 40
b 50 60 70 80
>>> d[‘A’][‘a’]
10
>>> d['x'][‘b’]
4
Abhijit Kar Gupta,
97
email: kg.abhi@gmail.com
DataFrame from Dictionary of Series
>>> index = ['a','b','c','d']
>>> s1 = pd.Series([10,20,30,40],index)
>>> s2 = pd.Series([100,200,300,400],index)
>>> d = {'A':s1, 'B':s2}
>>> pd.DataFrame(d)
A B
a 10 100
b 20 200
c 30 300
d 40 400
>>> D = pd.DataFrame(d)
>>> D['A']
a 10
b 20
c 30
d 40
Name: A, dtype: int64
>>> D.iloc[1]
A 20
B 200
C 2000
Name: b, dtype: int64
>>> D[1:3]
A B C
b 20 200 2000
c 30 300 3000
>>> D[1:3]['A']
b 20
c 30
Name: A, dtype: int64
>>> D1
A B C
e 50 500 5000
>>> D.append(D1) # Append another DataFrame
A B C
a 10 100 1000
b 20 200 2000
c 30 300 3000
d 40 400 4000
e 50 500 5000
>>> p.minor_xs(1)
0 1
0 0.061099 0.254202
1 0.916231 0.034463
2 0.228343 0.853884
>>> p = pd.Panel(data)
>>> p
<class 'pandas.core.panel.Panel'>
Dimensions: 2 (items) x 3 (major_axis)
x 4 (minor_axis)
Items axis: A to B
Major_axis axis: a to c
Minor_axis axis: 0 to 3
Abhijit Kar Gupta,
109
email: kg.abhi@gmail.com
>>> p.major_xs('a')
A B
0 0.422049 0.684155
1 0.922664 0.411938
2 0.644187 0.246746
3 0.213998 0.431654
>>> p.minor_xs(1)
A B
a 0.922664 0.411938
b 0.906779 0.573952
c 0.879191 0.233360
>>> p.axes
[Index([u'A', u'B'], dtype='object'), Index([u'a', u'b',
u'c'], dtype='object'), RangeIndex(start=0, stop=4,
step=1)]
>>> p.size
24
>>> p.ndim
3
>>> p.shape
(2, 3, 4)
>>> p.sum(2)
A B
a 2.202900 1.774494
b 2.619835 2.114777
c 1.867491 1.351817