Académique Documents
Professionnel Documents
Culture Documents
06 July 2012
Visixion GmbH
1 / 96
1 2
Useful
Python
Libraries
Array operations with NumPy Basic Array Operations Random Numbers 2d Plotting Exercise: NumPy in Action Selected Financial Topics Approximation Optimization Numerical Integration Case Study: Numerical Option Pricing Binomial Option Pricing Model Python Implementations Monte Carlo Approach Time Series Analysis with pandas Series class DataFrame class Plotting with pandas Exercise: pandas in Action Case Study: Analyzing Stock Quotes with Fast Data Mining with PyTables Introductory PyTables Example Exercise: PyTables with pandas Case Study: Simulation Research Project The Financial Model Python and PyTables Implementation Speeding-up Code with Cython Fundamentals about Cython Example Code for Cython Use Conclusion
Y. Hilpisch (Visixion GmbH)
6 7 8 9 10
pandas
2 / 96
1 2 3
However, it cannot be exhaustive with regard to We intend to provide an overview which allows
Python issues and libraries for Finance Python's capabilities in this eld
4 5
The majority of the content is included in the slides However, we will strive to provide hands-on experience through interactive parts
3 / 96
The Tao of
My
Python:
There is no mystery about my style. My lines of code are simple, direct and non-classical. The extraordinary part of it lies in its simplicity. Every line of code in my Python is being so of itself. There is nothing articial about it. I always believe that the easy way is the right way.
4 / 96
operations
5 / 96
library
6 / 96
Python
is general or
high productivity:
domain of
Python;
Python,
easy-to-maintain:
understand
Python
due to its compactness and readability, team members can easily code from others
low cost:
future-proof :
Python
experts
good performance:
compiled languages
7 / 96
www.dexision.com)
Python
and
research: Python
client projects: Python used to implement client specic nancial applications teaching: Python used to implement and illustrate nancial models in derivatives
course at Saarland University (see Course Web Site)
talks:
we have given a number of talks at Python conferences about the use of Python for Finance book: Python used to illustrate nancial models in our recent bookDerivatives Analytics with PythonMarket-Based Valuation of European and American Stock Index Options
8 / 96
Arrays (I)
NumPy
the convenience of
form and at high speed. The speed comes from the implementation in C. So you have
Python
>>> from numpy import * >>> a=arange(0.0,20.0,1.0) >>> a array([ 0., 1., 2., 3., 4., 5., 6., 11., 12., 13., 14., 15., 16., 17., >>> a.resize(4,5) >>> a array([[ 0., 1., 2., 3., 4.], [ 5., 6., 7., 8., 9.], [ 10., 11., 12., 13., 14.], [ 15., 16., 17., 18., 19.]]) >>> a[0] array([ 0., 1., 2., 3., 4.]) >>> a[3] array([ 15., 16., 17., 18., 19.]) >>> a[1,4] 9.0 >>> a[1,2:4] array([ 7., 8.]) >>>
9., 10.,
Care is to be taken with the conventions regarding array indices. The best way to learn these is to play with arrays.
Python for Finance
9 / 96
Arrays (II)
With
NumPy,
>>> a*0.5 array([[ 0. , [ 2.5, [ 5. , [ 7.5, >>> a**2 array([[ 0., [ 25., [ 100., [ 225., >>> a+a array([[ 0., [ 10., [ 20., [ 30., >>>
0.5, 3. , 5.5, 8. ,
1. , 3.5, 6. , 8.5,
1.5, 4. , 6.5, 9. ,
1., 4., 9., 36., 49., 64., 121., 144., 169., 256., 289., 324., 2., 4., 12., 14., 22., 24., 32., 34.,
10 / 96
Sometimes you need to loop over arrays to check something. Looping is easily done as
>>> b=arange(0.0,25.1,0.5) >>> b array([ 0. , 0.5, 1. , 4.5, 5. , 5.5, 9. , 9.5, 10. , 13.5, 14. , 14.5, 18. , 18.5, 19. , 22.5, 23. , 23.5, >>> for i in range(50): if b[i]==15.0: print "15.0 at
2.5, 3. , 3.5, 4. , 7. , 7.5, 8. , 8.5, 11.5, 12. , 12.5, 13. , 16. , 16.5, 17. , 17.5, 20.5, 21. , 21.5, 22. , 25. ])
index no.", i
15.0 at index no. 30 >>> for i in enumerate(b[0:6]): print i, (0, 0.0) (1, 0.5) (2, 1.0) (3, 1.5) (4, 2.0) (5, 2.5) >>>
The use of
arange
and
range
type while the latter can only generate integers; and indices of arrays are always integers that is why we loop over integers and not over oats or something else.
11 / 96
Random Numbers
Random Numbers
Sciences and Finance cannot live without random numbers, be them either pseudo-random or quasi-random.
NumPy
NumPy
sub-module
random.
>>> from numpy.random import * >>> a=random(20) >>> a array([ 0.66064392, 0.4315458 , 0.70880114, 0.00276342, 0.83383503, 0.24952601, 0.04636591, 0.10729739, 0.19072693, 0.82089409, 0.29784537, 0.35496562, 0.546188 , 0.52711541, 0.07060185, 0.60602829, 0.91907393, 0.52241082, 0.07597062, 0.27253169]) >>> b=standard_normal((4,5)) >>> b array([[-0.59317286, 0.27533818, -0.46122351, -0.05138033, -1.8371135 ], [-1.15520074, 1.04980946, 0.31082909, 0.32662006, -0.36752163], [ 0.66452767, -0.88077193, 1.18253972, 0.16836824, -1.40541028], [ 0.01481426, -0.88137549, 0.74594197, -0.97360666, -0.77270426]]) >>> c=random((2,3,4)) >>> shape(c) (2, 3, 4) >>> c array([[[ 0.09864194, 0.76069475, 0.54398641, 0.73081207], [ 0.81036431, 0.24343805, 0.38178278, 0.9414989 ], [ 0.0533329 , 0.0346994 , 0.67048989, 0.99188034]], [[ 0.27786962, 0.87359556, 0.14993006, 0.20461863], [ 0.59543661, 0.24566182, 0.47176266, 0.3328179 ], [ 0.8340118 , 0.96561975, 0.17854239, 0.81699292]]])
>>>
12 / 96
2d plotting (I)
module
2d Plotting
More often than not, one wants to visualze results from calculations or simulations. The
matplotlib
The most important types of graphics in general are lines, dots and bars.
>>> b=standard_normal((4,5)) >>> b array([[-0.57180547, -1.32783183, -0.27474264, 0.6301795 , 0.71101905], [ 0.29724602, 0.289595 , 0.1056877 , 0.06424294, -0.35708164], [ 0.25890926, 0.79000265, -0.47457278, 0.11719325, 0.39121246], [-0.24544426, 1.59194504, -1.6703606 , -0.00169267, -0.63803156]]) >>> from matplotlib.pyplot import * >>> plot(b) [<matplotlib.lines.Line2D object at 0x2b9e790>] >>> grid(True) >>> axis('tight') (0.0, 50.0, 0.0, 25.0) >>> show()
13 / 96
2d plotting (II)
2d Plotting
matplotlib produces 4 5.
0.5
1.0
1.5
2.0
2.5
3.0
14 / 96
2d plotting (III)
2d Plotting
The next example combines a dot sub-plot with a bar sub-plot the result of which is shown in the next gure. Here, due to resizing of the array we have only a one-dimensional set of numbers.
>>> d=standard_normal((4,5)) >>> d=resize(d,20) >>> d array([ 0.12709036, -1.19800928, 0.22527268, 0.39149983, 0.19080228, 0.57113933, -1.07355946, 0.8428513 , -2.22197056, 1.58069866, 0.6992034 , -1.45520777, 0.42116251, -0.26856476, 1.09870092, 0.83489701, -2.34729449, -0.58642723, 0.34725616, -0.56177434]) >>> subplot(211) <matplotlib.axes.AxesSubplot object at 0x3585590> >>> plot(d,'ro') [<matplotlib.lines.Line2D object at 0x3565490>] >>> subplot(212) <matplotlib.axes.AxesSubplot object at 0x3585250> >>> bar(range(20),d) [<matplotlib.patches.Rectangle object at ...] >>> grid(True) >>> show()
15 / 96
2d plotting (IV)
2.0 1.5 1.0 0.5 0.0 0.5 1.0 1.5 2.0 2.50 2.0 1.5 1.0 0.5 0.0 0.5 1.0 1.5 2.0 2.50
2d Plotting
10
15
20
10
15
20
16 / 96
We generate two series with 50 standard normal (pseudo-)random numbers (array of size
(50, 2),
call it
rn)
Calculate the sum of the two 50 number vectorsonce vector-wise and once using the
sum
rn_cum;
cumsum
it in an array
the vector
4 5
Plot the
rn_cum
vector
rn
17 / 96
Regression (I)
conclusions.
Approximation
It is often the case in Finance that one has to approximate something to draw
Two important approximation techniques are regression and interpolation. The type of regression we consider rst is called ordinary least squares regression (OLS). In its most simple form, ordinary monomials desired function
x, x2 , x3 , ...
a number
of obervations
f (x)
g (x) = a1 + a2 x + a3 x2
where the
ai
min
18 / 96
Regression (II)
Approximation
As an example, we want to approximate the cosine function over the interval given 20 observations.
[0, /2]
The code (see next slide) is straightforward since NumPy has built-in functions polyfit and polyval. From polyfit you get the minimizing regression parameters back, while you can them with polyval to generate values based on these parameters. The result is shown in the next gure for three dierent regression functions.
use
19 / 96
Regression (III)
Approximation
# # Ordinary Least Squares Regression # a_REG . py # from numpy import * from matplotlib . pyplot import * # Regression x = linspace (0.0 , pi /2 , 20 ) y = cos (x ) g1 = polyfit (x ,y ,0 ) g2 = polyfit (x ,y ,1 ) g3 = polyfit (x ,y ,2 ) g1y = polyval ( g1 , x) g2y = polyval ( g2 , x) g3y = polyval ( g3 , x) # Graphical Analysis plot (x ,y , ' y ' ) plot (x , g1y , ' rx ' ) plot (x , g2y , ' bo ' ) plot (x , g3y , ' g > ' )
20 / 96
Regression (IV)
1.2 1.0 0.8 0.6 0.4 0.2 0.0 0.2 0.0
Approximation
0.2
0.4
0.6
0.8
1.0
1.2
1.4
1.6
Figure: Approximation of cosine function (line) with constant regression (red crosses), linear regression (blue dots) and quadratic regression (green triangles)
Y. Hilpisch (Visixion GmbH)
21 / 96
Splines (I)
Approximation
The concept of interpolation is much more involved but nevertheless almost as straightforward in applications. The most common type of interpolation is with cubic splines for which you nd functions in the sub-module
scipy.interpolate.
The example remains the same and the code (see next slide) is as compact as before while the resultsee the respective gureseems almost perfect. Roughly speaking, cubic splines interpolation is (intelligent) regression between every two observation points with a polynomial of order 3. This is of course much more exible than a single regression with a polynomial of order 2. Two drawbacks in algorithmic terms are, however, that the observations have been ordered in the x-dimension. Furthermore, cubic splines are of limited or no use for higher dimensional problems where OLS regression is applicable as easy as in the two-dimensional world.
22 / 96
Splines (II)
Approximation
# # Cubic Spline Interpolation # b_SPLINE . py # from numpy import * from scipy . interpolate import * from matplotlib . pyplot import * # Interpolation x = linspace (0.0 , pi /2 , 20 ) y = cos (x ) gp = splrep (x ,y ,k= 3) gy = splev (x ,gp , der =0 ) # Graphical Analysis plot (x ,y , ' b ' ) plot (x ,gy , ' ro ' )
23 / 96
Splines (III)
1.2 1.0 0.8 0.6 0.4 0.2 0.0 0.0
Approximation
0.2
0.4
0.6
0.8
1.0
1.2
1.4
1.6
Figure: Approximation of cosine function (line) with cubic splines interpolation (red dots)
Y. Hilpisch (Visixion GmbH)
24 / 96
Optimization
Strictly speaking, regression and interpolation are two special forms of optimization (some kind of minimization). However, optimization techniques are needed much more often in science and nance. An important area is, for example, the calibration of derivatives model parameters to a given set of market-observed option prices or implied volatilities. The two major approaches are global and local optimization. While the rst looks for a global minimum or maximum of a function (which does not have to exist at all), the second looks for a local minimum or maximum.
25 / 96
Optimization
[, 0]
with a minimum
at
/2.
delivers respective functions via the sub-module
scipy
optimize.
# # Finding a Minimum # c_OPT . py # from numpy import * from scipy . optimize import * # Finding a Minimum def y(x ): if x <- pi or x > 0: return 0 .0 return sin (x ) gmin = brute (y ,(( - pi ,0 ,0. 01 ) ,) , finish = None ) lmin = fmin (y ,- 0.5 ) # Output print " Global Minimum is " , gmin print " Local Minimum is " , lmin
26 / 96
Optimization
brute
fmin
(local convex
optimization algorithm) also work in multi-dimensional settings. In general, the solution of the local optimization is strongly dependent on the initialization. Here the
0.5
/2
as the solution.
>>> ================================ RESTART ================================ >>> Optimization terminated successfully. Current function value: -1.000000 Iterations: 18 Function evaluations: 36 Global Minimum is -1.57159265359 Local Minimum is [-1.57080078] >>>
27 / 96
Numerical Integration
Numerical Integration
It is not always possible to analytically integrate a given function. Then numerical integration often comes into play. We want to check numerical integration where we can do it analytically as well: 1 x e dx 0 1 0 The value of the integral is e e 1.7182818284590451. For numerical integration, again the function
scipy helps out with the sub-module integrate quad, implementing a numerical quadrature scheme:
which contains
# # Numerically Integrate a Function # d_INT . py # from numpy import * from scipy . integrate import * # Numerical Integration def f(x ): return exp (x ) Int = quad ( lambda u: f(u ),0 ,1 )[ 0] # Output print " Value of the Integral is " , Int
28 / 96
To better understand how to implement the binomial option pricing model of Cox, Ross and Rubinstein (1979, henceforth: CRR), a little more background seems helpful. There are two securities traded in the model: a risky stock index and a risk-less zero-coupon bond. The time horizon
[0, T ]
T /t + 1
points in time
t {0, t, 2 t, ..., T }. r,
the stock
The zero-coupon bond grows p.a. in value with the risk-less short rate
Bt = B0 ert
where
B0 > 0. S0
at
Starting from a strictly positive, xed stock index level of index evolves according to the law
t = 0,
St+t St m
where Here,
m 0
is selected randomly from {u, d}. < d < ert < u e t as well as
a recombining tree.
29 / 96
St
= =
q=
The value of a European call option payos
ert d ud
discounting the nal
t = T t .
30 / 96
From an algorithmical point of view, one has to rst generate the index level values, determines then the nal payos of the call option and nally discounts them back. This is what we now will do, starting with a somewhat `naive' implementation. But before we do it, we generate a
Python
that we will need for dierent implementations afterwards. All parameters can be imported by using the module name is
import
a_Parameters).
a_Parameters.py
31 / 96
# # Model Parameters for European Call Option and Binomial Models # a_Parameters . py # from math import exp , sqrt # Option Parameters s0 = 105 . 00 # Initial Index Level K = 100 . 00 # Strike Level T = 1. # Call Option Maturity r = 0 . 05 # Constant Short Rate vola = 0 . 25 # Constant Volatility of Diffusion # Time Parameters t = 3 # Time Intervals delta = T/ t # Length of Time Interval df = exp (-r * delta ) # Discount Factor # u d q Binomial Parameters = exp ( vola * sqrt ( delta )) = 1 /u = ( exp ( r* delta )-d )/( u -d ) # Up - Movement # Down - Movement # Martingale Probability
The next slide presents the rst version of the binomial model which uses Excel-like cell iterations extensively. We will see that there are ways to a more compact and faster implementation.
32 / 96
Python Implementations
# # Valuation of European Call Option in CRR1979 Model # Naive Version (= Excel - like Iterations ) # b_CRR1979_Naive . py # from numpy import * from a_Parameters import * # Array Initialization for Index Levels s = zeros (( t+1 ,t+1) , ' float ' ) s[0 ,0] = s0 z = 0 for j in range (1 ,t+1 , 1 ): z = z+1 for i in range (z +1 ): s[i , j] = s[0 , 0 ]*( u ** j )*( d **( i*2 )) # Array Initialization for Inner Values iv = zeros (( t+1 ,t +1), ' float ' ) z = 0 for j in range (0 ,t+1 , 1 ): for i in range (z +1 ): iv [i ,j] = round ( max (s[i ,j]-K , 0),8) z = z+1 # Valuation pv = zeros (( t+1 ,t +1), ' float ' ) # Present Value Array pv [: , t] = iv [: , t] # Last Time Step Initial Values z = t+1 for j in range (t -1 ,-1 ,-1 ): z = z -1 for i in range (z ): pv [i ,j] = ( q* pv [i ,j+1 ]+( 1 -q )* pv [i+1 ,j+1 ])* df # Output print " Value of European call option is " , pv [0 ,0]
33 / 96
Python Implementations
A run of the module gives the following output and arrays where one can follow the three steps easily (index levels, inner values, discounting):
European call option is 16.2929324488 105. 0. 0. 0. 5. 0. 0. 0. , 121.30377267, 140.13909775, , 90.88752771, 105. , , 0. , 78.67183517, , 0. , 0. , 161.89905958], 121.30377267], 90.88752771], 68.09798666]])
34 / 96
Python Implementations
NumPy
the consequence is
# Valuation pv = maximum (s -K , 0) Qu = zeros (( t+1 ,t +1), ' float ' ) Qu [: ,:] = q Qd = 1 - Qu z = 0 for i in range (t -1 ,-1 ,-1 ): pv [0:t -z ,i] = ( Qu [ 0:t -z , i ]* pv [ 0:t -z , i+1 ]+ Qd [0:t -z ,i ]* pv [1:t -z+1 ,i +1 ])* df z = z+1 # Output print " Value of European call option is " , pv [0 ,0]
35 / 96
Python Implementations
The valuation result is, as expected, the same for the parameter denitions from before. However, three time intervals are of course not enough to come close to the Black-Scholes benchmark of 15.6547197268. With 1,000 time intervals, however, the algorithms come quite close to it:
>>> ================================ RESTART ================================ >>> Value of European call option is 15.6537846075 >>>
36 / 96
Comparison
Python Implementations
The major dierence between the two algorithms is execution time. The second implementation which avoids about 30 times faster than the rst one. You should make this a principle for your own coding eorts: whenever possible avoid necessary iterations in
Python
Python
NumPy.
Apart from time savings, you generally also get more compact and readable code. A direct comparison illustrates this point:
# Naive Version --- Iterations in Python # # Array Initialization for Inner Values iv = zeros((t+1,t+1),'float') z = 0 for j in range(0,t+1,1): for i in range(z+1): iv[i,j] = max(s[i,j]-K,0) z = z+1 # Advanced Version --- Iterations with NumPy/C # pv = maximum(s-K,0)
37 / 96
Python Implementations
To conclude this section, we apply the Fast Fourier Transform (FFT) algorithm to
Nowadays this numerical routine plays a central role in Derivatives Analytics and other areas of science. It is used regularly for plain vanilla option pricing in productive environments in investment banks or hedge funds. In general, however, it is not applied to a binomial model but the application in this case is straightforward and therefore a quick win for us. In this module (see next slide),
Python
since for European options only the nal payos are relevant. The speed advantage of this algorithm is again considerable: it is 30 times faster than our advanced algorithm from before and 900 times faster than the naive version.
38 / 96
Python Implementations
# # Valuation of European Call Option in CRR1979 Model # FFT Version # d_CRR1979_FFT . py # from numpy import * from numpy . fft import fft , ifft from a_Parameters import * # Array Generation for Index Levels md = arange (t+ 1) mu = resize ( md [-1] ,t+ 1) mu = u **( mu - md ) md = d ** md s = s0 * mu * md # Valuation by FFT C_T = maximum (s -K ,0) Q = zeros (t+1 , ' float ' ) Q[ 0] = q Q[ 1] = 1 - q l = sqrt ( t+1 ) v1 = ifft ( C_T )* l v2 = ( sqrt ( t+1 )* fft (Q )/( l *( 1 +r* delta )))** t C_0 = fft ( v1 * v2 )/ l # Output print " Value of European call option is " , real ( C_0 [0 ])
39 / 96
Finally, we apply Monte Carlo simulation (MCS) to value the same European call
Here it is where pseudo-random numbers come into play. Similarly to the FFT algorithm we only care about the nal index level at simulate it with pseudo-random numbers. We get the following simple simulation algorithm:
and
ST = S0 e(r 2
)T +
T wT
(1)
T T
iterate until i = I sum up all inner values at T , take the average and discount back to t = 0:
T
draw a standard normally distributed pseudo-random number w (i) determine at T the index level S (i) by applying the number w (i) to equation (1) determine the inner value of the call at T as max[S (i) K, 0]
C0 (K, T ) erT 1 I max[ST (i) K, 0]
I
this is the MCS estimator for the European call option value
Y. Hilpisch (Visixion GmbH)
40 / 96
Although the word `iterating' sounds like looping over arrays we can again avoid array loops completely on the
Python/NumPy
Python
level.
core algorithm With another 5 lines we can produce a histogram of the index levels at displayed in the respective gure.
as
# # Valuation of European Call Option via Monte Carlo Simulation # g_MCS . py # from numpy . random import * from matplotlib . pyplot import * from a_Parameters import * from numpy import * # Valuation via MCS paths = 500000 rand = standard_normal ( paths ) sT = s0 * exp (( r -0.5 * vola ** 2 )* T+ sqrt ( T )* vola * rand ) pv = sum ( maximum (sT -K ,0 )* exp (-r *T ))/ paths print " Value of European call option is " , pv # Graphical Analysis figure () hist ( sT , 100 ) xlabel ( ' index level at T ' ) ylabel ( ' frequency ' ) show ()
Y. Hilpisch (Visixion GmbH)
41 / 96
15000
frequency
10000
5000
0 50
50
200
250
42 / 96
The algorithm produces quite an accurate estimate for the European call option value although the implementation is rather simplistic (i.e. there are, for example, no variance reduction techniques involved):
>>> ================================ RESTART ================================ >>> Value of European call option is 15.6306695905 >>>
43 / 96
Series class
Series class is explicitly designed to handle indexed (time) series1 s is a Series object, s.index gives its index simple example is s=Series([1,2,3,4,5],index=['a','b','c','d','e'])
In [16]: s=Series([1,2,3,4,5],index=['a','b','c','d','e']) In [17]: s Out[17]: a 1 b 2 c 3 d 4 e 5 In [18]: s.index Out[18]: Index([a, b, c, d, e], dtype=object) In [19]: s.mean() Out[19]: 3.0 In [20]:
There are lots of useful methods in the
Series
class ...
44 / 96
Series class
pandas
DateRange
In [3]: x=standard_normal(250) In [4]: index=DateRange('01/01/2012',periods=len(x)) In [5]: s=Series(x,index=index) In [6]: s Out[6]: 2012-01-02 2012-01-03 2012-01-04 2012-01-05 ...
45 / 96
DataFrame class
data.frame
indexed
DataFrame
multiple,
maybe
hierarchically
class,
(time) series
The following example illustrates some convenient features of the i.e. data alignment and handling of missing data
DataFrame
In [35]: s=Series(standard_normal(4),index=['1','2','3','5']) In [36]: t=Series(standard_normal(4),index=['1','2','3','4']) In [37]: df=DataFrame({'s':s,'t':t}) In [38]: df['SUM']=df['s']+df['t'] In [39]: print df.to_string() s t SUM 1 -0.125697 0.016357 -0.109340 2 0.135457 -0.907421 -0.771964 3 1.549149 -0.599659 0.949491 4 NaN 0.734753 NaN 5 -1.236310 NaN NaN In [40]: df['SUM'].mean() Out[40]: 0.022728863312009556
Y. Hilpisch (Visixion GmbH)
46 / 96
The two main pandas classes have methods for easy plotting
The
Series
and
DataFrame
plot
and
hist
In [54]: index=DateRange(start='1/1/2013',periods=250) In [55]: x=standard_normal(250) In [56]: y=standard_normal(250) In [57]: df=DataFrame({'x':x,'y':y},index=index) In [58]: df.cumsum().plot() Out[58]: <matplotlib.axes.AxesSubplot at 0x3082c10> In [59]: df['x'].hist() Out[59]: <matplotlib.axes.AxesSubplot at 0x3468190> In [60]:
47 / 96
48 / 96
We generate three series with 100 standard normal (pseudo-)random numbers (array of size
(100, 3)).
DataFrame
2013 with 30-day steps and give the three series the names
3
['A','B','C']
We then generate a 4-th series with name sum of the three other series.
CUMSUM
4 5 6
We also generate a histogram for the 3rd series with We save the histogram as PDF le.
plot. matplotlib
and set
bins=20.
49 / 96
and store it in a
pandas DataFrame
object
Series
data analysis:
shift
method of the
object) and generate a new column with the log returns in the
calculate 252 day rolling means and standard deviations of the log returns as well as their rolling correlations and generate respective columns
plotting:
plot the log returns together with the daily DAX quotes into a single gure;
plot in another gure the rolling means and the standard deviations of the log returns as well as their correlation
data storage:
HDFStore
G
save the
pandas DataFrame
to a
PyTables/HDF5
data gathering:
50 / 96
1. Data Gathering
# # Analysis of Historical Stock Data # with pandas # RFE_Data . py # # (c) Visixion GmbH # Script for Illustration Purposes Only . # from pylab import * # 1. Data Gathering from pandas . io . data import * DAX = DataReader ( ' ^ GDAXI ' , ' yahoo ' , start = ' 01 / 01 / 2000 ' )
51 / 96
# 2. Data Analysis from pandas import * DAX [ ' Ret ' ]= log ( DAX [ ' Close ' ]/ DAX [ ' Close ' ]. shift (1 )) DAX [ ' rMe ' ]= rolling_mean ( DAX [ ' Ret ' ], 252 )* 252 DAX [ ' rSD ' ]= rolling_std ( DAX [ ' Ret ' ], 252 )* sqrt ( 252 ) DAX [ ' Cor ' ]= rolling_corr ( DAX [ ' rMe ' ], DAX [ ' rSD ' ], 252 )
52 / 96
print DAX.ix[-20:].to_string() Open High Low Date 2012-05-31 6297.68 6322.69 6208.09 2012-06-01 6259.76 6259.76 6008.47 2012-06-04 5976.46 6030.81 5942.38 2012-06-05 5999.86 6011.56 5914.43 2012-06-06 6028.36 6102.42 5996.41 2012-06-07 6117.76 6230.22 6099.08 2012-06-08 6082.63 6144.76 6053.95 2012-06-11 6255.65 6287.54 6130.28 2012-06-12 6141.92 6211.14 6083.81 2012-06-13 6183.80 6221.36 6093.61 2012-06-14 6146.92 6167.49 6078.22 2012-06-15 6164.56 6251.59 6158.78 2012-06-18 6304.77 6316.14 6221.87 2012-06-19 6254.77 6375.27 6233.25 2012-06-20 6364.06 6402.21 6333.97 2012-06-21 6357.25 6427.49 6331.79 2012-06-22 6273.10 6318.06 6256.34 2012-06-25 6229.43 6229.43 6118.72 2012-06-26 6157.84 6165.28 6109.93 2012-06-27 6155.91 6230.51 6131.30
Close 6264.38 6050.29 5978.23 5969.40 6093.99 6144.22 6130.82 6141.05 6161.24 6152.49 6138.61 6229.41 6248.20 6363.36 6392.13 6343.13 6263.25 6132.39 6136.69 6228.99
Volume Adj Close 33014600 42856100 23699300 22355900 32200300 28859800 22742300 29749700 28227200 28021500 29461700 70434200 28946700 25250900 22461300 30737700 25903100 25886800 25550800 25213500 6264.38 6050.29 5978.23 5969.40 6093.99 6144.22 6130.82 6141.05 6161.24 6152.49 6138.61 6229.41 6248.20 6363.36 6392.13 6343.13 6263.25 6132.39 6136.69 6228.99
Ret -0.002618 -0.034773 -0.011982 -0.001478 0.020657 0.008209 -0.002183 0.001667 0.003282 -0.001421 -0.002259 0.014683 0.003012 0.018263 0.004511 -0.007695 -0.012673 -0.021115 0.000701 0.014929
rMe -0.125673 -0.154371 -0.180338 -0.169200 -0.150697 -0.159234 -0.148888 -0.146535 -0.150797 -0.150285 -0.171289 -0.155601 -0.134741 -0.112545 -0.106139 -0.122593 -0.152372 -0.184679 -0.189818 -0.178054
rSD 0.302519 0.304405 0.304262 0.304029 0.304764 0.304395 0.304165 0.304173 0.304089 0.304087 0.303470 0.303859 0.303387 0.303949 0.303985 0.303932 0.303660 0.304118 0.304050 0.304430
Cor -0.698396 -0.695765 -0.692978 -0.690525 -0.687612 -0.684112 -0.680271 -0.676090 -0.671883 -0.666613 -0.661152 -0.654909 -0.648260 -0.640783 -0.633014 -0.625235 -0.617441 -0.609470 -0.600965 -0.592066
53 / 96
3. Plotting (I)
# 3. Plotting figure () subplot ( 211 ) DAX [ ' Close ' ]. plot () ylabel ( ' Index Level ' ) subplot ( 212 ) DAX [ ' Ret ' ]. plot () ylabel ( ' Log Returns ' ) DAX [[ ' rMe ' , ' rSD ' , ' Cor ' ]]. plot ()
54 / 96
3. Plotting (II)3
Index Level
3
9000 8000 7000 6000 5000 4000 3000 2000 001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 0.15 2 0.10 0.05 0.00 0.05 0.10 2 2 1 1 5 7 3 9 8 0 6 4 200 200 200 200 200 200 200 200 200 201 201 201
Log Returns
55 / 96
3. Plotting (III)4
1.0 0.5
0.0
0.5
1.0 2 2 1 1 5 7 3 9 8 0 6 4 200 200 200 200 200 200 200 200 200 201 201 201
56 / 96
# 4. Data Storage h5file = HDFStore ( ' DAX . h5 ' ) h5file [ ' DAX ' ]= DAX h5file . close ()
57 / 96
# # Analysis of Historical Stock Data with pandas # RFE_Data . py # # (c) Visixion GmbH # Script for Illustration Purposes Only . # from pylab import * # 1. Data Gathering from pandas . io . data import * DAX = DataReader ( ' ^ GDAXI ' , ' yahoo ' , start = ' 01 / 01 / 2000 ' ) # 2. Data Analysis from pandas import * DAX [ ' Ret ' ]= log ( DAX [ ' Close ' ]/ DAX [ ' Close ' ]. shift ( 1 )) DAX [ ' rMe ' ]= rolling_mean ( DAX [ ' Ret ' ], 252 )* 252 DAX [ ' rSD ' ]= rolling_std ( DAX [ ' Ret ' ], 252 )* sqrt ( 252 ) DAX [ ' Cor ' ]= rolling_corr ( DAX [ ' rMe ' ], DAX [ ' rSD ' ], 252 ) # 3. Plotting figure () subplot ( 211 ); DAX [ ' Close ' ]. plot (); ylabel ( ' Stock Price ' ) subplot ( 212 ); DAX [ ' Ret ' ]. plot (); ylabel ( ' Log Returns ' ) DAX [[ ' rMe ' , ' rSD ' , ' Cor ' ]]. plot () # 4. Data Storage h5file = HDFStore ( ' DAX . h5 ' ) h5file [ ' DAX ' ]= DAX h5file . close ()
58 / 96
PyTables
A Pythonic database5
PyTables is a package for managing hierarchical datasets and designed to eciently and easily cope with extremely large amounts of data. PyTables is built on top of the HDF5 library, using the Python language and the NumPy package. It features an object-oriented interface that, combined with C extensions for the performance-critical parts of the code ..., makes it a fast, yet extremely easy to use tool for interactively browse, process and search very large amounts of data. One important feature of PyTables is that it optimizes memory and disk resources so that data takes much less space (specially if on-ight compression is used) than other solutions such as relational or object oriented databases. One characteristic that sets PyTables apart from similar tools is its capability to perform extremely fast queries on your tables in order to facilitate as much as possible your main goal: get important information *out* of your datasets.
59 / 96
openFile: create new le or open existing le, like in h5=openFile('data.h5','w'); 'r'=read only, 'a'=read/write .close(): close database, like in h5.close() h5.createGroup: create a new group, as in group=h5.createGroup(root,'Name') IsDescription: class for column descriptions of tables, used as in: class Row(IsDescription): name = StringCol(20,pos=1) data = FloatCol(pos=2) h5.createTable: create new table, as in tab=h5.createTable(group,'Name',Row) tab.iterrows(): iterate over table rows tab.where('condition'): SQL-like queries with exible conditions tab.row: return current/last row of table, used as in r=tab.row row.append(): append row to table, as in r.append() tab.flush(): ush table buer to disk/le h5.createArray: create an array, as in arr=h5.createArray(group,'Name',zeros((10,5))
60 / 96
In [62]: tab=h5.createTable(h5.root,'Numbers',Row) In [63]: tab Out[63]: /Numbers (Table(0,)) '' description := { "number": Float64Col(shape=(), dflt=0.0, pos=0), "sqrt": Float64Col(shape=(), dflt=0.0, pos=1)} byteorder := 'little' chunkshape := (512,) In [64]: r=tab.row In [65]: for x in range(1000): ....: r['number']=x ....: r['sqrt']=sqrt(x) ....: r.append() ....:
Y. Hilpisch (Visixion GmbH)
61 / 96
In [66]: tab Out[66]: /Numbers (Table(0,)) '' description := { "number": Float64Col(shape=(), dflt=0.0, pos=0), "sqrt": Float64Col(shape=(), dflt=0.0, pos=1)} byteorder := 'little' chunkshape := (512,) In [67]: tab.flush() In [68]: tab Out[68]: /Numbers (Table(1000,)) '' description := { "number": Float64Col(shape=(), dflt=0.0, pos=0), "sqrt": Float64Col(shape=(), dflt=0.0, pos=1)} byteorder := 'little' chunkshape := (512,) In [69]: tab[:5] Out[69]: array([(0.0, 0.0), (1.0, 1.0), (2.0, 1.4142135623730951), (3.0, 1.7320508075688772), (4.0, 2.0)], dtype=[('number', '<f8'), ('sqrt', '<f8')]) In [70]:
62 / 96
In [8]: h5 Out[8]: File(filename=Test_Data.h5, title='', mode='a', rootUEP='/', filters=Filters(complevel=0, shuffle=False, fletcher32=False)) / (RootGroup) '' /Numbers (Table(1000,)) '' description := { "number": Float64Col(shape=(), dflt=0.0, pos=0), "sqrt": Float64Col(shape=(), dflt=0.0, pos=1)} byteorder := 'little' chunkshape := (512,) In [9]: tab=h5.root.Numbers In [10]: tab[:5]['sqrt'] Out[10]: array([ 0. , 1. , 1.41421356, 1.73205081, 2. ])
In [11]: from pylab import * In [12]: plot(tab[:]['sqrt']) Out[12]: [<matplotlib.lines.Line2D at 0x7fe65cf12d10>] In [13]: show()
63 / 96
create a le: rst create a PyTables database le row description: by sub-classing from IsDescription
Row
containing a
name,
number
and the
3 4
create group: create a group with name create table: in the group Tables create
sub-class
square Tables
of that number
Row
Numbers
using the
populate table:
square
i-th number,
the
number
Numbers;
the name
is the square of the random number determine the mean, meadian and standard deviation of both the
data analysis:
number column and the square colum; regress the square column against the number column using polyfit and polyval visualization: generate a histogram of the square column and plot the cumulative sum of the number column array: create a group named Arrays, create an array of size (1000,1000) in it,
polulate the array with random numbers, double each random number, save the array and close the le
64 / 96
1 2 3
transfer data: transfer the data stored in the table plot the data: plot the data in the DataFrame object close the le: after completion, close the le
65 / 96
Python
such as
physics engineering nance ... storing results automatically to a PyTables database or automatically generating Latex table output
Frequently, the results of such a project are to be documented and published in the form of a Latex document This case study is about potential approaches to automize numerical research with Python, e.g. by
The example project is taken from nance: we compare valuation results for European call options from Monte Carlo simulations with their analytical values
66 / 96
T, 0 < T < {, F , F, P }
0tT
(2)
with
0 < S0 < , r F,
i.e.
constant
Zt
Ft F (S0st )
dBt = rdt Bt
the time
(3)
t 0t<T
with
67 / 96
St ,
the SDE
St Stt Bt Btt
for
= =
r 2
t+
tzt
(4)
ert zt
(5)
t {t, ..., T }
this scheme is an Euler discretization which is known to be exact for the geometric Brownian motion (2)
68 / 96
value of an
FT measurable
contingent claim
VT hT (ST ) 0
Vt = EQ t (Bt (T )VT )
with
V0 = EQ 0 (B0 (T )VT )
and
P Q
follows
the Black-Scholes-Merton (BSM) model (2)(3) is known to be complete from which uniqueness of the risk-neutral measure the dening characteristic of martingale in the BSM model, the present value of a call option is given by
6 E () t
69 / 96
I index level paths with M + 1 St,i , t {0, ..., T }, i {1, ..., I } t=T t=0
the option value is calculate the MCS estimator
for for
VT,i = hT (ST,i )
by arbitrage
0M CS = erT 1 V I
VT,i
i=1
(6)
70 / 96
(7)
N(d) d1 d2
= = =
1 2 log log
S0 K
e 2 x dx
2 )T 2 2 )T 2
+ (r + T + (r T
S0 K
71 / 96
We set up a Monte Carlo simulation study for the valuation of European call options
We want to evaluate the impact of dierent simulation congurations on the accuracy of the MCS estimator (6) As benchmark we have available the analytical option value from formula (7) As model parameters we chose:
M {25, 50}
I {25000, 50000}
All in all, we get 16 dierent congurations for the simulation set-up We say that a valuation is accurate if the valuation error is smaller than 1 percent or smaller than 1 cent
72 / 96
# General Simulation Parameters write = True cL =[( False , False ) ,( False , True ) ,( True , False ) ,( True , True )] # 1st = moMatch -- Random Number Correction ( std + mean + drift ) # 2nd = antiPaths -- Antithetic Paths for Variance Reduction mL =[ 25 , 50 ] # Time Steps iL =[ 25000 , 50000 ] # Number of Paths per Valuation SEED = 100000 # Seed Value R= 10 # Number of Simulation Runs PY1 = 0. 010 # Performance Yardstick 1 : Abs . Error in Currency Units PY2 = 0. 010 # Performance Yardstick 2 : Rel . Error in Decimals tL =[ 1 .0/ 12 , 1.0/2 ,1 .0] # Maturity List kL =[ 90 , 100 , 110 ] # Strike List
for c in cL : # Variance Reduction Techniques moMatch , antiPaths = c for M in mL : # Number of Time Steps for I in iL : # Number of Paths ... # Name of the Simulation Setup name =( ' Call_ ' + str ( R )+ ' _ ' + str (M )+ ' _ ' + str (I / 1000 )+ ' _ ' + str ( moMatch )[ 0 ]+ str ( antiPaths )[ 0 ]+ ' _ ' + str ( PY1 * 100 )+ ' _ ' + str ( PY2 * 100 )) seed ( SEED ) # RNG seed value for i in range (R ): # Simulation Runs ... for T in tL : # Times -to - Maturity ... for K in kL : # Strikes
Y. Hilpisch (Visixion GmbH)
73 / 96
# Function for Random Numbers def RNG (M ,I ): if antiPaths == True : randh = standard_normal (( M +1 , I/2 )) rand = concatenate (( randh ,- randh ), 1) else : rand = standard_normal (( M+1 ,I )) if moMatch == True : rand = rand / std ( rand ) rand = rand - mean ( rand ) return rand
Matching of 1. moment for index level dynamics:
# Function for BSM Index Process def eulerSLog ( S0 , vol , r ): ran = RNG (M ,I ) sdt = sqrt ( dt ) S= zeros (( M+1 ,I), ' d ' ) S[0 ,:]= log ( S0 ) for t in range (1 , M+1 ,1 ): S[t ,:]+= S [t -1 ,:] S[t ,:]+=( r - vol ** 2/2 )* dt S[t ,:]+= vol * ran [t ]* sdt if moMatch == True : S[t ,:] -= mean ( vol * ran [t ]* sdt ) return exp (S )
74 / 96
for c in cL : # Variance Reduction Techniques moMatch , antiPaths = c for M in mL : # Number of Time Steps for I in iL : # Number of Paths ... seed ( SEED ) # RNG seed value for i in range (R ): # Simulation Runs ... for T in tL : # Times -to - Maturity ... for K in kL : # Strikes h= maximum (S[ -1] -K ,0 ) # Inner Value Vector ## MCS Estimator V0_MCS = exp (- r*T )* sum (h )/ I ## BSM Analytical Value V0 = BSM_Call (S0 ,K ,T ,r , vol , 0) ## Errors diff = V0_MCS - V0 rdiff = diff / V0 absError . append ( diff ) relError . append ( rdiff * 100 ) ... if abs ( diff )< PY1 or abs ( diff )/ V0 < PY2 : print " Accuracy ok !\ n"+ br CORR = True else : print " Accuracy NOT ok !\ n" + br CORR = False ; errors = errors +1
75 / 96
PyTables
# # Creating a Database for Simulation Results # (c) Visixion GmbH - Y. Hilpisch # Script for illustration purposes only # from tables import * from numpy import * # Record to store set of simulation results class SimResult ( IsDescription ): id_number = Int32Col ( pos =1 ) sim_name = StringCol ( 32 , pos =2) seed = Int32Col ( pos =3 ) runs = Int32Col ( pos =4 ) time_steps = Int32Col ( pos =5 ) paths = Int32Col ( pos = 6) ... # Record to store single simulation result class ValResult ( IsDescription ): id_number = Int32Col ( pos = 1) sim_name = StringCol (32 , pos =2) opt_T = Float32Col ( pos = 3) opt_K = Float32Col ( pos = 4) euro_ana = Float32Col ( pos = 5) euro_mcs = Float32Col ( pos = 6) correct = StringCol (8 , pos =7 ) val_err_abs = Float32Col ( pos =8 ) ...
76 / 96
PyTables
# Generate new hdf5 file for results storage filename = " MCS_Results_Comp . h5 " def CreateFile ( filename ): h5file = openFile ( filename , mode = "w" , title = " BSM_MCS_Results " ) ## Open / Generate hdf5 file in " write " mode group = h5file . createGroup ( "/" , ' results ' , ' Results ' ) ## Create a group called " Results " h5file . createTable ( group , ' Sim_Results ' , SimResult , " Simulation Results " ) ## In the group " Results ": ## Create a table called " Simulation Results " with Record " SimResult " h5file . createTable ( group , ' Val_Results ' , ValResult , " Valuation Results ") ## Create a table called " Valuation Results " with Record " ValResult " h5file . close ()
77 / 96
PyTables
# Fill the table with simulation results def ResWrite ( name , SEED ,R ,M ,I , moMatch , antiPaths ,l , atol , rtol , errors , absError , relError ,t1 , t2 , d1 , d2 ): h5file = openFile ( filename , mode = "a" ) table = h5file . root . results . Sim_Results simres = table . row idn = 1 if len ( table )> 0: for x in table . iterrows (): idn = max ( idn ,x[ ' id_number ' ]); idn = idn + 1 simres [ ' id_number ' ] = idn simres [ ' sim_name ' ] = name simres [ ' seed ' ] = SEED simres [ ' runs ' ] = R simres [ ' time_steps ' ] = M simres [ ' paths ' ] = I simres [ ' mo_match ' ] = moMatch simres [ ' anti_paths ' ] = antiPaths simres [ ' opt_prices ' ] = l simres [ ' abs_tol ' ] = atol simres [ ' rel_tol ' ] = rtol simres [ ' errors ' ] = errors simres [ ' error_ratio ' ]= float ( errors )/ l simres [ ' av_val_err ' ] = sum ( array ( absError ))/ l ... simres [ ' time_opt ' ] = (t2 - t1 )/ l simres [ ' start_date ' ] = str ( d1 ) simres [ ' end_date ' ] = str ( d2 ) simres . append () table . flush () h5file . close ()
Y. Hilpisch (Visixion GmbH)
78 / 96
PyTables
... from MCS_Results_PyTables import * ... write = True ... for c in cL : # Variance Reduction Techniques moMatch , antiPaths = c for M in mL : # Number of Time Steps for I in iL : # Number of Paths if write == True : h5file = openFile ( filename , mode = ' a ' ) ... if write == True : ValWrite ( h5file , name ,T ,K ,V0 , V0_MCS , str ( CORR ) , M ,I , str ( moMatch ), str ( antiPaths ), datetime . now ()) ... if write == True : h5file . close () ... if write == True : ResWrite ( name , SEED ,R ,M ,I , str ( moMatch ) , str ( antiPaths ),l , PY1 , PY2 , errors , absError , relError ,t1 ,t2 ,d1 , d2 )
79 / 96
PyTables
# Print simulation results ( Latex table output ) def PrintTex ( filename = filename , idl =0 , idh = 50 ): h5file = openFile ( filename , mode ="r ") table = h5file . root . results . Sim_Results for simres in table . where ( ''' idl <= id_number <= idh ''' ): print ( str ( simres [ ' runs ' ])+ ' & ' + str ( simres [ ' time_steps ' ])+ ' & ' + str ( simres [ ' paths ' ])+ ' & ' + str ( simres [ ' mo_match ' ])+ ' & ' + str ( simres [ ' anti_paths ' ])+ ' & ' + ' %. 3f ' % simres [ ' abs_tol ' ]+ ' & ' + ' %. 3f ' % simres [ ' rel_tol ' ]+ ' & ' + str ( simres [ ' opt_prices ' ])+ ' & ' + str ( simres [ ' errors ' ])+ ' & ' + ' %. 3f ' % simres [ ' av_val_err ' ]+ ' & ' + ' %. 3f ' % simres [ ' time_opt ' ]+ " \\ tn ") h5file . close ()
80 / 96
False & False & 0.010 & 0.010 & 90 & 22 & 0.013 & 0.059 \tn False & False & 0.010 & 0.010 & 90 & 9 & -0.010 & 0.093 \tn False & False & 0.010 & 0.010 & 90 & 8 & -0.005 & 0.088 \tn False & False & 0.010 & 0.010 & 90 & 6 & -0.017 & 0.152 \tn False & True & 0.010 & 0.010 & 90 & 10 & 0.002 & 0.051 \tn False & True & 0.010 & 0.010 & 90 & 5 & 0.008 & 0.081 \tn False & True & 0.010 & 0.010 & 90 & 12 & -0.005 & 0.078 \tn False & True & 0.010 & 0.010 & 90 & 1 & -0.008 & 0.143 \tn True & False & 0.010 & 0.010 & 90 & 5 & 0.010 & 0.066 \tn True & False & 0.010 & 0.010 & 90 & 2 & -0.006 & 0.108 \tn True & False & 0.010 & 0.010 & 90 & 3 & -0.008 & 0.104 \tn True & False & 0.010 & 0.010 & 90 & 5 & -0.007 & 0.188 \tn True & True & 0.010 & 0.010 & 90 & 13 & 0.001 & 0.061 \tn True & True & 0.010 & 0.010 & 90 & 3 & 0.009 & 0.101 \tn True & True & 0.010 & 0.010 & 90 & 11 & -0.004 & 0.100 \tn True & True & 0.010 & 0.010 & 90 & 1 & -0.006 & 0.197 \tn
81 / 96
\newcommand{\tn}{\tabularnewline} \def\TMP{\begin{center}\begin{tabular}{c c c c c r r r r r r} \hline\hline $R$ & $M$ & $I$ & \textbf{MM} & \textbf{AP}& \textbf{ATol}& \textbf{RTol}& \textbf{\#Op} & \textbf{Err} & \textbf{AvEr} & \textbf{Sec/O.} \tn [0.5ex] % inserts table heading \hline % inserts single horizontal line 10 & 25 & 25000 & False & False & 0.010 & 0.010 & 90 & 22 & 0.013 & 0.059 \tn 10 & 25 & 50000 & False & False & 0.010 & 0.010 & 90 & 9 & -0.010 & 0.093 \tn 10 & 50 & 25000 & False & False & 0.010 & 0.010 & 90 & 8 & -0.005 & 0.088 \tn 10 & 50 & 50000 & False & False & 0.010 & 0.010 & 90 & 6 & -0.017 & 0.152 \tn 10 & 25 & 25000 & False & True & 0.010 & 0.010 & 90 & 10 & 0.002 & 0.051 \tn 10 & 25 & 50000 & False & True & 0.010 & 0.010 & 90 & 5 & 0.008 & 0.081 \tn 10 & 50 & 25000 & False & True & 0.010 & 0.010 & 90 & 12 & -0.005 & 0.078 \tn 10 & 50 & 50000 & False & True & 0.010 & 0.010 & 90 & 1 & -0.008 & 0.143 \tn 10 & 25 & 25000 & True & False & 0.010 & 0.010 & 90 & 5 & 0.010 & 0.066 \tn 10 & 25 & 50000 & True & False & 0.010 & 0.010 & 90 & 2 & -0.006 & 0.108 \tn 10 & 50 & 25000 & True & False & 0.010 & 0.010 & 90 & 3 & -0.008 & 0.104 \tn 10 & 50 & 50000 & True & False & 0.010 & 0.010 & 90 & 5 & -0.007 & 0.188 \tn 10 & 25 & 25000 & True & True & 0.010 & 0.010 & 90 & 13 & 0.001 & 0.061 \tn 10 & 25 & 50000 & True & True & 0.010 & 0.010 & 90 & 3 & 0.009 & 0.101 \tn 10 & 50 & 25000 & True & True & 0.010 & 0.010 & 90 & 11 & -0.004 & 0.100 \tn 10 & 50 & 50000 & True & True & 0.010 & 0.010 & 90 & 1 & -0.006 & 0.197 \tn \hline \hline %inserts double line \end{tabular}\end{center}} \newdimen\TMPsize\settowidth{\TMPsize}{\TMP} \begin{table}[h]\begin{center}\begin{minipage}{\TMPsize} \footnotesize{\caption[Results]{\label{tab:RESULTS_1}Simulation results ...}}} \vspace{0.3ex} % title of Table \TMP \end{minipage}\end{center}\end{table} $R$ = number of runs, $M$ = number of time intervals, ...
Y. Hilpisch (Visixion GmbH)
82 / 96
Table: Simulation results for dierent congurations of the MCS algorithm and an accuracy level of P Y = 0.01 and P Y = 0.01.
R M I MM AP ATol RTol #Op Err AvEr Sec/O. 10 25 25000 False False 0.010 0.010 90 22 0.013 0.059 10 25 50000 False False 0.010 0.010 90 9 -0.010 0.093 10 50 25000 False False 0.010 0.010 90 8 -0.005 0.088 10 50 50000 False False 0.010 0.010 90 6 -0.017 0.152 10 25 25000 False True 0.010 0.010 90 10 0.002 0.051 10 25 50000 False True 0.010 0.010 90 5 0.008 0.081 10 50 25000 False True 0.010 0.010 90 12 -0.005 0.078 10 50 50000 False True 0.010 0.010 90 1 -0.008 0.143 10 25 25000 True False 0.010 0.010 90 5 0.010 0.066 10 25 50000 True False 0.010 0.010 90 2 -0.006 0.108 10 50 25000 True False 0.010 0.010 90 3 -0.008 0.104 10 50 50000 True False 0.010 0.010 90 5 -0.007 0.188 10 25 25000 True True 0.010 0.010 90 13 0.001 0.061 10 25 50000 True True 0.010 0.010 90 3 0.009 0.101 10 50 25000 True True 0.010 0.010 90 11 -0.004 0.100 10 50 50000 True True 0.010 0.010 90 1 -0.006 0.197 R = number of runs, M = number of time intervals, I = number of simulation paths, CV = control variates, MM = moment matching, AP = antithetic paths, ATol = absolute performance yardstick, RTol = relative performance yardstick, #Op = number of options, Err = number of errors, AvEr = average error in currency units, Sec/O. = seconds per option valuation.
Y. Hilpisch (Visixion GmbH)
83 / 96
Python
frequently, results of such projects are to be reported in the form of Latex documents using an example from mathematical nance, we show how to automate report generation through some simple methods:
iterating over several lists writing simulation results to a PyTables database reading results from database and printing strings in Latex format
once you have set up the numerical study, you only have three steps to generate a Latex table with the results
start your script by pressing F5 type PrintTex() copy the output to your Latex document
84 / 96
Cython
Python
suers from a
lack of speed in comparison with static typing languages as C or C++. Fortunately, the primary
Python
possible to access external modules written in C. But it is not trivial to write the necessary C glue code. This problem is solved by
Python
CythonCython Cython
The source code gets translated into optimized C/C++ code and compiled as
Python
extension modules, so
very fast program execution of C and the high-level, object-oriented and fast programming of
Python.
85 / 96
Write your
Python
and/or
.pyx. pyximport
(the
module.
86 / 96
Cython
Examples (I)
2 f (x)dx with f (x) x2 x. To approximate its value by its 0 lower sum we can use the following Python functions, saved in a module called a_Integrate_Norm.py, say.
# # Integration with Pure Python # a_Integrate_Norm . py # import time def f(x ): return x **2 - x def integrate (a ,b , N ): t0 = time . time () s= 0 dx =( b -a )/ N for i in range (N ): s += f(a +i* dx ) t1 = time . time () print " Approximate value is %s " %( s* dx ) print " Computation made in : %. 2f seconds " %( t1 - t0 )
87 / 96
Cython
Examples (II)
In [76]: from a_Integrate_Norm import * In [77]: integrate(0,2.0,10000) Approximate value is 0.66646668 Computation made in : 0.01 seconds In [78]: integrate(0,2.0,100000) Approximate value is 0.6666466668 Computation made in : 0.05 seconds In [79]: integrate(0,2.0,1000000) Approximate value is 0.666664666668 Computation made in : 0.40 seconds In [80]: integrate(0,2.0,10000000) Approximate value is 0.666666466667 Computation made in : 3.83 seconds
88 / 96
Cython
Examples (III)
and switch
After that, you can import and execute the functions as usual.
In [83]: from b_Integrate_Norm import * In [84]: integrate(0,2.0,1000) Approx. value is 0.664668 Computation made in : 0.00 seconds In [85]: integrate(0,2.0,10000000) Approx. value is 0.666666466667 Computation made in : 1.17 seconds In [86]:
Only the precompiling of the pure
Python
89 / 96
Cython
Examples (IV)
# # Integration with Static Typing in Cython # c_Integrate_with_static_typing . pyx # import time def f( double x ): return x **2 - x def integrate ( double a , double b , int N ): t0 = time . time () cdef int i cdef double s , dx s= 0 dx =( b -a )/ N for i in range (N ): s += f(a +i* dx ) t1 = time . time () print " Approximate value is %s " %( s* dx ) print " Computation made in : %. 2f seconds " %( t1 - t0 )
90 / 96
Cython
Examples (V)
In [93]: from c_Integrate_with_static_typing import * In [94]: integrate(0,2.0,10000) Approximate value is 0.66646668 Computation made in : 0.00 seconds In [95]: integrate(0,2.0,10000000) Approximate value is 0.666666466667 Computation made in : 0.98 seconds In [96]:
The static typing results in a further 20% reduction in time.
91 / 96
Cython
Examples (VI)
The return value of functions also has a type, so it could be a good idea to dene this type for the often called function f.
# # Integration with Static Typing in Cython # d_Integrate_with_static_typing_2 . pyx # import time cdef double f( double x ): return x **2 - x def integrate ( double a , double b , int N ): t0 = time . time () cdef int i cdef double s , dx s= 0 dx =( b -a )/ N for i in range (N ): s += f(a +i* dx ) t1 = time . time () print " Approximate value is %s " %( s* dx ) print " Computation made in : %. 2f seconds " %( t1 - t0 )
92 / 96
Cython
Examples (VII)
Using this version, reduces calculation time to 0.05 secondsa speed-up factor of 75 times compared to pure
Python.
In [98]: import pyximport In [99]: pyximport.install() In [100]: from d_Integrate_with_static_typing_2 import * In [101]: integrate(0,2.0,10000000) Approximate value is 0.666666466667 Computation made in : 0.05 seconds In [102]:
Caution: Functions with a static type are no longer callable from
Python
contexts.
93 / 96
Typically, one applies a development process described roughly as follows: write your
Python
code
NumPy
vectorization)
prole your program to identify the most time consuming parts re-write these parts with
Cython
94 / 96
Conclusions
1 2 3 4 5 6 7 8
Conclusion
Python
For Finance, it oers numerous really helpful libraries There are a number of good development tools available It allows high productivity levelsfor lone warriors as well as for teams It is easy-to-maintaincompact, readable code It is compact and nevertheless quite fast (when done right) It is low/no cost and future-proof It is fun to work withbe it in Finance or any other area
95 / 96
Contact
Dr. Yves J. Hilpisch Visixion GmbH Rathausstrasse 75-79 66333 Voelklingen Germany
www.visixion.com www.dxevo.com www.dexision.com
Conclusion
96 / 96