Académique Documents
Professionnel Documents
Culture Documents
Overview
This tutorial is part of the hands-on introductory session at the Effective use of programming in
scientific research workshop, Newcastle University, 20th June 2011.
Tutorial Aim:
Introduce the user to numerical processing and plotting in Python using the Spyder interface
Learning Objectives:
Gain a basic understanding of the Python interpreter
Read numerical data from a file on disk using Numpy
Develop custom functions
Create a Python script containing simple model to generated an simulated data-set
Use Scipy to compare observed and simulated results using a statistical method
Use the Matplotlib module in Spyder to produce data plots
Contents
1 Introduction
2 Pre-requisites
3 The Spyder interface
4 Numerical Arrays
5 Plotting
6 Writing Scripts
7 Going further
1 Introduction
Python is a powerful and efficient programming language which has a clear and readable syntax
that is easy to learn and fast to programme in. Python is platform independent, runs on all major
operating systems and is released under an open source license, meaning that it is freely use
able and distributable. Furthermore, Python is a full and complete object orientated language
which can be used to develop anything from simple scripts to the complete programmes with
graphical user interfaces. For more information on Python visit http://python.org/about/.
One of the key strengths of Python is the range of available libraries (called modules) available
to the user, covering a wide variety of application domains. Two such libraries are Scientific
Python (SciPy: http://www.scipy.org/) and Numerical Python (Numpy: http://numpy.scipy.org/)
which provide tools for mathematical, scientific and engineering computing. These modules
allow scientists and engineers to quickly create fast and efficient software for numerical processing, making it an ideal platform for many research scientists. For more examples of scientific computing in Python visit: http://www.python.org/about/success/#scientific. For examples
of scientific Python software visit: http://wiki.python.org/moin/NumericAndScientific.
This practical uses the Spyder (Scientific PYthon Development EnviRonment:
http://code.google.com/p/spyderlib) package, which provides a MATLAB-like environment for
the development of scientific software and numerical processing tools using, Numpy, SciPy
and the graphical plotting package Matplotlib (http://www.matplotlib.sourceforge.net).
2 Pre-requisites
Users in the practical session will be able to access the software and data on the cluster PCs
and guide themselves through the tutorial. Staff will be on-hand to answer any queries.
Users outside the practical session will require the following:
Python 2.5 or later (http://python.org)
Spyder 2.0.11 or later (http://code.google.com/p/spyderlib/)
Microsoft Windows users can install the Python(xy) package (http://www.pythonxy.com/)
which provides all the necessary software in one installation.
Sample data files (http://conferences.ncl.ac.uk/sciprog)
4 + 5
The result will be shown on the next line.
4. This time we will pass the result of our calculation to a variable. A variable is a space in
the computers memory to hold a piece of data.
a = 4 + 5
Nothing will be displayed as we have passed the output of our calculation to a. To see
the result, type:
print a
5. Its worth noting that for convenience we can avoid typing print each time we wish to see
the contents of a variable and just type the variable name instead. For example:
a
6. So far weve been entering data on the command line and letting Python interpret our
data type, this can cause unexpected problems with numerical data. For example now
do:
b = 8 / 3
b
Note that the result is two. This is because Python is interpreting our numbers as integers
(whole numbers) and uses integer division for the calculation, which is unable to create a
fraction from our input. Next, do:
d = 5.0 / 4.0
d
This time we get the correct answer, because the decimal point tells Python were using
floating point numbers (numbers with decimal points or fractional values).
7. If you need to know an objects data type you can use the type command. For example
to show the types of variables b and c do:
b = pi
b
9. Numpy includes a number of mathematical functions that we can use in the Command
line, for example Cos, Sin and Tan. For example to find the hypotenuse length (c) of a
right-angled triangle using the length of the other two sides a and b and the angle between
them C we could use the following formula:
c2 = a2 + b2 2ab cos(C)
In Python this would look like:
C = radians (C )
Note that we have overwritten the value of C in degrees to its new equivalent value in
radians.
10. Now try implementing the complete process:
4 Numerical Arrays
4.1 Introduction
1. When dealing with numerical processing we often need to deal with arrays or matrices of
numbers. We can use the Numpy array type to enter numbers into an array. For example
2. We can view all the values of our array by calling the array name:
myarray
Note that through IPython, the Spyder Console supports auto-completion of existing variable names and objects. After typing the first few letters of myarray, press the tab button
on the keyboard to auto-complete the variable name.
3. Alternatively, if we need to view just the first element of the array we can do:
myarray [0]
Like many programming languages Python will count from 0, so the first value will be 0,
the second will be 1 and so on.
4. The Numpy module provides unary operations for performing numerical operations over
entire arrays. For example, we can sum all the values in the array:
Version 1.0.1 (06-06-2011)
myarray . sum ()
5. Or we can perform arithmetic on each element in the array (an elementwise operation),
and pass the result to a new array:
secondarray = myarray / pi
secondarray
6. Numpy can also create arrays in more than one dimension, in this example well create
an array with two rows and three columns. The numbers in the first set of square brackets
are in the first row and the numbers in the second set of brackets are in the second row:
nextarray [0 ,1]
8. To view all the numbers in one row or column we use the special : symbol. To see all the
numbers in the second column we can do:
nextarray [: ,1]
9. Using this method we can select all the numbers from a row or column and pass them to
a new variable:
x = nextarray [0 ,:]
y = nextarray [1 ,:]
We will use these new variables to produce a plot in the following Plotting section.
temperatures
3. We can manipulate our array using the same techniques introduced above. Create a new
array with the temperatures (second column) converted from Kelvin to Celsius:
temperaturesC = temperatures
temperaturesC = temperaturesC [: ,1] - 273.15
Version 1.0.1 (06-06-2011)
temperaturesC
5. We can suppress printing of numbers in scientific format and view the array again by
doing:
6. Lastly, we can create a new CSV file from our processing using the Numpy savetxt function:
5 Plotting
5.1 Introduction
1. The Spyder console automatically loads the Pylab environment into the Console window,
this means that graphs can easily be generated in the command line using the Matplotlib
plot command:
plot (x ,y )
This shows a Matplotlib plot window, with our values plotted as a line. Using this window
we can visualise our plot (note the plot Figure 1 may be minimized to the task bar).
Figure 4 below shows an example of a plot window.
2. Close the plot window and plot again with the following command:
plot ( temperatures [: ,1] , temperatures [: ,0] , "x " , color =" green " )
Figure 5 shows an example of the plotted temperature data. Many more example plots
can be found at http://matplotlib.sourceforge.net/.
6 Writing Scripts
6.1 Introduction
Using the Console is good for short sequences of commands, however for bigger projects it is
useful to create a file containing a sequential list of commands, called a script which can then
be saved to disk. Python files are denoted with a .py extension. Spyder includes an Editor
window (Figure 1) for editing Python scripts.
1. In the Editor window type the following commands and the click Run from the Run menu
(Figure 6):
1
2
3
2. Now were going to write a script to read and plot our temperature data. As were writing a
script and not using the Spyder console we first need to tell Python to import the modules
we require (e.g. Numpy and Matplotlib). When importing modules we can give them a
short name to make scripting easier.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
10
11
6.2 Functions
The above scripts given an example of how to combine a sequence of commands to perform
a series of operations, however often we need to create our own functions to perform a specific and repeatable process. The temperature data in this practical is based on an observed
air temperature profile taken at 100 meter intervals through atmosphere. As we would expect,
higher altitudes mean lower temperatures. We can generate an idealised model of air temperature decrease with altitude based on the average environmental lapse rate (ELR) for air
temperature provided by the International Civil Aviation Organization (ICAO) of 6.49 K/1000m
between 0 and 11000m. That is for every 1000 meters gained in altitude the air temperature
drops by 6.49 K up to 11 kilometers.
1. In this section were going to create a function to calculate air temperature from ELR
based on a given temperature at sea level and an altitude. Functions are defined using
the def keyword. All code inside the function must have at least one tab-indentation from
the left margin. For example:
1
2
3
4
5
6
7
9
10
11
12
13
14
15
16
17
18
19
20
21
22
6.3 Loops
Sometimes it is necessary perform the same operation a number of times. Programming languages use loops for iteration. A loop is created to tell a script to perform the same operation
a set number of times. In this practical we concentrate on one type of loop, the for loop. The
for loop allows us to say for this many times, do some operation. Here were going to use
a for loop along with our ELR model to generate a modeled atmospheric temperature profile
matching our observed data in C:\temp\temperature_profile.csv, and then plot the observed
and modeled data to see the differences between them.
12
1. Open elr_model.py and save a new copy called elr_model_generator.py. Were going
to add a for loop to use our model to generate an atmospheric profile from sea level to
11000m in 100m intervals using the range function. The range function instructs python
to return a list of a progression of numbers, which in this case will be altitude from 0 to
11000 in intervals of 100. Code inside the loop must have one tab-indentation from the
left margin. The loop should look like this:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
13
2. Heres the example of the complete script with the for loop:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
14
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
15
42
43
44
45
46
47
48
49
50
7 Going Further
In the previous example weve seen how to develop a script to plot the difference between
observed and modeled data. Scipy and Numpy also contain functions for statistical analysis.
One such example is the corrcoef function which generates a matrix of correlation coefficients
from two given data sets. To view the correlation coefficients between the observed and modelled data, add the following lines to your script before the exit(0) statement and run again.
...
corrcoefmatrix = np . corrcoef ( temperatures [: ,1] , modeltemps )
print corrcoefmatrix
...
16