Vous êtes sur la page 1sur 47

1. What is Statistical Programming?

Computations which aid in statistical analysis to:

summarize and display data. t a model to data. display results.

Objective of this Course Our aim is to provide a foundation for understanding how those applications work.

What are the calculations done by the computer?

How could you do these kinds of calculations yourself?

Elements of Statistical Programming Graphics, for one, two or higher dimensional data.

Stochastic simulation: Digital computers are naturally very good at


exact, reproducible computations, but the real world is full of randomness. In stochastic simulation we program a computer to act as though it is producing random results, even though if we knew enough the results would be exactly predictable.

Other kinds of numerical programming: optimization, approximation of mathematical functions, ...


3

Outline of This Course Basic programming: telling a computer what to do.

Statistical graphics.

Controlling the ow of execution of a program. For example, doing repeated calculations as long as the input consists of positive integers, and stopping when an input value hits 0.

Basic logic: Boolean algebra, a formal way to manipulate logical statements.


4

Outline of This Course Planning the program: Analyzing complex problems by breaking them down into simpler parts.

Simulating and applying random values with specied characteristics.

Computational linear algebra: solving systems of equations, working with matrices

Numerical optimization: maximizing or minimizing a function, possibly subject to constraints


5

The R Package This course uses R, which is an open source package for statistical computing.

Open source has a number of dierent meanings; here the important one is that R is freely available, and its users are free to see how it is written, and to improve it.

R is based on the computer language S, developed by John Chambers and others at Bell Laboratories in 1976. Robert Gentleman and Ross Ihaka developed an implementation, and named it R.
6

The R Package Gentleman and Ihaka made it open source in 1995, and hundreds of people around the world have contributed to its development.

The R Core Team is now responsible for development and maintenance of R. There is currently only 1 Canadian on this international team of 19: Duncan Murdoch from UWO.

The R Package R is available at http://cran.r-project.org. (This site is referred to as CRAN)

Most users download and install a binary version. This is a version that has been translated (by compilers) into machine language for execution on a given operating system.

R is designed to be very portable: it will run on Microsoft Windows, Linux, Solaris, Mac OSX, and other operating systems, but dierent binary versions are required for each.
8

Course Conventions This course is about how to do computations in R.

Most of what we do would be the same on any system, but when we write system-specic instructions, we will assume you are using Microsoft Windows.

The user types input, and R responds with text or graphs as output. To indicate the dierence, we have typeset the user input in blue, and text output in red.*
* Screendumps

from the program will appear in red font.


9

Example of Font Conventions > This was typed by the user

This is a response from R

In most cases other than this one and certain exercises, we will show the actual response from R.

10

Installation of R Installation on Microsoft Windows is straightforward.

From CRAN (or a mirror site), download the setup program, a le with a name like R-2.14.0.win32.exe.

Clicking on this le will start an almost automatic installation of the R system. Though it is possible to customize the installation, the default responses will lead to a satisfactory installation in most situations, particularly for beginning users.

11

Starting R One of the default settings of the installation procedure is to create an R icon on your computers desktop. Double clicking on the R icon starts the program.* The rst thing that will happen is that R will open the console.

Something like the following will appear in the console.

* Other

systems may install an icon to click, or may require you to typeRat a command prompt.
12

Starting R
R version 2.14.0 (2011-12-08) Copyright (C) 2011 The R Foundation for Statistical Computing ISBN 3-900051-07-0 Platform: i486-pc-linux-gnu (32-bit) R is free software and comes with ABSOLUTELY NO WARRANTY. You are welcome to redistribute it under certain conditions. Type 'license()' or 'licence()' for distribution details. Natural language support but running in an English locale R is a collaborative project with many contributors. Type 'contributors()' for more information and 'citation()' on how to cite R or R packages in publications. Type 'demo()' for some demos, 'help()' for on-line help, or 'help.start()' for an HTML browser interface to help. Type 'q()' to quit R. >

13

Starting R The > sign tells you that R is ready for you to type in a command. For example, you can do addition

> 1234+4567

[1] 5801

If you hit the Enter key, you should see the result.

The greater-than sign (>) is called the prompt symbol.


14

Simple Calculations with R Some other arithmetic problems to try*:

> 29-45 # subtraction

[1] -16

> 325/25 # division

[1] 13
* Note

that R ignores everything typed to the right of #.


15

More Simple Calculations > 56*12 # multiplication

[1] 672

> 11*11

[1] 121

> 111*111

[1] 12321
16

More Simple Calculations > 1111111*1111111

[1] 1.234568e+12

The last example used scientic notation, because R doesnt automatically print enough digits.

17

More Simple Calculations You can control the number of digits in the output by typing

> options(digits=14)

Now, you can try the example again:

> 1111111*1111111

[1] 1234567654321

18

What are the numbers in square brackets? In the following example, the second line starts with the 13th value and is so labelled.*

> options(width=40) > 5:32

[1]

9 10 11 12 13 14 15 16

[13] 17 18 19 20 21 22 23 24 25 26 27 28 [25] 29 30 31 32
* The

position of the line break shown here depends on the optional setting options(width=40).
19

Simple Number Patterns For example,

> 1:10

[1]

9 10

Here is what happens when you add a number to the above sequence:

> (1:10)+3

[1]

9 10 11 12 13
20

Simple Number Patterns You can subtract or multiply a number too:

> (1:10)-3

[1] -2 -1

> (1:10)*3

[1]

9 12 15 18 21 24 27 30

21

More patterns > (1:10)^2

[1]

16

25

36

49

64

81

[10] 100

> (1:10)^3

[1] [8]

1 512

27

64

125

216

343

729 1000

22

More Calculations with R R can compute powers with the ^ operator. For example,

> 3^4

[1] 81

23

More Calculations with R Modular arithmetic is also available. For example, we can compute the remainder after division of 31 by 7, i.e. 31 (mod 7):

> 31 %% 7

[1] 3

24

More Calculations with R The integer part of a fraction is calculated using %/%:

> 31 %/% 7

[1] 4

> 7 * 4 + 3 # check the calculation

[1] 31

25

Simple Plots R can be used to create graphs.

You can plot ordered pairs such as (4, 6), (1, 2), (8, 4), (9, 3). To do this, rst note that the x coordinate is the rst coordinate of each pair. The y coordinate is the second coordinate. The way to get R to plot the ordered pairs is to collect all of the x values together into something that we will call x:

> x <- c(4, 1, 8, 9)


26

Simple Plots Note the use of the two symbols <- to make an arrow. This indicates that what is to the right of the arrow is assigned to an object with the name given on the left side of the arrow.

You also need to collect the y values:

> y <- c(6, 2, 4, 3)

27

Simple Plots Then you can plot the ordered pairs by plotting x and y: > plot(x, y)

3
q

4 x

28

Simple Plots You can join the plotted points using the lines function:

> plot(x,y) > lines(x, y)

q q q

4 x

29

Another Plotting Example This time we can see a specic pattern, i.e. plotting a function:

> x <- 1:10 > y <- x^2 > plot(x,y)

30

Another Plotting Example

100

q q

60

q q q

20

q q q q q

4 x

10

31

Another Plotting Example > plot(x,y) > lines(x,y)

100

q q

60

q q q

20

q q q q q

4 x

10

32

Pie Charts and Bar Charts The amounts of time spent by a person watching 4 dierent types of TV shows were measured. 15% of the time was spent on sports, 10% on game shows, 30% on movies, and 45% on comedies. Set up a pie chart and a bar chart.

33

Pie Charts and Bar Charts First, you need to set up an object that contains the information to be plotted. Here, an object called tv is assigned the required information.

> tv <- c("sports" = 15, "game shows"= 10, + "movies" = 30, "comedies" = 45)

The pie() function can then be used to create the pie chart:

34

Pie Charts and Bar Charts > pie(tv)

game shows movies sports

comedies

35

The Bar Chart The barplot function is used to create the bar chart: > barplot(tv)

10

20

30

40

sports

game shows

movies

comedies

36

Quitting R To quit your R session, type

> q()

If you then hit the Enter key, you will be asked whether to save an image of the current workspace, or not, or to cancel.

The workspace image contains a record of the computations youve done, and may contain some saved results.

37

Recording your work It is often better to keep a record of the commands, so that the workspace can be reproduced if necessary.

An easy way to do this is to enter commands in Rs script editor, available from the File menu.

Commands are executed by highlighting them and hitting Ctrl-R (which stands for run).

In non-Windows systems a text editor and some form of cut and paste serve the same purpose.
38

Why Use a Command Line? The R system is mainly command-driven, with the user typing in text and asking R to execute it.

Nowadays most programs use interactive graphical user interfaces (menus, etc.) instead. So why teach such an old-fashioned way of doing things?

Menu-based interfaces are very convenient when applied to a limited set of commands, from a few to one or two hundred. However, a command-line interface is open ended.
39

Why Use a Command Line? As we will show in this course, if you want to program a computer to do something that no one has done before, you can easily do it by breaking down the task into the parts that make it up, and then building up a program to carry it out. This may be possible in some menu-driven interfaces, but it is much easier in a command-driven interface.

Keeping a reproducible record of all of your activities is also easier with a command-line interface. Finding and correcting errors is more straightforward.
40

Why Use a Command Line? Learning how to use one command line interface will give you skills that carry over to others, and may even give you some insight into how a menu-driven interface is implemented.

Your goal should be understanding.

Learning to use a menu-based program makes you dependent on the particular organization of that program.

41

Why Use a Command Line? There is a fairly rich menu driven interface to R available in the Rcmdr package*. After you have worked through this course, if you come upon a statistical task that you dont know how to start, you may nd that the menus in Rcmdr give you an idea of what methods are available.

*A

package is a collection of functions and programs that can be used within R


42

More Information about R In addition to your course textbook, you can nd a lot of information on the web. One place to look is:

http://www.burns-stat.com/pages/ Tutor/hints_R_begin.html

You might also want to explore the main R site at

http://www.r-project.org

43

Practice With R 1. Calculate 35 + 777. 13 47. 675/15. 849 629. 2. What kind of pattern do you think you will see if you multiply 11111111 by itself? How about 111111111?* 3. The area of a rectangle can be calculated by multiplying the length by the width. You have 5 rectangles with lengths 3, 7, 12, 15 and 20. The widths are 2, 5, 8, 11 and 15. Find all of the areas of these rectangles. 4. Plot the rectangle lengths and widths as ordered pairs. 5. When you type > (1:10)*2
* There

is an error in this case. What do you think caused it?


44

you make R count by 2s. How would you make R count by 4s? How would you make R count by 6s? How about by 17s? 6. Joe has lots of homework. He spends 30 minutes on math, 20 minutes on English, 40 minutes on Biology and 10 minutes on French. Construct a bar chart and a pie chart which describe how much time Joe spends on these kinds of homework.

Solutions 1. > 35 + 777 [1] 812 > 13*47 [1] 611 > 675/15 [1] 45 > 849 - 629 [1] 220 2. > 11111111*11111111 [1] 123456787654321 > 111111111*111111111 [1] 12345678987654320 3. > length <- c(3, 7, 12, 15, 20) > width <- c(2, 5, 8, 11, 15) > length*width [1] 6 35 96 165 300 4. plot(width, length)
45

5. > (1:10)*4 [1] 4 8 12 16 20 24 28 32 36 40 > (1:10)*6 [1] 6 12 18 24 30 36 42 48 54 60 > (1:10)*17 [1] 17 34 51 68 85 102 119 136 153 170 6. homework <- c("Math"=30, "English"=20, "Biology"=40, "French"=10) barplot(homework) pie(homework)

Vous aimerez peut-être aussi