© All Rights Reserved

36 vues

© All Rights Reserved

- Exercise Lesson 4,5,6
- Statistics and Correlation Answer Key
- Assignment 2 -Ques
- CBSE Class 11 Economics Sample Paper-03
- ETS Math Arithmetic Questions
- Q1
- SMC Addmath Mocks 2010
- STAT assignment 2010
- Important Topics
- Kuliah 3
- CAPM
- Lecture 01
- mb 0050
- Asymptotic Data Analysis on Manifolds
- Technical Report
- Measurement Scales
- bbs13e_chapterhu
- Chapter_11_Revised(1).ppt
- Boas -Bodily Forms 1912.pdf
- lab2 (1)

Vous êtes sur la page 1sur 419

Descriptive Statistics

graphs

(quantitative or categorical

variables)

variability, position (quantitative

variables)

1. Tables and Graphs

of variable and number of times each

occurs

www.stat.ufl.edu/~aa/social/data.html

variable with 1 = very liberal, , 4 =

moderate, , 7 = very conservative

Histogram: Bar graph of frequencies

or percentages

Shapes of histograms

(for quantitative variables)

U.S. )

Skewed right (annual income, no. times

arrested)

Skewed left (score on easy exam)

Bimodal (polarized opinions)

3.73: always wrong, almost always wrong,

wrong only sometimes, not wrong at all

Stem-and-leaf plot (John Tukey,

1977)

Stem Leaf

3 6

4

5 37

6 235899

7 011346778999

8 00111233568889

9 02238

2.Numerical descriptions

Let y denote a quantitative variable, with

observations y1 , y2 , y3 , , yn

sample

Mean: y1 y2 ... yn yi

y

n n

Example: Annual per capita carbon dioxide

emissions (metric tons) for n = 8 largest nations

in population size

Indonesia 1.4, Pakistan 0.7, Russia 9.9, U.S.

20.1

Ordered sample:

Median =

y

Mean =

Example: Annual per capita carbon dioxide

emissions (metric tons) for n = 8 largest nations

in population size

Indonesia 1.4, Pakistan 0.7, Russia 9.9, U.S.

20.1

20.1

Median =

y

Mean =

Example: Annual per capita carbon dioxide

emissions (metric tons) for n = 8 largest nations

in population size

Indonesia 1.4, Pakistan 0.7, Russia 9.9, U.S.

20.1

20.1

y

Mean = (0.3 + 0.7 + 1.2 + + 20.1)/8 = 4.7

Properties of mean and

median

For symmetric distributions, mean =

median

For skewed distributions, mean is drawn in

direction of longer tail, relative to median

Mean valid for interval scales, median for

interval or ordinal scales

Mean sensitive to outliers (median often

preferred for highly skewed distributions)

When distribution symmetric or mildly

skewed or discrete with few values, mean

preferred because uses numerical values

of observations

Examples:

mean salary = $7.0 million

median salary = $2.9 million

expect

b. Describing variability

observations

(but highly sensitive to outliers, insensitive to

shape)

mean

yi y

The variance of the n observations is

s

( yi y ) ( y1 y ) ... ( yn y )

2 2 2

s

2

n 1 n 1

The standard deviation s is the square root of the

variance,

s s 2

Example: Political ideology

For those in the student sample who attend

religious services at least once a week (n = 9

of the 60),

y = 2, 3, 7, 5, 6, 7, 5, 6, 4

y 5.0,

(2 5) 2

(3 5) 2

... (4 5) 2

24

s

2

3.0

9 1 8

s 3.0 1.7

deviation = 1.6, tends to have similar variability but be

more liberal

Properties of the standard deviation:

s 0, and only equals 0 if all observations are equal

s increases with the amount of variation around the mean

Division by n - 1 (not n) is due to technical reasons (later)

s depends on the units of the data (e.g. measure euro vs $)

Like mean, affected by outliers

about 68% of data within 1 standard dev. of mean

about 95% of data within 2 standard dev. of mean

all or nearly all data within 3 standard dev. of mean

Example: SAT with mean = 500, s = 100

(sketch picture summarizing data)

GSS: The variable frinum has mean 7.4, s =

11.0

mode=4

NY.

If mean = $130,000, which is realistic?

c. Measures of position

pth percentile: p percent of

observations below it, (100 - p)%

above it.

p = 50: median

p = 25: lower quartile (LQ)

p = 75: upper quartile (UQ)

Quartiles portrayed graphically by box

plots

(John Tukey)

Example: weekly TV watching for n=60

from student survey data file, 3 outliers

Box plots have box from LQ to UQ, with

median marked. They portray a five-

number summary of the data:

Minimum, LQ, Median, UQ,

Maximum

except for outliers identified separately

below LQ 1.5(IQR)

or above UQ + 1.5(IQR)

3. Bivariate description

Usually we want to study associations between

two or more variables (e.g., how does number

of close friends depend on gender, income,

education, age, working status, rural/urban,

religiosity)

Response variable: the outcome variable

Explanatory variable(s): defines groups to

compare

variable, while gender, income, are

explanatory variables

Summarizing associations:

Categorical vars: show data using contingency

tables

Quantitative vars: show data using scatterplots

Mixture of categorical var. and quantitative var.

(e.g., number of close friends and gender) can

give numerical summaries (mean, standard

deviation) or side-by-side box plots for the

groups

Men: mean = 7.0, s = 8.4

Example: Income by highest degree

Contingency Tables

in which rows (typically) represent categories

of explanatory variable and columns

represent categories of response variable.

numbers of individuals at the corresponding

combination of levels of the two variables

Happiness and Family Income

(GSS 2008 data: happy,

finrela)

Happiness

Income Very Pretty Not too Total

-------------------------------

Above Aver. 164 233 26 423

Below Aver. 132 383 172 687

------------------------------

Total 589 1089 315

Can summarize by percentages on response

variable (happiness)

0.39)

33% for average income (293/883 = 0.33)

19% for below average income (??)

Happiness

Income Very Pretty Not too

Total

--------------------------------------------

Above 164 (39%) 233 (55%) 26 (6%)

423

Average 293 (33%) 473 (54%) 117 (13%)

883

Below 132 (19%) 383 (56%) 172 (25%)

687

----------------------------------------------

Scatterplots (for quantitative variables)

plot response variable on vertical axis,

explanatory variable on horizontal axis

several nations on many variables, including

fertility (births per woman), contraceptive use,

literacy, female economic activity, per capita

gross domestic product (GDP), cell-phone use,

CO2 emissions

Data available at

http://www.stat.ufl.edu/~aa/social/data.html

Example: Survey in Alachua County,

Florida, on predictors of mental health

(data for n = 40 on p. 327 of text and at

www.stat.ufl.edu/~aa/social/data.html)

various dimensions of psychiatric symptoms,

including aspects of depression and anxiety)

(min = 17, max = 41, mean = 27, s = 5)

personal disruptions such as death in family,

extramarital affair, to less severe events such as

new job, birth of child, moving)

(min = 3, max = 97, mean = 44, s = 23)

Bivariate data from 2000 Presidential election

Butterfly ballot, Palm Beach County, FL, text

p.290

Example: The Massachusetts Lottery

(data for 37 communities)

% income

spent on

lottery

Correlation describes strength of

association

Falls between -1 and +1, with sign indicating

direction of association (formula later in

Chapter 9)

stronger the association (in terms of a straight

line trend)

Mental impairment and life events, correlation =

GDP and fertility, correlation =

Correlation describes strength of

association

direction of association

0.37

GDP and fertility, correlation = - 0.56

GDP and percent using Internet, correlation =

0.89

Regression analysis gives line

predicting y using x

Example:

y = mental impairment, x = life events

e.g., at x = 0, predicted y =

at x = 100, predicted y =

Regression analysis gives line

predicting y using x

Example:

y = mental impairment, x = life events

at x = 100, predicted y = 23.3 + 0.09(100) =

32.3

Example: student survey

y = college GPA, x = high school GPA

(data at www.stat.ufl.edu/~aa/social/data.html)

finding the correlation and the best

fitting regression equation (with possibly

several explanatory variables), but for

now, try using software such as SPSS to

find the answers.

Sample statistics /

Population parameters

samples (statistics) and summaries of

populations (parameters).

letters, parameters by Greek letters:

proportion are parameters.

make inferences about their values using

The sample meany estimates

the population mean (quantitative

variable)

the population standard deviation

(quantitative variable)

a population proportion (categorical

variable)

Chapter 1: Statistics

Chapter Goals

Create an initial image of the field of

statistics.

Learn how to obtain sample data.

Example: A recent study examined the math and

verbal SAT scores of high school seniors across

the country. Which of the following statements

are descriptive in nature and which are

inferential.

The mean math SAT score was 492.

The mean verbal SAT score was 475.

Students in the Northeast scored higher in

math but lower in verbal.

80% of all students taking the exam were

headed for college.

32% of the students scored above 610 on the

verbal SAT.

The math SAT scores are higher than they were

10 years ago.

1.2 Introduction to Basic

Terms

Population: A collection, or set, of

individuals or objects or events whose

properties are to be analyzed.

Two kinds of populations: finite or

infinite.

Variable: A characteristic about each individual

element of a population or sample.

Data (singular): The value of the variable

associated with one element of a population or

sample. This value may be a number, a word, or

a symbol.

Data (plural): The set of values collected for the

variable from each of the elements belonging to

the sample.

Experiment: A planned activity whose results

yield a set of data.

Parameter: A numerical value summarizing all

the data of an entire population.

Statistic: A numerical value summarizing the

sample data.

Example: A college dean is interested in learning about the

average age of faculty. Identify the basic terms in this

situation.

college.

A sample is any subset of that population. For example,

we might select 10 faculty members and determine their

age.

The variable is the age of each faculty member.

One data would be the age of a specific faculty member.

The data would be the set of values in the sample.

The experiment would be the method used to select the

ages forming the sample and determining the actual age

of each faculty member in the sample.

The parameter of interest is the average age of all

faculty at the college.

The statistic is the average age for all faculty in the

sample.

Two kinds of variables:

Qualitative, or Attribute, or

Categorical, Variable: A variable that

categorizes or describes an element of a

population.

Note: Arithmetic operations, such as

addition and averaging, are not

meaningful for data resulting from a

qualitative variable.

Quantitative, or Numerical, Variable:

A variable that quantifies an element of a

population.

Note: Arithmetic operations such as

addition and averaging, are meaningful for

Example: Identify each of the following examples as

attribute (qualitative) or numerical (quantitative)

variables.

class. (Attribute)

2. The amount of gasoline pumped by the next 10

customers at the local Unimart. (Numerical)

3. The amount of radon in the basement of each of 25

homes in a new development. (Numerical)

4. The color of the baseball cap worn by each of 20

students. (Attribute)

5. The length of time to complete a mathematics

homework assignment. (Numerical)

6. The state in which each truck is registered when

stopped and inspected at a weigh station. (Attribute)

Qualitative and quantitative variables may be

further subdivided:

Nominal

Qualitative

Ordinal

Variable

Discrete

Quantitative

Continuous

Nominal Variable: A qualitative variable that categorizes

(or describes, or names) an element of a population.

an ordered position, or ranking.

assume a countable number of values. Intuitively, a

discrete variable can assume values corresponding to

isolated points along a line interval. That is, there is a gap

between any two values.

assume an uncountable number of values. Intuitively, a

continuous variable can assume any value along a line

interval, including every possible value between any two

values.

Note:

1. In many cases, a discrete and continuous

variable may be distinguished by

determining whether the variables are

related to a count or a measurement.

2. Discrete variables are usually associated

with counting. If the variable cannot be

further subdivided, it is a clue that you are

probably dealing with a discrete variable.

3. Continuous variables are usually associated

with measurements. The values of discrete

variables are only limited by your ability to

measure them.

Example: Identify each of the following as

examples of qualitative or numerical variables:

1. The temperature in Barrow, Alaska at 12:00

pm on any

given day.

2. The make of automobile driven by each

faculty member.

3. Whether or not a 6 volt lantern battery is

defective.

4. The weight of a lead pencil.

5. The length of time billed for a long distance

telephone call.

6. The brand of cereal children eat for breakfast.

7. The type of book taken out of the library by

an adult.

Example: Identify each of the following as

examples of (1) nominal, (2) ordinal, (3) discrete,

or (4) continuous variables:

1. The length of time until a pain reliever begins

to work.

2. The number of chocolate chips in a cookie.

3. The number of colors used in a statistics

textbook.

4. The brand of refrigerator in a home.

5. The overall satisfaction rating of a new car.

6. The number of files on a computers hard

disk.

7. The pH level of the water in a swimming pool.

8. The number of staples in a stapler.

1.3: Measure and Variability

No matter what the response

variable: there will always be

variability in the data.

One of the primary objectives of

statistics: measuring and

characterizing variability.

Controlling (or reducing) variability in

a manufacturing process: statistical

process control.

Example: A supplier fills cans of soda marked 12

ounces. How much soda does each can really

contain?

12 ounces of soda.

There is variability in any process.

Some cans contain a little more than 12

ounces, and some cans contain a little less.

On the average, there are 12 ounces in each

can.

The supplier hopes there is little variability in

the process, that most cans contain close to 12

ounces of soda.

1.4: Data Collection

First problem a statistician faces:

how to obtain the data.

It is important to obtain good, or

representative, data.

Inferences are made based on

statistics obtained from the data.

Inferences can only be as good as

the data.

Biased Sampling Method: A sampling method

that produces data which systematically differs

from the sampled population. An unbiased

sampling method is one that is not biased.

samples:

1. Convenience sample: sample selected from

elements of a

population that are easily accessible.

2. Volunteer sample: sample collected from

those elements

of the population which chose to contribute

the needed

information on their own initiative.

Process of data collection:

experiment.

Example: Estimate the average life of an

electronic component.

2. Define the variable and population of

interest.

Example: Length of time for anesthesia to

wear off after surgery.

3. Defining the data-collection and data-

measuring schemes. This includes sampling

procedures, sample size, and the data-

measuring device (questionnaire, scale, ruler,

etc.).

4. Determine the appropriate descriptive or

inferential data-analysis techniques.

Methods used to collect data:

modifies the environment and observes the

effect on the variable under study.

the population of interest. The investigator does

not modify the environment.

population is listed. Seldom used: difficult and

time-consuming to compile, and expensive.

Sampling Frame: A list of the elements

belonging to the population from which the

sample will be drawn.

representative of the population.

sample elements from the sampling frame.

designs. Usually they all fit into two categories:

judgment samples and probability samples.

Judgment Samples: Samples that are selected

on the basis of being typical.

population. The validity of the results from a

judgment sample reflects the soundness of the

collectors judgment.

elements to be selected are drawn on the basis

of probability. Each element in a population has

a certain probability of being selected as part of

the sample.

Random Samples: A sample selected in such a

way that every element in the population has a

equal probability of being chosen. Equivalently,

all samples of size n have an equal chance of

being selected. Random samples are obtained

either by sampling with replacement from a finite

population or by sampling without replacement

from an infinite population.

Note:

1. Inherent in the concept of randomness: the next result (or

occurrence) is not predictable.

2. Proper procedure for selecting a random sample: use a random

number generator or a table of random numbers.

Example: An employer is interested in the time it

takes each employee to commute to work each

morning. A random sample of 35 employees will

be selected and their commuting time will be

recorded.

Each employee is numbered: 0001, 0002, 0003,

etc. up to 2712.

Using four-digit random numbers, a sample is

identified: 1315, 0987, 1125, etc.

Systematic Sample: A sample in which every

kth item of the sampling frame is selected,

starting from the first element which is randomly

selected from the first k elements.

execute. However, it has some inherent dangers

when the sampling frame is repetitive or cyclical

in nature. In these situations the results may not

approximate a simple random sample.

by stratifying the sampling frame and then

selecting a fixed number of items from each of

the strata by means of a simple random

sampling technique.

Proportional Sample (or Quota Sample): A

sample obtained by stratifying the sampling

frame and then selecting a number of items in

proportion to the size of the strata (or by quota)

from each strata by means of a simple random

sampling technique.

stratifying the sampling frame and then selecting

some or all of the items from some of, but not

all, the strata.

1.5: Comparison of Probability and

Statistics

Probability: Properties of the

population are assumed known.

Answer questions about the sample

based on these properties.

sample to draw a conclusion about the

population.

Example: A jar of M&Ms contains 100 candy

pieces, 15 are red. A handful of 10 is selected.

3 of the 10 selected are red?

a jar containing 1000 candy pieces. Three

M&Ms in the handful are red.

M&Ms in the entire jar?

1.6: Statistics and the

Technology

The electronic technology has had a

tremendous effect on the field of

statistics.

Many statistical techniques are

repetitive in nature: computers and

calculators are good at this.

Lots of statistical software packages:

MINITAB, SYSTAT, STATA, SAS,

Statgraphics, SPSS, and calculators.

Remember: Responsible use of statistical

methodology is very important. The

burden is on the user to ensure that the

appropriate methods are correctly applied

and that accurate conclusions are drawn

and communicated to others.

procedures using MINITAB, EXCEL 97, and

the TI-83.

Chapter 1: Introduction to

Statistics

71

Variables

A variable is a characteristic or

condition that can change or take on

different values.

Most research begins with a general

question about the relationship

between two variables for a specific

group of individuals.

72

Population

The entire group of individuals is

called the population.

For example, a researcher may be

interested in the relation between

class size (variable 1) and academic

performance (variable 2) for the

population of third-grade children.

73

Sample

Usually populations are so large that

a researcher cannot examine the

entire group. Therefore, a sample is

selected to represent the population

in a research study. The goal is to

use the results obtained from the

sample to help answer questions

about the population.

74

Types of Variables

Variables can be classified as

discrete or continuous.

Discrete variables (such as class

size) consist of indivisible categories,

and continuous variables (such as

time or weight) are infinitely divisible

into whatever units a researcher may

choose. For example, time can be

measured to the nearest minute,

second, half-second, etc.

76

Real Limits

To define the units for a continuous

variable, a researcher must use real

limits which are boundaries located

exactly half-way between adjacent

categories.

77

Measuring Variables

To establish relationships between

variables, researchers must observe

the variables and record their

observations. This requires that the

variables be measured.

The process of measuring a variable

requires a set of categories called a

scale of measurement and a

process that classifies each individual

into one category.

78

4 Types of Measurement

Scales

1. A nominal scale is an unordered

set of categories identified only by

name. Nominal measurements only

permit you to determine whether

two individuals are the same or

different.

2. An ordinal scale is an ordered set

of categories. Ordinal

measurements tell you the direction

of difference between two

individuals. 79

4 Types of Measurement

Scales

3. An interval scale is an ordered series of

equal-sized categories. Interval

measurements identify the direction and

magnitude of a difference. The zero point

is located arbitrarily on an interval scale.

4. A ratio scale is an interval scale where a

value of zero indicates none of the

variable. Ratio measurements identify

the direction and magnitude of

differences and allow ratio comparisons

of measurements.

80

Correlational Studies

The goal of a correlational study is

to determine whether there is a

relationship between two variables

and to describe the relationship.

A correlational study simply

observes the two variables as they

exist naturally.

81

Experiments

The goal of an experiment is to

demonstrate a cause-and-effect

relationship between two variables;

that is, to show that changing the

value of one variable causes changes

to occur in a second variable.

83

Experiments (cont.)

In an experiment, one variable is

manipulated to create treatment

conditions. A second variable is observed

and measured to obtain scores for a group

of individuals in each of the treatment

conditions. The measurements are then

compared to see if there are differences

between treatment conditions. All other

variables are controlled to prevent them

from influencing the results.

In an experiment, the manipulated

variable is called the independent

variable and the observed variable is the

dependent variable. 84

Other Types of Studies

Other types of research studies,

know as non-experimental or

quasi-experimental, are similar to

experiments because they also

compare groups of scores.

These studies do not use a

manipulated variable to differentiate

the groups. Instead, the variable

that differentiates the groups is

usually a pre-existing participant

variable (such as male/female) or a

time variable (such as before/after). 86

Other Types of Studies

(cont.)

Because these studies do not use the

manipulation and control of true

experiments, they cannot

demonstrate cause and effect

relationships. As a result, they are

similar to correlational research

because they simply demonstrate

and describe relationships.

87

Data

The measurements obtained in a

research study are called the data.

The goal of statistics is to help

researchers organize and interpret

the data.

89

Descriptive Statistics

Descriptive statistics are methods

for organizing and summarizing data.

used to organize data, and

descriptive values such as the

average score are used to

summarize data.

A descriptive value for a population

is called a parameter and a

descriptive value for a sample is 90

Inferential Statistics

Inferential statistics are methods for

using sample data to make general

conclusions (inferences) about

populations.

Because a sample is typically only a part

of the whole population, sample data

provide only limited information about the

population. As a result, sample statistics

are generally imperfect representatives of

the corresponding population parameters.

91

Sampling Error

The discrepancy between a sample

statistic and its population parameter

is called sampling error.

Defining and measuring sampling

error is a large part of inferential

statistics.

92

Notation

The individual measurements or scores

obtained for a research participant will be

identified by the letter X (or X and Y if

there are multiple scores for each

individual).

The number of scores in a data set will be

identified by N for a population or n for a

sample.

Summing a set of values is a common

operation in statistics and has its own

notation. The Greek letter sigma, , will

be used to stand for "the sum of." For

example, X identifies the sum of the 94

Order of Operations

1. All calculations within parentheses are

done first.

2. Squaring or raising to other exponents is

done second.

3. Multiplying, and dividing are done third,

and should be completed in order from

left to right.

4. Summation with the notation is done

next.

5. Any additional adding and subtracting is

done last and should be completed in

order from left to right. 95

Basics of Statistics

interpretation of data.

Statistics presents a rigorous scientific method for gaining insight into data. For

example, suppose we measure the weight of 100 patients in a study. With so

many measurements, simply looking at the data fails to provide an informative

account. However statistics can give an instant overall picture of data based

on graphical presentation or numerical summarization irrespective to the

number of data points. Besides data summarization, another important task of

statistics is to make inference and predict relations of variables.

A Taxonomy of Statistics

Statistical Description of

Data

Statistics describes a numeric set of

data by its

Center

Variability

Shape

Statistics describes a categorical set

of data by

Frequency, percentage or proportion of each

category

Some Definitions

Variable - any characteristic of an individual or entity. A variable can

take different values for different individuals. Variables can be

categorical or quantitative. Per S. S. Stevens

Nominal - Categorical variables with no inherent order or ranking sequence such

as names or classes (e.g., gender). Value may be a numerical, but without

numerical value (e.g., I, II, III). The only operation that can be applied to Nominal

variables is enumeration.

Ordinal - Variables with an inherent rank or order, e.g. mild, moderate, severe.

Can be compared for equality, or greater or less, but not how much greater or less.

Interval - Values of the variable are ordered as in Ordinal, and additionally,

differences between values are meaningful, however, the scale is not absolutely

anchored. Calendar dates and temperatures on the Fahrenheit scale are examples.

Addition and subtraction, but not multiplication and division are meaningful

operations.

Ratio - Variables with all properties of Interval plus an absolute, non-arbitrary zero

point, e.g. age, weight, temperature (Kelvin). Addition, subtraction, multiplication,

and division are all meaningful operations.

Some Definitions

Distribution - (of a variable) tells us what values the variable takes

and how often it takes these values.

Unimodal - having a single peak

Bimodal - having two distinct peaks

Symmetric - left and right half are mirror images.

Frequency Distribution

Consider a data set of 26 children of ages 1-6 years. Then the

frequency distribution of variable age can be tabulated as

follows:

Frequency Distribution of Age

Age 1 2 3 4 5 6

Frequency 5 3 7 5 4 2

Grouped Frequency Distribution of Age:

Age Group 1-2 3-4 5-6

Frequency 8 12 6

Cumulative Frequency

Cumulative frequency of data in previous page

Age 1 2 3 4 5 6

Frequency 5 3 7 5 4 2

Cumulative Frequency 5 8 15 20 24 26

Frequency 8 12 6

Cumulative Frequency 8 20 26

Data Presentation

Two types of statistical presentation of data - graphical and numerical.

Graphical Presentation: We look for the overall pattern and for striking

deviations from that pattern. Over all pattern usually described by

shape, center, and spread of the data. An individual value that falls

outside the overall pattern is called an outlier.

Bar diagram and Pie charts are used for categorical variables.

Histogram, stem and leaf and Box-plot are used for numerical variable.

Data Presentation Categorical

Variable

Bar Diagram: Lists the categories and presents the percent or count of

individuals who fall in each category.

Group (%)

1 15 (15/60)=0.25 25.0

2 25 (25/60)=0.333 41.7

3 20 (20/60)=0.417 33.3

Total 60 1.00 100

Data Presentation Categorical

Variable

Pie Chart: Lists the categories and presents the percent or count of

individuals who fall in each category.

Group (%)

1 15 (15/60)=0.25 25.0

2 25 (25/60)=0.333 41.7

3 20 (20/60)=0.417 33.3

Graphical Presentation Numerical

Variable

Histogram: Overall pattern can be described by its shape, center,

and spread. The following age distribution is right skewed. The

center lies between 80 to 100. No outliers.

Mean 90.41666667

Standard Error 3.902649518

Median 84

Mode 84

Standard Deviation 30.22979318

Sample Variance 913.8403955

Kurtosis -1.183899591

Skewness 0.389872725

Range 95

Minimum 48

Maximum 143

Sum 5425

Count 60

Graphical Presentation Numerical

Variable

Box-Plot: Describes the five-number summary

Box Plot

Numerical Presentation

A fundamental concept in summary statistics is that of a central value for a set of

observations and the extent to which the central value characterizes the whole

set of data. Measures of central value such as the mean or median must be

coupled with measures of data dispersion (e.g., average distance from the

mean) to indicate how well the central value characterizes the data as a whole.

us consider the following two sets of data:

A: 30, 50, 70

B: 40, 50, 60

The mean of both two data sets is 50. But, the distance of the observations from

the mean in data set A is larger than in the data set B. Thus, the mean of data

set B is a better representation of the data set than is the case for set A.

Methods of Center Measurement

dataset

etc.

Mean: Summing up all the observation and dividing by number of

observations. Mean of 20, 30, 40 is (20+30+40)/3 = 30.

Notation : Let x1 , x2, ...xn are n observations of a variable

x. Then the mean of this variable,

n

x1 x2 ... xn x i

x i 1

n n

Methods of Center Measurement

That is, to find the median we need to order the data set and then

find the middle value. In case of an even number of observations

the average of the two middle most values is the median. For

example, to find the median of {9, 3, 6, 7, 5}, we first sort the data

giving {3, 5, 6, 7, 9}, then choose the middle value 6. If the

number of observations is even, e.g., {9, 3, 6, 7, 5, 2}, then the

median is the average of the two middle values from the sorted

sequence, in this case, (5 + 6) / 2 = 5.5.

undefined for sequences in which no observation is repeated.

Mean or Median

The median is less sensitive to outliers (extreme scores) than the

mean and thus a better measure than the mean for highly skewed

distributions, e.g. family income. For example mean of 20, 30, 40,

and 990 is (20+30+40+990)/4 =270. The median of these four

observations is (30+40)/2 =35. Here 3 observations out of 4 lie

between 20-40. So, the mean 270 really fails to give a realistic

picture of the major part of the data. It is influenced by extreme value

990.

Methods of Variability Measurement

interquartile range, coefficient of variation etc.

observations. The range of 10, 5, 2, 100 is (100-2)=98. Its a crude

measure of variability.

Methods of Variability Measurement

squares of the deviations of the observations from their mean. In

symbols, the variance of the n observations x1, x2,xn is

( x1 x ) 2 .... ( xn x ) 2

S

2

n 1

Variance of 5, 7, 3? Mean is (5+7+3)/3 = 5 and the variance is

(5 5) 2 (3 5) 2 (7 5) 2

4

3 1

Standard Deviation: Square root of the variance. The standard

deviation of the above example is 2.

Methods of Variability Measurement

Quartiles: Data can be divided into four regions that cover the total

range of observed values. Cut points for these regions are known as

quartiles.

In notations, quartiles of a data is the ((n+1)/4)q th observation of the

data, where q is the desired quartile and n is the number of

observations of data.

The first quartile (Q1) is the first 25% of the data. The second quartile

(Q2) is between the 25th and 50th percentage points in the data. The

upper bound of Q2 is the median. The third quartile (Q3) is the 25% of

the data lying between the median and the 75% cut point in the data.

the median of the second half of the ordered observations.

Methods of Variability Measurement

In the following example Q1= ((15+1)/4)1 =4th observation of the data. The

4th observation is 11. So Q1 is of this data is 11.

3 6 7 11 13 22 30 40 44 50 52 61 68 80 94

Q1 Q2 Q3

The first quartile is Q1=11. The second quartile is Q2=40 (This is

also the Median.) The third quartile is Q3=61.

of the previous example is 61- 40=21. The middle half of the ordered

data lie between 40 and 61.

Deciles and Percentiles

Deciles: If data is ordered and divided into 10 parts, then cut points

are called Deciles

Percentiles: If data is ordered and divided into 100 parts, then cut

points are called Percentiles. 25th percentile is the Q1, 50th percentile

is the Median (Q2) and the 75th percentile of the data is Q3.

the data, where p is the desired percentile and n is the number of

observations of data.

mean. It is usually expressed in percent.

Coefficient of Variation = 100

x

Five Number Summary

consists of the smallest (Minimum) observation, the first quartile (Q1),

The median(Q2), the third quartile, and the largest (Maximum)

observation written in order from smallest to largest.

Box Plot: A box plot is a graph of the five number summary. The

central box spans the quartiles. A line within the box marks the

median. Lines extending above and below the box mark the

smallest and the largest observations (i.e., the range). Outlying

samples may be additionally plotted outside the range.

Boxplot

Distribution of Age in Month

Choosing a Summary

The five number summary is usually better than the mean and standard

deviation for describing a skewed distribution or a distribution with

extreme outliers. The mean and standard deviation are reasonable for

symmetric distributions that are free of outliers.

In real life we cant always expect symmetry of the data. Its a common

practice to include number of observations (n), mean, median, standard

deviation, and range as common for data summarization purpose. We

can include other summary statistics like Q1, Q3, Coefficient of variation

if it is considered to be important for describing data.

Shape of Data

Shape of data is measured by

Skewness

Kurtosis

Skewness

Measures asymmetry of data

Positive or right skewed: Longer right tail

Negative or left skewed: Longer left tail

n

n ( xi x ) 3

Skewness i 1

3/ 2

n

(x x)

i 1

i

2

Kurtosis

Measures peakedness of the distribution of

data. The kurtosis of normal distribution is 0.

n

n ( xi x ) 4

Kurtosis i 1

2

3

n

(x x)

i 1

i

2

Summary of the Variable Age in

the given data set

Mean 90.41666667 Histogram of Age

10

Median 84

Mode 84

8

Standard Deviation 30.22979318

Number of Subjects

6

Sample Variance 913.8403955

Kurtosis -1.183899591

4

Skewness 0.389872725

Range 95

2

Minimum 48

0

Maximum 143

40 60 80 100 120 140 160

Sum 5425

Age in Month

Count 60

Summary of the Variable Age in the

given data set

140

120

Age(month)

100

80

60

Class Summary (First Part)

So far we have learned-

Graphical Presentation: Bar Chart, Pie Chart, Histogram, and Box Plot

Numerical Presentation: Measuring Central value of data (mean,

median, mode etc.), measuring dispersion (standard deviation,

variance, co-efficient of variation, range, inter-quartile range etc),

quartiles, percentiles, and five number summary

Any questions ?

Brief concept of Statistical Softwares

of data. Some of them are SAS (System for Statistical Analysis), S-plus, R,

Matlab, Minitab, BMDP, Stata, SPSS, StatXact, Statistica, LISREL, JMP,

GLIM, HIL, MS Excel etc. We will discuss MS Excel and SPSS in brief.

http://www.galaxy.gmu.edu/papers/astr1.html

http://ourworld.compuserve.com/homepages/Rainer_Wuerlaender/sta

tsoft.htm#archiv

http://www.R-project.org

Microsoft Excel

A Spreadsheet Application. It features calculation, graphing tools, pivot

tables and a macro programming language called VBA (Visual Basic for

Applications).

There are many versions of MS-Excel. Excel XP, Excel 2003, Excel 2007

are capable of performing a number of statistical analyses.

desktop or Click on Start --> Programs --> Microsoft Excel.

Worksheet: Consists of a multiple grid of cells with numbered rows down the page

and alphabetically-tilted columns across the page. Each cell is referenced by its

coordinates. For example, A3 is used to refer to the cell in column A and row 3.

B10:B20 is used to refer to the range of cells in column B and rows 10 through 20.

Microsoft Excel

Opening a document: File Open (From a existing workbook). Change the

directory area or drive to look for file in other locations.

Creating a new workbook: FileNewBlank Document

Saving a File: FileSave

Selecting more than one cell: Click on a cell e.g. A1), then hold the Shift key and

click on another (e.g. D4) to select cells between and A1 and D4 or Click on a cell

and drag the mouse across the desired range.

Creating Formulas: 1. Click the cell that you want to enter the formula,

2. Type = (an equal sign), 3. Click the Function Button, 4. Select the

formula you want and step through the on-screen instructions.

fx

Microsoft Excel

Entering Date and Time: Dates are stored as MM/DD/YYYY. No need to enter

in that format. For example, Excel will recognize jan 9 or jan-9 as 1/9/2007 and

jan 9, 1999 as 1/9/1999. To enter todays date, press Ctrl and ; together. Use a

or p to indicate am or pm. For example, 8:30 p is interpreted as 8:30 pm. To

enter current time, press Ctrl and : together.

Copy and Paste all cells in a Sheet: Ctrl+A for selecting, Ctrl +C for copying

and Ctrl+V for Pasting.

method. If Data Analysis is not available then click on Tools Add-Ins and then select

Analysis ToolPack and Analysis toolPack-Vba

Microsoft Excel

Statistical and Mathematical Function: Start with = sign and then select

function from function wizard f x .

give, Input data range, Update the Chart options, and Select output

range/ Worksheet.

Option ( Delimited/Fixed Width) Choose Options (Tab/ Semicolon/

Comma/ Space/ Other) Finish.

truncation errors and may produce inaccurate results in extreme

cases.

Statistics Package

for the Social Science (SPSS)

A general purpose statistical package SPSS is widely used in the social

sciences, particularly in sociology and psychology.

SPSS can import data from almost any type of file to generate tabulated

reports, plots of distributions and trends, descriptive statistics, and

complex statistical analyzes.

Starting SPSS: Double Click on SPSS on desktop or ProgramSPSS.

Data Editor

Various pull-down menus appear at the top of the Data Editor window. These

pull-down menus are at the heart of using SPSSWIN. The Data Editor menu

items (with some of the uses of the menu) are:

Statistics Package

for the Social Science (SPSS)

MENUS AND TOOLBARS

EDIT used to copy and paste data values; used to find data in a

file; insert variables and cases; OPTIONS allows the user to set

general preferences as well as the setup for the Navigator, Charts,

etc.

VIEW user can change toolbars; value labels can be seen in cells

instead of data values

Statistics Package

for the Social Science (SPSS)

MENUS AND TOOLBARS

advanced features)

statistical procedures)

Statistics Package

for the Social Science (SPSS)

MENUS AND TOOLBARS

When statistical procedures are run or charts are created, the output will appear

in the Navigator window. The Navigator window contains many of the pull-down

menus found in the Data Editor window. Some of the important menus in the

Navigator window include:

Statistics Package

for the Social Science (SPSS)

Formatting Toolbar

When a table has been created by a statistical procedure, the user can edit the

table to create a desired look or add/delete information. Beginning with version

14.0, the user has a choice of editing the table in the Output or opening it in a

separate Pivot Table (DEFINE!) window. Various pulldown menus are activated

when the user double clicks on the table. These include:

EDIT undo and redo a pivot, select a table or table body (e.g., to

change the font)

Statistics Package

for the Social Science (SPSS)

Additional menus

CHART EDITOR used to edit a graph

Show or hide a toolbar

Move a toolbar

Click on the toolbar (but not on one of the pushbuttons) and then drag the toolbar to

its new location

Customize a toolbar

Statistics Package

for the Social Science (SPSS)

Importing data from an EXCEL spreadsheet:

Data from an Excel spreadsheet can be imported into SPSSWIN as follows:

1. In SPSSWIN click on FILE OPEN DATA. The OPEN DATA FILE Dialog

Box will appear.

2. Locate the file of interest: Use the "Look In" pull-down list to identify the folder

containing the Excel file of interest

3. From the FILE TYPE pull down menu select EXCEL (*.xls).

4. Click on the file name of interest and click on OPEN or simply double-click on

the file name.

5. Keep the box checked that reads "Read variable names from the first row of

data". This presumes that the first row of the Excel data file contains variable

names in the first row. [If the data resided in a different worksheet in the Excel

file, this would need to be entered.]

6. Click on OK. The Excel data file will now appear in the SPSSWIN Data

Editor.

Statistics Package

for the Social Science (SPSS)

Importing data from an EXCEL spreadsheet:

7. The former EXCEL spreadsheet can now be saved as an SPSS file (FILE

SAVE AS) and is ready to be used in analyses. Typically, you would label variable

and values, and define missing values.

Importing an Access table

SPSSWIN does not offer a direct import for Access tables. Therefore, we must follow

these steps:

1. Open the Access file

2. Open the data table

3. Save the data as an Excel file

4. Follow the steps outlined in the data import from Excel Spreadsheet to SPSSWIN.

Importing Text Files into SPSSWIN

Text data points typically are separated (or delimited) by tabs or commas.

Sometimes they can be of fixed format.

Statistics Package

for the Social Science (SPSS)

Importing tab-delimited data

In SPSSWIN click on FILE OPEN DATA. Look in the appropriate location for

the text file. Then select Text from Files of type: Click on the file name and then

click on Open. You will see the Text Import Wizard step 1 of 6 dialog box.

You will now have an SPSS data file containing the former tab-delimited data. You

simply need to add variable and value labels and define missing values.

click on FILE SAVE AS. Click on the File Name for the file to be exported. For

the Save as Type select from the pull-down menu Excel (*.xls). You will notice the

checkbox for write variable names to spreadsheet. Leave this checked as you will

want the variable names to be in the first row of each column in the Excel

spreadsheet. Finally, click on Save.

Statistics Package

for the Social Science (SPSS)

Running the FREQUENCIES procedure

1. Open the data file (from the menus, click on FILE OPEN DATA) of

interest.

FREQUENCIES

3. The FREQUENCIES Dialog Box will appear. In the left-hand box will be a listing

("source variable list") of all the variables that have been defined in the data file. The

first step is identifying the variable(s) for which you want to run a frequency analysis.

Click on a variable name(s). Then click the [ > ] pushbutton. The variable name(s)

will now appear in the VARIABLE[S]: box ("selected variable list"). Repeat these

steps for each variable of interest.

(raw, adjusted and cumulative), then click on OK.

Statistics Package

for the Social Science (SPSS)

Requesting STATISTICS

Descriptive and summary STATISTICS can be requested for numeric variables. To

request Statistics:

1. From the FREQUENCIES Dialog Box, click on the STATISTICS... pushbutton.

2. This will bring up the FREQUENCIES: STATISTICS Dialog Box.

3. The STATISTICS Dialog Box offers the user a variety of choices:

DESCRIPTIVES

(click on ANALYZE DESCRIPTIVE STATISTICS DESCRIPTIVES). The

procedure offers many of the same statistics as the FREQUENCIES procedure,

but without generating frequency analysis tables.

Statistics Package

for the Social Science (SPSS)

Requesting CHARTS

One can request a chart (graph) to be created for a variable or variables included in

a FREQUENCIES procedure.

2. The FREQUENCIES: CHARTS Dialog box will appear. Choose the intended chart

(e.g. Bar diagram, Pie chart, histogram.

1. Click on the chart.

2. Click on the pulldown menu EDIT COPY OBJECTS

3. Go to the Word document in which the chart is to be embedded. Click on EDIT

PASTE SPECIAL

4. Select Formatted Text (RTF) and then click on OK

5. Enlarge the graph to a desired size by dragging one or more of the black squares

along the perimeter (if the black squares are not visible, click once on the graph).

Statistics Package

for the Social Science (SPSS)

BASIC STATISTICAL PROCEDURES: CROSSTABS

1. From the ANALYZE pull-down menu, click on DESCRIPTIVE STATISTICS

CROSSTABS.

2. The CROSSTABS Dialog Box will then open.

3. From the variable selection box on the left click on a variable you wish to

designate as the Row variable. The values (codes) for the Row variable make up

the rows of the crosstabs table. Click on the arrow (>) button for Row(s). Next,

click on a different variable you wish to designate as the Column variable. The

values (codes) for the Column variable make up the columns of the crosstabs

table. Click on the arrow (>) button for Column(s).

4. You can specify more than one variable in the Row(s) and/or Column(s). A cross

table will be generated for each combination of Row and Column variables

Statistics Package

for the Social Science (SPSS)

Limitations: SPSS users have less control over data manipulation and

statistical output than other statistical packages such as SAS, Stata etc.

in social science because it is easy to use and because it can be a good

starting point to learn more advanced statistical packages.

Introduction to

Statistics

Colm ODushlaine

codushlaine@gmail.com

145

Overview

Presentation of Data

Statistical Inference

Hypothesis Tests & Confidence Intervals

T-tests (Paired/Two-sample)

Regression (SLR & Multiple Regression)

ANOVA/ANCOVA

Intended as an interview. Will provide slides

after lectures

Whats in the lectures?...

146

Lecture 1 Lecture 2 Lecture 3 Lecture 4

Descriptive Statistics and Graphical Presentation of

Data

1. Terminology

2. Frequency Distributions/Histograms

3. Measures of data location

4. Measures of data spread

5. Box-plots

6. Scatter-plots

7. Clustering (Multivariate Data)

147

Lecture 1 Lecture 2 Lecture 3

Lecture 4 Statistical Inference

2. Normal Distribution

3. Sampling Distribution & Central Limit

Theorem

4. Hypothesis Tests

5. P-values

6. Confidence Intervals

7. Two-Sample Inferences

8. Paired Data

148

Lecture 1 Lecture 2 Lecture 3

Lecture 4 Sample Inferences

1. Two-Sample Inferences

Paired t-test

Two-sample t-test

2. Inferences for more than two samples

One-way ANOVA

Two-way ANOVA

Interactions in Two-way ANOVA

3. DataDesk demo

149

Lecture 1 Lecture 2 Lecture 3

Lecture 4

1. Regression

2. Correlation

3. Multiple Regression

4. ANCOVA

5. Normality Checks

6. Non-parametrics

7. Sample Size Calculations

8. Useful tools and websites

150

FIRST, A REALLY USEFUL SITE

Explanations of outputs

Videos with commentary

Help with deciding what test

to use with what data

151

1. Terminology

Populations & Samples

Population: the complete set of

individuals, objects or scores of interest.

Often too large to sample in its entirety

It may be real or hypothetical (e.g. the results

from an experiment repeated ad infinitum)

A sample may be classified as random (each

member has equal chance of being selected

from a population) or convenience (whats

available).

Random selection attempts to ensure the

sample is representative of the population.

152

Variables

Variables are the quantities

measured in a sample.They may

be classified as:

Quantitative i.e. numerical

Continuous (e.g. pH of a sample, patient

cholesterol levels)

Discrete (e.g. number of bacteria

colonies in a culture)

Categorical

Nominal (e.g. gender, blood group)

Ordinal (ranked e.g. mild, moderate or

severe illness). Often ordinal variables

are re-coded to be quantitative. 153

Variables

Variables can be further classified as:

Dependent/Response. Variable of primary

interest (e.g. blood pressure in an

antihypertensive drug trial). Not controlled by

the experimenter.

Independent/Predictor

called a Factor when controlled by

experimenter. It is often nominal (e.g.

treatment)

Covariate when not controlled.

If the value of a variable cannot be

predicted in advance then the variable is

154

referred to as a random variable

Parameters & Statistics

Parameters: Quantities that

describe a population

characteristic. They are usually

unknown and we wish to make

statistical inferences about

parameters. Different to

perimeters.

Descriptive Statistics:

Quantities and techniques used to

describe a sample characteristic or155

2. Frequency Distributions

An (Empirical) Frequency Distribution

or Histogram for a continuous variable

presents the counts of observations

grouped within pre-specified classes or

groups

presents the corresponding proportions of

observations within the classes

categorical variable

156

Example Serum CK

Blood samples taken from 36 male

volunteers as part of a study to

determine the natural variation in CK

concentration.

measured in (U/I) are as follows:

157

Serum CK Data for 36 male

volunteers

95 145 64 201 101 163

84 57 139 60 78 94

119 104 110 113 118 203

62 83 67 93 92 110

25 123 70 48 95 42

158

Relative Frequency Table

Serum CK Frequency Relative Cumulative Rel.

(U/I) Frequency Frequency

20-39 1 0.028 0.028

40-59 4 0.111 0.139

60-79 7 0.194 0.333

80-99 8 0.222 0.555

100-119 8 0.222 0.777

120-139 3 0.083 0.860

140-159 2 0.056 0.916

160-179 1 0.028 0.944

180-199 0 0.000 0.944

200-219 2 0.056 1.000

Total 36 1.000

159

Frequency Distribution

Distributions

CK-concentration-(U/l)

Quantiles

8 100.0% maximu

99.5%

97.5%

90.0%

6 75.0% quart

50.0% media

25.0% quart

Frequency

10.0%

4 2.5%

0.5%

0.0% minimu

160

Relative Frequency

Distribution

Distributions

CK-concentration-(U/l)

Quantiles

Mode

Shaded area is 100.0% maxim

percentage of 99.5%

males with CK 0.20 97.5%

values between 90.0%

60 and 100 U/l, 75.0% quar

Relative Frequency

i.e. 42%. 0.15 50.0% med

Right tail 25.0% quar

10.0%

(skewed) 2.5%

0.10 0.5%

0.0% minim

Left tail

0.05

161

3. Measures of Central

Tendency (Location)

Measures of location indicate where on the

number line the data are to be found. Common

measures of location are:

(ii) the Median, and

(iii) the Mode

162

The Mean

values of a random variable X, from

a sample of size n. The sample

arithmetic mean is defined as:

n

x 1

n xi

i 1

163

Example

Example 2: The systolic blood pressure

of seven middle aged men were as

follows:

151, 124, 132, 170, 146, 124 and 113.

x

151 124 132 170 146 124 113

7

The mean is 137.14

164

The Median and Mode

If the sample data are arranged in

increasing order, the median is

(i) the middle value if n is an odd

number, or

(ii) midway between the two middle

values if n is an even number

The mode is the most commonly

occurring value.

165

Example 1 n is odd

The reordered systolic blood pressure data seen

earlier are:

data, i.e. 132.

124 mm Hg, so the Mode is 124.

166

Example 2 n is even

investigate the effects of diet on cholesterol level. At the

beginning of the study, their cholesterol levels (mg/dL)

were as follows:

366, 327, 274, 292, 274 and 230.

Rearrange the data in numerical order as follows:

i.e. (274+292) 2 = 283.

Two men have the same cholesterol level- the Mode is 274.

167

Mean versus Median

happen if the histogram of the data is right-skewed.

a better measure of centrality if the distribution is skewed.

symmetrical

98.28. The median = 94.5, i.e. mean is larger than median

indicating that mean is inflated by two large data values

201 and 203.

168

4. Measures of Dispersion

spread out the distribution is, i.e., how

variable the data are.

Commonly used measures of dispersion

include:

1. Range

2. Variance & Standard deviation

3. Coefficient of Variation (or relative standard

deviation)

4. Inter-quartile range

169

Range

the sample Range is the difference

between the largest and smallest

observations in the sample

easy to calculate;

Blood pressure example: min=113

and max=170, so the range=57

mmHg

useful for best or worst case

scenarios

sensitive to extreme values

170

Sample Variance

The sample variance, s2, is the

arithmetic mean of the squared

deviations from the sample mean:

n

xi x

2

s i 1

2

n 1

>

171

Standard Deviation

The sample standard deviation, s, is

the square-root of the variance

n

xi x

2

i 1

s

n 1

as the original variable x

172

Example

Data Deviation Deviation2

151 13.86 192.02

124 -13.14 172.73

132 -5.14 26.45

170 32.86 1079.59

146 8.86 78.45

124 -13.14 172.73

113 -24.14 582.88

Sum = 960.0 Sum = 0.00 Sum = 2304.86

x 137.14 173

Example (contd.)

7

x x

2

i 2304.86

7

i 1

i s 2304.86

2

x x

Therefore,

i 1 7 1

19.6

174

Coefficient of Variation

The coefficient of variation (CV) or

relative standard deviation (RSD) is the

sample standard deviation expressed as a

percentage of thes mean, i.e.

CV 100%

x

The CV is not affected by multiplicative

changes in scale

Consequently, a useful way of comparing the

dispersion of variables measured on different

scales

175

Example

The CV of the blood pressure data is:

19.6

CV 100 %

137.1

14.3%

large as the mean.

176

Inter-quartile range

The Median divides a distribution into two

halves.

Q3) are defined as follows:

25% of the data lie below Q1 (and 75% is above Q1),

25% of the data lie above Q3 (and 75% is below Q3)

difference between the first and third quartiles,

i.e. 177

Example

The ordered blood pressure data is:

113124 124 132 146 151 170

Q1 Q3

= 27

178

60% of slides complete!

179

5. Box-plots

A box-plot is a visual description of

the distribution based on

Minimum

Q1

Median

Q3

Maximum

Useful for comparing large sets of

data

180

Example 1

The pulse rates of 12 individuals

arranged in increasing order are:

62, 64, 68, 70, 70, 74, 74, 76, 76, 78,

78, 80

77

181

Example 1: Box-plot

182

Example 2: Box-plots of intensities

from 11 gene expression arrays

14

12

10

8

183

Outliers

An outlier is an observation which

does not appear to belong with the

other data

Outliers can arise because of a

measurement or recording error or

because of equipment failure during

an experiment, etc.

An outlier might be indicative of a sub-

population, e.g. an abnormally low or

high value in a medical test could 184

Outlier Boxplot

of the boxplots (the whisker lines) as:

Lower limit = Q1-1.5IQR, and

Upper limit = Q3+1.5IQR

as these limits

If a data point is < lower limit or >

upper limit, the data point is

considered to be an outlier. 185

Example CK data

outliers

186

6. Scatter-plot

two continuous variables

analysis when exploring data and

determining is a linear regression

analysis is appropriate

Example 1: Age versus Systolic

Blood Pressure in a Clinical Trial

188

Example 2: Up-regulation/Down-regulation

of gene expression across an array

(Control Cy5 versus Disease Cy3)

189

Example of a Scatter-plot matrix

(multiple pair-wise plots)

190

Other graphical representations

Dot-Plots, Stem-and-leaf plots

Not visually appealing

Pie-chart

Visually appealing, but hard to compare two

datasets. Best for 3 to 7 categories. A total must be

specified.

Violin-plots

=boxplot+smooth density

Nice visual of data shape

191

Multivariate Data

Clustering is useful for visualising

multivariate data and uncovering patterns,

often reducing its complexity

dimensional data (p>>n): hundreds or

perhaps thousands of variables

electrophoresis and microarray

experiments where the variables are

protein abundances or gene expression

ratios

192

7. Clustering

sharing similiarity

between objects, quantifying a notion of

(dis)similarity

Points are grouped on the basis on minimum

distance apart (distance measures)

into a single point (using a linkage method)

e.g. take their average. The process is then

repeated. 193

Clustering

Clustering can be applied to rows or columns of a data

set (matrix) i.e. to the samples or variables

proportional to distances between linked clusters, called

a Dendrogram

use is made of sample annotations i.e. treatment groups,

diagnosis groups

194

UPGMA

Unweighted Pair-Group Method Average

Most commonly used clustering method

Procedure:

1. Each observation forms its own cluster

2. The two with minimum distance are grouped into

a single cluster representing a new observation-

take their average

3. Repeat 2. until all data points form a single

cluster

195

Contrived Example

5 genes of interest on 3 replicates arrays/gels

Array1 Array2 Array3

p53 9 3 7

mdm2 10 2 9

bcl2 1 9 4

d xy ( x1 y1 ) ( x2 y2 ) ( x3 y3 )

2 2 2

cyclinE 6 5 5

caspase 8 1 10 3

e.g. d ( p53, mdm2) (9 10) 2 (3 2) 2 (7 9) 2 2.5

196

Example

Construct a distance matrix of all pair-wise

distances

p53 mdm2 bcl2 cyclinE caspase 8

mdm2 - 0 12.5 6.4 13.93

bcl2 - - 0 6.48 1.41

cyclinE - - - 0 7.35

caspase 8 - - - - 0

Take their average & re-calculate distances to other genes

197

{caspase-8 &

p53 mdm2 cyclin E

bcl-2}

p53 0 2.5 4.12 10.9

mdm2 0 6.4 9.1

cyclin E 0 6.9

{caspase-8 &

0

bcl-2}

cyclin E

mdm2} bcl-2}

{p53 & mdm2} 0 3.7 9.2

cyclin E 0 6.9

198

Example (contd)

cluster:

199

Example of a gene expression

dendrogram

200

Variety of approaches to clustering

Clustering techniques

agglomerative -start with every element in its own

cluster, and iteratively join clusters together

divisive - start with one cluster and iteratively divide it

into smaller clusters

Distance Metrics

Euclidean (as-the-crow-flies)

Manhattan

Minkowski (a whole class of metrics)

Correlation (similarity in profiles: called similarity

metrics)

Linkage Rules

average: Use the mean distance between cluster

members

single: Use the minimum distance (gives loose clusters)

complete: Use the maximum distance (gives tight

clusters)

median: Use the median distance

centroid: Use the distance between the average 201

Clustering Summary

The clusters & tree topology often depend

highly on the distance measure and linkage

method used

such as Euclidean and a correlation metric

clusters, whether the data are organised in

clusters or not!

202

What is Statistics?

Statistics is a way to get information

from data

Statistics

Data Information

especially numerical Knowledge

facts, collected communicated

together for concerning some

reference or particular fact.

information.

1.203

Interval Data

Interval data

Real numbers, i.e. heights, weights,

prices, etc.

Also referred to as quantitative or

numerical.

Interval Data, thus its meaningful to talk

about 2*Height, or Price + $1, and so on.

1.204

Nominal Data

Nominal Data

The values of nominal data are categories.

E.g. responses to questions about marital status,

coded as:

Single = 1, Married = 2, Divorced = 3, Widowed = 4

operations dont make any sense (e.g. does Widowed

2 = Married?!)

categorical.

1.205

Ordinal Data

Ordinal Data appear to be categorical in nature, but their

values have an order; a ranking to them:

poor = 1, fair = 2, good = 3, very good = 4, excellent = 5

(e.g. does 2*fair = very good?!), we can say things like:

excellent > poor or fair < very good

That is, order is maintained no matter what numeric

values are assigned to each category.

1.206

Graphical & Tabular Techniques for Nominal

Data

count the frequency of each value of the variable.

the categories and their counts called a frequency

distribution.

categories and the proportion with which each occurs.

1.207

Nominal Data (Tabular

Summary)

1.208

Nominal Data (Frequency)

1.209

Nominal Data

It all the same information,

(based on the same data).

Just different presentation.

1.210

Graphical Techniques for Interval

Data

There are several graphical methods that are

used when the data are interval (i.e. numeric,

non-categorical).

is the histogram.

technique used to summarize interval data,

but it is also used to help explain probabilities.

1.211

Building a Histogram

1) Collect the Data

2) Create a frequency distribution for

the data.

3) Draw the Histogram.

1.212

Histogram and Stem &

Leaf

1.213

Ogive

distribution.

1) Calculate relative frequencies.

2) Calculate cumulative relative

frequencies by adding the current class

relative frequency to the previous class

cumulative relative frequency.

(For the first class, its cumulative relative frequency is just its

relative frequency)

1.214

Cumulative Relative

Frequencies

first class

next class: .

355+.185=.540

:

:

last class: .

930+.070=1.00

1.215

Ogive

The ogive can be used

to answer questions

like:

value is at the 50th

percentile?

around $35

(Refer also to Fig. 2.13 in your textbook

1.216

Scatter Diagram

Example 2.9 A real estate agent wanted

to know to what extent the selling price

of a home is related to its size

2) Determine the independent variable (X

house size) and the dependent variable

(Y selling price)

3) Use Excel to create a scatter diagram

1.217

Scatter Diagram

It appears that in fact there is a

relationship, that is, the greater the

house size the greater the selling

price

1.218

Patterns of Scatter

Diagrams

Linearity and Direction are two

concepts we are interested in

Time Series Data

Observations measured at the same point in

time are called cross-sectional data.

in time are called time-series data.

which plots the value of the variable on the

vertical axis against the time periods on the

horizontal axis.

1.220

Numerical Descriptive

Techniques

Measures of Central Location

Mean, Median, Mode

Measures of Variability

Range, Standard Deviation, Variance, Coefficient of

Variation

Percentiles, Quartiles

Covariance, Correlation, Least Squares Line

1.221

Measures of Central

Location

The arithmetic mean, a.k.a.

average, shortened to mean, is the

most popular & useful measure of

central location.

Sum of the observations

Mean

It is =

computed byofsimply adding up

Number observations

all the observations and dividing by

the total number of observations:

1.222

Arithmetic Mean

Sample Mean

Population Mean

1.223

Statistics is a pattern

language

Population Sample

Size N n

Mean

1.224

The Arithmetic Mean

is appropriate for describing

measurement data, e.g. heights of

people, marks of student papers, etc.

called outliers. E.g. as soon as a

billionaire moves into a neighborhood,

the average household income increases

beyond what it was previously!

1.225

Measures of Variability

Measures of central location fail to

tell the whole story about the

distribution; that is, how much are

For example, two sets of class grades

the The

are shown. observations

mean (=50) is the spread out around

same in each case

the mean value?

But, the red class has greater

variability than the blue class.

1.226

Range

The range is the simplest measure of variability,

calculated as:

E.g.

Data: {4, 4, 4, 4, 50} Range = 46

Data: {4, 8, 15, 24, 39, 50} Range = 46

The range is the same in both cases,

but the data sets have very different distributions

1.227

Statistics is a pattern

language

Population Sample

Size N n

Mean

Variance

1.228

Variance population mean

population size

The variance of a populationsample

is: mean

Note! the denominator is sample size (n) minus one !

1.229

Application

Example 4.7. The following sample consists of the

number of jobs six randomly selected students

applied for: 17, 15, 23, 7, 9, 13.

Finds its mean and variance.

jobs six randomly selected students applied for:

17, 15, 23, 7, 9, 13.

Finds its mean and variance.

as opposed to or 2

1.230

Sample

Sample Mean

Mean & Variance

Sample Variance

1.231

Standard Deviation

The standard deviation is simply the

square root of the variance, thus:

1.232

Standard Deviation

Consider Example 4.8 where a golf

club manufacturer has designed a

new club and wants to determine if it

is hit more consistently (i.e. with less

variability) than with an old club.

Using Tools > Data Analysis >

[may need to add in

You get more

consistent

the following tables for distance with

the new club.

interpretation

1.233

The Empirical Rule If the histogram

Approximately 68% of all observations

is bell shaped fall

within one standard deviation of the mean.

within two standard deviations of the mean.

within three standard deviations of the mean.

1.234

Chebysheffs TheoremNot often used because interval

is very wide.

standard deviation is derived from

Chebysheffs Theorem, which applies to

all shapes of histograms (not just bell

shaped).

For k=2 (say), the theorem

states that atin

The proportion of observations least

any 3/4 of all

observations lie within 2

sample that lie standard deviations of the

mean. This

within k standard deviations ofisthe

a lower

meanbound

is

compared to Empirical Rules

at least: approximation (95%).

1.235

Box Plots

These box plots are

based on data in

Xm04-15.

Wendys service

time is shortest and

least variable.

greatest variability,

while Jack-in-the-Box

has the longest

service times.

1.236

Methods of Collecting

Data

There are many methods used to

collect or obtain data for statistical

analysis. Three of the most popular

methods are:

Direct Observation

Experiments, and

Surveys.

1.237

Sampling

Recall that statistical inference permits us to draw

conclusions about a population based on a sample.

population) is often done for reasons of cost (its less

expensive to sample 1,000 television viewers than

100 million TV viewers) and practicality (e.g.

performing a crash test on every automobile

produced is impractical).

target population should be similar to one another.

1.238

Sampling Plans

A sampling plan is just a method or

procedure for specifying how a sample will be

taken from a population.

methods:

Stratified Random Sampling, and

Cluster Sampling.

1.239

Simple Random Sampling

A simple random sample is a sample

selected in such a way that every possible

sample of the same size is equally likely to

be chosen.

all the names of the students in the class is

an example of a simple random sample: any

group of three names is as equally likely as

picking any other group of three names.

1.240

Stratified Random

Sampling

After the population has been

stratified, we can use simple

random sampling to generate the

complete sample:

we would draw 100 of them from the low income group

50 of them from the high income group.

1.241

Cluster Sampling

A cluster sample is a simple random sample of

groups or clusters of elements (vs. a simple

random sample of individual objects).

to develop a complete list of the population

members or when the population elements are

widely dispersed geographically.

due to similarities among cluster members.

1.242

Sampling Error

Sampling error refers to differences between the

sample and the population that exist only because of the

observations that happened to be selected for the

sample.

for different samples (of the same size) is due to

sampling error:

happened to get the highest income level data points in

our first sample and all the lowest income levels in the

second, this delta is due to sampling error.

1.243

Nonsampling Error

Nonsampling errors are more serious and are

due to mistakes made in the acquisition of data or

due to the sample observations being selected

improperly. Three types of nonsampling errors:

Nonresponse errors, and

Selection bias.

this type of error.

1.244

Approaches to Assigning

Probabilities

There are three ways to assign a probability, P(O i),

to an outcome, Oi, namely:

(such as equally likely, independence) about

situation.

based on experimentation or historical data.

based on the assignors judgment.

1.245

Interpreting Probability

One way to interpret probability is this:

number of times, the relative frequency for any

given outcome is the probability of this outcome.

balanced coin is .5, determined using the classical

approach. The probability is interpreted as being

the long-term relative frequency of heads if the

coin is flipped an infinite number of times.

1.246

Conditional Probability

Conditional probability is used to

determine how two events are

related; that is, we can determine

the probability of one event given

the occurrence of another related

event.

as P(A | B) and read as the

probability of A given B and is 1.247

Independence

One of the objectives of calculating conditional

probability is to determine whether two events are

related.

independent, that is, if the probability of one event

is not affected by the occurrence of the other event.

P(A|B) = P(A)

or

P(B|A) = P(B)

1.248

Complement Rule

The complement of an event A is the event that occurs

when A does not occur.

event NOT occurring. That is:

P(AC) = 1 P(A)

of the number 1 being rolled is 1/6. The probability

that some number other than 1 will be rolled is 1

1/6 = 5/6.

1.249

Multiplication Rule

The multiplication rule is used to

calculate the joint probability of

two events. It is based on the

formula for conditional probability

defined earlier:

If we multiply both sides of the equation by P(B) we have:

1.250

Addition Rule

Recall: the addition rule was introduced

earlier to provide a way to compute the

probability of event A or B or both A and B

occurring; i.e. the union of A and B.

and B) from the sum of the probabilities of A

P(A orB?

and B) = P(A) + P(B) P(A and B)

1.251

Addition Rule for Mutually Excusive

Events

If and A and B are mutually exclusive the occurrence of

one event makes the other one impossible. This means

that

P(A and B) = 0

probabilities calculated from a probability tree

1.252

Two Types of Random

Variables

Discrete Random Variable

one that takes on a countable number of values

E.g. values on the roll of dice: 2, 3, 4, , 12

one whose values are not discrete, not countable

E.g. time (30.1 minutes? 30.10000001 minutes?)

Analogy:

Integers are Discrete, while Real Numbers are

Continuous

1.253

Laws of Expected Value

1. E(c) = c

The expected value of a constant (c) is just

the value of the constant.

2. E(X + c) = E(X) + c

3. E(cX) = cE(X)

We can pull a constant out of the

expected value expression (either as part of

a sum with a random variable X or as a

coefficient of random variable X).

1.254

Laws of Variance

1. V(c) = 0

The variance of a constant (c) is zero.

2. V(X + c) = V(X)

The variance of a random variable and a constant is

just the variance of the random variable (per 1 above).

3. V(cX) = c2V(X)

The variance of a random variable and a constant

coefficient is the coefficient squared times the variance

of the random variable.

1.255

Binomial Distribution

The binomial distribution is the probability

distribution that results from doing a binomial

experiment. Binomial experiments have the

following properties:

2. Each trial has two possible outcomes, a success

and a failure.

3. P(success)=p (and thus: P(failure)=1p), for all trials.

4. The trials are independent, which means that the

outcome of one trial does not affect the outcomes of

any other trials.

1.256

Binomial Random Variable

The binomial random variable

counts the number of successes in n

trials of the binomial experiment. It

can take on values from 0, 1, 2, , n.

Thus, its a discrete random variable.

for x=0, 1, 2, , n

associated with each value we use

combintorics:

1.257

Binomial Table

What is the probability that Pat fails

the quiz?

i.e. what is P(X 4), given

P(success) = .20 and n=10 ?

P(X 4) = .967

1.258

Binomial Table

What is the probability that Pat gets

two answers correct?

i.e. what is P(X = 2), given

P(success) = .20 and n=10 ?

remember, the table shows cumulative probabilities 1.259

=BINOMDIST() Excel

Function

There is a binomial distribution

function in Excel that can also be

used to calculate these probabilities.

# successes

P(success)

two answers correct?

cumulative

(i.e. P(Xx)?)

P(X=2)=.3020

1.260

=BINOMDIST() Excel

Function

There is a binomial distribution

function in Excel that can also be

used to calculate these probabilities.

# successes

P(success)

the quiz?

cumulative

(i.e. P(Xx)?)

P(X4)=.9672

1.261

Binomial Distribution

As you might expect, statisticians

have developed general formulas for

the mean, variance, and standard

deviation of a binomial random

variable. They are:

1.262

Poisson Distribution

Named for Simeon Poisson, the Poisson

distribution is a discrete probability distribution

and refers to the number of events (a.k.a.

successes) within a specific time period or region

of space. For example:

The number of cars arriving at a service station in 1

hour. (The interval of time is 1 hour.)

The number of flaws in a bolt of cloth. (The specific

region is a bolt of cloth.)

The number of accidents in 1 day on a particular

stretch of highway. (The interval is defined by both time,

1 day, and space, the particular stretch of highway.)

1.263

The Poisson Experiment

Like a binomial experiment, a Poisson experiment

has four defining characteristic properties:

1. The number of successes that occur in any interval is

independent of the number of successes that occur

in any other interval.

2. The probability of a success in an interval is the

same for all equal-size intervals

3. The probability of a success is proportional to the

size of the interval.

4. The probability of more than one success in an

interval approaches 0 as the interval becomes

smaller.

1.264

Poisson Distribution

The Poisson random variable is the number

of successes that occur in a period of time or

successes

an interval of space in a Poisson experiment.

time period

crossing

every hour.

textbook edition averages 1.5 per 100 pages.

successes (?!) interval

1.265

Poisson Probability

Distribution

The probability that a Poisson random

variable assumes a value of x is given by:

FYI:

1.266

Example 7.12

The number of typographical errors in new

editions of textbooks varies considerably

from book to book. After some analysis he

concludes that the number of errors is

Poisson distributed with a mean of 1.5 per

100 pages. The instructor randomly selects

100 pages of a new book. What is the

probability that there are no typos?

There is about a 22% chance of finding zero errors

1.267

Poisson Distribution

As mentioned on the Poisson experiment slide:

proportional to the size of the interval

100 pages, we can determine a mean value for

a 400 page book as:

1.268

Example 7.13

For a 400 page book, what is the

probability that there are

no typos?

P(X=0) =

there is a very small chance there are no typos

1.269

Example 7.13

Excel is an even better alternative:

1.270

Probability Density

Functions

Unlike a discrete random variable which

we studied in Chapter 7, a continuous

random variable is one that can assume

an uncountable number of values.

We cannot list the possible values

because there is an infinite number of

them.

Because there is an infinite number of

values, the probability of each individual

value is virtually 0.

1.271

Point Probabilities are Zero

Because there is an infinite number of values, the

probability of each individual value is virtually 0.

of values only.

meaningful to talk about P(X=5), say.

In a continuous setting (e.g. with time as a random variable), the

probability the random variable of interest, say task length, takes

exactly 5 minutes is infinitesimally small, hence P(X=5) = 0.

It is meaningful to talk about P(X 5).

1.272

Probability Density

Function

A function f(x) is called a probability density

function (over the range a x b if it meets

the following requirements:

f(x)

area=1

a b x

2) The total area under the curve between a and b is

1.0

1.273

The Normal Distribution

The normal distribution is the most important of

all probability distributions. The probability density

function of a normal random variable is given by:

Bell shaped,

Symmetrical around the mean

1.274

The Normal Distribution

Important things to note:

The normal distribution is fully defined by two parameters:

its standard deviation and mean

symmetrical about the mean

Normal distributions range from minus infinity to plus infinity

1.275

Standard Normal

Distribution

A normal distribution whose mean is zero and standard

deviation is one is called

0 the standard normal

distribution. 1

converted to a standard normal distribution with

simple algebra. This makes calculations much easier.

1.276

Calculating Normal

Probabilities

We can use the following function to

convert any normal random variable

to a standard normal random

variable

0

Some advice:

always draw a

picture!

1.277

Calculating Normal

Probabilities

Example: The time required to build a computer is

normally distributed with a mean of 50 minutes

and a standard deviation of 10 minutes:

assembled in a time between 45 and 60 minutes?

1.278

Calculating Normal

Probabilities

mean of 50 minutes and a

standard deviation of 10 minutes

P(45 < X < 60) ?

1.279

Calculating Normal

Probabilities

We can use Table 3 in

Appendix B to look-up

probabilities P(0 < Z < z)

P(.5 < Z < 0) + P(0 < Z < 1)

P(.5 < Z < 0) = P(0 < Z < .5)

Hence: P(.5 < Z < 1) = P(0 < Z < .5) + P(0 < Z <

1)

1.280

Calculating Normal

Probabilities

How to use Table 3

This table gives probabilities P(0 < Z < z)

First column = integer + first decimal

Top row = second decimal place

1.281

Using the Normal Table (

Table 3)

What is P(Z > 1.6)P(0?< Z < 1.6) = .4452

0 1.6

= .5 .4452

= .0548

1.282

Using the Normal Table (

Table 3)

What is P(Z < -2.23) ? P(0 < Z < 2.23)

-2.23 0 2.23

= .5 P(0 < Z < 2.23)

= .0129

1.283

Using the Normal Table (

Table 3)

What is P(Z < 1.52)P(0

P(Z < 0) = .5 ? < Z < 1.52)

0 1.52

= .5 + .4357

= .9357

1.284

Using the Normal Table (

Table 3)

What is P(0.9 < ZP(0

<<1.9)

Z < 0.9) ?

0 0.9 1.9

P(0.9 < Z < 1.9) = P(0 < Z < 1.9) P(0 < Z < 0.9)

=.4713 .3159

= .1554

1.285

Finding Values of Z

Z.05 = 1.645

Z.01 = 2.33

1.286

Using the values of Z

-1.96, it follows that we can state

Similarly

P(-1.645 < Z < 1.645) = .90

1.287

Other Continuous

Distributions

Three other important continuous

distributions which will be used

extensively in later sections are

introduced here:

Student t Distribution,

Chi-Squared Distribution, and

F Distribution.

1.288

Student t Distribution

Here the letter t is used to represent the random

variable, hence the name. The density function

for the Student t distribution is as follows

(Gamma function) is (k)=(k-1)(k-2)(2)(1)

1.289

Student t Distribution

In much the same way that and define the normal

distribution, , the degrees of freedom, defines the

Student

t Distribution:

Figure 8.24

As the number of degrees of freedom increases, the t

distribution approaches the standard normal distribution.

1.290

Determining Student t

Values

The student t distribution is used extensively in

statistical inference. Table 4 in Appendix B lists values of

degrees of freedom such that:

critical values, typically in the

10%, 5%, 2.5%, 1% and 1/2% range.

1.291

Using the t table (Table 4) for

values

For example, if we

Area under thewant the(tvalue

curve valueA

of

) : COLUMN

t with 10 degrees of freedom such

tthat the area under the Student t

.05,10

curve is .05:

t.05,10=1.812

1.292

F Distribution

The F density function is given by:

like weve already seen these are again degrees

of freedom.

is the numerator degrees of freedom and

is the denominator degrees of freedom.

1.293

Determining Values of F

For example, what is the value of F

for 5% of the area under the right

hand tail of the curve, with a

There are different tables

numerator

for different values of A. degree of freedom of 3

Make sure you start with

andtable!!

the correct a denominator degree of

freedom of 7? F =4.35

F

.05,3,7

Solution:

.05,3,7 use the F look-up (Table 6)

Denominator Degrees of Freedom : ROW

Numerator Degrees of Freedom : COLUMN

1.294

Determining Values of F

For areas under the curve on the left

hand side of the curve, we can

leverage the following relationship:

1.295

Chapter 9

Sampling Distributions

1.296

Sampling Distribution of the

Mean

A fair die is thrown infinitely many times,

with the random variable X = # of spots on

any throw.

x 1 2 3 4 5 6

The probability distribution of X is:

P(x) 1/6 1/6 1/6 1/6 1/6 1/6

as well:

1.297

Sampling Distribution of Two

Dice

A sampling distribution is created by looking at

all samples of size n=2 (i.e. two dice) and their means

only 11 values for , and some (e.g. =3.5) occur more

frequently than others (e.g. =1).

1.298

Sampling Distribution of Two Dice

The

P( )sampling

6/36

distribution of is

shown below:

1.0

1.5

1/36

2/36

5/36

2.0 3/36

4/36

)

2.5 4/36

3.0 5/36

3/36

P(

3.5 6/36

4.0 5/36

4.5 4/36 2/36

5.0 3/36

5.5 2/36

6.0 1/36 1/36

1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0

1.299

Compare

Compare the distribution of X

1 2 3 4 5 6 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0

1.300

Central Limit Theorem

The sampling distribution of the

mean of a random sample drawn

from any population is

approximately normal for a

sufficiently large sample size.

closely the sampling distribution of X

will resemble a normal distribution.

1.301

Central Limit Theorem

If the population is normal, then X is normally

distributed for all values of n.

approximately normal only for larger values of n.

may be sufficiently large to allow us to use the

normal distribution as an approximation for the

sampling distribution of X.

1.302

Sampling Distribution of the Sample

Mean

1.

2.

3. If X is normal, X is normal. If X is

nonnormal, X is approximately normal for

sufficiently large sample sizes.

Note: the definition of sufficiently large

depends on the extent of nonnormality of x

(e.g. heavily skewed; multimodal)

1.303

Example 9.1(a)

The foreman of a bottling plant has

observed that the amount of soda in each

32-ounce bottle is actually a normally

distributed random variable, with a mean

of 32.2 ounces and a standard deviation of

.3 ounce.

probability that the bottle will contain

more than 32 ounces?

1.304

Example 9.1(a)

We want to find P(X > 32), where X is

normally distributed and =32.2

and =.3

that a single bottle of soda

1.305

contains more than 32oz.

Example 9.1(b)

The foreman of a bottling plant has observed

that the amount of soda in each 32-ounce

bottle is actually a normally distributed

random variable, with a mean of 32.2 ounces

and a standard deviation of .3 ounce.

what is the probability that the mean

amount of the four bottles will be greater

than 32 ounces?

1.306

Example 9.1(b)

We want to find P(X > 32), where X is normally

distributed

with =32.2 and =.3

Things we know:

1)X is normally distributed, therefore so will X.

2) = 32.2 oz.

3)

1.307

Example 9.1(b)

If a customer buys a carton of four bottles,

what is the probability that the mean

amount of the four bottles will be greater

than 32 ounces?

of the four bottles will exceed 32oz.

1.308

Graphically Speaking

mean=32.

2

what is the probability that one what is the probability that the

bottle will contain more than 32 mean of four bottles will exceed 32

ounces? oz?

1.309

Sampling Distribution: Difference

of two means

The final sampling distribution introduced is that of the

difference between two sample means. This

requires:

of two normal populations

the difference between the two sample means, i.e.

will be normally distributed.

(note: if the two populations are not both normally

distributed, but the sample sizes are large (>30), the

distribution of is approximately normal)

1.310

Sampling Distribution: Difference

of two means

The expected value and variance of the

sampling distribution of are given by:

mean:

standard deviation:

between two means)

1.311

Estimation

There are two types of inference: estimation

and hypothesis testing; estimation is

introduced first.

the approximate value of a population

parameter on the basis of a sample statistic.

estimate the population mean ( ).

1.312

Estimation

The objective of estimation is to determine

the approximate value of a population

parameter on the basis of a sample statistic.

Point Estimator

Interval Estimator

1.313

Point & Interval Estimation

For example, suppose we want to estimate the mean

summer income of a class of business students. For

n=25 students,

is calculated to be 400 $/week.

The mean income is between 380 and 420 $/week.

1.314

Estimating when is

known

the confidence

interval

We established in Chapter 9:

is in the center of

the interval

Thus, the probability that the interval:

a confidence interval estimator for .

1.315

Four commonly used confidence

levels

Confidence Level

cut

& keep handy!

Table 10.1

1.316

Example 10.1

A computer company samples demand during

lead time over

235 25374

time 309

periods:

499 253

421 361 514 462 369

394 439 348 344 330

261 374 302 466 535

386 316 296 332 334

demand over lead time is 75 computers. We

want to estimate the mean demand over lead

time with 95% confidence in order to set

inventory levels

1.317

CALCULATE

Example 10.1

In order to use our confidence interval estimator, we need

the following pieces of

370.16

Calculated data:

from the data

1.96

75

Given

n 25

therefore:

399.56.

1.318

INTERPRET

Example 10.1

The estimation for the mean demand during lead

time lies between 340.76 and 399.56 we can use

this as input in developing an inventory policy.

lead time falls between 340.76 and 399.56, and this

type of estimator is correct 95% of the time. That

also means that 5% of the time the estimator will be

incorrect.

as 19 times out of 20, which emphasizes the

long-run aspect of the confidence level.

1.319

Interval Width

A wide interval provides little information.

For example, suppose we estimate with 95%

confidence that an accountants average starting

salary is between $15,000 and $100,000.

estimate of starting salaries between $42,000 and

$45,000.

accounting students more precise information about

starting salaries.

1.320

Interval Width

The width of the confidence interval

estimate is a function of the

confidence level, the population

standard deviation, and the sample

size

1.321

Selecting the Sample Size

We can control the width of the interval by

determining the sample size necessary to produce

narrow intervals.

to within 5 units; i.e. we want to the interval

estimate to be:

Since:

It follows that

Solve for n to get requisite sample size!

1.322

Selecting the Sample Size

Solving the equation

interval estimate of the mean (5

units), we need to sample 865 lead

time periods (vs. the 25 data points

we have currently).

1.323

Sample Size to Estimate a

Mean

The general formula for the sample

size needed to estimate a population

mean with an interval estimate of:

this large:

1.324

Example 10.2

A lumber company must estimate the

mean diameter of trees to determine

whether or not there is sufficient lumber to

harvest an area of forest. They need to

estimate this to within 1 inch at a

confidence level of 99%. The tree

diameters are normally distributed with a

standard deviation of 6 inches.

1.325

Example 10.2

Things we know:

We are given that = 6.

1.326

Example 10.2

We compute

1

That is, we will need to sample at

least 239 trees to have a

99% confidence interval of

1.327

Nonstatistical Hypothesis Testing

without the statistics.

In a trial a jury must decide between two hypotheses.

The null hypothesis is

H0: The defendant is innocent

H1: The defendant is guilty

must make a decision on the basis of evidence

presented.

1.328

Nonstatistical Hypothesis Testing

A Type I error occurs when we reject

a true null hypothesis. That is, a Type

I error occurs when the jury convicts

an innocent person.

reject a false null hypothesis. That

occurs when a guilty defendant is

acquitted. 1.329

Nonstatistical Hypothesis Testing

denoted as (Greek letter alpha).

The probability of a type II error is

(Greek letter beta).

related. Decreasing one increases

the other.

1.330

Nonstatistical Hypothesis Testing

1. There are two hypotheses, the null and the alternative

hypotheses.

2. The procedure begins with the assumption that the

null hypothesis is true.

3. The goal is to determine whether there is enough

evidence to infer that the alternative hypothesis is true.

4. There are two possible decisions:

Conclude that there is enough evidence to support the

alternative hypothesis.

Conclude that there is not enough evidence to support

the alternative hypothesis.

1.331

Nonstatistical Hypothesis Testing

Type I error: Reject a true null

hypothesis

Type II error: Do not reject a false

null hypothesis.

P(Type I error) =

P(Type II error) =

1.332

Concepts of Hypothesis Testing (1)

hypothesis and the other the alternative or research

hypothesis. The usual notation is:

pronounced

H nought

parameter equals the value specified in the

alternative hypothesis (H1)

1.333

Concepts of Hypothesis

Testing

Consider Example 10.1 (mean demand for

computers during assembly lead time) again.

Rather than estimate the mean demand, our

operations manager wants to know whether the

mean is different from 350 units. We can

rephrase this request into a test of the hypothesis:

H0: = 350

This becomes:

is what we are

interested in

H1: 350 determining

1.334

Concepts of Hypothesis Testing (4)

the alternative hypothesis

(also stated as: rejecting the null hypothesis in favor of

the alternative)

support the alternative hypothesis

(also stated as: not rejecting the null hypothesis in favor

of the alternative)

NOTE: we do not say that we accept the null

hypothesis

1.335

Concepts of Hypothesis

Testing

Once the null and alternative hypotheses are stated, the

next step is to randomly sample the population and

calculate a test statistic (in this example, the sample

mean).

hypothesis we reject the null hypothesis and infer

that the alternative hypothesis is true.

For example, if were trying to decide whether the mean is

not equal to 350, a large value of (say, 600) would

provide enough evidence. If is close to 350 (say, 355) we

could not say that this provides a great deal of evidence to

infer that the population mean is different than 350.

1.336

Types of Errors

A Type I error occurs when we reject a true null

hypothesis (i.e. Reject H0 when it is TRUE)

H0 T F

Reject I

Reject II

null hypothesis (i.e. Do NOT reject H0 when it is FALSE)

1.337

Recap I

1) Two hypotheses: H0 & H1

2) ASSUME H0 is TRUE

3) GOAL: determine if there is enough

evidence to infer that H1 is TRUE

4) Two possible decisions:

Reject H0 in favor of H1

NOT Reject H0 in favor of H1

5) Two possible types of errors:

Type I: reject a true H0 [P(Type I)= ]

Type II: not reject a false H 0 [P(Type II)= ]

1.338

Example 11.1

A department store manager determines that a

new billing system will be cost-effective only if

the mean monthly account is more than $170.

drawn, for which the sample mean is $178. The

accounts are approximately normally distributed

with a standard deviation of $65.

be cost-effective?

1.339

Example 11.1

The system will be cost effective if the mean account

balance for all customers is greater than $170.

is:

parameter of interest)

1.340

Example 11.1

What we want to show:

H1: > 170

H0: = 170 (well assume this is true)

We know:

n = 400,

= 178, and

= 65

1.341

Example 11.1

To test our hypotheses, we can use two

different approaches:

when computing statistics manually), and

with a computer and statistical software).

1.342

Example 11.1 Rejection

Region

The rejection region is a range of

values such that if the test statistic

falls into that range, we decide to

reject the null hypothesis in favor of

the alternative hypothesis.

1.343

Example 11.1

All thats left to do is calculate

and compare it to 170.

significance ( ) we want

1.344

Example 11.1

At a 5% significance level (i.e. =0.05), we get

Since our sample mean (178) is greater than the

critical value we calculated (175.34), we reject the null

hypothesis in favor of H1, i.e. that: > 170 and that

it is cost effective to install the new billing system

1.345

Example 11.1 The Big

Picture

H0: = 170 =178

Reject H0 in favor of

1.346

Standardized Test Statistic

An easier method is to use the standardized test

statistic:

H1

1.347

PLOT POWER CURVE

1.348

p-Value

The p-value of a test is the probability of

observing a test statistic at least as extreme

as the one computed given that the null

hypothesis is true.

what is the probability of observing a

sample mean at least as extreme as the

one already observed (i.e. = 178), given

that the null hypothesis (H0: = 170) is true?

p-value

1.349

Interpreting the p-value

The smaller the p-value, the more statistical evidence

exists to support the alternative hypothesis.

If the p-value is less than 1%, there is overwhelming

evidence that supports the alternative hypothesis.

If the p-value is between 1% and 5%, there is a

strong evidence that supports the alternative

hypothesis.

If the p-value is between 5% and 10% there is a weak

evidence that supports the alternative hypothesis.

If the p-value exceeds 10%, there is no evidence that

supports the alternative hypothesis.

We observe a p-value of .0069, hence there is

overwhelming evidence to support H1: > 170.

1.350

Interpreting the p-value

Compare the p-value with the selected value of the

significance level:

to be small enough to reject the null hypothesis.

the null hypothesis.

in favor of H1

1.351

Chapter-Opening Example

about the mean payment period. Thus, the parameter

to be tested is the population mean. We want to know

whether there is enough statistical evidence to show

that the population mean is less than 22 days. Thus,

the alternative hypothesis is

H1: < 22

H0: = 22

1.352

Chapter-Opening Example

The x

z test statistic is

/ n

the alternative only if the sample mean and

hence the value of the test statistic is small

enough. As a result we locate the rejection

region in the left tail of the sampling distribution.

We set the significance level at 10%.

1.353

Chapter-Opening Example

z z z.10 1.28

Rejection region:

x

x

4,759

i

21.63

220 220

and

x 21.63 22

z .91

/ n 6 / 220

p-value = P(Z < -.91) = .5 - .3186 = .1814

1.354

Chapter-Opening Example

to infer that the mean is less than 22.

that the plan will be profitable.

We fail to reject Ho: > 22

at a 10% level of significance.

1.355

PLOT POWER CURVE

1.356

Right-Tail Testing

Calculate the critical value of the

mean ( ) and compare against the

observed value of the sample mean (

)

1.357

Left-Tail Testing

Calculate the critical value of the

mean ( ) and compare against the

observed value of the sample mean (

)

1.358

TwoTail Testing

Two tail testing is used when we want

to test a research hypothesis that a

parameter is not equal () to some

value

1.359

Example 11.2

AT&Ts argues that its rates are such that customers wont

see a difference in their phone bills between them and

their competitors. They calculate the mean and standard

deviation for all their customers at $17.09 and $3.87

(respectively).

recalculate a monthly phone bill based on competitors

rates.

H1: 17.09. We do this by assuming that:

H0: = 17.09

1.360

Example 11.2

The rejection region is set up so we can reject the null

hypothesis when the test statistic is large or when it is

small.

area in the rejection region must sum to , so we

divide this probability by 2.

1.361

Example 11.2

At a 5% significance level (i.e. =.

05), we have

/2 = .025. Thus, z.025 = 1.96 and

our rejection region is:

z

-z.025 0 +z.025

1.362

Example 11.2

From the data, we calculate = 17.55

We find that:

1.96 we cannot reject the null hypothesis in favor of H 1.

That is there is insufficient evidence to infer that

there is a difference between the bills of AT&T

and the competitor.

1.363

PLOT POWER CURVE

1.364

Summary of One- and Two-Tail

Tests

(left tail) (right tail)

1.365

Inference About A

Population[SIGMA UNKNOWN]

Population

Sample

Inference

Statistic

Parameter

test three population parameters:

Population Mean

Population Variance

Population Proportion p

1.366

Inference With Variance Unknown

the population mean when the population

standard deviation ( ) was known or given:

population variance?

given by:

1.367

Testing when is

unknown

When the population standard

deviation is unknown and the

population is normal, the test

statistic for testing hypotheses about

is:

= n1 degrees of freedom. The

confidence interval estimator of is

1.368

Example 12.1

Will new workers achieve 90% of the level of

experienced workers within one week of

being hired and trained?

packages/hour, thus if our conjecture is

correct, we expect new workers to be able to

process .90(500) = 450 packages per hour.

1.369

IDENTIFY

Example 12.1

Our objective is to describe the population of the

numbers of packages processed in 1 hour by new

workers, that is we want to know whether the new

workers productivity is more than 90% of that of

experienced workers. Thus we have:

H0: = 450

1.370

COMPUTE

Example 12.1

Our test statistic is:

freedom. Our hypothesis under question is:

H 1: > 450

Our rejection region becomes:

alternative if our calculated test static falls in this

region.

1.371

COMPUTE

Example 12.1

From the data, we calculate = 460.38, s

=38.83 and thus:

Since

sufficient evidence to conclude that the new

workers are producing at more than 90% of

the average of experienced workers.

1.372

IDENTIFY

Example 12.2

Can we estimate the return on

investment for companies that won

quality awards?

= 83 such companies. We want to

construct a 95% confidence interval

for the mean return, i.e. what is:

??

1.373

COMPUTE

Example 12.2

From the data, we calculate:

and so:

1.374

Check Requisite

Conditions

The Student t distribution is robust, which means

that if the population is nonnormal, the results of

the t-test and confidence interval estimate are still

valid provided that the population is not

extremely nonnormal.

the data and see how bell shaped the resulting

figure is. If a histogram is extremely skewed (say in

the case of an exponential distribution), that could

be considered extremely nonnormal and hence t-

statistics would be not be valid in this case.

1.375

Inference About Population

Variance

If we are interested in drawing inferences about a

populations variability, the parameter we need to

investigate is the population variance:

efficient point estimator for . Moreover,

distribution,

1.376

Testing & Estimating Population

Variance

Combining this statistic:

estimator

lower confidence for : upper confidence

limit limit

1.377

IDENTIFY

Example 12.3

Consider a container filling machine.

Management wants a machine to fill 1 liter

(1,000 ccs) so that that variance of the fills is

less than 1 cc2. A random sample of n=25 1 liter

fills were taken. Does the machine perform as it

should at the 5% significance level?

Variance is less than 1 cc2

H1 : <1

(so our null hypothesis becomes: H 0: = 1). We

will use this test statistic:

1.378

COMPUTE

Example 12.3

Since our alternative hypothesis is phrased as:

H1: <1

falls into this rejection region:

re

pa

s2=.8088

m

co

And thus our test statistic takes on this value

1.379

Example 12.4

As we saw, we cannot reject the null hypothesis

in favor of the alternative. That is, there is not

enough evidence to infer that the claim is true.

Note: the result does not say that the variance

is greater than 1, rather it merely states that

we are unable to show that the variance is

less than 1.

variance of the fills

1.380

COMPUTE

Example 12.4

In order to create a confidence interval

estimate of the variance, we need these

formulae:

lower confidence upper confidence

limit limit

calculation, and we have from Table 5 in

Appendix B:

1.381

Comparing Two

Populations

Previously we looked at techniques to

estimate and test parameters for one

population:

Population Mean , Population Variance

We will still consider these parameters when

we are looking at two populations,

however our interest will now be:

The difference between two means.

The ratio of two variances.

1.382

Difference of Two Means

In order to test and estimate the difference

between two population means, we

draw random samples from each of two

populations. Initially, we will consider

independent samples, that is, samples that

are completely unrelated to one another.

means, we use the statistic:

1.383

Sampling Distribution of

1. is normally distributed if the original

populations are normal or approximately normal

if the populations are nonnormal and the sample

sizes are large (n1, n2 > 30)

3. The variance of is

1.384

Making Inferences About

Since is normally distributed if the

original populations are normal or approximately

normal if the populations are nonnormal and the

sample sizes are large (n1, n2 > 30), then:

random variable. We could use this to build test

statistics or confidence interval estimators for

1.385

Making Inferences About

except that, in practice, the z statistic is rarely

used since the population variances are unknown.

??

for the unknown population variances: when we

believe they are equal and conversely when they

are not equal.

1.386

When are variances equal?

How do we know when the population

variances are equal?

unknown, we cant know for certain

whether theyre equal, but we can

examine the sample variances and

informally judge their relative values to

determine whether we can assume that

the population variances are equal or not.

1.387

Test Statistic for (equal

variances)

1) Calculate the pooled variance

estimator as

degrees of freedom

1.388

CI Estimator for (equal

variances)

The confidence interval estimator

for when the population

variances are equal is given by:

1.389

Test Statistic for (unequal

variances)

The test statistic for when

the population variances are

unequal is given by:

degrees of freedom

estimator is:

1.390

IDENTIFY

Example 13.2

Two methods are being tested for assembling

office chairs. Assembly times are recorded (25

times for each method). At a 5% significance

level, do the assembly times for the two

methods differ?

1.391

COMPUTE

Example 13.2

The assembly times for each of the

two methods are recorded and

preliminary data is prepared

the population variances are equal

1.392

COMPUTE

Example 13.2

Recall, we are doing a two-tailed test,

hence the rejection region will be:

is:

our rejection region) becomes:

1.393

COMPUTE

Example 13.2

In order to calculate our t-statistic,

we need to first calculate the pooled

variance estimator, followed by

the t-statistic

1.394

INTERPRET

Example 13.2

the rejection region, we cannot reject H 0 in favor

of H1, that is, there is not sufficient evidence to

infer that the mean assembly times differ.

1.395

INTERPRET

Example 13.2

Excel, of course, also provides us

with the information

Compare

or look at p-value

1.396

Confidence Interval

We can compute a 95% confidence interval estimate

for the difference in mean assembly times as:

two assembly methods between .36 and .96 minutes.

Note: zero is included in this confidence interval

1.397

Matched Pairs Experiment

Previously when comparing two populations,

we examined independent samples.

matched with an observation in a second

sample, this is called a matched pairs

experiment.

consider example 13.4

1.398

Identifying Factors

Factors that identify the t-test and

estimator of :

1.399

Inference about the ratio of two

variances

So far weve looked at comparing measures of central

location, namely the mean of two populations.

the ratio of the variances, i.e. the parameter of interest to

us is:

degrees of freedom.

1.400

Inference about the ratio of two

variances

Our null hypothesis is always:

H 0:

equal, hence their ratio will be one)

df1 = n1 - 1

df2 = n2 - 1

1.401

IDENTIFY

Example 13.6

In example 13.1, we looked at the variances of

the samples of people who consumed high fiber

cereal and those who did not and assumed

they were not equal. We can use the ideas just

developed to test if this is in fact the case.

(the variances are not equal to each other)

1.402

CALCULATE

Example 13.6

Since our research hypothesis is: H1:

We are doing a two-tailed test, and

our rejection region is:

1.403

CALCULATE

Example 13.6

Our test statistic is:

.58 1.61 F

Hence there is sufficient evidence to reject the null

hypothesis in favor of the alternative; that is, there is a

difference in the variance between the two populations.

1.404

INTERPRET

Example 13.6

We may need to work with the Excel

output before drawing conclusions

Our research hypothesis

H1:

requires two-tail testing,

but Excel only gives us values

for one-tail testing

value of

the test were conducting (i.e. 2 x 0.0004 = 0.0008). Refer to

the text and CD Appendices for more detail.

1.405

Show of Hands

Who is doing a study

that involves

statistical analysis of

data?

What type of

(quantitative) data are

you collecting?

Will there be enough

data to achieve

statistical

significance?

(adequate power vs.

pilot) If pilot:

Descriptive statistics

Chart/graph

9/14/2010 406

Types of data

Continuous

Equal

increments

Ordinal/Rank

In order but not

equal (Likert)

Categorical

Names

9/14/2010 407

Continuous Data

If comparing 2 groups

(treatment/control)

t-test

If comparing > 2 groups

ANOVA (F-test)

If measuring association between 2

variables

Pearson r correlation

If trying to predict an outcome

(crystal ball)

Regression or multiple regression

9/14/2010 408

Ordinal Data

Beyond the capability of Excel just FYI

If comparing 2 groups

Mann Whitney U (treatment vs. control)

Wilcoxon (matched pre vs. post)

If comparing > 2 groups

Kruskal-Wallis (median test)

If measuring association between 2

variables

Spearman rho ()

Likert-type scales are ordinal data

9/14/2010 409

Categorical Data

Called a test of frequency how

often something is observed (AKA:

Goodness of Fit Test, Test of

Homogeneity)

Chi-Square (2)

Examples of burning research

questions:

Do negative ads change how people

vote?

Is there a relationship between marital

9/14/2010status and health insurance coverage? 410

Words we use to describe

statistics

Mean ()

The arithmetic

average (add all of

the scores

together, then

divide by the

number of scores)

= x / n

9/14/2010 412

Median

The middle number

(just like the

median strip that

divides a highway

down the middle;

50/50)

Used when data is

not normally

distributed

Often hear about

the median price of

housing

9/14/2010 413

Mode

The most

frequently

occurring number

(score,

measurement,

value, cost)

On a frequency

distribution, its the

highest point (like

the la mode on

pie)

9/14/2010 414

Standard Deviation ()

99%

95%

9/14/2010 415

We Make Mistakes!

Alpha level p value

Set BEFORE we collect Calculated AFTER we

data, run statistics gather the data

Defines how much of The calculated

an error we are willing probability of a mistake

by saying it works

to make to say we

AKA: level of significance

made a difference

Describes the percent of

If were wrong, its an

the population/area

alpha error or Type 1 under the curve (in the

error tail) that is beyond our

statistic

9/14/2010 416

2-tailed Test

The critical value is

the number that

separates the blue

zone from the middle

( 1.96 this example)

In a t-test, in order to

be statistically

significant the t score

needs to be in the

blue zone

If = .05, then 2.5%

of the area is in each

tail

9/14/2010 417

1-tailed Test

either + or -, but

not both.

In this case, you

would have

statistical

significance (p < .

05) if t 1.645.

9/14/2010 418

Chi-Square ( ) 2

is a positive number

Therefore, area under

the curve starts at 0

and goes to infinity

To be statistically

significant, needs to

be in the upper 5% (

= .05)

Compares observed

frequency to what we

expected

9/14/2010 419

- Exercise Lesson 4,5,6Transféré parmahmud_a
- Statistics and Correlation Answer KeyTransféré parcharlottedwn
- Assignment 2 -QuesTransféré parShehreen Mahiuddin
- CBSE Class 11 Economics Sample Paper-03Transféré parcbsestudymaterials
- ETS Math Arithmetic QuestionsTransféré parA.Saboor
- Q1Transféré parANTBLE
- SMC Addmath Mocks 2010Transféré parYogaraajaa Seemaraja
- STAT assignment 2010Transféré parLinzi Jacobs
- Important TopicsTransféré parAnabia Chodry
- Kuliah 3Transféré parJHazsmadhiee Adhiee
- CAPMTransféré parMayank Anand
- Lecture 01Transféré parWei Cong
- mb 0050Transféré parAmitosh Kumar
- Asymptotic Data Analysis on ManifoldsTransféré parVishesh Karwa
- Technical ReportTransféré parAlexandru Musteata
- Measurement ScalesTransféré parAnitha Purushothaman
- bbs13e_chapterhuTransféré parKesanapalli Sivarama Krishna
- Chapter_11_Revised(1).pptTransféré parMeth
- Boas -Bodily Forms 1912.pdfTransféré parpablomin2
- lab2 (1)Transféré parnurliyanaamin
- Lesson Structure and Assignments (1)Transféré parΔώρα Γεωργίου
- SamplingTransféré parSadi The-Darkraven Kafi
- copy of 16Transféré parapi-240724606
- SSC 10th Maths Code a Solution UpdatedTransféré parSunil Singh
- Almquist_Ashir_Brannstroem_Guide_1.0.1Transféré parMahrukh Khan
- Descriptive sTransféré parSuganthi Supaiah
- Pedro Data Activity 1.docxTransféré parPedro Galvão
- Hm 00387525Transféré parakila d
- Probability Statistics Chapter2 Lesson5Transféré parShanly Coleen Orendain
- binfet 6Transféré parapi-330564239

- IT3004 - Operating Systems and Computer Security 01 - Concepts.pptTransféré parMangala Semage
- Css Quick GuideTransféré parMangala Semage
- 2018 ජනවාරිTransféré parMangala Semage
- css_textTransféré parMangala Semage
- IT3004 - Operating Systems and Computer Security 06 - Trusted Operating Systems.pptxTransféré parMangala Semage
- dbTransféré parMangala Semage
- Css TablesTransféré parMangala Semage
- Css ImagesTransféré parMangala Semage
- jul_sep09.pdfTransféré parMangala Semage
- Computer Security NoteTransféré parMangala Semage
- Css BackgroundsTransféré parMangala Semage
- 2016-g12-pyt-operaTransféré parMangala Semage
- 2016 July Em IctTransféré parMangala Semage
- digitalTransféré parMangala Semage
- dbTransféré parMangala Semage
- cs_paperTransféré parMangala Semage
- Operating Systems and Computer Security-Network Security.pptxTransféré parMangala Semage
- OS and CS-5-Ideny and Authentication.pptxTransféré parMangala Semage
- OS and CS-4-Common vulnerabilities.pptxTransféré parMangala Semage
- Operating Systems and Computer Security-introduction.pptxTransféré parMangala Semage
- IT3004 - Operating Systems and Computer Security 05 - General Purpose Operating Systems.pptxTransféré parMangala Semage
- IT3004 - Operating Systems and Computer Security 02 - Cryptography.pptxTransféré parMangala Semage
- IT3004 - Operating Systems and Computer Security 07 - Database and Data Mining Security.pptxTransféré parMangala Semage
- IT3004 - Operating Systems and Computer Security 08 - Security in NetworksTransféré parMaajith Marzook
- 6 - Hashing and AuthenticationTransféré parMangala Semage
- IT3004 - Operating Systems and Computer Security 03 - Program security.pptxTransféré parMangala Semage
- 6 - Hashing and Authentication.pptTransféré parMangala Semage
- 2-Basic Cryptography.pptTransféré parMangala Semage
- ch20.pptTransféré parAnonymous 1ioUBbN

- Clerkship JournalTransféré parmruma01
- BURJ DUBAI.docxTransféré parsumitchoudharymbm91
- Supply Chain Management LectureTransféré parmuhib
- 58928-105782-1-PBsTransféré parRafael Saldanha Lopes
- PhRMA Marketing Brochure Influences on Prescribing FINALTransféré parJalwaz Tihami
- 16_Behavior of CostsTransféré parjenicejoy
- LI_FriezeTransféré parHepicentar Niša
- delta Pacific Case StudyTransféré paravi
- EVALUATION OF PEGUERO-LO PRESTI CRITERIA FOR ASSESSMENT OF LEFT VENTRICULAR HYPERTROPHY.Transféré parIJAR Journal
- Semantic Web Based Sentiment EngineTransféré parJames Dellinger
- Effects of Media to ChildrenTransféré parvishcu
- An operational planning is a subset of strategic work planTransféré parsumathikl
- Admin Directive for CPE Reporting 02-01-2012Transféré pargautammittal
- Roles of Solar UVB and Vitamin D in Reducing Cancer Risk and Increasing SurvivalTransféré pardear5643
- Ben & Jerry Marketing Research Question a IreTransféré pardineshnithiya
- Bertrand SchneiderTransféré pardevonchild
- Elements of Volatility at High FrequencyTransféré parCervino Institute
- C. Critique of Ax ToolTransféré parJosie Evangelista
- HIV DisertasiTransféré parachmadnashir
- Gyanodya at Aditya Birla GroupTransféré parRajan Kalal
- Electrotherapy for pain relief: does it work? A laboratory-based study to examine the analgesic effects of electrotherapy on cold-induced pain in healthy individualsTransféré parHéctor Knno Gómez
- Internship Program PolicyTransféré parm_ganea268973
- RIGTH-2Transféré parrichard sossa
- Research in Gender EqualityTransféré parZachée Wayag
- Teacher as Curriculum Leader.pdfTransféré parRachelleTapz
- Early Cinema Production RubricTransféré parfilmteacher
- Advance Anthropometry for Health DiagnoseTransféré parreinzig1
- Science Investigatory Project Format (1)Transféré parJasperOndap
- ISO 17025 StandardTransféré paratomicbrent
- Candesic HealthcareTransféré pargerald_templer

## Bien plus que des documents.

Découvrez tout ce que Scribd a à offrir, dont les livres et les livres audio des principaux éditeurs.

Annulez à tout moment.