Vous êtes sur la page 1sur 37

Descriptive statistic

Dr. Ayaz Muhammad Khan

Statistics
Many studies generate large numbers of data points, and to make sense of all that data, researchers use statistics that summarize the data, providing a better understanding of overall tendencies within the distributions of scores.

Reasons for using statistics


aids in summarizing the results helps us recognize underlying trends and tendencies in the data aids in communicating the results to others

Types of statistics
Types of statistics:
1. descriptive (which summarize some characteristic of a sample)
Measures of central tendency Measures of variability Measures of skewness

2. inferential (which test for significant differences between groups and/or significant relationships among variables within the sample
t-ratio, chi-square, beta-value

Descriptive statistic
Descriptive statistics is a series of procedures designed to illuminate the data, so that its principal characteristics and main features are revealed. This may mean sorting the data by size; perhaps putting it into a table, maybe presenting it in an appropriate chart, or summarising it numerically.

Descriptive Statistics is a tool or technique that is used to describe and organize the characteristics of a collection of information or data. The collection is called a data set or just data.

Why use it?


Descriptive Statistics is used in research to answer five basic questions based on five key concepts.

Click for key concepts.

Concept: Finding middle scores


Question: What is the middle set of scores for this data set?
Click for key concepts.

Concept: Finding the spread of scores

Question: How spread out are the scores of this data set?
Click for key concepts.

Concept: Finding the rank of scores

Question: How does a particular score compare to the rest of the set of scores for this data set?

Click for key concepts.

Concept: Finding relationships between variables


We all love this game! All of our children are playing!

Question: How are different variables related in this data set?

Key terms
Central Tendency measures. They are
computed to give a center around which the measurements in the data are distributed. describe data spread or how far away the measurements are from the center.

Variation or Variability measures. They

Relative Standing measures. They describe


the relative position of specific measurements in the data.

Measures of Central Tendency

Mean Median Mode

The Mean (average value)


sum of all the scores divided by the number of scores.

a good measure of central tendency for roughly symmetric distributions


can be misleading in skewed distributions since it can be greatly influenced by extreme scores in which case other statistics such as the median may be more informative

formula

m = SX/N (population)

X = Sxi/n (sample)

where m/X is the population/sample mean

Example of Mean
MEAN = 40/10 = 4

Measurements x
3 5 5 1 7 2 6 7 0 4 40

Deviation x - mean
-1 1 1 -3 3 -2 2 3 -4 0 0

Notice that the sum of the deviations is 0. Notice that every single observation intervenes in the computation of the mean.

The mean
Features: 1. One advantage of the mean over the median is that it uses all of the information in the data set. 2. it is affected by skewness in the distribution, and by the presence of outliers in the data. 3. it cannot be used with ordinal data.

The median
The median the data is sorted from the lowest to the highest ,the middle value is the median,half of the values will be equal to or less than the median value, and half equal to or above it.

Exercise
The following is 11 rats survival days: 4107503152913>60>60 Questions:the average survival days?

Day: Rank:

2 3 4 7 9 10 13 15 50>60>60 1 2 3 4 5 6 7 8 9 10 11

The median
Features: 1. the median is that it is not much affected by skewness in the distribution, or by the presence of outliers. 2. it discards a lot of information, because it ignores most of the values, apart from those in the centre of the distribution.

Normal Distributions
Curve is basically bell shaped from - to

symmetric with scores concentrated in the middle (i.e. on the mean) than in the tails.
Mean, medium and mode coincide

They differ in how spread out they are.


The area under each curve is 1. The height of a normal distribution can be specified mathematically in terms of two parameters: the mean (m) and the standard deviation (s).

Mode
The most frequently occurring score Look at the simple frequency of each score Report mode when using nominal scale, the most frequently occurring category If you have a rectangular distribution do not report the mode

Features:
1. the mode is a measure of common-ness or typical-ness. 2. The mode is not particularly useful with metric continuous data where no two values may be the same

Example of Mode
In this case the data have tow modes: 5 and 7 Both measurements are repeated twice

Measurements x 3 5 5 1 7 2 6 7 0 4

Measures of Variability
Range Variance Standard deviation

Range
Distance between the highest and lowest scores in a distribution;
sensitive to extreme scores; Can compensate by calculating interquartile range (distance between the 25th and 75th percentile points) which represents the range of scores for the middle half of a distribution

Usually used in combination with other measures of dispersion.

range
unit 1 unit 2 9.7 9.0 11.5 11.2 11.6 11.3 12.1 11.7 12.4 12.2 12.6 12.5 13.1 13.2 13.5 13.8 13.6 14.0 14.8 15.5 16.3 15.6 26.9 16.2 16.4
unit 1 * 9 | | | | | | | | | | 3 | | 8 | 651 | 641 | 65 | * 7 unit 2 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9

24 56 0 28 25 237 0

* | | | | | | *

R: range(x)

would be better to use midspread

Variance
The difference between an observed value and the
mean is called the deviation from the mean The variance is the mean squared deviation from the mean

i.e. you subtract each value from the mean, square each result and then take the average. Because it is squared it can never be negative
2/n s2 = S(xx ) i

Standard Deviation (SD)


A summary statistic of how much scores vary from the mean Square root of the Variance
expressed in the original units of measurement Represents the average amount of dispersion in a sample Used in a number of inferential statistics

The standard deviation


What is a standard deviation (in English)?

the mean of deviations from the mean (sort of)

What is:

(lowercase sigma) is the population standard deviation. the sample standard deviation (s-hat) is the sample estimate of

The deviation (definitional) formula for the population standard deviation

X X
N

The larger the standard deviation the more variability there is in the scores The standard deviation is somewhat less sensitive to extreme outliers than the range (as N increases)

x- x

the difference deviates from the mean

Estimating the population standard deviation from a sample


S, the sample standard, is usually a little smaller than the population standard deviation. Why?
The sample mean minimizes the sum of squared deviations (SS). Therefore, if the sample mean differs at all from the population mean, then the SS from the sample will be an understimate of the SS from the population

Therefore, statisticians alter the formula of the sample standard deviation by subtracting 1 from N

The population variance could be interpreted as the average squared difference from the population mean, and the sample variance has almost the same interpretation about the sample mean.

feature
the variance and the standard deviation are shown to be the most appropriate measures of variation when the data come from a symmetric distribution,used to describe the spread tendency of the numeric variable.

Skewness of distributions
Measures look at how lopsided distributions arehow far from the ideal of the normal curve they are When the median and the mean are different, the distribution is skewed. The greater the difference, the greater the skew. Distributions that trail away to the left are negatively skewed and those that trail away to the right are positively skewed If the skewness is extreme, the researcher should either transform the data to make them better resemble a normal curve or else use a different set of statistics nonparametric statisticsto carry out the analysis

Positive and negative Skewness

Mean Median Mode

Mode Median Mean

Percentiles
The p-the percentile is a number such that at most p% of the measurements are below it and at most 100 p percent of the data are above it. Example, if in a certain data the 85th percentile is 340 means that 15% of the measurements in the data are above 340. It also means that 85% of the measurements are below 340 Notice that the median is the 50th percentile

So
Descriptive statistics are used to summarize data from individual respondents, etc.
They help to make sense of large numbers of individual responses, to communicate the essence of those responses to others

They focus on typical or average scores, the dispersion of scores over the available responses, and the shape of the response curve

Vous aimerez peut-être aussi