Vous êtes sur la page 1sur 106

Statistics

1st edition Md. Khaledur Rahman Bhuiyan,B.pharm(Running) Student Department of Pharmacy University of Asia Pacific Dhaka-1209,Bangladesh This is my first publication,so if there any problem then take it easily and contact with us. E-mail ID:bhuiyankhaled@gmail.com bhuiyankhaled@ymail.com Like us in facebook: http://www.facebook.com/pages/ThePharmacist/330151363706425

To My father for his uncompromising principles that guided my life. My mother for leading her children into intellectual. To My teachers: Abdul Mannan(Shere bangla nagar govt boys high school) Debabrata Kumar Sen(Mohhammadpur model college) Md. Tariqur Rahman(BCS 29th Batch,Police) Sahadat Bin Sayed(University of asia pacific) To Shere bangla nagar govt boys high school Mohhammadpur model college University of asia pacific

Md.Khaledur Rahman Bhuiyan

It is not a book which is written by me.It is a collection of math which will help the student to understand statistics. I would like to inform it is a short collection of statistical problem which will help especially the pharmacy student. I wish to express my thanks to many other persons who have helped in preparing this book, including my teacher Sahadat bin sayed in the department of pharmacy at the university of asia pacific who provided all this valuable solution. I am also grateful to Kaji nusrat jahan and Md.Rejaul karim and Md. Aktarujjaman khan for their excellent secretarial services.

BAR GRAPH What it is:


A bar graph is a chart that uses either horizontal or vertical bars to show comparisons among categories. One axis of the chart shows the specific categories being compared, and the other axis represents a discrete value. Some bar graphs present bars clustered in groups of more than one (grouped bar graphs), and others show the bars divided into subparts to show cumulate effect (stacked bar graphs).

How to use it:


Determine the discrete range. Examine your data to find the bar with the largest value. This will help you determine the range of the vertical axis and the size of each increment. Then label the vertical axis. Determine the number of bars. Examine your data to find how many bars your chart will contain. These may be single, grouped, or stacked bars. Use this number to draw and label the horizontal axis. Determine the order of the bars. Bars may be arranged in any order. (A bar chart arranged from highest to lowest incidence is called a Pareto chart.) Normally, bars showing frequency will be arranged in chronological (time) sequence. Draw the bars. If you are preparing a grouped bar graph, remember to present the information in the same order in each grouping. If you are preparing a stacked bar graph, present the information in the same sequence on each bar. Label and title the graph.

Bar Graph Examples


Vertical Bar Graph
Error Rate 14% 12% 10% 8% 6% 4% 2% 0% 1990 1991 1992 1993 1994 1995 1996

Horizontal Bar Graph


1996 1995 1994 1993 1992 1991 1990 0% 5% 10% 15%

Error Rate

Grouped Bar Graph


90 80 70 60 50 40 30 20 10 0 1st Qtr 2nd Qtr 3rd Qtr 4th Qtr East West North

Stacked Bar Graph


100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0%

North West East

1st Qtr

2nd Qtr

3rd Qtr

4th Qtr

Histogram

Histogram
What is it? A histogram is a bar graph representing the frequency of individual occurrences or classes of data. A histogram shows basic information about the data set, such as central location (mean, median, and mode), width of spread (range or standard deviation), and the shape. The purpose of making a histogram is to gain knowledge about the system. This knowledge, gained from the basic information given by the histogram (central location, spread, and shape), will act as a guide to improve the system. From a stable system, predictions can be made about the future performance of the system. If the system were unstable, it would change from time to time and the histogram would have little predictive value. The group uses a histogram to assess the systems current situation and to study results. The histograms shape and statistical information help us know how to improve the system. After an improvement action is carried out, the group continues to collect data and make histograms to see if the theory has worked.

106

PQ Systems, Inc. Health Care

Histogram

What does it look like? A completed histogram is shown below. An outpatient clinic patient health educator constructed this histogram using data from the X X-R chart for the Adult Asthmatic Patient Respiratory Capability. The X-R X chart showed the system to be unstable. The patient and care provider successfully identified special cause in the last four subgroups (the patient was out of town and forgot to take medications). Deleting the four subgroups occurring due to special cause, the educator used the remaining 23 subgroups to make this histogram. (See Step 9 in X X-R for the stable control chart using the 23 subgroups.)

24 22 20 18 16 14 12 10 8 6 4 2 0 420

Vertical Axis

F R E Q U E N C Y

434

448

462

476

490

Adult Asthma Resp. Cap. On Steroid Inhalants

Class
PQ Systems, Inc. Health Care

Horizontal Axis
107

Histogram

When is it used? Use a histogram when you can answer yes to both these questions: 1. Do you have a data set of related values, either attributes (counts) or variables data (measurement)? For analyzing system performance, single readings or individual data points are of limited value. Much more can be learned from a group of data points because they reflect the systems variation. Using a histogram is one way to start learning from a group of data points. Is it important to visualize central location, shape, and spread of the data? When it comes to data analysis, a picture is worth a thousand words. Seeing the form of the data makes it easier to understand the kind or pattern of variation the system is producing. How is it made? These steps assume that the data for the construction of the histogram has already been collected. The data can be collected especially to make a histogram or can come from the data entry section of a control chart. Once you have collected data for a control chart, that same data could be used to make a histogram. The data entry section of the control chart used for the example histogram is shown below.

2.

VARIABLES CONTROL CHART X


Product / Service User Name DATE
MEASUREMENT

R CHART

Quality Measure

Chart No. Specification Limits N/A Unit of Measure 6-4 6-5 6-6

Asthma Care
Location Home 5-18 5-19 5-20 5-21 5-22 5-23

Process Respiratory Process Measurement Device Home Spirometer 5-24 5-25 5-26 5-27 5-28 5-29 5-30 5-31 6-1 6-2 6-3

SKM
5-17 1 430 460 450 2 420 480 470 3 440 470 470 4 5 1290 1410 1390
470

MOSM
6-7 6-8 6-9

SUM AVERAGE, X 430 RANGE, R 20 NOTES

SAMPLE

475 440 480 420 480 450 430 470 475 480 500 450 465 460 445 430 450 500 420 420 430 470 450 450 460 480 470 450 445 480 450 450 430 470 470 450 450 470 440 440 430 420 485 460 465 430 470 465 440 440 470 470 470 430 480 485 430 430 470 430 450 455 385

1430 1350 1395 1310 1430 1385 1320 1355 1425 1400 1540 1310 1415

1415

1325 1310 1390 1370 1310 1305 1235


436.7 463.3 456.7 436.7 435.0 411.7

463.3 476.7 450.0 465.0 436.7 476.7 461.7 440.0 451.7 475.0 466.7 473.3 436.7 471.7 471.7 441.7

20 20

15

20 30

40

10

20

20 30

10

30 50

20 15

25

20

20 20

70

30

35

45

1. Select the classes.

a. Determine the number of classes. To find the number of classes (or subdivisions) needed for the histogram, first count the number of data points in the data set. Then use the following table to choose the number of classes. As the table indicates, it is best to use no fewer than 5 classes (or subdivisions) or more than 20.

108

PQ Systems, Inc. Health Care

Histogram

No. of Data Points Under 50 50 - 100 100 - 250 Over 250

No. of Classes 5-7 6 - 10 7 - 12 10 - 20

There are 69 data points in the example, 23 subgroups of 3 observations each This table indicates between six and ten classes should be used for this many data points. Choose 6 for the example. The choice of the number of classes you want to use is only a rough estimate at this point. You can decide later to use more or fewer classes. b. Determine the class width and boundaries. The width of the class determines the range of data points in each class. Find the class width by dividing the range of the data set by the number of classes (found in Step a). The range is found by subtracting the smallest value in the data set from the largest. Range = X highest - X lowest In this example, the highest value in the data set is 500 and the lowest is 420. So the range is: Range = 500 - 420 = 80 The class width for the example is:

Class Width = No. of classes


80 = 6 = 13.3 = 14

range of data set

Round the class width to an easy number to work with. In the example, we rounded 13.33 to 14. Next, select a starting number for the lower boundary of the first class. The lower boundary should be chosen so the lowest value in the data set is included in the first class. A convenient lower boundary for the example is 420, since the lowest value in the data set is 420.
PQ Systems, Inc. Health Care

109

Histogram

To determine the lower boundaries for the remaining classes, begin with the lower boundary of the first class and add the class width. Continue adding class width until the number of classes is complete and all the data has been included. The lower class boundaries for this example are: 420 + 14 = 434 434 + 14 = 448 448 + 14 = 462 462 + 14 = 476 476 + 14 = 490 490 + 14 = 504 In some cases, an extra class may need to be added so the highest data point will be included. The upper boundary for each class is any number under or below the lower class boundary of the next class. For example, the upper class boundary for the first class is under 434. This means that any number greater than or equal to 420 but less than 434 falls into the first class. This is done so that no point will fall on the boundary between two classes. The classes for the example are: 420 to under 434 434 to under 448 448 to under 462 462 to under 476 476 to under 490 490 to under 504

110

PQ Systems, Inc. Health Care

Histogram

2. Record the data.

The easiest way to record the data is to create a check sheet listing the classes along the left side with space to the right to make tally marks. To record the data, make a tally mark beside the class in which each data point falls. Total the number of marks in each class. Shown below is the completed check sheet for the example.

CLASSES 420 UNDER 434 434 UNDER 448 448 UNDER 462 462 UNDER 476 476 UNDER 490 490 UNDER 504 |||| |||| ||| |||| ||||

TALLY

TOTAL 13 9 17 19 9

|||| |||| |||| || |||| |||| |||| |||| |||| |||| ||

PQ Systems, Inc. Health Care

111

Basic Tools for Process Im provem ent

What is a Histogram?
A Histogram is a vertical bar chart that depicts the distribution of a set of data. Unlike Run Charts or Control Charts, which are discussed in other modules, a Histogram does not reflect process performance over time. It's helpful to think of a Histogram as being like a snapshot, while a Run Chart or Control Chart is more like a movie (Viewgraph 1).

When should we use a Histogram?


When you are unsure what to do with a large set of measurements presented in a table, you can use a Histogram to organize and display the data in a more userfriendly format. A Histogram will make it easy to see where the majority of values falls in a measurement scale, and how much variation there is. It is helpful to construct a Histogram when you want to do the following (Viewgraph 2): ! Sum m arize large data sets graphically. When you look at Viewgraph 6, you can see that a set of data presented in a table isnt easy to use. You can make it much easier to understand by summarizing it on a tally sheet (Viewgraph 7) and organizing it into a Histogram (Viewgraph 12). ! Com pare process results with specification lim its. If you add the process specification limits to your Histogram, you can determine quickly whether the current process was able to produce "good" products. Specification limits may take the form of length, weight, density, quantity of materials to be delivered, or whatever is important for the product of a given process. Viewgraph 14 shows a Histogram on which the specification limits, or "goalposts," have been superimposed. Well look more closely at the implications of specification limits when we discuss Histogram interpretation later in this module. ! Com m unicate inform ation graphically. The team members can easily see the values which occur most frequently. When you use a Histogram to summarize large data sets, or to compare measurements to specification limits, you are employing a powerful tool for communicating information. ! Use a tool to assist in decision m aking. As you will see as we move along through this module, certain shapes, sizes, and the spread of data have meanings that can help you in investigating problems and making decisions. But always bear in mind that if the data you have in hand arent recent, or you dont know how the data were collected, its a waste of time trying to chart them. Measurements cannot be used for making decisions or predictions when they were produced by a process that is different from the current one, or were collected under unknown conditions.

HISTOGRAM

Basic Tools for Process Im provem ent

What Is a Histogram?
100 80 60 40 20 0 0 5 10 15 20 25 30 35 40 45 50 55 60

A bar graph that shows the distribution of data A snapshot of data taken from a process

HISTOGRAM

VIEWGRAPH 1

When Are Histograms Used?


Summarize large data sets graphically Compare measurements to specifications Communicate information to the team Assist in decision making

HISTOGRAM

VIEWGRAPH 2

HISTOGRAM

Basic Tools for Process Im provem ent

What are the parts of a Histogram?


As you can see in Viewgraph 3, a Histogram is made up of five parts: 1. Title: The title briefly describes the information that is contained in the Histogram. 2. Horizontal or X-Axis: The horizontal or X-axis shows you the scale of values into which the measurements fit. These measurements are generally grouped into intervals to help you summarize large data sets. Individual data points are not displayed. 3. Bars: The bars have two important characteristicsheight and width. The height represents the number of times the values within an interval occurred. The width represents the length of the interval covered by the bar. It is the same for all bars. 4. Vertical or Y-Axis: The vertical or Y-axis is the scale that shows you the number of times the values within an interval occurred. The number of times is also referred to as "frequency." 5. Legend: The legend provides additional information that documents where the data came from and how the measurements were gathered.

HISTOGRAM

Basic Tools for Process Im provem ent

Parts of a Histogram
DAYS OF OPERATION PRIOR TO FAILURE FOR AN HF RECEIVER
F R E Q U E N C Y
100 80 60

3
40 20 0 0 5 10 15 20 25 30 35 40 45 50 55 60

DAYS OF OPERATION MEAN TIME BETWEEN FAILURE (IN DAYS) FOR R-1051 HF RECEIVER Data taken at SIMA, Pearl Harbor, 15 May - 15 July 94

1 Title 3 Bars 5 Legend

2 Horizontal / X-axis 4 Vertical / Y-axis


VIEWGRAPH 3

HISTOGRAM

HISTOGRAM

Basic Tools for Process Im provem ent

How is a Histogram constructed?


There are many different ways to organize data and build Histograms. You can safely use any of them as long as you follow the basic rules. In this module, we will use the nine-step approach (Viewgraphs 4 and 5) described on the following pages. EXAMPLE: The following scenario will be used as an example to provide data as we go through the process of building a Histogram step by step: During sea trials, a ship conducted test firings of its MK 75, 76mm gun. The ship fired 135 rounds at a target. An airborne spotter provided accurate rake data to assess the fall of shot both long and short of the target. The ship computed what constituted a hit for the test firing as: From 60 yards short of the target To 300 yards beyond the target

HISTOGRAM

Basic Tools for Process Im provem ent

Constructing a Histogram
Step 1 - Count number of data points Step 2 - Summarize on a tally sheet Step 3 - Compute the range Step 4 - Determine number of intervals Step 5 - Compute interval width
HISTOGRAM VIEWGRAPH 4

Constructing a Histogram
Step 6 - Determine interval starting points Step 7 - Count number of points in each interval Step 8 - Plot the data Step 9 - Add title and legend
HISTOGRAM VIEWGRAPH 5

HISTOGRAM

Basic Tools for Process Im provem ent

Step 1 - Count the total num ber of data points you have listed. Suppose your team collected data on the miss distance for the gunnery exercise described in the example. The data you collected was for the fall of shot both long and short of the target. The data are displayed in Viewgraph 6. Simply counting the total number of entries in the data set completes this step. In this example, there are 135 data points. Step 2 - Sum m arize your data on a tally sheet. You need to summarize your data to make it easy to interpret. You can do this by constructing a tally sheet. First, identify all the different values found in Viewgraph 6 (-160, -010. . .030, 220, etc.). Organize these values from smallest to largest (-180, -120. . .380, 410). Then, make a tally mark next to the value every time that value is present in the data set. Alternatively, simply count the number of times each value is present in the data set and enter that number next to the value, as shown in Viewgraph 7. This tally helped us organize 135 mixed numbers into a ranked sequence of 51 values. Moreover, we can see very easily the number of times that each value appeared in the data set. This data can be summarized even further by forming intervals of values.

HISTOGRAM

Basic Tools for Process Im provem ent

How to Construct a Histogram Step 1 - Count the total number of data points
Number of yards long (+ data) and yards short (- data) that a gun crew missed its target. -180 - 10 -130 260 160 210 50 140 210 -30 300 110 260 110 30 30 220 190 180 40 20 220 130 80 260 -30 70 130 190 60 170 -100 240 70 30 - 40 350 270 20 50 100 120 380 230 130 150 260 - 70 280 290 250 320 40 240 140 30 330 90 - 50 210 - 20 250 410 90 - 20 30 - 20 180 80 70 140 120 - 80 140 - 80 360 70 100 230 240 250 50 190 160 10 180 -130 30 120 - 10 - 30 180 120 310 130 100 270 50 100 130 80 - 60 20 340 130 100 40 200 270 10 250 110 150 240 - 30 130 20 - 30 20 200 280 140 - 90 180 200 370 130 200 170 80 210 70 190 60 80

TOTAL = 135
HISTOGRAM VIEWGRAPH 6

How to Construct a Histogram Step 2 - Summarize the data on a tally sheet


DATA - 180 - 130 - 100 - 90 - 80 - 70 - 60 - 50 - 40 - 30 TALLY 1 2 1 1 2 1 1 1 1 5 DATA - 20 - 10 10 20 30 40 50 60 70 80 TALLY 3 2 2 5 6 3 4 2 5 5 DATA 90 100 110 120 130 140 150 160 170 180 TALLY 2 5 3 4 8 5 2 2 2 5 DATA 190 200 210 220 230 240 250 260 270 280 TALLY 4 4 4 2 2 4 4 4 3 2 DATA 290 300 310 320 330 340 350 360 370 380 410 TALLY 1 1 1 1 1 1 1 1 1 1 1

HISTOGRAM

VIEWGRAPH 7

HISTOGRAM

Basic Tools for Process Im provem ent

Step 3 - Com pute the range for the data set. Compute the range by subtracting the smallest value in the data set from the largest value. The range represents the extent of the measurement scale covered by the data; it is always a positive number. The range for the data in Viewgraph 8 is 590 yards. This number is obtained by subtracting -180 from +410. The mathematical operation broken down in Viewgraph 8 is: +410 - (-180) = 410 + 180 = 590 Remember that when you subtract a negative (-) number from another number it becomes a positive number. Step 4 - Determ ine the num ber of intervals required. The number of intervals influences the pattern, shape, or spread of your Histogram. Use the following table (Viewgraph 9) to determine how many intervals (or bars on the bar graph) you should use. If you have this many data points: Less than 50 50 to 99 100 to 250 More than 250 Use this number of intervals: 5 to 7 6 to 10 7 to 12 10 to 20

For this example, 10 has been chosen as an appropriate number of intervals.

10

HISTOGRAM

Basic Tools for Process Im provem ent

How to Construct a Histogram Step 3 - Compute the range for the data set

Largest value

= + 410 yards past target

Smallest value = - 180 yards short of target Range of values = 590 yards Calculation: + 410 - (- 180) = 410 + 180 = 590
HISTOGRAM VIEWGRAPH 8

How to Construct a Histogram Step 4 - Determine the number of intervals required


IF YOU HAVE THIS MANY DATA POINTS
Less than 50 50 to 99 100 to 250 More than 250

USE THIS NUMBER OF INTERVALS:


5 to 7 intervals

6 to 10 intervals 7 to 12 intervals 10 to 20 intervals

HISTOGRAM

VIEWGRAPH 9

HISTOGRAM

11

Basic Tools for Process Im provem ent

Step 5 - Com pute the interval width. To compute the interval width (Viewgraph 10), divide the range (590) by the number of intervals (10). When computing the interval width, you should round the data up to the next higher whole number to come up with values that are convenient to use. For example, if the range of data is 17, and you have decided to use 9 intervals, then your interval width is 1.88. You can round this up to 2. In this example, you divide 590 yards by 10 intervals, which gives an interval width of 59. This means that the length of every interval is going to be 59 yards. To facilitate later calculations, it is best to round off the value representing the width of the intervals. In this case, we will use 60, rather than 59, as the interval width. Step 6 - Determ ine the starting point for each interval. Use the smallest data point in your measurements as the starting point of the first interval. The starting point for the second interval is the sum of the smallest data point and the interval width. For example, if the smallest data point is -180, and the interval width is 60, the starting point for the second interval is -120. Follow this procedure (Viewgraph 11) to determine all of the starting points (-180 + 60 = -120; -120 + 60 = -60; etc.). Step 7 - Count the num ber of points that fall within each interval. These are the data points that are equal to or greater than the starting value and less than the ending value (also illustrated in Viewgraph 11). For example, if the first interval begins with -180 and ends with -120, all data points that are equal to or greater than -180, but still less than -120, will be counted in the first interval. Keep in mind that EACH DATA POINT can appear in only one interval.

12

HISTOGRAM

Basic Tools for Process Im provem ent

How to Construct a Histogram Step 5 - Compute the interval width


Interval Width Range = Number of Intervals = 10 Use 10 for the Use 10 for the number of intervals number of intervals Round up to 60 590 = 59

HISTOGRAM

VIEWGRAPH 10

How to Construct a Histogram


Step 6 - Determine the starting point of each interval Step 7 - Count the number of points in each interval
INTERVAL STARTING NUMBER VALUE
1 2 3 4 5 6 7 8 9 10 -180 -120 -060 000 060 120 180 240 300 360

INTERVAL WIDTH
60 60 60 60 60 60 60 60 60 60

ENDING VALUE
-120 -060 000 060 120 180 240 300 360 420

NUMBER OF COUNTS
3 5 13 20 22 24 20 18 6 4

Equal to or greater than the STARTING VALUE HISTOGRAM

But less than the ENDING VALUE VIEWGRAPH 11

HISTOGRAM

13

Basic Tools for Process Im provem ent

Step 8 - Plot the data. A more precise and refined picture comes into view once you plot your data (Viewgraph 12). You bring all of the previous steps together when you construct the graph. ! The horizontal scale across the bottom of the graph contains the intervals that were calculated previously. ! The vertical scale contains the count or frequency of observations within each of the intervals. ! A bar is drawn for the height of each interval. The bars look like columns. ! The height is determined by the number of observations or percentage of the total observations for each of the intervals. ! The Histogram may not be perfectly symmetrical. Variations will occur. Ask yourself whether the picture is reasonable and logical, but be careful not to let your preconceived ideas influence your decisions unfairly. Step 9 - Add the title and legend. A title and a legend provide the who, what, when, where, and why (also illustrated in Viewgraph 12) that are important for understanding and interpreting the data. This additional information documents the nature of the data, where it came from, and when it was collected. The legend may include such things as the sample size, the dates and times involved, who collected the data, and identifiable equipment or work groups. It is important to include any information that helps clarify what the data describes.

14

HISTOGRAM

Basic Tools for Process Im provem ent

How to Construct a Histogram Step 8 - Plot the data Step 9 - Add the title and legend
MISS DISTANCE FOR MK 75 GUN TEST FIRING
S H O T C O U N T
25 20 15 10 5 0

MISSES

HITS MISSES

-180

-120

-060

000

060

120

180

240

300

360

420

YARDS SHORT

YARDS LONG

TARGET
LEGEND: USS CROMMELIN (FFG-37), PACIFIC MISSILE FIRING RANGE, 135 BL&P ROUNDS/MOUNT 31, 25 JUNE 94

HISTOGRAM

VIEWGRAPH 12

HISTOGRAM

15

Basic Tools for Process Im provem ent

How do we interpret a Histogram?


A Histogram provides a visual representation so you can see where most of the measurements are located and how spread out they are. Your Histogram might show any of the following conditions (Viewgraph 13): ! Most of the data were on target, with very little variation from it, as in Viewgraph 13A. ! Although some data were on target, many others were dispersed away from the target, as in Viewgraph 13B. ! Even when most of the data were close together, they were located off the target by a significant amount, as in Viewgraph 13C. ! The data were off target and widely dispersed, as in Viewgraph 13D. This information helps you see how well the process performed and how consistent it was. You may be thinking, "So what? How will this help me do my job better?" Well, with the results of the process clearly depicted, we can find the answer to a vital question: Did the process produce goods and services which are within specification limits? Looking at the Histogram, you can see, not only whether you were within specification limits, but also how close to the target you were (Viewgraph 14).

16

HISTOGRAM

Basic Tools for Process Im provem ent

Interpreting Histograms

Location and Spread of Data


A B

Target

Target

Target
HISTOGRAM

Target
VIEWGRAPH 13

Interpreting Histograms

Is Process Within Specification Limits?


WITHIN LIMITS OUT OF SPEC

LSL

Target

USL

LSL

Target

USL

LSL = Lower specification limit USL = Upper specification limit


HISTOGRAM VIEWGRAPH 14

HISTOGRAM

17

Basic Tools for Process Im provem ent

Portraying your data in a Histogram enables you to check rapidly on the number, or the percentage, of defects produced during the time you collected data. But unless you know whether the process was stable (Viewgraph 15), you wont be able to predict whether future products will be within specification limits or determine a course of action to ensure that they are. A Histogram can show you whether or not your process is producing products or services that are within specification limits. To discover whether the process is stable, and to predict whether it can continue to produce within spec limits, you need to use a Control Chart (see the Control Chart module). Only after you have discovered whether your process is in or out of control can you determine an appropriate course of actionto eliminate special causes of variation, or to make fundamental changes to your process. There are times when a Histogram may look unusual to you. It might have more than one peak, be discontinued, or be skewed, with one tail longer than the other, as shown in Viewgraph 16. In these circumstances, the people involved in the process should ask themselves whether it really is unusual. The Histogram may not be symmetrical, but you may find out that it should look the way it does. On the other hand, the shape may show you that something is wrong, that data from several sources were mixed, for example, or different measurement devices were used, or operational definitions weren't applied. What is really important here is to avoid jumping to conclusions without properly examining the alternatives.

18

HISTOGRAM

Basic Tools for Process Im provem ent

Interpreting Histograms

Process Variation
Day 1 Day 2

Target Day 3 Day 4

Target

Target
HISTOGRAM

Target
VIEWGRAPH 15

Interpreting Histograms

Common Histogram Shapes

Skewed (not symmetrical) Discontinued

Symmetrical (mirror imaged)


HISTOGRAM VIEWGRAPH 16

HISTOGRAM

19

Basic Tools for Process Im provem ent

How can we practice what we've learned?


Two exercises are provided that will take you through the nine steps for developing a Histogram. On the four pages that follow the scenario for Exercise 1 you will find a set of blank worksheets (Viewgraphs 17 through 23) to use in working through both of the exercises in this module. You will find a set of answer keys for Exercise 1 after the blank worksheets, and for Exercise 2 after the description of its scenario. These answer keys represent only one possible set of answers. It's all right for you to choose an interval width or a number of intervals that is different from those used in the answer keys. Even though the shape of your Histogram may vary somewhat from the answer key's shape, it should be reasonably close unless you used a very different number of intervals. EXERCISE 1: The source of data for the first exercise is the following scenario. A list of the data collected follows this description. Use the blank worksheets in Viewgraphs 17 through 23 to do this exercise. You will find answer keys in Viewgraphs 24 through 30. Your corpsman is responsible for the semiannual Physical Readiness Test (PRT) screening for percent body fat. Prior to one PRT, the corpsman recorded the percent of body fat for the 80 personnel assigned to the command. These are the data collected: PERCENT BODY FAT RECORDED 11 4 8 23 24 11 14 17 22 14 11 12 10 20 20 11 15 11 23 10 16 15 11 15 7 16 14 16 18 13 19 11 13 18 16 17 22 9 10 15 20 32 10 24 15 18 17 16 25 10 5 11 13 22 15 12 12 16 21 20 19 16 12 28 16 17 26 9 15 18 17 14 19 10 10 13 24 9 11 13

20

HISTOGRAM

Basic Tools for Process Im provem ent

WORKSHEET Step 1 - Count the number of data points

TOTAL NUMBER =
HISTOGRAM VIEWGRAPH 17

WORKSHEET Step 2 - Summarize the data on a tally sheet


VALUE TALLY VALUE TALLY VALUE TALLY VALUE TALLY VALUE TALLY

HISTOGRAM

VIEWGRAPH 18

HISTOGRAM

21

Basic Tools for Process Im provem ent

WORKSHEET Step 3 - Compute the range for the data set

Largest value Smallest value

= =

_______________ _______________

________________________________________ Range of values = _______________

HISTOGRAM

VIEWGRAPH 19

WORKSHEET Step 4 - Determine the number of intervals


IF YOU HAVE THIS MANY DATA POINTS
Less than 50 50 to 99 100 to 250 More than 250

USE THIS NUMBER OF INTERVALS:


5 to 7 intervals

6 to 10 intervals 7 to 12 intervals 10 to 20 intervals

HISTOGRAM

VIEWGRAPH 20

22

HISTOGRAM

Basic Tools for Process Im provem ent

WORKSHEET Step 5 - Compute the interval width


Range = Number of Intervals = =

Interval Width

Round up to next higher whole number

HISTOGRAM

VIEWGRAPH 21

WORKSHEET
Step 6 - Determine the starting point of each interval Step 7 - Count the number of points in each interval
INTERVAL STARTING INTERVAL NUMBER VALUE WIDTH 1 2 3 4 5 6 7 8 9 10
HISTOGRAM VIEWGRAPH 22

ENDING NUMBER VALUE OF COUNTS

HISTOGRAM

23

Basic Tools for Process Im provem ent

WORKSHEET
Step 8 - Plot the data Step 9 - Add title and legend

HISTOGRAM

VIEWGRAPH 23

24

HISTOGRAM

Basic Tools for Process Im provem ent

EXERCISE 1 ANSWER KEY Step 1 - Count the number of data points


11 4 8 23 24 11 14 17 22 14 11 12 10 20 20 11 15 11 23 10 16 15 11 15 7 16 14 16 18 13 19 11 13 18 16 17 22 9 10 15 20 32 10 24 15 18 17 16 25 10 5 11 13 22 15 12 12 16 21 20 19 16 12 28 16 17 26 9 15 18 17 14 19 10 10 13 24 9 11 13

TOTAL = 80
HISTOGRAM VIEWGRAPH 24

EXERCISE 1 ANSWER KEY Step 2 - Summarize the data on a tally sheet


% FAT NO. OF PERS 0 0 1 0 2 0 3 0 4 1 5 1 6 0 7 1 8 1 9 3 10 7 % FAT NO. OF PERS 11 9 12 4 13 5 14 4 15 7 16 8 17 5 18 4 19 3 20 4 21 1 % FAT NO. OF PERS 22 3 23 2 24 3 25 1 26 1 27 0 28 1 29 0 30 0 31 0 32 1

HISTOGRAM

VIEWGRAPH 25

HISTOGRAM

25

Basic Tools for Process Im provem ent

EXERCISE 1 ANSWER KEY Step 3 - Compute the range for the data set
Largest value Smallest value = = 32 Percent body fat 4 Percent body fat

_________________________________________ Range of values = 28 Percent body fat

HISTOGRAM

VIEWGRAPH 26

EXERCISE 1 ANSWER KEY Step 4 - Determine the number of intervals


IF YOU HAVE THIS MANY DATA POINTS
Less than 50 50 to 99 100 to 250 More than 250

USE THIS NUMBER OF INTERVALS:


5 to 7 intervals

6 to 10 intervals 7 to 12 intervals 10 to 20 intervals

HISTOGRAM

VIEWGRAPH 27

26

HISTOGRAM

Basic Tools for Process Im provem ent

EXERCISE 1 ANSWER KEY Step 5 - Compute the interval width


Range = Number of Intervals = 8 28 = 3.5

Interval Width

Use 8 for the number of intervals Round up to 4

HISTOGRAM

VIEWGRAPH 28

EXERCISE 1 ANSWER KEY


Step 6 - Determine the starting point of each interval Step 7 - Count the number of points in each interval
INTERVAL NUMBER 1 2 3 4 5 6 7 8 STARTING VALUE 4 8 12 16 20 24 28 32 INTERVAL WIDTH +4 +4 +4 +4 +4 +4 +4 +4 ENDING VALUE 8 12 16 20 24 28 32 36 NUMBER OF COUNTS 3 20 20 20 10 5 1 1

Equal to or greater than the STARTING VALUE


HISTOGRAM

But less than the ENDING VALUE


VIEWGRAPH 29

HISTOGRAM

27

Basic Tools for Process Im provem ent

EXERCISE 1 ANSWER KEY Step 8 - Plot the data Step 9 - Add title and legend
JUNE 94 PRT PERCENT BODY FAT
SATISFACTORY % BODY FAT 20 18

NO. OF PERSONNEL

16 14 12 10 8 6 4 2 0 0 4 8 12 16 20 24 28 32 36

PERCENT BODY FAT


LEGEND: USS LEADER (MSO-490), 25 JUNE 94, ALL 80 PERSONNEL SAMPLED HISTOGRAM VIEWGRAPH 30

28

HISTOGRAM

Basic Tools for Process Im provem ent

EXERCISE 2: The source of data for the second exercise is the following scenario. A listing of the data collected follows this description. Use the blank worksheets in Viewgraphs 17 through 23 to do this exercise. You will find answer keys in Viewgraphs 31 through 37. A Marine Corps small arms instructor was performing an analysis of 9 mm pistol marksmanship scores to improve training methods. For every class of 25, the instructor recorded the scores for each student who occupied the first four firing positions at the small arms range. The instructor then averaged the scores for each class, maintaining a database on 105 classes. These are the data collected: AVERAGE SMALL ARMS SCORES 160 175 270 180 255 255 230 195 220 210 220 190 190 265 245 180 245 255 235 215 240 230 155 210 255 270 260 210 235 230 225 215 225 300 225 235 200 240 225 195 215 250 230 215 280 275 170 200 245 225 220 225 220 220 225 185 240 175 220 170 235 210 235 245 225 250 170 185 265 205 230 235 225 195 200 285 185 195 270 260 230 240 200 235 235 200 215 200 250 215 195 200 245 225 215 165 220 260 230 185 225 220 230 230 240

HISTOGRAM

29

Basic Tools for Process Im provem ent

EXERCISE 2 ANSWER KEY Step 1 - Count the number of data points


160 175 270 180 255 255 230 195 220 210 220 190 190 265 245 180 245 255 235 215 240 230 155 210 255 270 260 210 235 230 225 215 225 300 225 235 200 240 225 195 215 250 230 215 280 275 170 200 245 225 220 225 220 220 225 185 240 175 220 170 235 210 235 245 225 250 170 185 265 205 230 235 225 195 200 285 185 195 270 260 230 240 200 235 235 200 215 200 250 215 195 200 245 225 215 165 220 260 230 185 225 220 230 230 240

TOTAL = 105
HISTOGRAM VIEWGRAPH 31

EXERCISE 2 ANSWER KEY Step 2 - Summarize the data on a tally sheet


SCORE TALLY SCORE TALLY SCORE TALLY

155 160 165 170 175 180 185 190 195 200
HISTOGRAM

1 1 1 3 2 2 4 2 5 7

205 210 215 220 225 230 235 240 245 250

1 4 7 8 11 9 8 5 5 3

255 260 265 270 275 280 285 290 295 300

4 3 2 3 1 1 1 0 0 1
VIEWGRAPH 32

30

HISTOGRAM

Basic Tools for Process Im provem ent

EXERCISE 2 ANSWER KEY Step 3 - Compute the range for the data set

Largest value Smallest value

= =

300 Points 155 Points

__________________________________ Range of values = 145 Points

HISTOGRAM

VIEWGRAPH 33

EXERCISE 2 ANSWER KEY Step 4 - Determine the number of intervals


IF YOU HAVE THIS MANY DATA POINTS
Less than 50 50 to 99 100 to 250 More than 250

USE THIS NUMBER OF INTERVALS:


5 to 7 intervals

6 to 10 intervals 7 to 12 intervals 10 to 20 intervals

HISTOGRAM

VIEWGRAPH 34

HISTOGRAM

31

Basic Tools for Process Im provem ent

EXERCISE 2 ANSWER KEY Step 5 - Compute the interval width


Range = Number of Intervals = 10 145 = 14.5

Interval Width

Use 10 for the number of intervals Round up to 15

HISTOGRAM

VIEWGRAPH 35

EXERCISE 2 ANSWER KEY


Step 6 - Determine the starting point of each interval Step 7 - Count the number of points in each interval
INTERVAL NUMBER 1 2 3 4 5 6 7 8 9 10 STARTING VALUE 155 170 185 200 215 230 245 260 275 290 INTERVAL WIDTH + 15 + 15 + 15 + 15 + 15 + 15 + 15 + 15 + 15 + 15 ENDING VALUE 170 185 200 215 230 245 260 275 290 300 NUMBER OF COUNTS 3 7 11 12 26 22 12 8 3 1

Equal to or greater than the STARTING VALUE


HISTOGRAM

But less than the ENDING VALUE


VIEWGRAPH 36

32

HISTOGRAM

Basic Tools for Process Im provem ent

EXERCISE 2 ANSWER KEY Step 8 - Plot the data Step 9 - Add title and legend
MARKSMANSHIP SCORES FOR 9mm PISTOL
NO. OF PERSONNEL

30 25 20 15 10 5 0 155 170 185 200 215 230 245 260 275 290 300
SCORES

LEGEND: MCBH KANEOHE BAY, HI; AVERAGE OF 4 SCORES PER CLASS, 105 CLASSES, 1 JUNE 94 - 15 JULY 94

HISTOGRAM

VIEWGRAPH 37

HISTOGRAM

33

Basic Tools for Process Im provem ent

REFERENCES :
1. Brassard, M. (1988). The Memory Jogger, A Pocket Guide of Tools for Continuous Improvement, pp. 36 - 43. Methuen, MA: GOAL/QPC. 2. Department of the Navy (November 1992), Fundamentals of Total Quality Leadership (Instructor Guide), pp. 6-44 - 6-47. San Diego, CA: Navy Personnel Research and Development Center. 3. Department of the Navy (September 1993). Systems Approach to Process Improvement (Instructor Guide), pp. 10-17 - 10-38. San Diego, CA: OUSN Total Quality Leadership Office and Navy Personnel Research and Development Center. 4. Naval Medical Quality Institute (Undated). Total Quality Leader's Course (Student Guide), pp. U-26 - U-28. Bethesda, MD.

34

HISTOGRAM

Graphics Commands

PIE CHART

PIE CHART
PURPOSE
Generates a pie chart.

DESCRIPTION
A pie chart is a graphical data analysis technique for summarizing the distributional information of a variable. It is a circular plot consisting of wedges where the size of each wedge is proportional to the frequency (= number of observations) in that wedge. The plot is to be read clockwise (where the rst wedge is at 9 oclock). If a single variable is specied, DATAPLOT divides the values into frequency classes in the same manner as for a histogram. The histogram and the pie chart have the same information except the histogram has bars at the data values (where the height of the bar is proportional to the number of observations in the class), whereas the pie chart has wedges (where the area of the wedge is proportional to the number of observations in the class). If two variables are specied, the rst variable contains pre-computed frequencies and the second variable is a group identier. This second form is more commonly used.

SYNTAX 1
PIE CHART <x> <SUBSET/EXCEPT/FOR qualication> where <x> is the variable of raw data values; and where the <SUBSET/EXCEPT/FOR qualication> is optional. This syntax is used when you have raw data only.

SYNTAX 2
PIE CHART <y> <x> <SUBSET/EXCEPT/FOR qualication> where <y> is the variable of pre-computed frequencies; <x> is the variable of group identiers; and where the <SUBSET/EXCEPT/FOR qualication> is optional. This syntax is used when you have pre-computed frequencies at each data value.

EXAMPLES
PIE CHART X PIE CHART TEMP SUBSET TEMP > 0 PIE CHART F X SUBSET X > 2 PIE CHART COUNTS STATE

NOTE 1
Each wedge is drawn with a common set of attributes. The attributes of the wedge borders are set with the LINE, LINE COLOR, and LINE THICKNESS commands (typically they are all set the same). The attributes of the interior are set with the various REGION commands. Any labels for the wedges must be set with the LEGEND or TEXT commands. The CROSS HAIR command can help in positioning labels. The program example below shows how to set the attributes. DATAPLOT does not support features such as 3d pie charts or exploding slices that are common in many business graphics programs.

NOTE 2
Although pie charts are popular in business graphics, they are generally a poor graphics technique. See the book listed in the REFERENCE section below for more information.

NOTE 3
For the one variable form of the command, DATAPLOT uses a class width of 0.3 times the standard deviation of the variable. Use the CLASS WIDTH to override this default. DATAPLOT also tends to generate a large number of zero frequency classes at the lower and upper tails. The CLASS LOWER and CLASS UPPER commands can be used to set lower and upper limits for the classes.

DEFAULT
None

SYNONYMS
None

DATAPLOT Reference Manual

March 10, 1997

2-167

PIE CHART

Graphics Commands

RELATED COMMANDS
HISTOGRAM FREQUENCY PLOT PERCENT POINT PLOT PLOT CLASS LOWER CLASS UPPER CLASS WIDTH LINE LINE COLOR LINE THICKNESS REGION FILL = = = = = = = = = = = Generates a histogram. Generates a frequency plot. Generates a percent point plot. Generates a plot (including bar plots). Sets the lower class minimum for histograms, frequency plots, and pie charts. Sets the upper class maximum for histograms, frequency plots, and pie charts. Sets the class width for histograms, frequency plots, and pie charts. Sets the types for plot lines. Sets the colors for plot lines. Sets the thicknesses for plot lines. Sets the on/off switches for region lls.

REFERENCE
The Elements of Graphing Data, William Cleveland, Wadsworth, 1985 (p. 264).

APPLICATIONS
Business Graphics

IMPLEMENTATION DATE
The ability to set the attributes of the pie wedges was implemented 93/11.

2-168

March 10, 1997

DATAPLOT Reference Manual

Graphics Commands

PIE CHART

PROGRAM
LET X = DATA 81 82 83 84 85 LET Y = DATA 2 5 9 15 28 MULTIPLOT 2 2; MULTIPLOT CORNER COORDINATES 0 0 100 100 X1LABEL SALES IN MILLIONS OF DOLLARS . LINE THICKNESS .3 ALL; TITLE PIE CHART WITH THICKER LINES PIE CHART Y X . REGION FILL ON ALL; REGION PATTERN COLOR G10 G30 G50 G70 G90 REGION FILL COLOR G10 G30 G50 G70 G90 TITLE PIE CHART WITH SOLID FILL SLICES PIE CHART Y X . TITLE PIE CHART WITH LABELS LET N = SIZE X LEGEND SIZE 3 LOOP FOR K = 1 1 N LET A = X(K) LEGEND ^K 19^A END OF LOOP LEGEND 1 COORDINATES 8 58; LEGEND 2 COORDINATES 10 71; LEGEND 3 COORDINATES 28 92 LEGEND 4 COORDINATES 68 77; LEGEND 5 COORDINATES 67 30 PIE CHART Y X . REGION PATTERN COLOR BLACK ALL; REGION PATTERN D1 D2 D1D2 VERT HORI REGION PATTERN SPACING 1.0 1.0 3.0 4.0 5.0; REGION PATTERN LINE SOLID SOLID SOLID DASH DOT TITLE PIE CHART WITH HATCH PATTERN FILLS PIE CHART Y X MULTIPLOT OFF

PIE CHART WITH THICKER LINES

PIE CHART WITH SOLID FILL SLICES

SALES IN MILLIONS OF DOLLARS

SALES IN MILLIONS OF DOLLARS

PIE CHART WITH LABELS


1983 1984 1982

PIE CHART WITH HATCH PATTERN FILLS


1983 1984 1982

1981

1981

1985

1985

SALES IN MILLIONS OF DOLLARS

SALES IN MILLIONS OF DOLLARS

DATAPLOT Reference Manual

March 10, 1997

2-169

STATISTICS

Statistics: The word statistics in the first sense is defined by professor secret as follow: By statistics we mean aggregate of facts affected to a marked extend by multiplicity of cause, numerically expressed, enumerated or estimated according to reasonable standard of accuracy, collected in a systematic manner for a predetermined purpose and placed in relation to each other. A L Bowley has given three definitions: Statistics may be called the science of counting. Statistics may be called the science of average. Statistics is the science of the measurement of social organism as a whole in all its manifestation. In another word It deals with data, Which can collected, documented, analyzed and interpidation. Sampling :Sampling involves the selection of a number of study units from a defined study population.A study population may consist of individual village, institutions, records etc. Sampling method : There are two types of sampling method. 1.Probability sampling Random sampling Stratified sampling Systemic sampling Cluster Sampling Multistage Sampling

2.Non probability sampling: Quota sampling Convenient

Probability sampling: probability sampling involves random selection procedure to ensure that each sample unit is chosen on the basic of change. Systemic sampling: In systemic sampling a group of people are selected in a systemically random manner from a complete list of a given population. A systemic sampling is applied where very large numbers are included in the target population. For example : 1. Class Interval : To calculate the class interval we have to divide desired the population by sample size. Example : If we want to select 15 universities from a list of 40 in our sampling frame. class interval = =2.6 2. Random : Random number should be selected from 0 1, such as 0.178 By multiplying random number with class interval, we get a fraction and the next interval to the fractional value will be facility-1. Random Class interval = 0.178 2.6 = 0.463 (this is facility -1) Summation of class interval with facility-1, we get a fraction and the next interval to the fractional value will be facility-4. Facility -1 + Class interval = 0.463 +2.6 =3.063 (facility -4) Summation of class interval with facility-4, we get a fraction and the next interval to the fractional value will be facility-6. Facility- 4 + class interval =3.063 + 2.6 =5.663 (facility- 6) And so forth
40 50

CENTRAL TENDENCY
There are 3 types of central tendency ; a) Mean b) Median c) Mode

a) Mean : Mean is the sum of observation divided by the number of observation. =


x = mean xi = n = total number of sample

It is 3 types : 1) Arithmetic Mean 2) Harmonic Mean 3) Geometric Mean 1) Arithmetic Mean : It is total number of data divided total number of frequency . For example : 1, 2, 3, 25, 9 AM = =8 2) Harmonic Mean : Harmonic mean is the reciprocal of the mean of the reciprocal of nonzero data. For example 1, 2, 3, 9,25 Step -1 : =
5 5 1+ 0.5+0.33+0.11+0.4 1.98 5 5
1 1

1+2+3+25+9 5

+ + + +

1 2

1 1 1 3 9 25

Step 2 :

= 2.53
3

1.98

3) Geometric Mean :Geometric mean is defined as the nth positive root of the product of the non zero, non negative values. For example : 1, 2, 3, 9, 25 GM = 1 2 3 25 9 5 = 1350
5

= 4.23

Question: Prove the equation AM HM =GM2 Answer: It is possible only two non zero positive number. Suppose, one non zero positive number = a other non zero positive number = b
+ + 2 2 2

1 + +

+ 2

2 +

= = =

ab = ab

Question: For two non zero positive number AM=5 HM=4. What is the value of GM=? Answer: Suppose, Two non zero positive number a and b. a+b = 10
1 1 1 +

+ 2

a+b = 10(i) + = 4
+ 2 2 2

= 5

=4

=4

(a-b)2

= (10)2 4.20 = 100 80

ab =20(ii) (a+b)2 4ab

10

= 2

(i)+(ii)

ab = 20 = 20

= 4.47..(iii)

a+b = 10 ab =4.47 a = 7.24 2a = 14.47

From (i)

a =7.2

b = 107.24 a+b =10 b = 2.77 Answer: (a,b) = (7.24,2.77).

Table 1: Different discrete positive number. Find AM, GM and HM. Show a relationship between them. For example:1,2,3,4,5,6,7,8,9,10. AM = =
1+2+3+4+5+6+7+8+9+10 10

5.5

HM =Step-1: = =

1 1 1 1 1 1 1 1 1 1 + + + + + + + + + 1 2 3 4 5 6 7 8 9 10

2.928 10

1+0.5+0.33+0.25+0.2+0.167+0.143+0.125+0.11+0.1 10 10

10

Step-2: =
10

GM = 1 2 3 4 5 6 7 8 9 10 = 3.418 = 362880
10

2.928

=4.52

AM >GM >HM AMHM =5.53.418 =18.8 GM2 = (4.52)2 = 20.52 AMHM GM2

[Non zero and non equal]

If the relation is AM GM HM [a=5, b=5, c=5 non zero and non equal]

Mean for ungrouped data: Score xi 0 1 2 3 4 5 Frequency 4 3 5 5 6 7 fi =30 Mean =


xifi 0 3 10 15 24 35 fixi =87

87 30

=2.9 Answer: 2.9 Mean for grouped data: Height(m) 150-155 155-160 160-165 165-170 170-175 175-180 Middle value(x) 152.5 157.5 162.5 167.5 172.5 177.5

50

Frequency (fi) 4 7 18 11 6 4 fi =50

fixi 610 1102.5 2925 1842.5 1035 710 fixi =8225

Mean = =

8225

=164.5 Answer: 164.5

Graph for median & mode: Weekly wages (taka) 75-85 85-95 95-105 105-115 115-125 125-135 135-145 145-155
2

No. of workers 14 18 30 45 52 45 20 6
fm

Mid value 80 90 100 110 120 130 140 150

Cumulative frequency 14 32 62 107 159 204 224 230

Median = L 1 + Where,

L 1 = The lower limit of the median group. C = Class interval of median group. n = The total frequency. f m = The frequency of the median group. f c = The cumulative frequency of the group preceding the median group.

Here, L 1 = 115 C = 10 n = 230 f c = 107 f m = 52


230 2

Median

= 115+

= 116.53
8

52

107

10

Mode = L+ Where,

1+2

L = The lower limit of the modal group. C = Class interval of Modal class.
R

Here,

2 = The difference in frequencies of modal class and post- modal class.


R

1 = The difference in frequencies of modal class and pre- modal class.

L = 115 C = 10 2 = 7
R R

1 = 7

Mode = 115+

7+7

10 = 120

10

Median: When all the observation of a set of data are arranged in either ascending or descending order, the middle observation is known as median. If the numbers of observation is even, the mean of the two central value is taken as median. Median = the middle value of a set of data. For ungrouped data, median =(
P

Class interval 5-9 9-13 13-17 17-21 21-25 25-29

+1 th ) 2

M.V 7 11 15 19 23 27
2

Frequency 12 8 15 19 14 7 N =75

C.F 12 20 35 54 68 75

Median: = L + = 17 + =17 +

c 4

2.5 4 19

75 35 2

19

=17 +0.526 =17.526 Here, L=lower limit of median class. =c.f of the class just preceding the median class. N=Total number of observation.

=frequency of the median class.

C=class interval of the median class.

11

Mode: The mode is the value of a data set that occurs most frequently. It is the typical or commonly observed value which occurs maximum number of times. Mode = L + Weight 35-40 40-45 45-50 50-55 55-60 60-65 65-70 M.V 37.5 42.5 47.5 52.5 57.5 62.5 67.5
(1 0 )+(1 2 ) 1 0

c (Grouped data) Frequency 6 5(0 ) 10(1 ) 9(2 ) 9 8 7 Cumulative frequency 6 11 21 30 39 47 55

C.I

Mode = L + ( =45 +

=45 +

=45+4.17 = 49.17

(5+1)

105 5 (105)+(109) 5

1 0 )+(1 2 )

1 0

(For ungrouped data highest frequency is mode)

12

Determination of Mean, Median, Mode, M.D Mean, M.D mode, M.D Median, Variance.(Grouped data) Class M.V Frequ(xi) ency (fi) 15 5 25 35 45 55 10 15 12 13 N=55 Mean = C.F fixi 23. 27 13. 27 3.2 7 6.7 3 16. 73 fixi 116. 35 132. 70 49.0 5 80.7 6 217. 49 xi =5 96.3 5

xifixiximedian median mode 23.33 13.33 3.33 6.67 16.67 116.65 133.30 49.95 80.04 216.17 fiximedian =596.65 21.25 11.25 1.25 8.75 18.75

fiximode 106.25 112.5 18.75 105 243.75 fiximode= 586.25

10-20 20-30 30-40 40-50 50-60

5 15 30 42 55

75 250 525 540 715 fix i=2 105


2105 55

=38.27

Median = L + 2

c
15

=30 +

55 15 2

10

=38.33 Mode = L + ( =30 +

=30+6.25 =36.25

1510 (1510)+(1512)

1 0 )+(1 2 )

1 0

10

13

Mean deviation mean = =

596.35 55

=10.84 Mean deviation median = =

596.65 55

=10.85 Mean deviation mode = =


586.25 55

=10.66 For Variance: (xi- )2 fi(xi- )2 541.49 2707.45

176.09 1760.9

10.69 160.35

45.29 543.48

279.89 3638.61

Mean variance 2 = =

8810 .79 55

=160.19 Median variance 2 = =

355991.222 55

=6472.567

14

Mode variance 2 = =

2
343689 .062 55

=6248.892 Determination of M.D mean ,M.D median and M.D mode from ungrouped data. Score (xi) 1 3 5 7 Frequency (fi) 5 7 10 8 N =30 fixi xi- xi-median 14 12 10 8 ximedian=44 xi-mode 9 7 5 3 ximode=24

5 21 50 56 fixi =132

3.4 1.4 0.6 2.6 xi- =8

Mean = =

132 30

Median =

30+1 2

=4.4 ;

30+2 2

=15.5th ; 16th mode = 10 M.D mean =



8

M.D median = M.D mode =

30

= 0.267 =
24 30 44 30

=1.47

= 0.8

15

Determination of variance from ungrouped data:1,2,3,4,5. xi 1 2 3 4 5 xi=15 x =

xi- 2 4 1 0 1 4 xi- =10

15 5

=3

Variance 2 =

10 5

=2

16

Dispersion deals with how value is scattered in a set of data. Dispersion is small if the values are closely branched about their mean and it is large if the values are scattered widely about their mean. There are three important measures: Range Variance Standard deviation Co-efficient of variation (CV) Standard error

DISPERSION

Range : Range is the absolute difference between the highest value and the lowest value in a series of observations. Range : largest value smallest value For example: 1,10,10,20,30,40 Range: 40-1 =39 Variance : The mean of the squares of the deviations of every observation from their mean is a measure of spread and is called variance. Formula for grouped data : 2 = =
2

Standard deviation : The standard deviation is the square root of the variance. S.D = =

17

Co-efficient of variation (CV) : The Co-efficient of variation provides a relative measure of data dispersion compared to the mean. C.V =
, ,

100%

Standard error: The standard error is the standard deviation of the sampling distribution of the sample statistic. S.E =
.

(n= number of sample or population)

1.The average of 10 observation is 40. One observation x is added and average is reduced to 38. One more observation y is added and the average is 42. Find the value of x and y. We know, xi = xn = 4010 1 = 1 1 =400 Here, n =10 x =40

Here,

=3811

1= 10+1 = 11 1=38

400+x = 418

x = 418-400 =18

2 = 4212 =504 y + 418 = 504 y = 504 418 y = 86

Here,

2=12 2=42

Answer: (x,y) = (18,86).


18

1. Suppose the mean and variance of 20 observation are 32 and 25 respectively. One observation 36 is added now. Find the mean and variance 21 observation. We know, x=
+36
21

= =

(2032)+36 640+36 21 21

Here, 1=20

1=32

Or combined mean =

= 32.19

1 1 + 2 2 1 + 2 20+1

12=25 2= ?

3220+361
2 2 1 1 + 2 2

=32.19 Combined variance 2 =

2 It is not possible because there is not given the value of 2

1 + 2

Answer: Suppose , two number are 1 2 . =


1 + 2 2

Question: What are the relation between M.D mean and S.D?

(This formula is applicable 1 = 2 = 0 1 = 2 = )

when

M.D mean = =

2 1 1 2 +

+ + 1 1 2 + 2 1 2 2

1 + 2
2

2 2 1 2 2

19

1 2 + 2 1 2

= =

S.D = =

1 2 2

1 2 + 1 2 2 2

= = =

1 =
2

( 1 )2 +( 2 )2
1+ 2 2 2 2

2
2

2 1 1

+ 2

+ 2 2
2

1+ 2 2 2 2

2 2 1

1 2 2 2 + 2 1 2 2 2 2

.( 1 2 ) =4
2

1 2 + 1 2

1 2 2

( 1 2 )2 4

M. D. Mean = S.D.

20

Q: Relation between variance and S.D? Answer: Suppose three numbers are a,b,c. 2
=

= =

2 + 2 + 2 3 3

2 2

3 2 +3 2 +3 2 2 2 2 2 22 2 2 +2 2 +2 2 2 22 9 9 9

2 + 2 + 2

- -

++ 2 3

( 2 + 2 + 2 +2 +2 +2) 9

2 =

Q: Determination variance and S.D of the numbers 27,28,29. 2 = (a-b)2+(b-c)2+(c-a)2 = (27-28)2+(28-29)2+(29-27)2


1 9 6 9 2 3 3 1 1 9 1 9

= ( )2 + ( )2 + ( )2
1 3

()2 +()2 +( )2 9

2 2 + 2 + 2 2 + 2 +( 2 2 + 2 )

= (1+1+4) = = S.D =
2 3

( )2 + ( )2 + ( )2 = (27 28)2 + (28 29)2 + (29 27)2 = 1 + 1 + 4= 6=0.816 3 3


1 1 1 3

Answer:

Answer: 0.816

21

Combined variance : 2 =

2 2 2 2 1 1 + 1 + 2 2 + 1 1 + 2

2 2 Here given 1 = 20, 2 = 25, 1 = 30, 2 = 30 , 1 = 5, 2 = 6, 2 =?

(1 = 1 ) (2 = 2 )

2 = =

205+256 20+25

2 2 1 1 + 2 2

1 + 2

=5.5

Q: Find two number for which mean and variance are 6 and 16? Answer:
+ 2

a+b =12
2
2 2

= 6(i)

a = 12-b ..(ii)
2 + 2

72-2ab = 32 2ab = 40 ab = 20
22

(+)2 2 (12)2 2 2 2 2

2
2 2

+ 2

= 2

1442 72

- 36 = 16 = 16

- (6)2 = 16

= 16

b(12-b) = 20

12b- b2-20 = 0

a = 12-10 (when b = 10) =2

b = 10 or b = 2

(b-10) (b-2) = 0

b2-12b+20 = 0

b2-10b-2b+20 =0

a = 12-2 = 10 (when b = 2) Answer: (a,b) = (2,10); (10,2) Question: Determination the variance for 1,2,3..n number. We know, Variance 2 =
2

2 = 12 +22 +32 +..+ 2 . =


(+1)(2+1) (+1) 2 6

2 =
=

= 1 + 2 + 3 ++ n. =
6

(+1)(2+1) (+1) 2

(+1)(2+1) 6

(+1)(2+1)

=
23

2+1 3

(+1) 2 2

(+1) 2 2

(+1) 2

= = =

(+1) (+1) 2 1 12 2 2

1 6

4+233

Question: Determination variance and S.D for (1-50) number. Variance 2 =


2 1 12 (50)2 1 2500 1 12 12

S.D () =

2 1 12

=208.25 (Answer)= 208.25 S.D =


2 1 12

=208.25 = 14.43 Answer: 14.43

(50)2 1 12

24

MOMENT SKEWNESS AND KURTOSIS


Moment: Moment of a set of data describe the nature of its distribution. Moment is two types: i)Corrected moment: This moment is found from mean.The equation for corrected moment is:
( )

For ungrouped data

For grouped data

ii)Raw moment:

( )

/ For grouped data , =

/ For ungrouped data, =

Mid value is substitute by a random number to found the raw moment. The equation for raw moment is :
() ()

Question: Why there is two types of moment Answer: For any case we need to determine the corrected moment but there is a difficulty to found the corrected moment. Because if the mean of the data set got fractional value then the calculation will be difficult. In statistic we always try to avoid calculation and thats why to solve the problem we take another type of moment that is raw moment. In raw moment we take a random number which makes our calculation easier. Finally we made a relation between raw moment and Corrected moment Thats why there is two types of moment.
25

Skewness: Skewness refers to lack of symmetry or departure from symmetry of distribution. Pearson co-efficient =
.

When the result is positive then it will be called positively skewed distribution. When the result is negative then it will be called negatively skewed distributio When skewness is zero then it is called symmetrical skewed /distribution.

Kartosis: It measures the peakness of the graph of a data set. It only found when skewness is zero. When, Peakness is high ,it is called Laptokurtic When, Peakness is medium, it is called mesokurtic When, Peakness is low, it is called platykurtic Parameters for finding the kartosis:

1 = 3 3 2 =4 2

2
2 2

1 =

represent the kartosis.


26

2 = 2 - 3

2 2

When, 2 is greater than 3 then 2 will be positive and then it will be Laptokurtic.
R

When,

When,

2 is equal to 3 then 2 will be zero and then it will be mesokurtic.


R

1) Find out the relation of 2nd raw & corrected moment


2 is less than 3 then 2 will be negative and then it will be platykurtic. We know, = = = = = =
[( )( )]2 ( ) 1
2

( + )2

( )2

()2 2( ) 1 +1 2 ( )2

1 = x A

So, This is the relationof 2nd raw &corrected moment.


27

2 = 2 - 1 2

=2 -2 1 1 + 1 2

( )

1 +

1 2

Find out the relation of 3rd raw & corrected moment


We know, 3 = = = = = =
[( )( )]3 ( )1
3

( + )3

( )3

( )3 3( )2 1 +3( )1 2 1 ( )3

1 = x A
( )2

3 = 3 3 1 2 + 21 3

= 3 3 1 2 + 21 3

= 3 3 1 2 + 3 1 1 2 1 3

1 +

( )

1 2

So, This is the relation of 3rd raw &corrected moment.

28

3)Find out the relation of 4th raw & corrected moment


We know, 4 = = = = = =
[( )( )]4 ( )1
4

( + )4

( )4

= 4 4 1 3 + 6 2 1 2 - 4 1 1 3 + 1 4 = 4 4 1 3 + 6 2 1 2 - 31 4

( )4

( )4 4( )3 1 +6( )2 1 2 4( )1 3 1 4

1 = x A
( )3

1 +

( )2

1 2 -

( )

1 3 +

1 4

So, This is the relation of 4th raw &corrected moment.

29

CO-RRELATION REGRESSION AND RANK CO- RRELATION


Correlation :

Correlation is the element of going togetherness of two or more variables. Correlation can be defined as the probable tendency of two or more variables or begins of items to a vary together. It is also termed as co-variation. The primary objective of correlation analysis is to measure the strength or degree of the linear association between two or more variables. Correlation & co-efficient: Correlation coefficient is a quantitative measure of the direction & strength the of linear relationship between two numerically measured variables. The coefficient of correlation between two variables is denoted by r. R is known as product moment co-efficient of Co-relation. The formula is given below Correlation co-efficient =
(,)

( ) ( )

Variance of (x) 62x =


()2

( )2 ( )2

( )( )

Co-variance of (x,x) = 62x = Co-variance of (x,y) =


30

()2

()()

( ) 2 2 [ 2 ( ) ] [ 2 ( ) ]

62(x,x) =

() ()

Proved that 1 1 or Proved that 2 = [0 1]


We know, 62x = Again we know, 62y = 62 = ( )2_______________(i)
()2 ()2

From definition of co-variance of co-efficient, = = =


()2 ()2 () () 6 2 . 6 2 () () 6 6.6 () ( ) 2 ) 6 6 () ()

62 y = ( )2 _______________(ii)

= (

At now, [
6

= + 2 0 = 1 0 = 2 (1 + 0)

2 ) 6

( ) 6

+ (

( ) 2 ] 6

0
6

v= 1;
31

v 1 + 0; 1 0

= 1 1 1 v

Regression : Regression is the nature of a relation. The regression analysis is a technique of studying the dependence of one variable (called dependent variable) or more variables (called explanatory variables) with a view of estimating or prediction of the values or fixed values of the independent variables. Regression is of three types
y

(i)

Linear
o x

(ii)

Cubi linear
o x o x

(iii)

Exponetion

Calculation of regression co-efficient 1) The regression line of yonx. yi = a + bxi + ei 2) The regression live of x on y xi = a+b`yi + ei; Here the, 2 regression co-efficients are b & b
32

Regression co-efficient: Regression co-efficient is the average change in one variable corresponding of the unit change in another. The two quantities b & b are called co-efficient of regression. The quantity b is the co-efficient of regression of y on x of b is the co-efficient of regression of x on y. (i) b=
() ( ) ()2
. ( )2

(ii)

b =

R.H.S = b = = ()2

. 2 2 ( )

() () ()2

()() ()2

()2 ( )2

( )()

{()( )}2

( )2

()() ( )2

= = b

= r. = L.H.S L.H.S = R.H.S

So, this is the relationship between co-relations efficient and regression co-efficient.

33

What is the importance of regression? Ans: (i) Estimate the relationship that exists on the average between the dependent variable and the explanatory variables. (ii) Determine the efficient of each of the explanatory variables on the dependent variable controlling the effects of all others explanatory variables. (ii) Predict the value of the dependant variable for a given value of the explanatory variable. Rank correlation: Rank correlation method is applied when the rank order data are available or when each variable can be ranked in same order. Rank co-efficient Is a non-parametric counter part of the conventional correlation coefficient. Proved this: The geometric mean of two regressions co-efficient is equal to the correlation co-efficient = = b

Solve: we know

Regression co-efficient are b and b. = =


()() ()() ( )2 ( )2

= b =

The geometric mean of two regression co-efficient of is equal to the correlation co-efficient. The measure based on rank co-rrelation method is known as rank co-rrelation co-efficient. It is denoted by the symbol
34

= b

Question: Proved this =


Answer: By definition of variance: 2 = and, 2 =

( )

= = =

(+1)(2+1) 6 6

(+1)(2+1) 6

(+1)(2+1) 6

(i)

=(n+1) c

4+233 12 (1) 12 2 1

(2+1)

=(n+1) =
2 1 12

(+1)2 4 4

2 (+1)2 4 2

(+1) 2 2

(+1)

2 =

12 ( )2 = (2 1) ( )2 =
( 2 1) 12

12

2 1 12

..(ii)

.(iii)

35

We know,

di =( xi- ) - (yi- )

)] 2 = [( xi )2 + ( xi )2 - 2( xi- )(yi- 2 = 2[( xi )2 - ( xi- )(yi- )] )] 2 = ( xi )2 - ( xi- )(yi-


1 2 1 2

) 2 = [( xi )2 + (yi 2 - 2( xi- )(yi- )] 2 = 2 ( xi )2 - 2( xi- )(yi- )

2 = [( xi ) (yi 2 )]

di = xi- + -yi

di = xi- + yi

di = xi yi

[xi =yi ; = ]

[xi =yi ; = ]

Correlation co- efficient, r = Rank co- rrelation =

( xi- )(yi- = ( xi )2 2 )
) ( xi )2 (yi 2 ( xi )2 ) ( xi )(yi

)(yi- = ( ) ) ( xi- =
( xi )2

=
36

=1-

( xi )2

( xi )2

2 2 ) ( xi )(yi ( xi ) 2

) ( xi )(yi ( xi )2
1

[ since = ] =1 1 2 2 ( 2 1) 12

( 2 1)

6 2

( xi )2

1 2 2

=1 -

( xi )2

1 2 2

( )

Question -1: Math of co- rrletion and regression and rank sco- efficient. xi yi 10 8 12 9 15 xi=54 11 10 13 10 15 yi=59 100 64 144 81 225 2 = 614 121 100 169 100 225 2 = 715 xiyi R(x) 3 5 2 4 1 R(y) 3 4 2 4 1 110 80 156 90 225 xiyi= 661

n=5

di=R(x)R(y) 0 1 0 0 0 di=1

Co-rrelation co- efficient : r= =

0 1 0 0 0 2 = 1


5459 5

614

661

( )2 ( ) 2 [ 2 ]

(54)2 (59)2 [ 715 ] 5 5

Regression co- efficient(b,, ): b

30.818.8 579.04 23.8

23.8

=0.98

661
23.8 30.8

5459 5 (54)2 614 5

( )2

661637.2 614583.2

=0.77

37

( )2 2

= = =

23.8

715696.2 18.8

715

23.8

23.8

592 5

Rank co- efficient: = 1 =1-

r =

bb = 0.98
61 6 ( 2 1) 524 6 2

b b = 0.77 1.27 =1.27 =0.9779

=1-

5(52 1)

=0.95

38

Question -2: Math of co- rrletion and regression and rank co- efficient. Math(x) Statistics(y) xy

80 85 82 78 71

81 80 84 76 70

6400 7225 6724 6084 5041

(x)=396 (y)=391

Co-rrelation co- efficient : r= =

2 =30093 (xy)=31066 2 = 31474



( )2 ( ) 2 [ 2 ] 396 391 5

6541 6450 7056 5776 4900

6480 6800 6888 5928 4970

R(x) R(y) di= R(x)R(y) 3 2 1 1 3 -2 n=5 2 1 1 4 4 0 5 5 0

1 4 1 0 0

2 =6

Regression co- efficient(b,, ): =0.86

31474

31066

(396 )2 (391 )2 [ 30693 5 ] 5

b = b

b = 0.89 0.81=0.72 = b b = 0.86

( )2 2

( )2 2

31066

396 391 5 31474 (3965 )2

31066

396 391 5 (391 )2 30093 5

116.8

98.8

110 .8

98.8

=0.89

=0.81

39

Rank co- efficient: = 1

=1=0.7

( 2 1)

6 2

Question-3 : = 0 the rank is same with example? Answer: Rank co- efficient is denoted by . = 1
( 2 1) 6 2

5(52 1)

66

Hear, di is the difference of Rank of (x) and Rank of (y). Math (x) Statistics(y) 1 2 3 4 1 2 3 4 R(x) 4 3 2 1 R(y) 4 3 2 1 di= R(x)R(y) 0 0 0 0

So, 2 = 0 .

di=0 2 =0

0 0 0 0

40

41

Binomial distribution :

1) P(x) = px qn-x . 2) Expectation E(x) = xP(x). = mean of the distribution. 3) Variance V(x) = E(x2) - [E(x)]2 and variance V(x) = E[x(x-1) + x]2 4) (a+b)n = 1 an + 2 a2b2 +.. 5) P+q = 1 [ if p=60%, q=40% then p+q =100% = 1] 6) 2 =
(1) 2!

(1) 2

Q. How can you find the mean and variance in the binomial distribution. Or, Mean is greater than the variance explain. Means and variance distribution : x P(x) x(x-1) 0 qn 0 1 npqn-1 0 2
(1) 2 n-2 pq 2!

(1)(2) 3 n-3 pq 3!

Here,

P(1) = 1 p1 qn-1 = npqn-1 P(2) = 2 p2 qn-2 = P(3) = 3 p3 qn-3 =


2!

P(0) = 0 p0 qn-0 = qn

P(x) = px qn-x

(1)(2) p3qn-3 3!
p2qn-2

( 1)

Mean of the distribution : E(x) = x P(x) = 0 + npqn-1 +2


(1) 2 n-2 pq 2! (1)(2) 3 n-3 pq 3!

= npqn-1 + n(n-1)p2qn-2 + = np[qn-1 + (n-1)p qn-2 + = np (p+q)n-1 = np (1)n-1 [p+q=1] = np Variance distribution : V(x) =
xi 2

( 1)(2) 2 n-3 p q +.] 2!

(1)(2) 3 n-3 pq 2!

+3

= E(x2) [E(x)]2 = E[x(x-1) + x] - [E(x)]2 = E[x(x-1)] + E(x) - [E(x)]2.(i) Here, E[x(x-1)] = x(x-1) P(x) = 0 + 0 + 2
(1) 2 n-2 pq 2! (1)(2) 3 n-3 pq 3!

-(

xi

)2

+ 32

= n(n-1)p2qn-2 + n(n-1)(n-2)p3qn-3 + = n(n-1)p2 [qn-2 + (n-2)pqn-3 + = n(n-1) p2(p+q)n-2 = n(n-1) p2(1)n-2 = n(n-1) p2

(2)( 3) 2 n-4 pq 2!

(1)(2)(3) 4 n-4 pq 2!

+ 43

(1)(2)(3) 4 n-4 pq 4!

+.]

[ p+q= 1]

From equation no (i).. E[x(x-1)] + E(x) - [E(x)]2 = n(n-1) p2 + np n2p2 = np np2 = np(1-p) [ p+q=1] = npq So, From Binomial distribution . Mean > Variance. Math -1: The probably of surviving a patient operating a delicate heart operation is 0.2. What is the probability that out of 8 person operating such heart operation. a) At least one will survive. b) Exactly two will survive. c) All will survive. a) 1 - P(0) = 1 - 0 p0 qn = 1 80 (12)0 (0.8)8 = 0.83 b) P(2) = 82 p2 q8-2 = 82 (0.2)2 (0.8)6 = 0.29 c) P(8) = 88 p8 q8-8 = 88 (0.2)8 1 = 2.56 10-6
P

Here, n=8 p= 0.2 & q= 0.8

(Ans): 2.56 10-6

Poison Distribution : P(x) = e- Where, p 0 n np =


x !

Mean distribution: x(x-1) x P(x) 0 0 0 1 12 2


2

e-

e-

e-

2!

23 3

e-

3!

34 4

e-

4!

Here,

P(x) = e- P(0) =

P(1) = e P(2) =

! 0 e- 0! 1 - 1! 2 - e 2! 3 - 3! 4 - 4!

= e- = e-

P(3) = e P(4) = e

Mean of the distribution : E(x) = x P(x)

=0+e +e
-

= e- (1 + + + 2! - =e .e = e0 = =np [np =]

2! 2

+ e
P

3!

3!

+e
4!

+.)

4!

+..

Math -2: In a factory the probability to the defective for a product in 0.001. Out of 500 product What is the Probability of 10 product to the defective? Here, n = 500 P = 0.001 x = 10. = 500 0.001 = 0.5

We know. Poison distribution P(x) = e


-

=e 10! = 1.63 10-10

10 -0.5 (0.5)

(Ans): 1.63 10-10

Variance of the distribution : V(x) = E[x(x-1)] + E(x) - [E(x)]2 E[x(x-1)] = [x(x-1)] P(x) = 0 + 0 + 2 e
-

= e-2 + e-3 + e- = e-2 (1 + + = e-2 e


2 2!

2!

+ 23 e
4!

+..

3!

+ 34 e

4!

+.)

= 2 = n2p2 [np =] V(x) = n2p2 + np - n2p2 = np = So, From poison distribution Mean = Variance .

Types of Hypothesis Hypothesis are two types :


1.Parametric test (normal distribution) 2.Non - parametric test (do not normal distribution)

Parametric test:It may be emphasize here that the statistical tests to be discussed tests, which are primarily based on the assumptions on the forms of population distributions.

Non parametric test:The discussion on non parametric tests that do not require rigorous assumptions about the populations.

Terminology: Null hypothesis (H) Alternative hypothesis Test statistic Critical region Significant region () Acceptance region Test of hypothesis

Null hypothesis (H):Null hypothesis is a statement , which tells us that no difference exists between the parameter and the statistics being compared to it.Between the rates of prevalence of malnutrition between the male and female children is an example of null hypothesis.

Alternative hypothesis:The alternative hypothesis is the logical opposite of the null hypothesis. The rejection of a null hypothesis leads to the acceptance of the alternative hypothesis. The alternative hypothesis again the null hypothesis stated above may be formulated as, there exists significant difference in the population between the rate of prevalence of malnutrition between the male and female children.

Test hypothesis:-The statistic used to provide evidence about the null hypothesis is called
the test statistic, e.g. - t- test, z-test.

Critical region:-Set of values of test statistic leading to rejection of the tested.


is true is called the level of significance of region ().This is depends on the amount of confidence that we want to attach to the test conclusion and the significance level . = P(reject 0 /0 is true)

Significant region:-The probability of rejecting the tested hypothesis when it

Acceptance region:-Value of the test statistic not included in the critical region. Test of hypothesis:-A procedure where by the truth or falseness of the tested hypothesis
is investigated by examining value.

Two tailed test:Statistical hypothesis where the alternative is two sided such as
H : = H : Is called a two tailed test.

One tailed test:-A test of any statistical hypothesis where the alternative is one-sided such
as, H : = H : > Or, perhaps, H : = H : < Is called a one tailed test.

Two types of error:Type 1 error:-A type -1 error for a statistical test is the error of rejecting the null hypothesis when the null hypothesis is true. Type -2 error:-A type -2 error for a statistical test is the eror of accepting the null hypothesis when the null hypothesis is false. Level of a test is the smallest value of ,for which H can be rejected. It is the actual risk of committing a type -1 if H is rejected based on the observed value of the test statistic.

Important steps in a test of significance:


1. Set up the null hypothesis 2. Collect relevant data 3. Decide whether two tailed or one tailed 4. Choose the level of significance 5. Calculate suitable test statistic to test the null hypothesis 6. Draw calculation

Application: Normal test (z-test):[n>30]


1. To test for an assigned population mean 2. Comparison of two independent sample means 3. to test for an assumed population proportion 4. to test for equality of two proportions.

To test for an assigned population mean : Math 1 Company claims to have 125 hours life. 64 bulbs 127 hours life with S.D 4.8 Can we accept the claim ? 0 = x = Solve 127 125 64 x n

Here ,

z=

= 4.8

= 3.33 n=64

x=127(sample mean) =4.8

= 125

IzI=3.33

[ z >1.96 ] [ H =Rejected]

Comparison of two independent sample means : Math-2

So, 0 rejected. we can not accept the claim. (Ans). 3.33>1.96

Drug-A : 80 patients mean 20 days S.D=3 Drug-B : 60 patients mean 22.5 days, S.D =5 Are these drugs equally effective ? 0 = x1 = x2 Solve z=
2022.5
9 25 + 80 60

x 2 x 1 1 2 2 2 + n1 n2

x1 =20 & x2 =22.5 Here , 1 =3 & 2 =5 n1 =80 & n2 =60

= -3.42

z= 3.42

0 rejected. SO, these drugs are not equqlly effective.

(Ans).

To test for an assumed population mean Math -3: Random sample of 15, mean high of 66.4 inch, SD = 3.1 . Can you say that the mean height is 65 inch? Solve : H o : = 65

Here, x = 66.4 (claimed mean) =65 s =3.1 & n = 15

t= =

with (n-1) degree of freedom with (15-1) degree of freedom

66.465
3.1 15

= 1.75 with 14 degree of freedom So, H o accepted. We can say thet the mean height is 65 inch. (Ans). To test the equality of two independent sample means Math 4: Worker A 10 days increases the production umits of 7,8,8,12,10,9,10,11,6,8; Worker B 12 days increases the production umits of 10,10,11,9,12,13,13,12,11,10,9,12; Are they equaly efficient ? Solve: Ho : 1 = 2 s1 2 = = = =
1 1 1 ( )2

Here, 1 = 1 = 8.9 ]

[xi2

2 = 2 = 11

101 1 9 1 9

[823

(89)2 10

1 =10 & 2 =12

[823 792.1]

30.9

= 3.43

s2 2 = = =

=2

11

121 1

[xi2

[1474

(132 )2 12

( )2

[1474 1452]

We know, s2 = = =

(101) 3.43+ (121) 2 9 3.43+11 2 20 10+122

( 1 1)1 2 + ( 2 1)2 2 1 + 2 2

= 2.64

s = 1.62 Again we know, t =


+
1

= 1.62 =

2.1 0.69

8.911

1 2
1

1 1 + 10 12

with (1 + 2 2) degree of freedom

with (10 + 12 2) degree of freedom

with 20 degree of freedom

= -3.04 with 20 degree of freedom t = 3.04 with 20 degree of freedom t > 0 .05 0.05 = 1.7253.04 with 20 degree of freedom (Ans).

So, H o accepted. They are equally efficient.

Chi-square test ( 2- test) : 1. 2. 3. 4. 5. To test significance of population variance. To test of independence in a contingency table. To test goodness of fit. To test the equality of several correlation co-efficient. To test the equality of several variance.

To test significance of population variance Math 5 A Mechine produces 60,62,58,55,57,54,55,56,58,56, items respectively per day for 10 days. Test the hypothesis that the population variance is 4 ? Solve We know, 2 = =
(1) 2 2

H o = 2 = 4 At now

( )2 2

with (n-1) degree of freedom

with (n-1) degree of freedom

(n-1) 2 = xi
2

= 32659 = 54.9 2 =
54.9 4

(571 )2 10

( )2

with (10-1) degree of freedom

Answer : This value is rejected.

2 97.5% = 2.7 & 2 2.5% = 19.02

= 13.75 with 9 degree of freedom

To test of independence in a contingency table Math 6 Attacked 10 (a) 30 (c) 40 (a+c) Non- attacked 40 (b) 20 (d) 60 (b+d)

Inoculated Non- inoculeted

50 (a+b) 50 (c+d) 100 (a+b+c+d)

Where degree of freedom = 1 . Solve : H o = Inoculation does not have any effect 2 = 2 =
(+)(+)( +)(+) 100(2001200)2 50 40 60 50 ( )2

with 1 degree of freedom

with 1 degree of freedom

= 16.67 with 1 degree of freedom 2 0.05 = 3.841

Answer : Wheather a person be attacked by cholera depends wheather he or she is being inoculated or not. F Test : 1. Comparison of two independent variance. 2. To test the equality of several means.

Pie chart Systemic sampling Prove the equation AM HM =GM2 For two non zero positive number AM=5 HM=4. What is the value of GM=? Form Different discrete positive number. Find AM, GM and HM. Show a relationship between them. 6. Graph for median & mode: 7. The average of 10 observation is 40. One observation x is added and average is reduced to 38. One more observation y is added and the average is 42. Find the value of x and y. 8. Suppose the mean and variance of 20 observation are 32 and 25 respectively. One observation 36 is added now. Find the mean and variance 21 observation. 9. Relation between variance and S.D? 10. Determination variance and S.D of the numbers 27,28,29.Combined variance Q: Find two number for which mean and variance are 6 and 16? 11. Determination the variance for 1,2,3..n number. 12. Determination variance and S.D for (1-50) number. 13. Find out the relation of 2nd /4th /3rd raw & corrected moment 14. Proved that 1 1 or Proved that 2 = [0 1] 15. What is the importance of regression?
1. 2. 3. 4. 5. 16. Proved this 17. Math of co- rrletion and regression and rank co- efficient. 18. 2 = 0 the rank is same with example? 19. Importance of normal distribution 20. How can you find the mean and variance in the binomial distribution. Or, Mean is greater than the variance explain. 21. The probably of surviving a patient operating a delicate heart operation is 0.2. What is the probability that out of 8 person operating such heart operation. 22. How can you find the mean and variance in the poison distribution Or, Mean & variance is equal explain. 23. In a factory the probability to the defective for a product in 0.001. Out of 500 product What is the Probability of 10 product to the defective? 24. Two types of error, Null hypothesis (H) , Important steps in a test of significance, Normal test (z-test):[n>30], Null hypothesis (H) 25. Define range,variance,standard deviation,standard error,mean deviation from median/mean/mode 26. Why there is two types of moment?

=1

( 2 1)

6 2

27.To test for an assigned population mean : Company claims to have 125 hours life. 64 bulbs 127 hours life with S.D 4.8 Can we accept the claim ? 28.Comparison of two independent sample means : Drug-A : 80 patients mean 20 days S.D=3 Drug-B : 60 patients mean 22.5 days, S.D =5 Are these drugs equally effective ? 29.To test for an assumed population mean Random sample of 15, mean high of 66.4 inch, sd = 3.1 . Can you say that the mean heigh is 65 inch? 30.To test the equality of two independent sample means Worker A 10 days increases the production umits of 7,8,8,12,10,9,10,11,6,8; Worker B 12 days increases the production umits of 10,10,11,9,12,13,13,12,11,10,9,12; Are they equaly efficient ? 31.To test significance of population variance A Mechine produces 60,62,58,55,57,54,55,56,58,56, items respectively per day for 10 days. Test the hypothesis that the population variance is 4 ? 32.To test of independence in a contingency table Inoculated Non- inoculeted Attacked 10 (a) 30 (c) 40 (a+c) Non- attacked 40 (b) 20 (d) 60 (b+d) 50 (a+b) 50 (c+d) 100 (a+b+c+d)

Combined variance:

=
2

0 1 = 2 = )

2 2 1 1 + 2 2 (This 1 + 2

formula is applicable when 1 = 2 =

Median = L1 +
Where,

L1 = The lower limit of the median group. C = Class interval of median group. n = The total frequency. f m = The frequency of the median group. f c = The cumulative frequency of the group preceding the
1 0

fm

median group.

Mode = L + (

Why there is two types of moment? For any case we need to determine the corrected moment but there is a difficulty to found the corrected moment. Because if the mean of the data set got fractional value then the calculation will be difficult. In statistic we always try to avoid calculation and thats why to solve the problem we take another type of moment that is raw moment. In raw moment we take a random number which makes our calculation easier. Finally we made a relation between raw moment and Corrected moment Thats why there is two types of moment. Skewness: Skewness measures the lack of symmetry in a frequency distribution Pearson co-efficient =
.

1 0 )+(1 2 )

When the result is positive then it will be called positively skewed distribution. When the result is negative then it will be called negatively skewed distribution. When skewness is zero then it is called symmetrical skewed /distribution. Fig: positively skewed distribution. Fig: negatively skewed distribution. Fig: symmetrical distribution.

Kurtosis: It measures the peakness of the graph of a data set. It only found when skewness is zero. When, Peakness is high ,it is called Leptokurtic Peakness is medium, it is called mesokurtic Peakness is low, it is called platykurtic Parameters for finding the kurtosis:

1 = 3 3 2 =4 2

2
2 2

When,

2 = 2 - 3 represent the kurtosis.

1 =

2 2

When,

2 is greater than 3 then 2 will be positive and then it will be Laptokurtic. 2 is equal to 3 then 2 will be zero and then it will be mesokurtic.
R

When,

2 is less than 3 then 2 will be negative and then it will be platykurtic.

THANK YOU ALL

Vous aimerez peut-être aussi