Académique Documents
Professionnel Documents
Culture Documents
1st edition Md. Khaledur Rahman Bhuiyan,B.pharm(Running) Student Department of Pharmacy University of Asia Pacific Dhaka-1209,Bangladesh This is my first publication,so if there any problem then take it easily and contact with us. E-mail ID:bhuiyankhaled@gmail.com bhuiyankhaled@ymail.com Like us in facebook: http://www.facebook.com/pages/ThePharmacist/330151363706425
To My father for his uncompromising principles that guided my life. My mother for leading her children into intellectual. To My teachers: Abdul Mannan(Shere bangla nagar govt boys high school) Debabrata Kumar Sen(Mohhammadpur model college) Md. Tariqur Rahman(BCS 29th Batch,Police) Sahadat Bin Sayed(University of asia pacific) To Shere bangla nagar govt boys high school Mohhammadpur model college University of asia pacific
It is not a book which is written by me.It is a collection of math which will help the student to understand statistics. I would like to inform it is a short collection of statistical problem which will help especially the pharmacy student. I wish to express my thanks to many other persons who have helped in preparing this book, including my teacher Sahadat bin sayed in the department of pharmacy at the university of asia pacific who provided all this valuable solution. I am also grateful to Kaji nusrat jahan and Md.Rejaul karim and Md. Aktarujjaman khan for their excellent secretarial services.
Error Rate
1st Qtr
2nd Qtr
3rd Qtr
4th Qtr
Histogram
Histogram
What is it? A histogram is a bar graph representing the frequency of individual occurrences or classes of data. A histogram shows basic information about the data set, such as central location (mean, median, and mode), width of spread (range or standard deviation), and the shape. The purpose of making a histogram is to gain knowledge about the system. This knowledge, gained from the basic information given by the histogram (central location, spread, and shape), will act as a guide to improve the system. From a stable system, predictions can be made about the future performance of the system. If the system were unstable, it would change from time to time and the histogram would have little predictive value. The group uses a histogram to assess the systems current situation and to study results. The histograms shape and statistical information help us know how to improve the system. After an improvement action is carried out, the group continues to collect data and make histograms to see if the theory has worked.
106
Histogram
What does it look like? A completed histogram is shown below. An outpatient clinic patient health educator constructed this histogram using data from the X X-R chart for the Adult Asthmatic Patient Respiratory Capability. The X-R X chart showed the system to be unstable. The patient and care provider successfully identified special cause in the last four subgroups (the patient was out of town and forgot to take medications). Deleting the four subgroups occurring due to special cause, the educator used the remaining 23 subgroups to make this histogram. (See Step 9 in X X-R for the stable control chart using the 23 subgroups.)
24 22 20 18 16 14 12 10 8 6 4 2 0 420
Vertical Axis
F R E Q U E N C Y
434
448
462
476
490
Class
PQ Systems, Inc. Health Care
Horizontal Axis
107
Histogram
When is it used? Use a histogram when you can answer yes to both these questions: 1. Do you have a data set of related values, either attributes (counts) or variables data (measurement)? For analyzing system performance, single readings or individual data points are of limited value. Much more can be learned from a group of data points because they reflect the systems variation. Using a histogram is one way to start learning from a group of data points. Is it important to visualize central location, shape, and spread of the data? When it comes to data analysis, a picture is worth a thousand words. Seeing the form of the data makes it easier to understand the kind or pattern of variation the system is producing. How is it made? These steps assume that the data for the construction of the histogram has already been collected. The data can be collected especially to make a histogram or can come from the data entry section of a control chart. Once you have collected data for a control chart, that same data could be used to make a histogram. The data entry section of the control chart used for the example histogram is shown below.
2.
R CHART
Quality Measure
Chart No. Specification Limits N/A Unit of Measure 6-4 6-5 6-6
Asthma Care
Location Home 5-18 5-19 5-20 5-21 5-22 5-23
Process Respiratory Process Measurement Device Home Spirometer 5-24 5-25 5-26 5-27 5-28 5-29 5-30 5-31 6-1 6-2 6-3
SKM
5-17 1 430 460 450 2 420 480 470 3 440 470 470 4 5 1290 1410 1390
470
MOSM
6-7 6-8 6-9
SAMPLE
475 440 480 420 480 450 430 470 475 480 500 450 465 460 445 430 450 500 420 420 430 470 450 450 460 480 470 450 445 480 450 450 430 470 470 450 450 470 440 440 430 420 485 460 465 430 470 465 440 440 470 470 470 430 480 485 430 430 470 430 450 455 385
1430 1350 1395 1310 1430 1385 1320 1355 1425 1400 1540 1310 1415
1415
463.3 476.7 450.0 465.0 436.7 476.7 461.7 440.0 451.7 475.0 466.7 473.3 436.7 471.7 471.7 441.7
20 20
15
20 30
40
10
20
20 30
10
30 50
20 15
25
20
20 20
70
30
35
45
a. Determine the number of classes. To find the number of classes (or subdivisions) needed for the histogram, first count the number of data points in the data set. Then use the following table to choose the number of classes. As the table indicates, it is best to use no fewer than 5 classes (or subdivisions) or more than 20.
108
Histogram
There are 69 data points in the example, 23 subgroups of 3 observations each This table indicates between six and ten classes should be used for this many data points. Choose 6 for the example. The choice of the number of classes you want to use is only a rough estimate at this point. You can decide later to use more or fewer classes. b. Determine the class width and boundaries. The width of the class determines the range of data points in each class. Find the class width by dividing the range of the data set by the number of classes (found in Step a). The range is found by subtracting the smallest value in the data set from the largest. Range = X highest - X lowest In this example, the highest value in the data set is 500 and the lowest is 420. So the range is: Range = 500 - 420 = 80 The class width for the example is:
Round the class width to an easy number to work with. In the example, we rounded 13.33 to 14. Next, select a starting number for the lower boundary of the first class. The lower boundary should be chosen so the lowest value in the data set is included in the first class. A convenient lower boundary for the example is 420, since the lowest value in the data set is 420.
PQ Systems, Inc. Health Care
109
Histogram
To determine the lower boundaries for the remaining classes, begin with the lower boundary of the first class and add the class width. Continue adding class width until the number of classes is complete and all the data has been included. The lower class boundaries for this example are: 420 + 14 = 434 434 + 14 = 448 448 + 14 = 462 462 + 14 = 476 476 + 14 = 490 490 + 14 = 504 In some cases, an extra class may need to be added so the highest data point will be included. The upper boundary for each class is any number under or below the lower class boundary of the next class. For example, the upper class boundary for the first class is under 434. This means that any number greater than or equal to 420 but less than 434 falls into the first class. This is done so that no point will fall on the boundary between two classes. The classes for the example are: 420 to under 434 434 to under 448 448 to under 462 462 to under 476 476 to under 490 490 to under 504
110
Histogram
The easiest way to record the data is to create a check sheet listing the classes along the left side with space to the right to make tally marks. To record the data, make a tally mark beside the class in which each data point falls. Total the number of marks in each class. Shown below is the completed check sheet for the example.
CLASSES 420 UNDER 434 434 UNDER 448 448 UNDER 462 462 UNDER 476 476 UNDER 490 490 UNDER 504 |||| |||| ||| |||| ||||
TALLY
TOTAL 13 9 17 19 9
111
What is a Histogram?
A Histogram is a vertical bar chart that depicts the distribution of a set of data. Unlike Run Charts or Control Charts, which are discussed in other modules, a Histogram does not reflect process performance over time. It's helpful to think of a Histogram as being like a snapshot, while a Run Chart or Control Chart is more like a movie (Viewgraph 1).
HISTOGRAM
What Is a Histogram?
100 80 60 40 20 0 0 5 10 15 20 25 30 35 40 45 50 55 60
A bar graph that shows the distribution of data A snapshot of data taken from a process
HISTOGRAM
VIEWGRAPH 1
HISTOGRAM
VIEWGRAPH 2
HISTOGRAM
HISTOGRAM
Parts of a Histogram
DAYS OF OPERATION PRIOR TO FAILURE FOR AN HF RECEIVER
F R E Q U E N C Y
100 80 60
3
40 20 0 0 5 10 15 20 25 30 35 40 45 50 55 60
DAYS OF OPERATION MEAN TIME BETWEEN FAILURE (IN DAYS) FOR R-1051 HF RECEIVER Data taken at SIMA, Pearl Harbor, 15 May - 15 July 94
HISTOGRAM
HISTOGRAM
HISTOGRAM
Constructing a Histogram
Step 1 - Count number of data points Step 2 - Summarize on a tally sheet Step 3 - Compute the range Step 4 - Determine number of intervals Step 5 - Compute interval width
HISTOGRAM VIEWGRAPH 4
Constructing a Histogram
Step 6 - Determine interval starting points Step 7 - Count number of points in each interval Step 8 - Plot the data Step 9 - Add title and legend
HISTOGRAM VIEWGRAPH 5
HISTOGRAM
Step 1 - Count the total num ber of data points you have listed. Suppose your team collected data on the miss distance for the gunnery exercise described in the example. The data you collected was for the fall of shot both long and short of the target. The data are displayed in Viewgraph 6. Simply counting the total number of entries in the data set completes this step. In this example, there are 135 data points. Step 2 - Sum m arize your data on a tally sheet. You need to summarize your data to make it easy to interpret. You can do this by constructing a tally sheet. First, identify all the different values found in Viewgraph 6 (-160, -010. . .030, 220, etc.). Organize these values from smallest to largest (-180, -120. . .380, 410). Then, make a tally mark next to the value every time that value is present in the data set. Alternatively, simply count the number of times each value is present in the data set and enter that number next to the value, as shown in Viewgraph 7. This tally helped us organize 135 mixed numbers into a ranked sequence of 51 values. Moreover, we can see very easily the number of times that each value appeared in the data set. This data can be summarized even further by forming intervals of values.
HISTOGRAM
How to Construct a Histogram Step 1 - Count the total number of data points
Number of yards long (+ data) and yards short (- data) that a gun crew missed its target. -180 - 10 -130 260 160 210 50 140 210 -30 300 110 260 110 30 30 220 190 180 40 20 220 130 80 260 -30 70 130 190 60 170 -100 240 70 30 - 40 350 270 20 50 100 120 380 230 130 150 260 - 70 280 290 250 320 40 240 140 30 330 90 - 50 210 - 20 250 410 90 - 20 30 - 20 180 80 70 140 120 - 80 140 - 80 360 70 100 230 240 250 50 190 160 10 180 -130 30 120 - 10 - 30 180 120 310 130 100 270 50 100 130 80 - 60 20 340 130 100 40 200 270 10 250 110 150 240 - 30 130 20 - 30 20 200 280 140 - 90 180 200 370 130 200 170 80 210 70 190 60 80
TOTAL = 135
HISTOGRAM VIEWGRAPH 6
HISTOGRAM
VIEWGRAPH 7
HISTOGRAM
Step 3 - Com pute the range for the data set. Compute the range by subtracting the smallest value in the data set from the largest value. The range represents the extent of the measurement scale covered by the data; it is always a positive number. The range for the data in Viewgraph 8 is 590 yards. This number is obtained by subtracting -180 from +410. The mathematical operation broken down in Viewgraph 8 is: +410 - (-180) = 410 + 180 = 590 Remember that when you subtract a negative (-) number from another number it becomes a positive number. Step 4 - Determ ine the num ber of intervals required. The number of intervals influences the pattern, shape, or spread of your Histogram. Use the following table (Viewgraph 9) to determine how many intervals (or bars on the bar graph) you should use. If you have this many data points: Less than 50 50 to 99 100 to 250 More than 250 Use this number of intervals: 5 to 7 6 to 10 7 to 12 10 to 20
10
HISTOGRAM
How to Construct a Histogram Step 3 - Compute the range for the data set
Largest value
Smallest value = - 180 yards short of target Range of values = 590 yards Calculation: + 410 - (- 180) = 410 + 180 = 590
HISTOGRAM VIEWGRAPH 8
HISTOGRAM
VIEWGRAPH 9
HISTOGRAM
11
Step 5 - Com pute the interval width. To compute the interval width (Viewgraph 10), divide the range (590) by the number of intervals (10). When computing the interval width, you should round the data up to the next higher whole number to come up with values that are convenient to use. For example, if the range of data is 17, and you have decided to use 9 intervals, then your interval width is 1.88. You can round this up to 2. In this example, you divide 590 yards by 10 intervals, which gives an interval width of 59. This means that the length of every interval is going to be 59 yards. To facilitate later calculations, it is best to round off the value representing the width of the intervals. In this case, we will use 60, rather than 59, as the interval width. Step 6 - Determ ine the starting point for each interval. Use the smallest data point in your measurements as the starting point of the first interval. The starting point for the second interval is the sum of the smallest data point and the interval width. For example, if the smallest data point is -180, and the interval width is 60, the starting point for the second interval is -120. Follow this procedure (Viewgraph 11) to determine all of the starting points (-180 + 60 = -120; -120 + 60 = -60; etc.). Step 7 - Count the num ber of points that fall within each interval. These are the data points that are equal to or greater than the starting value and less than the ending value (also illustrated in Viewgraph 11). For example, if the first interval begins with -180 and ends with -120, all data points that are equal to or greater than -180, but still less than -120, will be counted in the first interval. Keep in mind that EACH DATA POINT can appear in only one interval.
12
HISTOGRAM
HISTOGRAM
VIEWGRAPH 10
INTERVAL WIDTH
60 60 60 60 60 60 60 60 60 60
ENDING VALUE
-120 -060 000 060 120 180 240 300 360 420
NUMBER OF COUNTS
3 5 13 20 22 24 20 18 6 4
HISTOGRAM
13
Step 8 - Plot the data. A more precise and refined picture comes into view once you plot your data (Viewgraph 12). You bring all of the previous steps together when you construct the graph. ! The horizontal scale across the bottom of the graph contains the intervals that were calculated previously. ! The vertical scale contains the count or frequency of observations within each of the intervals. ! A bar is drawn for the height of each interval. The bars look like columns. ! The height is determined by the number of observations or percentage of the total observations for each of the intervals. ! The Histogram may not be perfectly symmetrical. Variations will occur. Ask yourself whether the picture is reasonable and logical, but be careful not to let your preconceived ideas influence your decisions unfairly. Step 9 - Add the title and legend. A title and a legend provide the who, what, when, where, and why (also illustrated in Viewgraph 12) that are important for understanding and interpreting the data. This additional information documents the nature of the data, where it came from, and when it was collected. The legend may include such things as the sample size, the dates and times involved, who collected the data, and identifiable equipment or work groups. It is important to include any information that helps clarify what the data describes.
14
HISTOGRAM
How to Construct a Histogram Step 8 - Plot the data Step 9 - Add the title and legend
MISS DISTANCE FOR MK 75 GUN TEST FIRING
S H O T C O U N T
25 20 15 10 5 0
MISSES
HITS MISSES
-180
-120
-060
000
060
120
180
240
300
360
420
YARDS SHORT
YARDS LONG
TARGET
LEGEND: USS CROMMELIN (FFG-37), PACIFIC MISSILE FIRING RANGE, 135 BL&P ROUNDS/MOUNT 31, 25 JUNE 94
HISTOGRAM
VIEWGRAPH 12
HISTOGRAM
15
16
HISTOGRAM
Interpreting Histograms
Target
Target
Target
HISTOGRAM
Target
VIEWGRAPH 13
Interpreting Histograms
LSL
Target
USL
LSL
Target
USL
HISTOGRAM
17
Portraying your data in a Histogram enables you to check rapidly on the number, or the percentage, of defects produced during the time you collected data. But unless you know whether the process was stable (Viewgraph 15), you wont be able to predict whether future products will be within specification limits or determine a course of action to ensure that they are. A Histogram can show you whether or not your process is producing products or services that are within specification limits. To discover whether the process is stable, and to predict whether it can continue to produce within spec limits, you need to use a Control Chart (see the Control Chart module). Only after you have discovered whether your process is in or out of control can you determine an appropriate course of actionto eliminate special causes of variation, or to make fundamental changes to your process. There are times when a Histogram may look unusual to you. It might have more than one peak, be discontinued, or be skewed, with one tail longer than the other, as shown in Viewgraph 16. In these circumstances, the people involved in the process should ask themselves whether it really is unusual. The Histogram may not be symmetrical, but you may find out that it should look the way it does. On the other hand, the shape may show you that something is wrong, that data from several sources were mixed, for example, or different measurement devices were used, or operational definitions weren't applied. What is really important here is to avoid jumping to conclusions without properly examining the alternatives.
18
HISTOGRAM
Interpreting Histograms
Process Variation
Day 1 Day 2
Target
Target
HISTOGRAM
Target
VIEWGRAPH 15
Interpreting Histograms
HISTOGRAM
19
20
HISTOGRAM
TOTAL NUMBER =
HISTOGRAM VIEWGRAPH 17
HISTOGRAM
VIEWGRAPH 18
HISTOGRAM
21
= =
_______________ _______________
HISTOGRAM
VIEWGRAPH 19
HISTOGRAM
VIEWGRAPH 20
22
HISTOGRAM
Interval Width
HISTOGRAM
VIEWGRAPH 21
WORKSHEET
Step 6 - Determine the starting point of each interval Step 7 - Count the number of points in each interval
INTERVAL STARTING INTERVAL NUMBER VALUE WIDTH 1 2 3 4 5 6 7 8 9 10
HISTOGRAM VIEWGRAPH 22
HISTOGRAM
23
WORKSHEET
Step 8 - Plot the data Step 9 - Add title and legend
HISTOGRAM
VIEWGRAPH 23
24
HISTOGRAM
TOTAL = 80
HISTOGRAM VIEWGRAPH 24
HISTOGRAM
VIEWGRAPH 25
HISTOGRAM
25
EXERCISE 1 ANSWER KEY Step 3 - Compute the range for the data set
Largest value Smallest value = = 32 Percent body fat 4 Percent body fat
HISTOGRAM
VIEWGRAPH 26
HISTOGRAM
VIEWGRAPH 27
26
HISTOGRAM
Interval Width
HISTOGRAM
VIEWGRAPH 28
HISTOGRAM
27
EXERCISE 1 ANSWER KEY Step 8 - Plot the data Step 9 - Add title and legend
JUNE 94 PRT PERCENT BODY FAT
SATISFACTORY % BODY FAT 20 18
NO. OF PERSONNEL
16 14 12 10 8 6 4 2 0 0 4 8 12 16 20 24 28 32 36
28
HISTOGRAM
EXERCISE 2: The source of data for the second exercise is the following scenario. A listing of the data collected follows this description. Use the blank worksheets in Viewgraphs 17 through 23 to do this exercise. You will find answer keys in Viewgraphs 31 through 37. A Marine Corps small arms instructor was performing an analysis of 9 mm pistol marksmanship scores to improve training methods. For every class of 25, the instructor recorded the scores for each student who occupied the first four firing positions at the small arms range. The instructor then averaged the scores for each class, maintaining a database on 105 classes. These are the data collected: AVERAGE SMALL ARMS SCORES 160 175 270 180 255 255 230 195 220 210 220 190 190 265 245 180 245 255 235 215 240 230 155 210 255 270 260 210 235 230 225 215 225 300 225 235 200 240 225 195 215 250 230 215 280 275 170 200 245 225 220 225 220 220 225 185 240 175 220 170 235 210 235 245 225 250 170 185 265 205 230 235 225 195 200 285 185 195 270 260 230 240 200 235 235 200 215 200 250 215 195 200 245 225 215 165 220 260 230 185 225 220 230 230 240
HISTOGRAM
29
TOTAL = 105
HISTOGRAM VIEWGRAPH 31
155 160 165 170 175 180 185 190 195 200
HISTOGRAM
1 1 1 3 2 2 4 2 5 7
205 210 215 220 225 230 235 240 245 250
1 4 7 8 11 9 8 5 5 3
255 260 265 270 275 280 285 290 295 300
4 3 2 3 1 1 1 0 0 1
VIEWGRAPH 32
30
HISTOGRAM
EXERCISE 2 ANSWER KEY Step 3 - Compute the range for the data set
= =
HISTOGRAM
VIEWGRAPH 33
HISTOGRAM
VIEWGRAPH 34
HISTOGRAM
31
Interval Width
HISTOGRAM
VIEWGRAPH 35
32
HISTOGRAM
EXERCISE 2 ANSWER KEY Step 8 - Plot the data Step 9 - Add title and legend
MARKSMANSHIP SCORES FOR 9mm PISTOL
NO. OF PERSONNEL
30 25 20 15 10 5 0 155 170 185 200 215 230 245 260 275 290 300
SCORES
LEGEND: MCBH KANEOHE BAY, HI; AVERAGE OF 4 SCORES PER CLASS, 105 CLASSES, 1 JUNE 94 - 15 JULY 94
HISTOGRAM
VIEWGRAPH 37
HISTOGRAM
33
REFERENCES :
1. Brassard, M. (1988). The Memory Jogger, A Pocket Guide of Tools for Continuous Improvement, pp. 36 - 43. Methuen, MA: GOAL/QPC. 2. Department of the Navy (November 1992), Fundamentals of Total Quality Leadership (Instructor Guide), pp. 6-44 - 6-47. San Diego, CA: Navy Personnel Research and Development Center. 3. Department of the Navy (September 1993). Systems Approach to Process Improvement (Instructor Guide), pp. 10-17 - 10-38. San Diego, CA: OUSN Total Quality Leadership Office and Navy Personnel Research and Development Center. 4. Naval Medical Quality Institute (Undated). Total Quality Leader's Course (Student Guide), pp. U-26 - U-28. Bethesda, MD.
34
HISTOGRAM
Graphics Commands
PIE CHART
PIE CHART
PURPOSE
Generates a pie chart.
DESCRIPTION
A pie chart is a graphical data analysis technique for summarizing the distributional information of a variable. It is a circular plot consisting of wedges where the size of each wedge is proportional to the frequency (= number of observations) in that wedge. The plot is to be read clockwise (where the rst wedge is at 9 oclock). If a single variable is specied, DATAPLOT divides the values into frequency classes in the same manner as for a histogram. The histogram and the pie chart have the same information except the histogram has bars at the data values (where the height of the bar is proportional to the number of observations in the class), whereas the pie chart has wedges (where the area of the wedge is proportional to the number of observations in the class). If two variables are specied, the rst variable contains pre-computed frequencies and the second variable is a group identier. This second form is more commonly used.
SYNTAX 1
PIE CHART <x> <SUBSET/EXCEPT/FOR qualication> where <x> is the variable of raw data values; and where the <SUBSET/EXCEPT/FOR qualication> is optional. This syntax is used when you have raw data only.
SYNTAX 2
PIE CHART <y> <x> <SUBSET/EXCEPT/FOR qualication> where <y> is the variable of pre-computed frequencies; <x> is the variable of group identiers; and where the <SUBSET/EXCEPT/FOR qualication> is optional. This syntax is used when you have pre-computed frequencies at each data value.
EXAMPLES
PIE CHART X PIE CHART TEMP SUBSET TEMP > 0 PIE CHART F X SUBSET X > 2 PIE CHART COUNTS STATE
NOTE 1
Each wedge is drawn with a common set of attributes. The attributes of the wedge borders are set with the LINE, LINE COLOR, and LINE THICKNESS commands (typically they are all set the same). The attributes of the interior are set with the various REGION commands. Any labels for the wedges must be set with the LEGEND or TEXT commands. The CROSS HAIR command can help in positioning labels. The program example below shows how to set the attributes. DATAPLOT does not support features such as 3d pie charts or exploding slices that are common in many business graphics programs.
NOTE 2
Although pie charts are popular in business graphics, they are generally a poor graphics technique. See the book listed in the REFERENCE section below for more information.
NOTE 3
For the one variable form of the command, DATAPLOT uses a class width of 0.3 times the standard deviation of the variable. Use the CLASS WIDTH to override this default. DATAPLOT also tends to generate a large number of zero frequency classes at the lower and upper tails. The CLASS LOWER and CLASS UPPER commands can be used to set lower and upper limits for the classes.
DEFAULT
None
SYNONYMS
None
2-167
PIE CHART
Graphics Commands
RELATED COMMANDS
HISTOGRAM FREQUENCY PLOT PERCENT POINT PLOT PLOT CLASS LOWER CLASS UPPER CLASS WIDTH LINE LINE COLOR LINE THICKNESS REGION FILL = = = = = = = = = = = Generates a histogram. Generates a frequency plot. Generates a percent point plot. Generates a plot (including bar plots). Sets the lower class minimum for histograms, frequency plots, and pie charts. Sets the upper class maximum for histograms, frequency plots, and pie charts. Sets the class width for histograms, frequency plots, and pie charts. Sets the types for plot lines. Sets the colors for plot lines. Sets the thicknesses for plot lines. Sets the on/off switches for region lls.
REFERENCE
The Elements of Graphing Data, William Cleveland, Wadsworth, 1985 (p. 264).
APPLICATIONS
Business Graphics
IMPLEMENTATION DATE
The ability to set the attributes of the pie wedges was implemented 93/11.
2-168
Graphics Commands
PIE CHART
PROGRAM
LET X = DATA 81 82 83 84 85 LET Y = DATA 2 5 9 15 28 MULTIPLOT 2 2; MULTIPLOT CORNER COORDINATES 0 0 100 100 X1LABEL SALES IN MILLIONS OF DOLLARS . LINE THICKNESS .3 ALL; TITLE PIE CHART WITH THICKER LINES PIE CHART Y X . REGION FILL ON ALL; REGION PATTERN COLOR G10 G30 G50 G70 G90 REGION FILL COLOR G10 G30 G50 G70 G90 TITLE PIE CHART WITH SOLID FILL SLICES PIE CHART Y X . TITLE PIE CHART WITH LABELS LET N = SIZE X LEGEND SIZE 3 LOOP FOR K = 1 1 N LET A = X(K) LEGEND ^K 19^A END OF LOOP LEGEND 1 COORDINATES 8 58; LEGEND 2 COORDINATES 10 71; LEGEND 3 COORDINATES 28 92 LEGEND 4 COORDINATES 68 77; LEGEND 5 COORDINATES 67 30 PIE CHART Y X . REGION PATTERN COLOR BLACK ALL; REGION PATTERN D1 D2 D1D2 VERT HORI REGION PATTERN SPACING 1.0 1.0 3.0 4.0 5.0; REGION PATTERN LINE SOLID SOLID SOLID DASH DOT TITLE PIE CHART WITH HATCH PATTERN FILLS PIE CHART Y X MULTIPLOT OFF
1981
1981
1985
1985
2-169
STATISTICS
Statistics: The word statistics in the first sense is defined by professor secret as follow: By statistics we mean aggregate of facts affected to a marked extend by multiplicity of cause, numerically expressed, enumerated or estimated according to reasonable standard of accuracy, collected in a systematic manner for a predetermined purpose and placed in relation to each other. A L Bowley has given three definitions: Statistics may be called the science of counting. Statistics may be called the science of average. Statistics is the science of the measurement of social organism as a whole in all its manifestation. In another word It deals with data, Which can collected, documented, analyzed and interpidation. Sampling :Sampling involves the selection of a number of study units from a defined study population.A study population may consist of individual village, institutions, records etc. Sampling method : There are two types of sampling method. 1.Probability sampling Random sampling Stratified sampling Systemic sampling Cluster Sampling Multistage Sampling
Probability sampling: probability sampling involves random selection procedure to ensure that each sample unit is chosen on the basic of change. Systemic sampling: In systemic sampling a group of people are selected in a systemically random manner from a complete list of a given population. A systemic sampling is applied where very large numbers are included in the target population. For example : 1. Class Interval : To calculate the class interval we have to divide desired the population by sample size. Example : If we want to select 15 universities from a list of 40 in our sampling frame. class interval = =2.6 2. Random : Random number should be selected from 0 1, such as 0.178 By multiplying random number with class interval, we get a fraction and the next interval to the fractional value will be facility-1. Random Class interval = 0.178 2.6 = 0.463 (this is facility -1) Summation of class interval with facility-1, we get a fraction and the next interval to the fractional value will be facility-4. Facility -1 + Class interval = 0.463 +2.6 =3.063 (facility -4) Summation of class interval with facility-4, we get a fraction and the next interval to the fractional value will be facility-6. Facility- 4 + class interval =3.063 + 2.6 =5.663 (facility- 6) And so forth
40 50
CENTRAL TENDENCY
There are 3 types of central tendency ; a) Mean b) Median c) Mode
It is 3 types : 1) Arithmetic Mean 2) Harmonic Mean 3) Geometric Mean 1) Arithmetic Mean : It is total number of data divided total number of frequency . For example : 1, 2, 3, 25, 9 AM = =8 2) Harmonic Mean : Harmonic mean is the reciprocal of the mean of the reciprocal of nonzero data. For example 1, 2, 3, 9,25 Step -1 : =
5 5 1+ 0.5+0.33+0.11+0.4 1.98 5 5
1 1
1+2+3+25+9 5
+ + + +
1 2
1 1 1 3 9 25
Step 2 :
= 2.53
3
1.98
3) Geometric Mean :Geometric mean is defined as the nth positive root of the product of the non zero, non negative values. For example : 1, 2, 3, 9, 25 GM = 1 2 3 25 9 5 = 1350
5
= 4.23
Question: Prove the equation AM HM =GM2 Answer: It is possible only two non zero positive number. Suppose, one non zero positive number = a other non zero positive number = b
+ + 2 2 2
1 + +
+ 2
2 +
= = =
ab = ab
Question: For two non zero positive number AM=5 HM=4. What is the value of GM=? Answer: Suppose, Two non zero positive number a and b. a+b = 10
1 1 1 +
+ 2
a+b = 10(i) + = 4
+ 2 2 2
= 5
=4
=4
(a-b)2
10
= 2
(i)+(ii)
ab = 20 = 20
= 4.47..(iii)
From (i)
a =7.2
Table 1: Different discrete positive number. Find AM, GM and HM. Show a relationship between them. For example:1,2,3,4,5,6,7,8,9,10. AM = =
1+2+3+4+5+6+7+8+9+10 10
5.5
HM =Step-1: = =
1 1 1 1 1 1 1 1 1 1 + + + + + + + + + 1 2 3 4 5 6 7 8 9 10
2.928 10
1+0.5+0.33+0.25+0.2+0.167+0.143+0.125+0.11+0.1 10 10
10
Step-2: =
10
GM = 1 2 3 4 5 6 7 8 9 10 = 3.418 = 362880
10
2.928
=4.52
AM >GM >HM AMHM =5.53.418 =18.8 GM2 = (4.52)2 = 20.52 AMHM GM2
If the relation is AM GM HM [a=5, b=5, c=5 non zero and non equal]
87 30
=2.9 Answer: 2.9 Mean for grouped data: Height(m) 150-155 155-160 160-165 165-170 170-175 175-180 Middle value(x) 152.5 157.5 162.5 167.5 172.5 177.5
50
Mean = =
8225
Graph for median & mode: Weekly wages (taka) 75-85 85-95 95-105 105-115 115-125 125-135 135-145 145-155
2
No. of workers 14 18 30 45 52 45 20 6
fm
Median = L 1 + Where,
L 1 = The lower limit of the median group. C = Class interval of median group. n = The total frequency. f m = The frequency of the median group. f c = The cumulative frequency of the group preceding the median group.
Median
= 115+
= 116.53
8
52
107
10
Mode = L+ Where,
1+2
L = The lower limit of the modal group. C = Class interval of Modal class.
R
Here,
L = 115 C = 10 2 = 7
R R
1 = 7
Mode = 115+
7+7
10 = 120
10
Median: When all the observation of a set of data are arranged in either ascending or descending order, the middle observation is known as median. If the numbers of observation is even, the mean of the two central value is taken as median. Median = the middle value of a set of data. For ungrouped data, median =(
P
+1 th ) 2
M.V 7 11 15 19 23 27
2
Frequency 12 8 15 19 14 7 N =75
C.F 12 20 35 54 68 75
Median: = L + = 17 + =17 +
c 4
2.5 4 19
75 35 2
19
=17 +0.526 =17.526 Here, L=lower limit of median class. =c.f of the class just preceding the median class. N=Total number of observation.
11
Mode: The mode is the value of a data set that occurs most frequently. It is the typical or commonly observed value which occurs maximum number of times. Mode = L + Weight 35-40 40-45 45-50 50-55 55-60 60-65 65-70 M.V 37.5 42.5 47.5 52.5 57.5 62.5 67.5
(1 0 )+(1 2 ) 1 0
C.I
Mode = L + ( =45 +
=45 +
=45+4.17 = 49.17
(5+1)
105 5 (105)+(109) 5
1 0 )+(1 2 )
1 0
12
Determination of Mean, Median, Mode, M.D Mean, M.D mode, M.D Median, Variance.(Grouped data) Class M.V Frequ(xi) ency (fi) 15 5 25 35 45 55 10 15 12 13 N=55 Mean = C.F fixi 23. 27 13. 27 3.2 7 6.7 3 16. 73 fixi 116. 35 132. 70 49.0 5 80.7 6 217. 49 xi =5 96.3 5
xifixiximedian median mode 23.33 13.33 3.33 6.67 16.67 116.65 133.30 49.95 80.04 216.17 fiximedian =596.65 21.25 11.25 1.25 8.75 18.75
5 15 30 42 55
=38.27
Median = L + 2
c
15
=30 +
55 15 2
10
=30+6.25 =36.25
1510 (1510)+(1512)
1 0 )+(1 2 )
1 0
10
13
596.35 55
596.65 55
176.09 1760.9
10.69 160.35
45.29 543.48
279.89 3638.61
Mean variance 2 = =
8810 .79 55
355991.222 55
=6472.567
14
Mode variance 2 = =
2
343689 .062 55
=6248.892 Determination of M.D mean ,M.D median and M.D mode from ungrouped data. Score (xi) 1 3 5 7 Frequency (fi) 5 7 10 8 N =30 fixi xi- xi-median 14 12 10 8 ximedian=44 xi-mode 9 7 5 3 ximode=24
5 21 50 56 fixi =132
Mean = =
132 30
Median =
30+1 2
=4.4 ;
30+2 2
30
= 0.267 =
24 30 44 30
=1.47
= 0.8
15
15 5
=3
Variance 2 =
10 5
=2
16
Dispersion deals with how value is scattered in a set of data. Dispersion is small if the values are closely branched about their mean and it is large if the values are scattered widely about their mean. There are three important measures: Range Variance Standard deviation Co-efficient of variation (CV) Standard error
DISPERSION
Range : Range is the absolute difference between the highest value and the lowest value in a series of observations. Range : largest value smallest value For example: 1,10,10,20,30,40 Range: 40-1 =39 Variance : The mean of the squares of the deviations of every observation from their mean is a measure of spread and is called variance. Formula for grouped data : 2 = =
2
Standard deviation : The standard deviation is the square root of the variance. S.D = =
17
Co-efficient of variation (CV) : The Co-efficient of variation provides a relative measure of data dispersion compared to the mean. C.V =
, ,
100%
Standard error: The standard error is the standard deviation of the sampling distribution of the sample statistic. S.E =
.
1.The average of 10 observation is 40. One observation x is added and average is reduced to 38. One more observation y is added and the average is 42. Find the value of x and y. We know, xi = xn = 4010 1 = 1 1 =400 Here, n =10 x =40
Here,
=3811
1= 10+1 = 11 1=38
400+x = 418
x = 418-400 =18
Here,
2=12 2=42
1. Suppose the mean and variance of 20 observation are 32 and 25 respectively. One observation 36 is added now. Find the mean and variance 21 observation. We know, x=
+36
21
= =
(2032)+36 640+36 21 21
Here, 1=20
1=32
Or combined mean =
= 32.19
1 1 + 2 2 1 + 2 20+1
12=25 2= ?
3220+361
2 2 1 1 + 2 2
1 + 2
Question: What are the relation between M.D mean and S.D?
when
M.D mean = =
2 1 1 2 +
+ + 1 1 2 + 2 1 2 2
1 + 2
2
2 2 1 2 2
19
1 2 + 2 1 2
= =
S.D = =
1 2 2
1 2 + 1 2 2 2
= = =
1 =
2
( 1 )2 +( 2 )2
1+ 2 2 2 2
2
2
2 1 1
+ 2
+ 2 2
2
1+ 2 2 2 2
2 2 1
1 2 2 2 + 2 1 2 2 2 2
.( 1 2 ) =4
2
1 2 + 1 2
1 2 2
( 1 2 )2 4
M. D. Mean = S.D.
20
Q: Relation between variance and S.D? Answer: Suppose three numbers are a,b,c. 2
=
= =
2 + 2 + 2 3 3
2 2
3 2 +3 2 +3 2 2 2 2 2 22 2 2 +2 2 +2 2 2 22 9 9 9
2 + 2 + 2
- -
++ 2 3
( 2 + 2 + 2 +2 +2 +2) 9
2 =
= ( )2 + ( )2 + ( )2
1 3
()2 +()2 +( )2 9
2 2 + 2 + 2 2 + 2 +( 2 2 + 2 )
= (1+1+4) = = S.D =
2 3
Answer:
Answer: 0.816
21
Combined variance : 2 =
2 2 2 2 1 1 + 1 + 2 2 + 1 1 + 2
(1 = 1 ) (2 = 2 )
2 = =
205+256 20+25
2 2 1 1 + 2 2
1 + 2
=5.5
Q: Find two number for which mean and variance are 6 and 16? Answer:
+ 2
a+b =12
2
2 2
= 6(i)
a = 12-b ..(ii)
2 + 2
72-2ab = 32 2ab = 40 ab = 20
22
(+)2 2 (12)2 2 2 2 2
2
2 2
+ 2
= 2
1442 72
- 36 = 16 = 16
- (6)2 = 16
= 16
b(12-b) = 20
12b- b2-20 = 0
b = 10 or b = 2
(b-10) (b-2) = 0
b2-12b+20 = 0
b2-10b-2b+20 =0
a = 12-2 = 10 (when b = 2) Answer: (a,b) = (2,10); (10,2) Question: Determination the variance for 1,2,3..n number. We know, Variance 2 =
2
2 =
=
= 1 + 2 + 3 ++ n. =
6
(+1)(2+1) (+1) 2
(+1)(2+1) 6
(+1)(2+1)
=
23
2+1 3
(+1) 2 2
(+1) 2 2
(+1) 2
= = =
(+1) (+1) 2 1 12 2 2
1 6
4+233
S.D () =
2 1 12
(50)2 1 12
24
ii)Raw moment:
( )
Mid value is substitute by a random number to found the raw moment. The equation for raw moment is :
() ()
Question: Why there is two types of moment Answer: For any case we need to determine the corrected moment but there is a difficulty to found the corrected moment. Because if the mean of the data set got fractional value then the calculation will be difficult. In statistic we always try to avoid calculation and thats why to solve the problem we take another type of moment that is raw moment. In raw moment we take a random number which makes our calculation easier. Finally we made a relation between raw moment and Corrected moment Thats why there is two types of moment.
25
Skewness: Skewness refers to lack of symmetry or departure from symmetry of distribution. Pearson co-efficient =
.
When the result is positive then it will be called positively skewed distribution. When the result is negative then it will be called negatively skewed distributio When skewness is zero then it is called symmetrical skewed /distribution.
Kartosis: It measures the peakness of the graph of a data set. It only found when skewness is zero. When, Peakness is high ,it is called Laptokurtic When, Peakness is medium, it is called mesokurtic When, Peakness is low, it is called platykurtic Parameters for finding the kartosis:
1 = 3 3 2 =4 2
2
2 2
1 =
2 = 2 - 3
2 2
When, 2 is greater than 3 then 2 will be positive and then it will be Laptokurtic.
R
When,
When,
( + )2
( )2
()2 2( ) 1 +1 2 ( )2
1 = x A
2 = 2 - 1 2
=2 -2 1 1 + 1 2
( )
1 +
1 2
( + )3
( )3
( )3 3( )2 1 +3( )1 2 1 ( )3
1 = x A
( )2
3 = 3 3 1 2 + 21 3
= 3 3 1 2 + 21 3
= 3 3 1 2 + 3 1 1 2 1 3
1 +
( )
1 2
28
( + )4
( )4
= 4 4 1 3 + 6 2 1 2 - 4 1 1 3 + 1 4 = 4 4 1 3 + 6 2 1 2 - 31 4
( )4
( )4 4( )3 1 +6( )2 1 2 4( )1 3 1 4
1 = x A
( )3
1 +
( )2
1 2 -
( )
1 3 +
1 4
29
Correlation is the element of going togetherness of two or more variables. Correlation can be defined as the probable tendency of two or more variables or begins of items to a vary together. It is also termed as co-variation. The primary objective of correlation analysis is to measure the strength or degree of the linear association between two or more variables. Correlation & co-efficient: Correlation coefficient is a quantitative measure of the direction & strength the of linear relationship between two numerically measured variables. The coefficient of correlation between two variables is denoted by r. R is known as product moment co-efficient of Co-relation. The formula is given below Correlation co-efficient =
(,)
( ) ( )
( )2 ( )2
( )( )
()2
()()
( ) 2 2 [ 2 ( ) ] [ 2 ( ) ]
62(x,x) =
() ()
62 y = ( )2 _______________(ii)
= (
At now, [
6
= + 2 0 = 1 0 = 2 (1 + 0)
2 ) 6
( ) 6
+ (
( ) 2 ] 6
0
6
v= 1;
31
v 1 + 0; 1 0
= 1 1 1 v
Regression : Regression is the nature of a relation. The regression analysis is a technique of studying the dependence of one variable (called dependent variable) or more variables (called explanatory variables) with a view of estimating or prediction of the values or fixed values of the independent variables. Regression is of three types
y
(i)
Linear
o x
(ii)
Cubi linear
o x o x
(iii)
Exponetion
Calculation of regression co-efficient 1) The regression line of yonx. yi = a + bxi + ei 2) The regression live of x on y xi = a+b`yi + ei; Here the, 2 regression co-efficients are b & b
32
Regression co-efficient: Regression co-efficient is the average change in one variable corresponding of the unit change in another. The two quantities b & b are called co-efficient of regression. The quantity b is the co-efficient of regression of y on x of b is the co-efficient of regression of x on y. (i) b=
() ( ) ()2
. ( )2
(ii)
b =
R.H.S = b = = ()2
. 2 2 ( )
() () ()2
()() ()2
()2 ( )2
( )()
{()( )}2
( )2
()() ( )2
= = b
So, this is the relationship between co-relations efficient and regression co-efficient.
33
What is the importance of regression? Ans: (i) Estimate the relationship that exists on the average between the dependent variable and the explanatory variables. (ii) Determine the efficient of each of the explanatory variables on the dependent variable controlling the effects of all others explanatory variables. (ii) Predict the value of the dependant variable for a given value of the explanatory variable. Rank correlation: Rank correlation method is applied when the rank order data are available or when each variable can be ranked in same order. Rank co-efficient Is a non-parametric counter part of the conventional correlation coefficient. Proved this: The geometric mean of two regressions co-efficient is equal to the correlation co-efficient = = b
Solve: we know
= b =
The geometric mean of two regression co-efficient of is equal to the correlation co-efficient. The measure based on rank co-rrelation method is known as rank co-rrelation co-efficient. It is denoted by the symbol
34
= b
( )
= = =
(+1)(2+1) 6 6
(+1)(2+1) 6
(+1)(2+1) 6
(i)
=(n+1) c
4+233 12 (1) 12 2 1
(2+1)
=(n+1) =
2 1 12
(+1)2 4 4
2 (+1)2 4 2
(+1) 2 2
(+1)
2 =
12 ( )2 = (2 1) ( )2 =
( 2 1) 12
12
2 1 12
..(ii)
.(iii)
35
We know,
di =( xi- ) - (yi- )
2 = [( xi ) (yi 2 )]
di = xi- + -yi
di = xi- + yi
di = xi yi
[xi =yi ; = ]
[xi =yi ; = ]
( xi- )(yi- = ( xi )2 2 )
) ( xi )2 (yi 2 ( xi )2 ) ( xi )(yi
)(yi- = ( ) ) ( xi- =
( xi )2
=
36
=1-
( xi )2
( xi )2
2 2 ) ( xi )(yi ( xi ) 2
) ( xi )(yi ( xi )2
1
[ since = ] =1 1 2 2 ( 2 1) 12
( 2 1)
6 2
( xi )2
1 2 2
=1 -
( xi )2
1 2 2
( )
Question -1: Math of co- rrletion and regression and rank sco- efficient. xi yi 10 8 12 9 15 xi=54 11 10 13 10 15 yi=59 100 64 144 81 225 2 = 614 121 100 169 100 225 2 = 715 xiyi R(x) 3 5 2 4 1 R(y) 3 4 2 4 1 110 80 156 90 225 xiyi= 661
n=5
di=R(x)R(y) 0 1 0 0 0 di=1
0 1 0 0 0 2 = 1
5459 5
614
661
( )2 ( ) 2 [ 2 ]
23.8
=0.98
661
23.8 30.8
( )2
661637.2 614583.2
=0.77
37
( )2 2
= = =
23.8
715696.2 18.8
715
23.8
23.8
592 5
r =
bb = 0.98
61 6 ( 2 1) 524 6 2
=1-
5(52 1)
=0.95
38
Question -2: Math of co- rrletion and regression and rank co- efficient. Math(x) Statistics(y) xy
80 85 82 78 71
81 80 84 76 70
(x)=396 (y)=391
1 4 1 0 0
2 =6
31474
31066
b = b
( )2 2
( )2 2
31066
31066
116.8
98.8
110 .8
98.8
=0.89
=0.81
39
=1=0.7
( 2 1)
6 2
Question-3 : = 0 the rank is same with example? Answer: Rank co- efficient is denoted by . = 1
( 2 1) 6 2
5(52 1)
66
Hear, di is the difference of Rank of (x) and Rank of (y). Math (x) Statistics(y) 1 2 3 4 1 2 3 4 R(x) 4 3 2 1 R(y) 4 3 2 1 di= R(x)R(y) 0 0 0 0
So, 2 = 0 .
di=0 2 =0
0 0 0 0
40
41
Binomial distribution :
1) P(x) = px qn-x . 2) Expectation E(x) = xP(x). = mean of the distribution. 3) Variance V(x) = E(x2) - [E(x)]2 and variance V(x) = E[x(x-1) + x]2 4) (a+b)n = 1 an + 2 a2b2 +.. 5) P+q = 1 [ if p=60%, q=40% then p+q =100% = 1] 6) 2 =
(1) 2!
(1) 2
Q. How can you find the mean and variance in the binomial distribution. Or, Mean is greater than the variance explain. Means and variance distribution : x P(x) x(x-1) 0 qn 0 1 npqn-1 0 2
(1) 2 n-2 pq 2!
(1)(2) 3 n-3 pq 3!
Here,
P(0) = 0 p0 qn-0 = qn
P(x) = px qn-x
(1)(2) p3qn-3 3!
p2qn-2
( 1)
= npqn-1 + n(n-1)p2qn-2 + = np[qn-1 + (n-1)p qn-2 + = np (p+q)n-1 = np (1)n-1 [p+q=1] = np Variance distribution : V(x) =
xi 2
(1)(2) 3 n-3 pq 2!
+3
= E(x2) [E(x)]2 = E[x(x-1) + x] - [E(x)]2 = E[x(x-1)] + E(x) - [E(x)]2.(i) Here, E[x(x-1)] = x(x-1) P(x) = 0 + 0 + 2
(1) 2 n-2 pq 2! (1)(2) 3 n-3 pq 3!
-(
xi
)2
+ 32
= n(n-1)p2qn-2 + n(n-1)(n-2)p3qn-3 + = n(n-1)p2 [qn-2 + (n-2)pqn-3 + = n(n-1) p2(p+q)n-2 = n(n-1) p2(1)n-2 = n(n-1) p2
(2)( 3) 2 n-4 pq 2!
(1)(2)(3) 4 n-4 pq 2!
+ 43
(1)(2)(3) 4 n-4 pq 4!
+.]
[ p+q= 1]
From equation no (i).. E[x(x-1)] + E(x) - [E(x)]2 = n(n-1) p2 + np n2p2 = np np2 = np(1-p) [ p+q=1] = npq So, From Binomial distribution . Mean > Variance. Math -1: The probably of surviving a patient operating a delicate heart operation is 0.2. What is the probability that out of 8 person operating such heart operation. a) At least one will survive. b) Exactly two will survive. c) All will survive. a) 1 - P(0) = 1 - 0 p0 qn = 1 80 (12)0 (0.8)8 = 0.83 b) P(2) = 82 p2 q8-2 = 82 (0.2)2 (0.8)6 = 0.29 c) P(8) = 88 p8 q8-8 = 88 (0.2)8 1 = 2.56 10-6
P
e-
e-
e-
2!
23 3
e-
3!
34 4
e-
4!
Here,
P(x) = e- P(0) =
P(1) = e P(2) =
! 0 e- 0! 1 - 1! 2 - e 2! 3 - 3! 4 - 4!
= e- = e-
P(3) = e P(4) = e
=0+e +e
-
= e- (1 + + + 2! - =e .e = e0 = =np [np =]
2! 2
+ e
P
3!
3!
+e
4!
+.)
4!
+..
Math -2: In a factory the probability to the defective for a product in 0.001. Out of 500 product What is the Probability of 10 product to the defective? Here, n = 500 P = 0.001 x = 10. = 500 0.001 = 0.5
10 -0.5 (0.5)
Variance of the distribution : V(x) = E[x(x-1)] + E(x) - [E(x)]2 E[x(x-1)] = [x(x-1)] P(x) = 0 + 0 + 2 e
-
2!
+ 23 e
4!
+..
3!
+ 34 e
4!
+.)
= 2 = n2p2 [np =] V(x) = n2p2 + np - n2p2 = np = So, From poison distribution Mean = Variance .
Parametric test:It may be emphasize here that the statistical tests to be discussed tests, which are primarily based on the assumptions on the forms of population distributions.
Non parametric test:The discussion on non parametric tests that do not require rigorous assumptions about the populations.
Terminology: Null hypothesis (H) Alternative hypothesis Test statistic Critical region Significant region () Acceptance region Test of hypothesis
Null hypothesis (H):Null hypothesis is a statement , which tells us that no difference exists between the parameter and the statistics being compared to it.Between the rates of prevalence of malnutrition between the male and female children is an example of null hypothesis.
Alternative hypothesis:The alternative hypothesis is the logical opposite of the null hypothesis. The rejection of a null hypothesis leads to the acceptance of the alternative hypothesis. The alternative hypothesis again the null hypothesis stated above may be formulated as, there exists significant difference in the population between the rate of prevalence of malnutrition between the male and female children.
Test hypothesis:-The statistic used to provide evidence about the null hypothesis is called
the test statistic, e.g. - t- test, z-test.
Acceptance region:-Value of the test statistic not included in the critical region. Test of hypothesis:-A procedure where by the truth or falseness of the tested hypothesis
is investigated by examining value.
Two tailed test:Statistical hypothesis where the alternative is two sided such as
H : = H : Is called a two tailed test.
One tailed test:-A test of any statistical hypothesis where the alternative is one-sided such
as, H : = H : > Or, perhaps, H : = H : < Is called a one tailed test.
Two types of error:Type 1 error:-A type -1 error for a statistical test is the error of rejecting the null hypothesis when the null hypothesis is true. Type -2 error:-A type -2 error for a statistical test is the eror of accepting the null hypothesis when the null hypothesis is false. Level of a test is the smallest value of ,for which H can be rejected. It is the actual risk of committing a type -1 if H is rejected based on the observed value of the test statistic.
To test for an assigned population mean : Math 1 Company claims to have 125 hours life. 64 bulbs 127 hours life with S.D 4.8 Can we accept the claim ? 0 = x = Solve 127 125 64 x n
Here ,
z=
= 4.8
= 3.33 n=64
= 125
IzI=3.33
[ z >1.96 ] [ H =Rejected]
Drug-A : 80 patients mean 20 days S.D=3 Drug-B : 60 patients mean 22.5 days, S.D =5 Are these drugs equally effective ? 0 = x1 = x2 Solve z=
2022.5
9 25 + 80 60
x 2 x 1 1 2 2 2 + n1 n2
= -3.42
z= 3.42
(Ans).
To test for an assumed population mean Math -3: Random sample of 15, mean high of 66.4 inch, SD = 3.1 . Can you say that the mean height is 65 inch? Solve : H o : = 65
t= =
66.465
3.1 15
= 1.75 with 14 degree of freedom So, H o accepted. We can say thet the mean height is 65 inch. (Ans). To test the equality of two independent sample means Math 4: Worker A 10 days increases the production umits of 7,8,8,12,10,9,10,11,6,8; Worker B 12 days increases the production umits of 10,10,11,9,12,13,13,12,11,10,9,12; Are they equaly efficient ? Solve: Ho : 1 = 2 s1 2 = = = =
1 1 1 ( )2
Here, 1 = 1 = 8.9 ]
[xi2
2 = 2 = 11
101 1 9 1 9
[823
(89)2 10
[823 792.1]
30.9
= 3.43
s2 2 = = =
=2
11
121 1
[xi2
[1474
(132 )2 12
( )2
[1474 1452]
We know, s2 = = =
( 1 1)1 2 + ( 2 1)2 2 1 + 2 2
= 2.64
= 1.62 =
2.1 0.69
8.911
1 2
1
1 1 + 10 12
= -3.04 with 20 degree of freedom t = 3.04 with 20 degree of freedom t > 0 .05 0.05 = 1.7253.04 with 20 degree of freedom (Ans).
Chi-square test ( 2- test) : 1. 2. 3. 4. 5. To test significance of population variance. To test of independence in a contingency table. To test goodness of fit. To test the equality of several correlation co-efficient. To test the equality of several variance.
To test significance of population variance Math 5 A Mechine produces 60,62,58,55,57,54,55,56,58,56, items respectively per day for 10 days. Test the hypothesis that the population variance is 4 ? Solve We know, 2 = =
(1) 2 2
H o = 2 = 4 At now
( )2 2
(n-1) 2 = xi
2
= 32659 = 54.9 2 =
54.9 4
(571 )2 10
( )2
To test of independence in a contingency table Math 6 Attacked 10 (a) 30 (c) 40 (a+c) Non- attacked 40 (b) 20 (d) 60 (b+d)
Where degree of freedom = 1 . Solve : H o = Inoculation does not have any effect 2 = 2 =
(+)(+)( +)(+) 100(2001200)2 50 40 60 50 ( )2
Answer : Wheather a person be attacked by cholera depends wheather he or she is being inoculated or not. F Test : 1. Comparison of two independent variance. 2. To test the equality of several means.
Pie chart Systemic sampling Prove the equation AM HM =GM2 For two non zero positive number AM=5 HM=4. What is the value of GM=? Form Different discrete positive number. Find AM, GM and HM. Show a relationship between them. 6. Graph for median & mode: 7. The average of 10 observation is 40. One observation x is added and average is reduced to 38. One more observation y is added and the average is 42. Find the value of x and y. 8. Suppose the mean and variance of 20 observation are 32 and 25 respectively. One observation 36 is added now. Find the mean and variance 21 observation. 9. Relation between variance and S.D? 10. Determination variance and S.D of the numbers 27,28,29.Combined variance Q: Find two number for which mean and variance are 6 and 16? 11. Determination the variance for 1,2,3..n number. 12. Determination variance and S.D for (1-50) number. 13. Find out the relation of 2nd /4th /3rd raw & corrected moment 14. Proved that 1 1 or Proved that 2 = [0 1] 15. What is the importance of regression?
1. 2. 3. 4. 5. 16. Proved this 17. Math of co- rrletion and regression and rank co- efficient. 18. 2 = 0 the rank is same with example? 19. Importance of normal distribution 20. How can you find the mean and variance in the binomial distribution. Or, Mean is greater than the variance explain. 21. The probably of surviving a patient operating a delicate heart operation is 0.2. What is the probability that out of 8 person operating such heart operation. 22. How can you find the mean and variance in the poison distribution Or, Mean & variance is equal explain. 23. In a factory the probability to the defective for a product in 0.001. Out of 500 product What is the Probability of 10 product to the defective? 24. Two types of error, Null hypothesis (H) , Important steps in a test of significance, Normal test (z-test):[n>30], Null hypothesis (H) 25. Define range,variance,standard deviation,standard error,mean deviation from median/mean/mode 26. Why there is two types of moment?
=1
( 2 1)
6 2
27.To test for an assigned population mean : Company claims to have 125 hours life. 64 bulbs 127 hours life with S.D 4.8 Can we accept the claim ? 28.Comparison of two independent sample means : Drug-A : 80 patients mean 20 days S.D=3 Drug-B : 60 patients mean 22.5 days, S.D =5 Are these drugs equally effective ? 29.To test for an assumed population mean Random sample of 15, mean high of 66.4 inch, sd = 3.1 . Can you say that the mean heigh is 65 inch? 30.To test the equality of two independent sample means Worker A 10 days increases the production umits of 7,8,8,12,10,9,10,11,6,8; Worker B 12 days increases the production umits of 10,10,11,9,12,13,13,12,11,10,9,12; Are they equaly efficient ? 31.To test significance of population variance A Mechine produces 60,62,58,55,57,54,55,56,58,56, items respectively per day for 10 days. Test the hypothesis that the population variance is 4 ? 32.To test of independence in a contingency table Inoculated Non- inoculeted Attacked 10 (a) 30 (c) 40 (a+c) Non- attacked 40 (b) 20 (d) 60 (b+d) 50 (a+b) 50 (c+d) 100 (a+b+c+d)
Combined variance:
=
2
0 1 = 2 = )
2 2 1 1 + 2 2 (This 1 + 2
Median = L1 +
Where,
L1 = The lower limit of the median group. C = Class interval of median group. n = The total frequency. f m = The frequency of the median group. f c = The cumulative frequency of the group preceding the
1 0
fm
median group.
Mode = L + (
Why there is two types of moment? For any case we need to determine the corrected moment but there is a difficulty to found the corrected moment. Because if the mean of the data set got fractional value then the calculation will be difficult. In statistic we always try to avoid calculation and thats why to solve the problem we take another type of moment that is raw moment. In raw moment we take a random number which makes our calculation easier. Finally we made a relation between raw moment and Corrected moment Thats why there is two types of moment. Skewness: Skewness measures the lack of symmetry in a frequency distribution Pearson co-efficient =
.
1 0 )+(1 2 )
When the result is positive then it will be called positively skewed distribution. When the result is negative then it will be called negatively skewed distribution. When skewness is zero then it is called symmetrical skewed /distribution. Fig: positively skewed distribution. Fig: negatively skewed distribution. Fig: symmetrical distribution.
Kurtosis: It measures the peakness of the graph of a data set. It only found when skewness is zero. When, Peakness is high ,it is called Leptokurtic Peakness is medium, it is called mesokurtic Peakness is low, it is called platykurtic Parameters for finding the kurtosis:
1 = 3 3 2 =4 2
2
2 2
When,
1 =
2 2
When,
2 is greater than 3 then 2 will be positive and then it will be Laptokurtic. 2 is equal to 3 then 2 will be zero and then it will be mesokurtic.
R
When,