Vous êtes sur la page 1sur 129

STAGE 2 QUANTITATIVE TECHNIQUES 1) Time series analysis (8) 2) Index numbers 3) Regression theory (4) 4) Correlation (5) 5) Probability

distribution (1) 6) Numerical methods 7) Statistical test (test of hypothesis) (3) 8) Estimation theory (2) 9) Decision theory 10)Financial mathematics 11) Simulation (9) 12) Linear programming (6) 13) Network analysis (7) 14)Sampling REFERENCES 1) Quantitative techniques by Lucy 2) Statistics (any book) 3) Business statistics (any book) 1) Probability Distribution

DEFINITION: Probability It is the measurement of the likelihood of an event occurring. It is also the Quantification of uncertainty. It is a branch of mathematics which deals with the study of random or nondeterministic experiments. EXAMPLE 1 A single toss of a coin can result to the following two possible outcomes i.e. i) Head turning up ii) Tail turning up. Probability of the head P (tail) = measurement of its likelihood = one head out of two possible outcomes.

Probability of a tail P (tail) = measurement of its likelihood = one tail out of two possible outcome Total outcome = 2 P (head) = P (tail) =

TYPES OF PROBABILITY i. Experimental probability / empirical probability. This is probability that is deduced by performing some experimental trials i.e. making observations on the number of times an event occurs. Probability of an event; P (even) = Number of times an event occurs Total number of trails = NB frequency of an event Total number of trials = F N

Experimental probability is based on carrying out a large number of trials i.e. N should be large for better results. P (event) = Limit f (event) n f (infinity) N

EXAMPLE 2 A corn is tossed three times and the total number of head or tail turn up recorded as below. Outcome Head Tail Total P (head) = 2/3 P (tail) = 1/3 Frequency 2 1 3

ii.

Theoretical Probability / classical probability This is the probability deduced using mathematical formulas or calculators i.e. without experiments. Experimental probability may be cumbersome if the total number of trials is too large i.e. many trials need to be done hence spending a lot of time and effort. Theoretical probability provided a way of defining probability based on mutually exclusive events.

EXAMPLE 3 There are three lions, four cheetahs and five zebra (no other animal). Supposing an animal is picked at randomly, what is the probability of getting a lion or a zebra, or a cheetah? P (lion) = 3/12 P (cheetah) = 4/12 P (zebra) = 5/12 P (lion or zebra) = P (lion) + P (zebra) = 3/12 + 5/12 = 8/12 P (zebra or lion or cheetah) = P (zebra) + P (lion) + P (cheetah) = 5/12 + 3/12 + 4/12 = 12/12 - 1 RULES OF PROBABILITY 1) Probability of an event A (p (A)) = 0 i.e. is impossible 2) P (A) = 1 i.e. A is certain 3) For any event A, 0 P (A) 1 4) P (A) or B) = P(A) + P (B) (mutually exclusive events) 5) P(A) and B) = P (A) + P (B) (for independent events) 6) Total probability for all possible events is equal to 1 EXERCISE 1. A coin is tossed three times and the numbers of heads are determined theoretically. Calculate the probability of each outcome i.e. the probability of a head, 2 heads, 1 head and 3 heads. P (0 heads) P (2 heads P (3 heads) = = = f N f N f N = = = 1 8 3 8 1 8 3 Trials 1st 2nd 3rd

T TT TTT Outcomes TT TTH THT THH HHH HHT HTH HTT TTH THT TH THH HH

H TH HTH P 1 /8 3 /8 3 /8 1 /8 HTT

HHH HHH HHT X 0 1 2 3 F 1 3 3 1

No. of heads (x) 0 1 1 2 3 2 2 1

DEFINITION RANDOM VARIABLE (R.V) Its an event or variable whose value depends on probability i.e. variable value cannot be predetermined precisely. Is a variable whose value is a number determined by an outcome of an experiment. Outcome (H T) (HH, HT, TT, TH) (HHH, HTT, HTH, THH) (TTT, HHT, THT, HTT) No. of heads (R.V.) (0,1) (0, 1, 2) (0, 1, 2, 3)

Example Experiment 1. Single toss of a coin 2. Double toss of a coin 3. Triple toss of a coin

TYPES OF RANDOM VARIABLES 1. 2. 1. Discrete Random variables Continuous random variables DISCRETE RANDOM VARIABLES One whose possible value is a countable distinct value e.g. No of heads, No of cars a family owns etc.

2.

CONTINUOUS RANDOM VARIABLES One whose possible value is any value or number within an open, closed or semi closed interval. e.g. < x < 5 4 x 5 (closed) 4 < x 5 (semi closed) NB Random variables are denoted by capital letters and their values by equivalent small letters. X = No of heads in double toss of a coin. X = 0, 1, 2. PROBABILITY DISTRIBUTION It is an assignment of probabilities to all possible values of a random variable Probability distribution is represented by Tables or graph or mathematical formula. EXAMPLE Coins are tossed at once. Determine:i. The possible number of heads ii. Probability distribution of the number of heads. SOLUTIONS i. The answer is a number determined by the outcome of the experiment i.e. maybe o, 1, 2, 0r 3 e.g. Heart = 3, Black = 26, King = 4 Red = 26. OUTCOMES HHH HHT HTH THH TTT TTH THT HTT X NO OF HEADS 3 2 2 2 0 1 1 1

ii.

We cannot predict the number of outcomes of the number of heads exactly but we can say what the possibility and probability are. The 5

probability distribution for the number possible of heads (Random variable) is given by the frequency of each number in the outcome Total number of outcome i.e. X = No of heads 0 1 2 3 Frequency f 1 3 3 1 f = 8 Probabilities 1 /8 3 /8 3 /8 1 /8 P(x) = 1

EXERCISE A coin is tossed four times. Determine the probability distribution of the number of tails. Outcomes HHHT HHTT THTH THHT TTTT TTHT THTT HTTTT HHHH HHTH HTHH THHH TTTH TTHH THTH HTTH No of Tails 1 2 2 2 4 3 3 3 0 1 1 1 3 2 2 2 No of tails 0 1 2 3 4 Frequency 1 4 6 4 1 f = 16 Probability
1

/16 /16 /16 /16 /16

P = 1

TYPES OF PROBABILITY DISTRIBUTION 1. Discrete probability Distribution / Probability Mass Function 2. Continuous probability distribution / probability density function.

1.

Discrete Probability Distribution / Probability Mass Function 6

Is an assignment of probabilities to all possible values of a discrete random variable. It is all possible values of discrete random variable and their corresponding probabilities.

3. Continuous probability Distribution / Probability density function. Is an assignment of probabilities to all possible values of a continuous random variable. It lists all the classes of possible values a continuous random variable and tier corresponding probabilities. Also called probability density function. Continuous Probability Distribution for Lives of 1000 Batteries. X (life of battery 55 to less than 56 56 to less than 57 57 to less than 58 58 to less than 59 59 to less than 60 60 to less than 61 61 to less than 62 62 to less than 63 36 to less than 64 64 to less than 65 Probability (P(x) 0.020 0.050 0.090 0.150 0.190 0.160 0.130 0.130 0.100 0.070 0.040 f = 1

CHARACTERISTICS OF PROBABILITY DISTRIBUTION 1) 0 P(x) 1 for each valued of x i.e. probability that x assumes a value in any interval or countable set lies in the ranged 0 and 1. 2) P(x) = 1 i.e. total probability of all the possible values of x in an interval or countable set is 1. Examples of Probability Distribution Families. 1) Binomial 2) Poison discrete 3) Normal 4) Gamma continuous 5) Beta 6) Exponential continuous 7) Student distribution 8) F- distribution 1. Binomial Probability Distribution 7

Definition Is a discrete probability distribution used to describe the probability that an outcome will occur x times in N performance of an experiment. Example 1. Find the probability that in a random sample of 3 TVs manufactured at a firm. Exactly 1 will be defective. 2. Find the probability that 4 out of a customer who visit a department store will make a purchase when 25% of all customers who visit this store make a purchase. Solution 1. n = x = No of defectiveness = 1 Probability of success = P Probability of failure = q P (x = 1) = (xn) px q n-x P (x = 1) = 3x1 p1 q2 N = 9 p = 0.25 2.

q = 1 0.25 = 0.75

P (x = 4) = (xn) px qn x P(x = 4) = (49) 0.254 0.755 = 126 x 0.0039 x 0.2373 = 0 = 0.1166

Characteristics of binomial experiments Binomial probability distribution is applied to discrete dichotomous random variables that satisfy the 4 conditions of a binomial experiment:i. There are n identical trails i.e. a given experiment is repeated n times under identical conditions. ii. The trial has only 2 possible outcomes i.e. usually called a success and a failure. iii. The probabilities of the two outcomes remain constant i.e. probability of failure is denoted by q and are both constant for each trial p + q = 1 iv. The trials are independent i.e. outcome of 1 trial does not affect outcome of another. Example Consider the experiment consisting of 10 tosses of a coin. Determine if it is a binomial experiment. Solution

Its a binomial experiment because it satisfies the 4 conditions i.e. i. Ten identical trials i.e. 10 tosses under the same condition ii. Each trial i.e. toss has got 2 possible outcomes i.e. head or tail. (success or failure) iii. Probability of obtaining a head (success) is and tail (failure) is for any toss. iv. Trials or tosses are independent i.e. outcome of the toss has no effect on succeeding toss. Binomial Formula For a binomial experiment, the probability of exactly x successes in n trials is given by the binomial formula. i.e. P (x = x) n px qn-x x Where n = total number of trials p = P (Success q = P (Failure) = 1 P x = No of successes in n trials n x = No of failures in n trials Example A coin is tossed 4 times. Determine the probability distribution of the number of tails.

X N=4 X = 0, 1,2,3,4 P (H) = P P (T) = = q 0 1 2 3 4

p (x = x) = x
4

p x q n-x = 0.0625 = 0.25 = 0.375 = 0.25 = 0.0625 1.0

Co 1/20 1/24-0 C1 1 4-1 C2 2 4-2 C3 3 4-3 C4 4. 4-4

Example According to an estimate, 50% of the people in Nairobi have at least 1 credit card. If a random sample of 3 is taken. What is the probability that 19 of them will have at least 1 credit card (use binomial). Solution Let x be number of people with at least a credit card. 9

= 41 ()0 ()4 (4 0)1 = 41 ()1 ()3 (4 1)1 = 41 ()2 ()2 1 (4 2) = 41 ()3. () (4 3)1 = 41. ()4. ()0 (4 4)1

P (x) = 50% = probability of success = P = 0.5 Sample size n = 30 P (x = 19) = 30 0.519 x 0.530 19 19 = 454627300 x 0.964175997 x 0.938909 = 0.05087 Example It is expected that 10% of the production from a continuous process will be defective and scraped. Determine the probability tat in a sample of 10 units chosen at random. i. Exactly 2 ill be defective ii. At the most 2 will be defective. Solution Let x be number of defectives P (x) = 10% = P (success) = 0.1 = P P (failure) = q = 1 P = 1 0.1 = 0.9 Sample size, n = 10 P(x = 2) = 10 0.12 x 0.910-2 = 45 x 0.01 x 0.430467 2 P (x = 2) = 0.1937.

ii) At most 2 will be defective, we consider x = 0 or x = 1 or x = 2 P (x = 0) = 10 0.10 x 0.910 - 0 = 1 x 1 x 0.348678 0 P (x = 1) = P (x = 2) = 10 1 0.11 x 0.910 1 = 10 x 0.1 x 0.38742 = 0.38742

10 0.12 x 0.910 2 = 45 x 0.01 x 0.43046721 = 0.1937 2 P (x = 0 or x = 1 or x = 2) = 0.348678 + 0.38742 + 0.1937 = 0.9298 Exercise An airline deliberately overbooks its local flights because it knows from past experience that not all passengers who books for a given flight actually arrive for the flights. It is assumed that the probability of any booked passenger arriving for a flight is 0.8. Independent of the chance of any other passenger arriving. The

10

airline takes 10 bookings for an 8 seater aircraft, using binomial distribution; find the probability that for a given flight, the air light takes off. i. Full (6 marks) ii. With at least 3 empty seats (4 marks) n = 10 p (successes) = 0.8 = P P (failure) = 1 0.8 = 0.2 x n-x P (x = 8) = n p q = 10 0.85 x 0.210 8 x 8 = 10 8 0.83 x 0.22 = 0.30198989

2.

Razor blades are sold in packets of 5. The distribution below shows the number of faulty blades in 100 packets. No. of packets F 84 10 3 2 1 0 f 100 xf 0 10 6 6 4 0 fx 26

No. of faulty blades X 0 1 2 3 4 5

Required i. Calculate the mean number of faulty blades per packet. ii. Assuming the distribution is binomial, estimate the probability that a blade taken randomly will be faulty. NB P (success) Binomial mean = M = n p Sample size Binomial standard deviation = f = n p q Variance f 2 = n p q Mean = x f = 26 = 0.26 f 100 P (x = x) = n px qn-x x Let the number of faulty blades be x 11

P (x = 1)

5 1

x px x qn x

Where P = Probability of success X = number of success in n trials n x = number of failures in n trails q = Probability of failure = 1 P But M = n P :. 0.052 = 5 P = 0.052 5 5 P (x = 1) P (x = 1) 5 (0.052)1 (1 0.052)4 1 = 5 x 0.052 x 0.9484 1 = 5 x 0.052 x 0.9480 = 0.2439

2.

Poisson Probability Distribution It is discrete probability distribution used to describe the probability of a certain number of occurrences Internal Occurrences x

NB

1. Time Breakdowns 2. Distance Accidents 3. Volume Errors The internal maybe Time, internal, Space interval, Volume interval etc the occurrence (i.e. event in an interval) maybe breakdown, accidents, phone calls, errors etc.

Examples 1) The number of accidents that occur in a company during a one month period. 2) The number of customers coming to a grocery store during a 1 hour interval 3) The number of TV sets sold at a departmental store during a given week. CHARACTERISTICS OF POISSON DISTRIBUTION The following 4 conditions must be satisfied to apply the passion probability distribution. 1) Occurrence x is a discrete random variable i.e. either 0, 1, 2 not 1.5. 2) The occurrences are random i.e. you cannot predict when the next occurrence occur.

12

3) The occurrences are independent i.e. occurrences dont influence each other. 4) The occurrences take place in an interval. Example Consider the number of customers arriving at the Moi Avenue branch of standard chattered bank during a 1 hour interval. Determine where this is a Poisson distribution case. Solution 1) The above situation satisfies the 4 conditions of Poisson probability distribution i.e. i. Occurrence x is discrete i.e. Number of customers arriving at this bank is countable i.e. 0, 1, 2, 3 etc ii. Occurrences are random i.e. arrival of 1 or more customers cannot be predicted iii. Occurrences are independent i.e. arrival of one customer is not related to arrival of another customer. iv. Occurrences take place in an interval i.e. arrival of customers in 1 hour. Poisson Probability Distribution Formula According to the Poisson probability distribution, the probability of x occurrences in an interval is given by:x = No pf cost x = value P (x = x) x e- x! of

Where (labda) () e

= mean (average) number

occurrences in that interval. = 2.71828

Example The ATM installed outside KCB savings and loan is used on average by 5 customers per hour. The bank closed this ATM for 1 hour for repairs. What is the probability that during that 1 hour 8 customers came to use this ATM. Solution Let = average number of customers who use this ATM per hour. :. 5 = x = number of customers who come to use this ATM in this 1 hours x=8 13

P (x = 8) = x e-x x!

58 x e-5= 8!

0.0653.

Mean and Standard Deviation of the Poisson Distribution M = mean d2 = variance d = standard deviation Exercise A washing machine in a laundryman breaks down at an average of 3 times per month. Find the probability that during the next month, this machine will have a) Exactly 2 breakdowns b) At most 1 breakdown. Solution a). b). 3= x=2 x -x p (x = 2) = e = 32 e-3 x! 2! = 32 x 2.71828-3 = 0.224 2! = 0.14936 = 0.049787

At most 1 breakdown, we consider x = 0, and 1 P (x = 1) = x e- = 13e-3 = 31 x 2.71828-3 x! 1! 1! P (x = 0) = x e- = 03e-3 x! = 1 x 2.71828-3 0! 0!

3.

P (x = 0 or 1) = 0.14936 + 0.049787 = 0.199147 Normal Probability Distribution Def: Is a continuous probability distribution which when plotted gives a bell shaped curve. Such that: The total area under the curved is 1.0 The curve is symmetric about the mean. (divided into 2 equal parts) The two tails of the curve extend indefinitely (dont touch the down line)

Pr

14

Shape of the curve is determined by M and standard deviation.

-x

+xx

N / B. Many phenomena, in the real world are normally distributed either exactly or approximately e.g. 1. Weights of packages 2. Life of an item i.e. bulb, TV, radio etc 3. Scores on an examination 4. Heights of people 5. Time taken to complete a certain job. Normal Random Variable: - Is a continuous random variable x that has a normal distribution. The Standard Normal Distribution Defn: Is a normal distribution with mean = 0 and standard deviation = 1 i.e. M = 0, d = 1.

M=0 d=1

-4 -3 -2 -1 0 1 2 3 4 Standard Normal Variety Defn: Is a normal random variable that has a standard normal distribution i.e. M = 0, d = 1. It is denoted by z and its units are referred to as z values or z scores The interval between z values is UNIT N/B: The z value for a point on the horizontal axis gives the distance between the mean and that point in terms of standard deviation. Standard normal distribution table Lists the areas under the standard normal curve between z = 0 and z = 3.09 These areas give the probabilities of z taking those z values. 15

Example 1. Find the area under the standard normal curve from i. Z = 0 to 3 ii. Z = 2 to 2.5 iii. Z = -1 to 0 Solution i) Z = 0 to 3 Area given in the table is x to any point thus area from 0 to 3 is given by area of 0 subtracted from area of 3. Z = 0 to 3 = 0.999 0.5 = 0.499. 0.994 0.977 = 0.017 Assume -1 to the +ve side thus z = 0 and 1 = 0.841 0.5 = 0.341. Assume -2.17 to the +ve side. z = -2.3 to 0 then to 2.1 z = (0.989 0.5) + 4.982 0.5) z = 0.489 + 0.482 + 0.482 = 0.971

0 1 2 3 ii) Z = 2 to 2.5 iii) Z = -1 to 0 iv) z = -2.17 to 0 v) z = -2.3 to 2.1

-2.3 2 1 0 2.1 Standardizing a Normal Distribution. Standard normal tables provide areas under the standard normal curve. In real world applications, a continuous random variable may have a normal distribution with the values of the mean and standard deviation different from 0 and 1 respectively. Standardizing a normal distribution involves converting the gene normal distribution to the standard normal distribution. For a normal random variable x a particular value of x can be converted to a z value by using the formula below. Z=xM d where M and d are the mean and standard deviation of the normal distribution of x.

N/B The z value for the mean of a normal distribution is always 0. Example Let x be a continuous random variable that has a normal distribution with a mean of 50 and standard deviation of 20. Convert the following values to z 16

values. a) x = 55 b) x = 35. Solution a) d = 10 z=x-M d z = 55 50 = 5 = 0.5 10 10


M = 50 X = 55

d=1

M=0

z= d = 10

b) z = 35 50 = -15 = -1.5. 10 10

X = 35

M = 50

d= 1 z=xM M=0 d Let x be a continuous random variable that is normally distributed with mean of 25 and standard deviation of 4. Find the area a) Between x = 25 and x = 32. Solution a) d=4 z1 = x M = 25 25 = 0 17

x = M = 25 x = 32

z2 = x M = 32 25 = 7 = 1.75 d 4 4 Area (a) = Area of 0 to 1.75

Area between x = 25 and x = 32 is the area between x = 0 and z = 1.75. (z = 0 to 1.75) = z = - to 1.75 (z = - to 0) = 0.969 0.5 = 0.46. b) d=x z1 = x M = 18 25 = -7 = -1.75 z2 = x M = 32 25 = 7 = 1.75 d 4 4 Area between x = 18 and x = 35 is the area between z = -1.75 and z = 2.5 -1.75 to 0 = z = 0 and 1.75 0.969 0.5 = 0.46 0 to 2.5 = 0.994 0.5 = 0.494

x = 1 x = 35

-1.75 z = 0 2.5 N/B The area between x = 25 and x = 32 under the normal curve gives the probability that x assumes a value between 25 and 32 i.e. P (25< x < 32) = Area between x = 25 and x = 32. = P (0< z < 1.75) = 0.46 Example Let x be a normal random variable with its M = 40 and d = 5. Find the probabilities for this normal distribution. a) Probability of x > 55 b) Probability of x < 49

Solution d=5
Normal curve for x

18

z = x - M = 55 40 = 3 d 5 M = 40 P (x > 55) = p (z = 3) = Area between z = 3 and z = 0 (-x to ) Any of the whole area of z = - to 3 1 0.999 = 0.001 z = 55 40 = 3 5 b) P(x < 49) = z = 49 40 = 9 = -1.8 = P (z = 1.8) 5 5 = area between z = 1.8 and z = - = - to 0 then 0 to 1.8 = 0.5 + (0.964 0.5) = 0.5 + 0.464 = 0.964 Approximation of Binomial Distribution to Normal Distribution. When approximating the binomial distribution to the normal distribution, it should be remembered that the normal curved is continuous while the binomial distribution takes integer values. If n and r are larger, calculations in binomial no. no. of of trials successes distribution can become tedious. In such a situation therefore, the binomial distribution approximates to the normal distribution and normal tables can be used to find the probabilities. Example 20 coins are tossed and the number of heads x is noted. Find the probability that the number of heads is a) 15 or more. b) Between 6 and 13 inclusive. Solution NOTE To find the probability that the number of heads is 15 or more using the normal approximation, we have to find the probability of a value greater that 14.5 it is necessary therefore to find the mean, standard deviation of binomial distribution and standardize the distribution to find the Z score. i.e. Mean M = n p = 20 x = 10 Standard deviation d = n p q = 20 x x = 5 Z=0

19

Standardizing Z = x M = - + x np d npq Normal Variables have contained range of possible values e.g. time, weight, sizes, and growth rate. N is large and P not close to 0 or 1 so that distribute is aprox, symmetric. When probability of an item has one of the two possible condition is P thus probability of another item having the other cod is (1 P) usually referred to as q so that p + q = 1. When there are only 2 possible condition e.g. male / female, good bad, black / white. Outcomes have discrete values and no continuous ranges e.g. dealing with people. Passion: - when n > 5 Large say greater than 50. P is small in relation to q so that n p is less than say 5. P (x x) = P (z > z) But z = - + x np npq P (x x) = P (z < z) But z = + + x n p npq a) P (x 15) = P (z > z) but z = - + x np = - + 15 10 = 14.5 10 npq Z = 2.0125 P (z > z) = P z > 2.012) = 1 z = 1 0.978 = 0.022 5 5

M=0 z = 2.012 b) P (6 x 13 P (x 6) = P (z > z) but z = - + x np = - + 6 10 20

npq Z = 5.5 10 = -4.5 = -2.012 5 Z = -2.012 P (z > z) = P (z = -2.012) = 0.478 5

P (x 13) = P (z >z)

but z = + + x - np npq Z = + 13 10 = 3.5 = 1.565 5 5


-

2.012

M= 0 1.565

2.012 to 0 = 0 to 2.012 = 0.978 0.5 = 0.478 0 to 1.565 = 0.941 0.5 = 0.441 0.441 + 0.478 = 0.919 = 0.92

SAMPLING DISTRIBUTION Basic Concepts of Sampling. Definitions 1) Population is an aggregate of all elements whose characteristics are being studies. E.g. population of all families in Kenya owning vehicles, population of all firms started in 1992 in Kenya. 2) Sample is a proportion of the population selected for study. E.g. sample of 20 districts in a country, sample of 5 students in a class. 3) Sampling Units individuals or elementary units whose characteristics are to be measured. 4) Sample Survey is the technique of collecting information from a proportion of the pollution. i.e. sample. 5) Sample Statistic numerical summary measures calculated for sample data e.g. x, 5, s2, p, n. 6) Population Parameters numerical summary measures calculated for population data e.g. M, d, d2 P, N.

RANDOM AND NON RANDOM SAMPLES 21

1) Random sample is a sample drawn such that each member of the population has an equal chance of being included in the sample. 2) Non random sample- is a sample drawn such that members of the population do not have an equal chance for being included in the sample. Examples 1) Consider a list of 100 companies with a sample of 10 companies to be drawn from it. i. If we write the names of all 100 companies on pieces of paper. Put them in a hat, mix them and then draw 10 names / cards, we get a random sample. ii. If we arrange these 100 companies alphabetically and pick the first 10 names, we get a non-random sample. Make reference on Population parameters Sample statistic. M =? x =? d =? s =? d2 =? s2 =? P =? p =? N =? n =? Population Parameters 1. M = fx N 2. d = f (x x) 2 f 3. d2 = 4. P = x N 5. N = METHODS OF SAMPLING 1) Simple Random sampling 2) Stratified Random sampling 3) Systematic sampling 22 = (x x) 2 N Sampled Statistics 1. x = fx = x f n 2. 5 (x x)2 n-1

f (x x)2 = (x x)2 f N

3.52 f (x x) f 4. P = x N

4) Cluster sampling 5) Multistage sample. 1) Systematic sampling A sample item would be selected at every nth item i.e. after choosing randomly a starting point. E.g. to sample every 100th item and a start point of 67 chosen randomly, the sample would include the following items. 67th, 26th, 367th and so on. The gap between selections is known as the sampling interval which is randomly selected. 2) Stratified Random Sampling Population is divided into groups known as strata. Random samples are taken from within each group in the proportion that each group bears to the population as a whole. E.g. To sample 100 staff at a group of hospitals whose staff could be stratified as follows. Staff Number Doctors 200 Nurses 600 Administrators 400 Auxiliary workers 600 2000 USING STRATIFIED Staff Doctors Nurses Administrators Auxiliary workers 800 RANDOM SAMPLING Numbers Proportion 200 10% 600 30% 400 20% 40% 2000 100%

To pick a sample of 100, then 10% of 100 must be Doctors = 10 30% of 100 must be Nurses = 30 20% of 100 must be Administrators = 20 40% of 100 must bed auxiliary workers = 40 3) Simple Random Sampling Its a method of choosing a sample such that each unit in the population has an equal and independent chance of being included in the sample. This means that each unit in the population has a 1/n chance of being chosen. SIMPLE RANDOM SAMPLING PROCEDURE Let the population consist of N units. Suppose you require random simple sample of N where (n < N). Then there are NCn possible samples in each of which no

23

population unit is included more than once. i.e. each sampled has a 1/NCn chance of being chosen from the population. Population size N. sample size n NCn = N! (N n)! n! Example. Population size N = 4 i.e. A, B, C, D Sample size n = 2 Possible combinations are AB, AC, AD, BC, BD, CD. :. There are 6 possible samples of size 2. NCn = 4! =4x3x2x1 = (4 2)! 2! 2! 2

12 = 6 possible samples 2

Probability of picking each sample = 1/NCn Such a sample can be obtained sequentially by drawing members from the population one at a time without replacement (W.O.R) so that at each of the remaining unit has an equal chance of being chosen. Methods of Drawing a simple Random Sample. There are two methods:1. Lottery method 2. Random Number Generator. 1. Lottery Method Identify the N units in the population with the numbers 1 N written on cards which are made homogeneous as possible in shape size etc. These cards are snuffed a number of times and n cards are drawn one at a time. The n cards constitute a random sample drawn. Lottery method is time consuming if the population is large. 2. Random Number Generator. The most practical in expensive method of selecting a sample consists of a random number generator, which has been constructed such that each of the digits 0, 1, ---- 9 appears with equal probability and are independent of each other. If we have to select a sample from a population size from of N 99. The digits are combined in twos to give pars from 00 99. Similarly, if N 999, then the digits are combined in threes to give triplets from 000 999.

4) Multi Stage Sampling 24

Its similar to stratified sampling except that the groups and subgroups are selected on a geographical or location basis rather than social characteristics e.g. Age, height, Sex, Weight etc.

5) Cluster Sampling. This is a useful system for reducing sampling costs and dealing with the lack of a satisfactory sampling frame. A few geographical areas / perhaps a village or a street in a town, are selected at random and every single household is interviewed in the selected area. POPULATION AND SAMPLING DISTRIBUTION. DEFN: Population It is a probability distribution derived from the population data i.e. information on all elements of the population. Example Suppose there are only 5 employees for a small company. Then the following data given the summary of their annual salaries. 17, 24, 35, 43, 35 in thousand dollars. Produce a population distribution for this data. Solution. Let x = Employees salary. Population Distribution X F P = f/N fx

17 24 35 43

1 1 2 1 f = N = 5

1/5 1/5 2/5 1/5 f = 1

17 24 70 43 fx = 154

Mean = fx = 154 = 30.8 f 5 SAMPLING DISTRIBUTION Is the probability distribution of a sample statistic. N/ B population parameters are always a random variable i.e. varies from one sample to another. Every random sample must possess a probability distribution. Sample statistic being a random variable has a probability distribution known as sampling distribution. EXAMPLES OF SAMPLING DISTRIBUTIONS. 1. Sampling Distribution of x. 25

2. Sampling Distribution of P. 3. Sampling Distribution of S. Example. Sample n=15 x1 = 6 x2 = 9 P = x1 = 6 N 15 X N (M, d / n) P N (P, pq/ n ) S 2 1. Sampling Distribution of X. Is the probability distribution of X i.e. lists the various values that X can assume and the probability for each value of X. Example (refer to annual salaries of 5 employees.) Suppose we label the 5 employees as A, B, C, D, E then Employee Salary A B C D E 17 24 35 35 43

Population N = 30 X1 = 10 x2 = 20 P = x1 = 10 N 30

Required Determine the sampling distribution of x when the sample size is 3. Solution Observe N = 5, n = 3 Possible samples given by NCn = 5(3 = 5! (5 3) 3!

5 x 4 = 10 2x4

= 10 possible samples. POSSIBLE SAMPLES 26 SALARIES X

1 2 3 4 5 6 7 8 9 10

ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE

17, 24, 35 17, 24, 35 17, 24,43 17, 35, 35 17, 35, 43 17, 35, 43 24, 35, 35 24, 35, 43 24, 35, 43 35, 35, 43

25.33 25.33 28.00 29.00 31.67 31.67 31.33 34.00 34.00 37.67

SAMPLING DISTRIBUTION OF X x f 25.33 28.00 29.00 31.67 31. 33 34.00 37.67 2 1 1 2 1 2 1 f = 10

P (x) = f / n /10 /10 1 /10 2 /10 1 /10 2 /10 1 /10 P (x) = 1


1 2

Also Required Calculate the mean of x Mean = fx = 308 = 30.8 f 10 Mean of x = Population mean E (x) = M DEFN: Sampling Error. It is the difference between the value of a sample statistic and the value of a corresponding population parameter. E.g. 25.33 308 and is due to the sampling distribution of x i.e. value of x depends on the sample chosen. E.g. Sampling Error of x = x M. Which depends on chance. DEFN: Non Sampling Error It is the error that occurs in the collections recording and tabulation of data. MEAN AND STANDARD DEVIATION OF X Definition 27

Mean of x (Mx) Is the mean of the sampling distribution of x, denoted by MX which is equal to the population mean i.e. Mx = M = x (n) DEFN: Standard Deviation of x (dv) Is the standard deviation of the sampling distribution of x denoted by dx. dx = d where d = population standard deviation n n = sample size. It is also referred to as Standard Error of x.

N/B x has a normal distribution with mean M and standard deviation d n x N (M = Mx, d/n)

-3

/d

-2

/d

-d

/n mx= d/n

2d

/n 3d/n

2. E S T I M A T I O N T H E O R Y Consider a population of size N = 6. The values Y p (I = 1, 2...6) of a variant or characteristic for this population are 8, 3, 2, 11, 5 and 7 respectively. Suppose a simple random sample of size n = 2 is drawn from the population. The values yi (I = 1, 2) of a variate for this sample are 8 and II respectively. POPULATION Y1 Y2 Y3 Y4 Y5 Y6 VALUES 8 3 2 11 5 7 SAMPLE Y1 Y2 VALUES 8 11

Then, population mean = Y1 + Y2 + Y3 + Y4 + Y5 + Y6 28

Y = 8 +3 + 2 + 11 + 5 + 7 = 36 = 6 6 6 Sample mean = Y1 + Y2 = 8 + 11 = 19 = 9.5 2 2 2 The sample mean is meant to estimate the population mean. DEFN: ESTIMATE Is a value of sample variate that approximates the value of a population parameter e.g. 9.5 is an estimate for 6. There are two types of estimates:i. Point estimate ii. Interval estimate. DEFN: Point Estimate Is a fixed number which we infer as an estimate or approximation of a population parameter e.g. y = 9.5 is a point estimate of 6. DEFN: Interval Estimate Is a range or subset of the possible values that we infer as an estimate or approximation of a population parameter e.g. 4 < y < 10 DEFN: Estimator Is a sample variate of statistic that is used to approximate the population parameter. E.g. Y is an estimator for Y S is an estimator for S D is an estimator for P i.e. estimators are sample Estimators. CHARACTERISTICS OF ESTIMATORS 1. Unbiasedness DEFN: - An estimator y, for a parameter y is said to be unbiased estimator is its expected value is equal to the value of the population parameter. i.e. E (estimator) = population parameter E(y) = y Alternatively, an estimator y is said to be unbiased estimator for the population parameter y is the average of all values of y for all possible samples = y. 2. Consistency 29

DEFN: A good estimator should be one for which the precision becomes higher as the sample size increase i.e. the estimator is ought to be better when it is based on 20 observation that when it is based on 2. Alternatively, let y be an estimate of Y based on a sample of size 1, let y2 be an estimate of Y based on a sample of size 2 and in general, let yn be an estimate of Y based on a sample of size n. then y1 y2 y3 Yn is a sequence of estimates of Y. we desire that if y is consistent, then; E (Yn Y) N = Population Mean = Proportion = Variance = Standard dev =
2

= 0 as n approaches

M P d2 d

Sample x P 52 5

sample statistics

OTHER CHARACTERISTICS OF ESTIMATORS. 3. Sufficiency: an estimator is said to be sufficient it if uses all the information in the sample in estimating the required population parameters. 4. Efficiency: an estimator is said to be more efficient than the other if, in repeated sampling, its variance is smaller. Example. i. Calculate the point estimates for the population mean. ii. Prove that the estimator of the population mean is unbiased estimator. iii. Repeat (i) and (ii) standard deviation.

Solution Population Characteristic Element Y! 1 1 2 2 3 3 4 4 Number of possible samples = NCn = 4C2 = 4! = (4 2)! 2! Possible sample Sample values 30

4 x 3 x 2 x 1 = 12 2x2 2 Sample Mean

Y1, Y2 1 2 3 4 5 6 1, 2 1, 3 1, 4 2, 3 2, 4 3, 4

Y = Y! n 1.5 2 2.5 2.5 3 3.5

(Difference between value of a sample statistic and the corresponding population parameters. Sampling Error = x M = 3.5 2.5 = 1 i.e. value of x depends on sample chosen. ii) Prove unbiasedness of mean; i.e. E(y) = M E(y) = y! = y1 + y2 + y3 + y4 + y5 = 1.5 + 2.0 + 2.5 + 2.5 + 3.0 N 6 6 = 15 = 2.5 6 E(y) = 2.5 M = Y = y1 + y2 + y3 + y4 = 1 + 2 + 3 + 4 = 10 = 2.5 N 4 4 4 E(y) = M i.e. 2.5 = 2.5 Therefore, estimator for the population mean, y is unbiased estimator. iii) Repeat for the standard deviation; i and ii i.e. calculate point estimates for the population standard deviation and prove unbiasedness.

Possible Sample

Sample variance 31

Sample standard

samples 1 2 3 4 5 6

values 1, 2 1, 3 1, 4 2, 3 2, 4 3, 4

S 2 = 1 (y! y) 2 n-1 ! = 1 1 (1 2.5)2 + (2 1.)2 2-1 0.25 + 0.25 = 0.5 1 (1 2)2 + (3 2)2 2-1 1 + 1 = -2 1 (1 2.5)2 + (4 2.5)2 2-1 2.25 + 2.25 = 4.5 1 (2 2.5)2 + (3 2.5)2 2-1 0.25 + 0.25 = 0.5 1 (2 3)2 +4 3)2 1-2 1+1=2 1 (3 3.5)2 + (4 - 3.5)2 2-1 0.25 + 0.25 = 0.5

deviation S = S 2 0.5 = 0.7071 2 = 1.4142 4.5 = 2.1213 0.5 = 0.7071 2 = 1.4142 0.5 = 0.7071

ESTIMATION OF POPULATION PROPORTIONS DEFN: Population Proportion Is a ratio of the number of elements in a population with specific characteristic to the total number of elements in the population denoted by P. i.e. P = x, where x is total elements with specific characteristics n = population elements in a population. Population proportion P is estimated using sample proportion P i.e. P = x, where x is number of elements in the sample. n is the sample size. SAMPLING DISTRIBUTION OF SAMPLE PROPORTION, P. ii) Mean of P Mean of the sample proportion P is equal to the population proportion P and is denoted by MP. i.e. MP = P! (summation of all P divided by number of !=1 sample proportions) NCn All possible samples MP = P iii) Standard deviation of P Is denoted by dp i.e. dp = Pq n Where P = Population proportion 32

q=1P n = sample size Or SP = Pq n If n > 0.05 then N d=P Pq n

n 0.05 (large sample) N

where the factor N n is called the finite N-1

POPULATION CORRECTION FACTOR Example 18% of the working people polled by the Roper organization saying tier carriers are both personally and financially rewarding. Assume that this result is true for proportion of all working people. Let P by the proportion in a random sample of 100 working people who hold this view. Find the mean and standard deviation of P.

Solution P = 18% = 0.18; q = 1 P = 1 0.18 = 0.82 i. ii. Mean of P = MP = P = 0.18 Standard deviation of P = d P = pq N = 0.18 x 0.82 100

d p = 0.0384 INTERVAL ESTIMATION Definition Is the construction of an interval around the point estimate and then making probabilistic statement that this interval is likely to contain the corresponding population parameter. The interval is obtained by subtracting a number k from the point estimate and adding the same number to the point estimate i.e. Interval estimate = Point estimate + K where K is a number. The value of the number k depends on i. Standard error of the sample statistic. ii. Level of confidence to be attached to the interval. (The larger the interval the better the higher the confidence). DEFN: Confidence Level. 33

Is a probabilistic statement that gives a measure of confidence or certainty that an interval contains the true population parameter. Confidence level is denoted by: - (1 x) 100% where = significance level. E.g. 90%, 95%, 99%, confidence level and 10%, 5%, 1%, significance level respectively. Confidence level expressed as a probability is called the confidence coefficient and is denoted by 1 - .E.g. 0.9, 0.95, 0.99 confidence coefficients.

DEFN: Confidence Interval Is an interval that is constructed with regard to confidence level i.e. based on confidence level. Confidence Interval for M (Population Mean) With Large Samples. According to the central limit theorem, for large samples (i.e. n > 30) the sampling distribution of the sample mean, x is approximately normal. Hence, we use normal distribution to construct a confidence interval for M. The (1 - 100 ( %confidence interval for M is given by maximum error or margin of error. (i) 9x + Z dx) If d is known. Or X + Z Sx; If d is not known.
2

Where dx = d and Sx = S n n

Given population. Calculate d When the population is not given use sample. S = x - z S- to x + Z Sx
2 2

Z is read from standard normal distribution table.

34

confidence
level 2

(1-x) 100% Z1 probability for the error

significance region Z2 probability for the error

Total area = 1 removing error = 1 (x + )


2 2 = (1 - 100 (%

Maximum Error of Estimate for M (E) Is the quantity that is subtracted from and added to the value of x to obtain a confidence interval for M, and is denoted by E. i.e. E = Z dx
2

Or if d is not known E = Z Sx
2

Example A publishing company has just published a new college text book. Before the company decides the price at which to sell this book. It wants to know the average price of all such text books in the market. The research department at the company took a sample of 36 such text books and collected information on their prices. The information produced a mean price of 48.4 dollars for this sample. It is known that the standard deviation of the prices of all such text books is 4.5 dollars. Required. a) What is the point estimate of the mean price of all such college text books. b) Calculate the margin of error or maximum error for this estimate. (use 95% confidence interval) c) Construct a 90% confidence interval for the mean price of all such college text books. Solution n = 36; x = 48.4 d = 4.5 a) Point Estimate for mean, M is given by x = 48.4 b) Margin of error = Z dx or Z Sz
2 2

Z dx But dx = d = 4.5 = 4.6 = 0.75 2 n 36 6

35

Confidence interval = 95% which is given Confidence level = (1 - 95 = %100 (% (1 - 0.95 = %95 = ( 100% 1 - 0.95 = Therefore 0.05 = 0.95 1 =

1-0.95 =

2 2

0.025 = 0.05 = 0.025 = 0.05 = Z1


2

z2

Z2 = Z1 - = Z1 0.025 = Z0.975 from tables read the z value = 1.96 Margin of error = Z dx
2

= 1.96 x 0.75 = 1.47. LFT EST EFT EST EFT LST c) 90% confidence interval is given by x + Z dx But x = 48.4 2 dx = d = 0.75 n Z ,also confidence level = 90% = (1 - 100 (%
2

(1 - 90 = %100 (% 1 - 0.9 = 90 = 0.1 = 0.9 1 = 100

+Z = Z1 -
2 2 2 -

0.05 = 0.05=0.1= =

+Z = Z1 0.005 = Z095

+ Z 0 Z Look at the tables for the Z value

36

1.65 1.64

0.951 0.949

1.65 + 1.64 = 1.645 2 90% confidence interval is x + Z dx


2

= 48.4 + 1.645 x 0.75 = 48.4 + 1.23 48.4 1.23 to 48.4 + 1.23 47.17 to 49.63 Question 1. A sociologist wished to estimate the mean number of hours that a computer student spends in a computer during a particular week. A random sample of 64 students is selected and the number of hours spent on the computer is recorded. The sample mean is 14.7 hours and the population variance is 6.25. Find the 90% confidence limits for the population mean (4 marks). Solution n = 64 x = 1.47 d2 = 6.25 d = 6.25 = 2.5 90% confidence limits are given by x + Z dx But x = 14.7
2

dx = d = 2.5= 2.5 = 0.3125 n 64 8 Z ,also called confidence level = 90% = (1 - 100 (% 1 - 0.9 = %90 = 1 - 0.9 = :. = 1 9.0 = 1.0

+ Z = Z1 - = Z1 0.05 = Z0.95
2

0.05 = 0.1=
2 -

0.05=0.1= =
2 2

Z1

+ 2

Look in the tables for Z values. 37

0.95 is between 0.951 and 0.949 whose z values are 1.65 and 1.64 respectively hence fine average = 1.65 + 1.64 = 1.645. 2 90% confidence limits is x + Z dx substituting, 14.7 + 1.645 x 0.3125 14.7 + 0.514 The confidence limits are therefore 14.186 to 15.214. Confidence Interval for M with Small Samples Confidence interval for M when n 30 is given by; x + t or x + t Sx
2 2

Correction factor. If N is given then use the formula x +t dx N n 2 N-1

Where t is the students distribution table.

Confidence Interval for P (Population Proportion) with Large Samples. For large samples, i.e. n 30, the sampling distribution of the sample proportion P is approximately normal. Hence we use normal distribution to construct the confidence interval i.e. the (1 - 100 ( %confidence interval for P is given by P + Z dP
2

this is true if d is known if d is not known.

Or P + Z SP
2

Where dP = SP = i.e.

pq n pq n

q=1P q=1P z is read from the standard normal tables. Maximum Error or estimate for P. Ep = Z dp
2

Or Ep = Z Sp
2

38

Confidence Interval for P (Population Proportion) With Small Samples. Sample size is small when n 30 Confidence Interval P when n is small is given by:P + t dp 2 Or P + t dp N n 2 N-1 Example Estimate the population mean or proportion at 95% confidence level where the sample data are:i. Sample size = 25 ii. Standard deviation = 15 iii. Mean = 950. b) 30 out of 57 parts were effective. Solution Confidence level = 95% = (1 - 100 (% Get a 1 - 95 = % 100% 1 - 0.95 = 0.05 = 0.95 1 = (a) n = 25, 5 = 15, x = 950 0.05 (Consider that n = small) Confidence Interval for M when n is small is given by X + t Sx
2

But x = 950, 5x = S = n Correction factor

15 = 15 = 3 25 5 = 25 = -1.04 24 +Z = Z1-0.025 = Z0.975 Look in the tables for the 0.05 = 0.025 value 0.975 = 1.96
+ Z 1.96 + =
2

n = 25 1n 1 25

0.05 2

1-2 1 - 0.05
= 0.95 = 95%
+ 2

Z
2

x + Z Sx 39

1-n

950 + 1.96 x 3 x (-1.04) 950 + 1.96 _-3.12) 950 + (-6.1152) 950 6.1152 to 950 + 6.1152 943.8848 to 956.1152 b) Let x be equal to number of defective. x = 30 n = sample size = 57. Sample proportion P = x = 30 = 0.526 n 57 Confidence Interval for P when n is small is given by; P + t Sp
2

But P = 30 57 Sp= pq n = P (1 p) n =
30

/57 (1 30/57) 57

30 + 27 57 57 57 n 1n = 57 1 57

= 0.0661

= 57 56

= -1.017857

P + Z Sp (n/1 n) 30 + 1.96 x 0.661 x (-1.0178) 57 30 + (-0131862) 57 30 + -0131862 to 30 - -0.131862 57 57 0.39445 to 0.6582. 40

Assignment a) Define sample random sampling method b) A single random sampling of 30 household was drawn from a city area containing 14,848 households. The number of persons per household in the samples were as follows:5 6 3 3 2 3 3 3 4 4 6 2 7 4 3 5 4 4 3 3 4 3 3 6 2 4 3 4 2 4 15 11 13 12 7 12 10 11 9 11 Estimate the confidence Interval for the mean number of people in the area i.e. use 95% confidence level. (10 marks) c) Distinguish between point estimate and interval estimate. (6 marks) Solution a) Simple random sampling method is a method of choosing a sample such that each unit of the population has an equal and independent chance of being included in the sample. b) n = 30 N = 14, 848 Confidence interval = x + Z dx (n/1 n) correction factor
2

Confidence interval = x + Z dx
2

N-n n-1

x = x f = 112 = 3.73 f 30 Z 2 confidence level = 95% (1-100 (% 1 - 95 = (100 =)


2 2

-Z1 S=

+Z2

(x x) 2 = 15.0716 = 0.709 conf. Interval

41

x 2 3 4 5 6 7

f 4 11 9 2 3 1 f = 30

xf 8 33 36 10 18 7 xf = 112

x-x -1.73 -0.73 0.27 1.27 2.27 3.27

(x-x) 2 +2.9929 0.5329 0.0729 1.6129 5.1529 10.6929 15.0716

3.73 + 1.96(1.5106) 0.99902 3.73 + 1.509123947(1.96) 3.73 + 2.95788 9.77212 to 6.687882937. Confidence Interval = x + Z Sx N n 2 N1 = 3.73 + Z Sx N n 2 N-1 Sx = Sx = 0.7209 = 0.7209 n 30 0.477225 = 1.5106

Conf. Interval = 3.73 + Z 30 848 ,14 1.5106 2 14,848 1 3.73 + Z 0.9989467 1.5106
2

3.73 + Z 0.9902287) 1.5106 )


2

42

c) POINT ESTIMATE 1. Fixed number which we infer as an estimate or approximation of a Parameter. E.g. y = 9.5 is a point estimate of 6 given that population Parameters values and sample values are as follows. Pop Val sample Val Y1 8 Y1 8 Y2 3 Y2 11 Y3 2 Y4 11 Y5 5 Y6 7

INTERVAL ESTIMATE Its a subset/range of estimate/ approximation of a population parameter. e.g. 4 < y < 10.

3. TESTS OF HYPOTHESIS DEFN: Statistical Hypothesis Its a statement about the probability Distribution of a random variable x under study. Its a prediction about some aspect of a random variable. When a hypothesis is stated in terms of one or more parameters of an appropriate distribution, statistical methods can be sued to test its validity. Example i. The mean age of marriage for men in Kenya is at least 25 years. ii. The rate of serious crimes has remained the same over the past 4 years. iii. Married women have more depression than do married men. N/B Hypothesis is derived from theory and they serve as guides to research. STATISTICAL TEST/HYPOTHESIS TESTING/SIGNIFICANCE TESTS. Involves comparing what is expected according to the hypothesis with what is actually observed in the sample data. Possible results of hypothesis testing are:i. Accept a true hypothesis ii. Reject a false hypothesis correct iii. Reject a true hypothesis iv. Accept a false hypothesis incorrect

43

ELEMENTS OF A STATISTICAL TEST There are 5 basic elements of a statistical test of a hypothesis about parameter namely:i. Assumptions ii. Hypothesis iii. Test statistics iv. Significance v. Conclusion. i. i. ii. iii. Assumption Statistical tests based on conditions that must be met in order for the test to be valid i.e. The form of population distribution i.e. for even tests, the variable must be continuous or even normally distributed. The method of sampling i.e. formulas for most tests requires simple random sampling. The sample size i.e. must be known to either be small or large.

EXAMPLE A company is proposing to introduce a new system of production bonuses with the aim of improving productivity last year; the average production per man per day was 1020. Before introducing the bonuses, throughout the company, it is decided to test the new bonus scheme on a random sample of 60 employees. The mean production per day was found to be 1050 with a standard deviation of 120. Is there any evidence that the bonus scheme has improved production? (Use 5% of significance).

SOLUTION Observe X = average production per man per day. M = 1020 x = 1050 S = 120 n = 60 Assumptions 1. 5% level of significance 2. Random sampling method used. 3. Sample size; n = 60 i.e. large 4. x N (M, d2/n i.e. normally distributed. 44

II)

HYPOTHESIS A statistical test focuses on two hypotheses about the value of a parameter. 1. Null Hypothesis 2. Alternative Hypothesis

1) NULL HYPOTHESIS (H O ) This is the hypothesis that is actually tested and is denoted by (Ho). The purpose of the test is to analyse in probabilistic terms how strong the sample evidence is against the null hypothesis. 2) ALTERNATIVE HYPOTHESIS (H1 OR HA) This is the hypothesis that is accepted when the null hypothesis that is accepted when the null hypothesis is rejected and is denoted by (H1, or Ha) It consists of an alternative set of parameter values to those given in the null hypothesis. The researcher usually conducts the test to investigate whether the alternative hypothesis is true. N/B The alternative hypothesis is judged to be acceptable if the null hypothesis can be known to be inconsistent with the observed data. The two hypotheses should be formulated before analysing the data. Example Null hypothesis 1. Ho: M = Mo 2. Ho: M = Mo 3. Ho: M = Mo 4. Ho: M = Mo Refer to the Example 1. Hypothesis Ho: M = Mo : M = 1020 HA: M > Mo M > 1020 (iii) bonus scheme has not improved production Alternative hypothesis 1. HA: M = MA (some value different from M0. 2. HA: M # Mo 3. HA: M < Mo 4. HA: M > Mo

bonus scheme has improved production

Test Statistics

45

It is a point estimator of the parameter about which the hypotheses are made. Is derived from the sample obtained in the population under study and helps us test the nail hypothesis. Knowledge of the sampling distribution of the statistic allows us to calculate the probability that specific values of the statistic would occur if the null hypothesis were actually true.

E.g. x is an estimator of population mean M. we x N (M, d2/n) Transform this into a standard normal variable i.e. z = x M = x M hence z is a test statistic. d /n d Rejected region Acceptance region

z Example Refer to example 1 Solution Observe M = 1020 n = 60 x = 1050 S = 1120

Test statistic is given by z=xM d /n = 1050 1020 = 30 120 15.491 60 If the null hypothesis is true the z value = 1.9367. iv. OR xn s/n = 1.9367

Significance level Is a measure of how unusual the test statistic value is relative to what would be expected for its value if Ho were true. Is a measure of rejecting the null hypothesis when it is true. Is usually denoted by and sometimes known as significance level of the test. For the example is 5%.

46

v.

Conclusion This is the decision that we make of either rejecting or accepting the null hypothesis. Example Refer to example 1. Is z = 1.94 > 5% point If YES then the result is significant hence the null hypothesis if no, then the result is not significant hence accept the null hypothesis.

1 0.05 = 0.95 the equivalent z value is 1.645 0.05 = %5 = 1.645 If the value lies on the right region then we reject the null hypothesis. Example A finance company is concerned about the arrears of its payment on its loan account. In 1982, the average arrears of those accounts which were in arrears were 72.43 pounds. A survey conducted in 1983 using a simple random sample of 100 loan accounts which were in arrears showed that the mean arrears was 77.83 pounds with standard deviation of 16 pounds. Is there evidence that there has been a significance change in the mean debt of arrears? Use 1 - .% Solution Observe M = 72.43 n = 100 x = 77.83 s = 16 1 = % (i) 1. 2. 3. 4. Assumption Random sampling was used x N (M, d2) n = 100, large. 1 = %

(ii) Hypothesis Ho M = Mo (no significant change in mean) 47

M = 72.43 HA M # 72.43 (there is significant change in mean) (iii) Test statistic z=xM OR xM = d s /n /n z = 5.40 = 3.375 1.6 77.83 72.43 16 /100 = 5.40 /10

16

(iv) Significance level / rejection region. 1 = % Test = 2 selected test i.e. two tails.

0.005 = 2/

Acceptance

0.005 = 2/

region Rejection region Rejection region

z1

z2

(v) Conclusion Is z1 > z2 or z1 < z1 Is 3.375 > z2 or 3.375 < z1 2.58 -2.58 We reject the Ho i.e. there is significant change in mean at 1%.

ONE TAILED AND TWO TAILED TESTS. One tailed tests involve testing a change in one direction, while a two tailed test involves a change in both directions. N/B Problem situations with words that imply a change in one direction require one tailed test e.g. better, worse, improved, increased, reduced etc. Problem situations with words that imply a change in either direction require a two tailed test e.g. is there any difference? Is there any change?

Example A sociologist wishes to estimate the mean number of hours that a computer student spends on a computer during a particular week. A random sample of 64 48

students is selected and the number of hours spent on the computer is recorded. The sample mean is 14.7 hours and the population variance is 6.25. test whether the student spent more than 14 hours on the computer. Use 8) 0.05 = marks). Solution Observe; n = 64; x = 114.7 d2 = 6.25 M = 14 (i) (ii) Assumptions Significance level 0.05 = Random sampling used Sample size large n = 64. Hypothesis Ho: M = 14 (exactly 14 hours) HA: M > 14 (more than 14 hours) Test statistic Z1 = x - M or x M d d /n = 14.7 14 = 0.7 2.5 /64 0.3125 = 2.24

x N (M, d2/n)

(iii)

(iv)

Critical region / significance level Significance level = 0.05 = Type of test; one tail test (right tail) Z1 = z2 0.05 = z0.95 0.95 lies between 0.949 = 1.64 and 0.951 1.65 therefore
(Critical region)
1.64 + 1.65 = 3.29 = 1.645 2 2

Acceptance region
0

Rejection region, 0.05 = z1


(critical point)

(v) Conclusion 2.24 lies in the rejection region, thus we reject. Is z1 = 2.24 > z1 = 1.645 If YES we reject the Ho If NO, we accept the Ho. Decision Reject the Ho i.e. student spent more than 14 hours on the computer. Exercise A sugar refinery packs sugar into packets weighing on average 1 kg. the average weight of packets filled by the machines tends to fluctuate. In order to detect 49

shifts and the mean weight the refinery periodically selects random samples of 100 packets, weighs them and finds that their average is 1.03 kg with a standard deviation of 0.1kg. Are the consumers justified to complain of a difference in the average weight of the packets? i. At 0.05 = ii. At 0.01 = Observe M = 1kg n = 100 packets x = 1.03kg S = 0.1kg 1. Assumptions Significance level 0.05 = Random sampling used Sample size, large = 100 packets. x N (M, d2/n)

2. Hypothesis Ho: M= Mo Ho: M = 1 HA: M # Mo M#1 3. Test statistic. Z=xM z1 = 1.03 1= 0.03 s 0.1 n 100 0.03 = 3 0.1 0.01 10

4. Significance level (i) At 0.01 = Test type: Two tailed test. z2 = z1 0.005 z2 = z0.995 z2 = 2.6 0.005 = 0.01 = 2 0.005 = z1 M z2 50 0.01 =

(ii) At 0.05 = Test type: Two tailed test

z2 = z1- 0.025 Z 0.975 = 1.96 0.025 = 0.01 = 2 0.005 = z1 M z2 0.01 =

5. Conclusion Is Z1 > Z2 OR Z1 < Z2 Z1 = 1.196 which is < Z2 because Z1 = 3 thus Z2 < Z1 i.e. 1.96 < 3. We therefore reject the Ho i.e. there is significant change in the weight of the packets. Types of Errors During the statistical testing, we make a decision on rejecting the null hypothesis or accepting it on the basis of a random sample of size n where n is fixed. In this case, the sample space of observations can be partitioned into two disjoint subsets i.e. W and w. w i.e. Acceptance region

w (Rejection region)

If the observed points * fall inside w we reject the null hypothesis if they fall in w we accept the null hypothesis. We call w the rejection region or critical region and w is called the acceptance region. In making the two possible decisions i.e. rejection or acceptance, there are two possible errors that we can commit i.e. i. Type 1 error ii. Type II error. (i) Type I Error 51

Is an error of rejecting the null hypothesis when it is true. (ii) Type II Error Is an error of accepting the null hypothesis when it is false. N/B Probability of making both types of errors is very small and if there is a test that minimises both types of errors then it is the best test. But there isnt. One way out if fixing the probability of I error and minimising the probability of Type II Error i.e. we fix. Pr (Type I error) = Where 0 1 is a specified number and we minimise. Pr (Type II Error) = We call Pr (type I error) = the size of the test or size of the critical region or level of significance of the test. SIGNIFICANCE TEST FOR PROPORTION. The procedure for testing is similar to that of testing simple mean. The hypothesis in this case considers proportion (i.e. Po) the actual population proportion and z in this case calculated as z = P Po = P Po = P Po =P-P dP pq pq p (1-p) n n n Where n = sample size p = sample proportion Po = actual population proportion. Example It is claimed that a process produces not more than 30% defective items. Random sample of 100 items of the process was found to contain 43% defectives. Investigate the validity of the claim at 1 = .% Solution Observe. Po = 30% n = 100 P = 42% 1 = % (i) Assumptions Random sampling used P N P, pq) 52

n Sample size, n = 100 larger 1 = % (ii) Hypothesis Ho: Po = 30% (defective still 30%) HA: Po < 30% (defectives not more than 30%) (iii) Test statistic Z = P Po = 0.42 0.3 Pq 0.3 x 0.7 n 100 = 0.12 0.21 100 = 0.12 0.21 10 = 2.6186

(iv) Significance / critical region 0.01 = %1 = Is a one tailed test left. Z1 = 1 0.01 = 0.99 they are 4 the middle 2 0.322 + 0.323 2 = -2.325 Acceptance region 0.01 = Rejection region z1 Conclusion We accept the Ho. 0

4. Regression Theory Definition: Regression. Its a term that is used to describe relationships between two or more variables. In many situations, relationship exists between two or more variables i.e. i. Mass and height ii. Total cost and number of units produced. - Increase in number of units produced leads to increase in total cost. iii. Income and expenditure. - Increase in income leads to increase in expenditure 53

Exam performance and number of lectures attended - The more the number of lectures attended the higher the likelihood of performing ell in exams. v. Income and level of education. DEFN: Dependent Variable Is a variable whose value is determined by the value of another variable. Its dented by Y. DEFN: Independent Variable. Is a variable whose value is not affected by a change in value of another variable. Controls change in dependent variable. Its denoted by X N/B for various purpose, it is useful to excess relationships between variables in mathematical terms i.e. Y = f(x) (Regression Equation / model) Dependent variable Independent variable The above equation is called a regression equation or model. Bivariate regression: - two variables relationships. Multivariable regression: many variable relationships.

iv.

Linear and Nonlinear Regression A regression model / equation that give a straight line relationship between two variables are called a linear regression. Otherwise it is called non linear regression. LINEAR REGRESSION Y * * * * LINEAR REGRESSION ANALYSIS The following techniques are used:i. Scatter Diagrams ii. Curve and Line fittings iii. Least Squares Method. i. Scatter Diagrams Is a plot of paired observation, on the Cartesian graph i.e. dependent variable against independent variable. 54 NON LINEAR REGRESSION Y * * * * * *

Y * * * * * X Example Draw a scatter diagram for the data below. Income Expenditure (in dollars) X Y 35 9 49 15 21 7 39 11 15 5 28 8 25 9 16 14 12 10 8 6 4 2 15 20 25 30 35 40 45 50 A simple scatter diagram may help in:i. Telling whether or not there appears to be a relationship between two variables. ii. If there is a relationship it may indicate whether it is linear or non linear. * * * *
Line, curve fitting

* *

55

iii.

If the relationship is linear, the scatter diagram shows whether it is negative or positive. +ve as x increases, y also increases. -ve as x increases, y decreases.

Exercise Draw a scatter diagram for the data below and make conclusions. Year 1 2 3 4 5 6 7 8 9 10 Advertising (x) 000 106 105 90 80 80 85 87 92 90 95 10 9 8 7 6 5 4 3 2 1 80 85 90 95 100 105 110 115 Conclusion i. There is a relationship. ii. The relationship is linear iii. It is a positive relationship. * * * * * * * * * * Sales Y ( 000) 9 8 5 2 4 6 4 7 6 7

56

(ii) Curve or line fittings Its sketching an approximation curve or line on the scatter diagram i.e. approximation of the curve or line of best fit using free hand. Not all individually observed points will necessarily be on the line. The points outside the line are called outliers. (iii) Method of least squares. Is a method that is used to determine the equation of the straight line that best fits the observation i.e. line of best fit. Line of best fit is one that reduces the sum squares of the deviation of each observation (normally vertical deviations) from the line to the minimum. Graph of Y against X axis Y
(x1, y1) (X3, Y3) (X5, Y5) d5
d6

i.e. d12 + d22 + d32 ---- + dn2 gives the measure of goodness Of fit of the line or curve.
dn

d2

d4

(xn, yn)

(x2, y2

d2

d4 (x6, y6) (x4, y4)

X Line of best fit is the one that reduces this value to a minimum. Y = A + Bx Dependent Variable Independent variable slope or gradient.

Y Intercept (Constant) The above equation gives an exact relationship between X and Y hence called deterministic model. States that Y is determined exactly by X and for a given value of X there is only one value of Y. however, in most cases, the relationship between variables is not exact i.e. apart from the independent variable, and other factors do affect the value of a dependent variable. Hence we allow an error factor (which is random) in our regression model i.e. Y = A + B x + E 57

Random error. The new regression model above is called Probabilistic model. The random error may be due to:1. Missing or omitted variables. 2. Errors in collecting data. The population regression line is estimated using sample data i.e. y = a + bx + e or y = A + Bx + E Population regression line is given by Y = A + Bx + E Sample regression line is given by Y = a + bx + e or Y=A+Bx +e The sum of these errors is always 0 i.e. e = (Y Y) = 0. The errors are the deviations whereby there are positive and negative deviations. Normal Equations of the Regression Line. To find the line that best fits the scatter points, we cannot minimise the sum of errors, but we minimise the error sum of squares denoted by. S.S.E (sum of squared Errors) S. S. E = n e 2 = (Y Y) 2 Or di 2
iz1

This is based on say n observations (xiyi) I = 1, 2, -------------n which implies that Yi = a + bxi + ei Ei = Yi a bxi ei 2 = (Yi a bxi) 2
i=1

To minimise sum of squared error, we find the differentiation. Find; 2SSE = 0 ---------- (i) 2b 2SSE = 0 ---------- (ii) From (i) 2SSE = 2 (Yi bxi) x (-xi) = 0 = -2 (Yi bxi) xi = 0 = (Yi bxi) xi = 0 = xiyi xi bxi = 0 = xiyi xi bxi2 = 0 = xiyi xi b xi2 = 0 xiyi = xi + bxi2 = X + b X2 (i) 2SSE =2XY(yi bxi) x (-1) = 0 2 = -2 (yi bxi) = 0 = (yi bxi) = 0 = yi n bxi = 0 yi = n + b X 58

From (ii)

Y = na + b X

So, the two equations are:i. X Y = X + b X2 ii. Y = na + b X

Solving these equations simultaneously we get from (ii) Y = na + b X Making na the subject, na = Y b X dividing n on both sides, = Y b X rearranging = Y b X n n n Thus =YbX

From (i) and (ii) (i) X Y = X + b X2 (ii) x X ( Y = na + b X) n (i) (ii) X Y = X + b X2 x Y = X + b ( X)2 n n

Subtracting (i) (ii) X Y X Y = 0 + b X2 b ( X)2 n n Multiplying by n n X Y x Y = bn x2 b (X)2 Factor out b n X Y x Y = b (n x2 (X)2) b = nxy XY nX2 (X)2) N/B (i) (ii) BUT 59

"a" is the Y intercept and "b" is the slope of the least squares regression line.

Variance of x is given by (i.e. denoted by Sx2). The variance of x Sx2 = 1 (x X)2 n

Expanding the bracket. Sx2 = 1 ( x2 2xx + x-2) n Sx2 = 1 ( x2 2x x + x2) n Sx2 = x2 n x2 n ------ (i)

Covariance of x and Y (i.e. denoted by SxY2) is given by: - covariance is the Joint between 2 related variables. Therefore, b = covariance of (x, Y) Variance (x) b = Cov x, y = S2xy Var (x) Sx2 = XYXY n n2 2 x x)2 n n = X Y x2 Y = xY xY x n2 n n2 n n2 nX2 (X)2 nx2 X2 n2 multiplying n2 nx2 (x2) =nXY XY nx2 (x)2 nx2 (x)2

PROOF b = S2xy Sx2

to each of bracket,

b = n X Y x Y = Sx2y hence proved. n x2 (x)2 Sx2 Examples 1. Given that x = 1322, X Y = 6062, Y2 = 46,

Y = 78 x2 = 117, 532 n = 15

60

Find the regression equation connecting the values of x and y. Find also the covariance of (x, y). Solution Regression line is given by Y = + bx. Therefore b = n xy xy n x2 (x)2 b = 15 x 7072 1322 x 78 15 x 117,532 (1322)2 b = 106080 103116 = 2964 = 0.19378 1762980 1747684 15296 Also = y bx = Y b X n n = 78 0.19378 x 1322 15 15 = 78 17.0781248 15 = 5.2 17.0781248 = -11.8781248 This means that the regression equation is given by y = a + bx y = -11.8781248 0.193776 y = 0.93776 x -11.8781 (ii) Covariance is given by Sx2y = XY xY n n2 2 Sx y = 7072 1322 x 78 = 13.17 15 15 2 Sx = 471.4666 20623.2 Sx2y = -2015173346 Exercise Student 1. A 2. B 3. C 4. D 5. E 6. F 7. G 8. H 9. I Mark in paper 1 X 42 84 50 42 33 50 69 81 50 61 Mark in paper 2 Y 31 83 42 60 28 63 59 92 73

10 J

35

40

Required a) Draw a scatter diagram for the data above. b) Find the regression equation connecting X and Y. Solution X Y X2 A 42 31 1764 B 84 83 7056 C 50 42 2500 D 42 60 1764 E 33 28 1089 F 50 63 2500 G 69 59 4761 H 81 92 6561 I 50 73 2500 J 35 40 1225 2 x = 536 y = 571 x = 31720 xy

XY 1302 6992 2100 2520 924 3150 4071 7452 3650 1400 = 33541

Y2 961 6889 1764 3600 784 3969 3481 8464 5329 1600 2 y = 36841

Regression line is given by y = + bx x = 536 = y bx but = Y and x = X y = 571 n n x2 = 31720 = Y b X y2 = 36841 n n xy = 33571 b = n XY - XY n= 10 X2 (X)2 b = 10 x 33541 536 x 571 = 335410 306056 10 x 31720 (536)2 317200 87290 b = 29354 = 0.981607811 29904 =ybx = Y b x = 571 0.981608 x 536 n n 10 10 = 571 52.6141888 = 57.1 52.6141888 10 = 4.4858112 This means the regression equation is given by y= + bx y= 4.485811 + 0.981607811x Get the range Range x y

62

Max Min

84 33 51

92 31 64

Question 1. A manufacturer wishes to establish a procedure to estimate direct labour cost for small batch production orders. He has obtained a random sample of recorded actual direct labour cost for a sample of 10 batches as follows. Batch size X 15 18 18 21 23 23 26 28 32 37 X = 24 Direct Labour Cost Y 200 240 260 290 300 320 380 370 400 470 Y = 3220 X2 225 324 324 441 529 529 676 784 1024 1369 x2 = 6225 X Y 3000 4320 4680 6090 9600 7360 9880 10360 12800 17390 xy = 82780

Required 1. Calculate the least squares regression line relating labour costs to batch size. N = 10 b = n x Y x Y = (10 X 82780) (24 X 3230) n x2 (x)2 10 x 6225 - (241)2 b = 827800 778430 = 493700 = 11.8421 62250 58081 4160 =ybx = Y b X = 3230 11.8421 x 241 10 10 The regression equation connecting x and y y=+b y = 37.656 + 11.8421x Exercise US PRODUCTION of steel in tonnes is shown below. YEAR PRODUCTION IN TONNES 1946 1947 66.6 84.9 63

1048 1949 1950 1951 1952 1953 1954 1955 1956

88.9 78.0 96.8 155.2 93.2 111.6 88.3 117.0 115.2

Required a) Draw a scatter diagram for the above data. (4 marks) b) Calculate the line of least squares. (9 marks) c) Calculate the forecast production of the year 1958. (3 marks) S 1 2 3 4 5 6 7 8 9 10 11 X 1946 1947 1048 1949 1950 1951 1952 1953 1954 1955 1956 x21461 Y 66.6 84.9 88.9 78.0 96.8 155.2 93.2 111.6 88.3 117.0 115.2 y1095.7 X2 3786916 3790809 3794704 3798601 3802500 3806401 3010304 3814209 3818116 3822025 3825936 x241870521 XY 129603.6 165300.3 173177.2 152022.0 188760.0 302795.2 181926.4 217954.8 172538.2 228735.0 225331.2 xy2138143.9

b) n = 11 x y = 2138143.9 x2 = 41870521.0 Y = 1095.7 X = 21461 b = n XY X Y = 11(2138143.9) 21461 x 1095.7 n x2 (x)2 11(41870521) (21461)2 b = 23519582.9 23514817.7 460575731 460574521 b = 4765.2 = 3.9381818 1210 =ybx = Y b X = 1095.5 3.9381818 x 21461 n n 11 11 = 99.60909 7683.392692 64

= -7583.783602 The regression equation of the line is y=a+bx y = - 7583.78 + 3.9338 x c) Forecast production of the year 1958 is y=a+bx y = -7583.783602 + 3.9381818 x 1958 y = -7583.783602 + 7710.959964 y = 127.176362 Thus forecast production of 1958 is 127.176362. 2. In a large city, the distributions of incomes per family had standard deviation (d) = 3250 for a random sample of 400 families from this population. Calculate the probability that the sample mean x is correct to within. (i) + 100 (ii) + 500 i) n = 400 d = 3250 x = 100 dx = d = 3250 n 400 Z dx = 100
2

dx = 162.5

Z x 162.5 = 100
2

Z 0.62 = 0.615 = 100 = 2 162.5 Therefore the probability is equal to 0.732 ii) n = 400 (large) d = 3250 x = 500 dx = 162.5 Z dx = 500
2

Z x 162.5 = 500
2

Z 500 = 2 162.5 Z 3.1 = 3.076 =


2

65

Therefore the probability is equal to 0.999.

5. Correlation Theory DEFN: its a term used to describe the degree of the relationship variables. - It determines how well a linear or other equation describes the relationship between variables. - Linear regression variables are perfectly correlated if all the observation lies on a straight line. If there is no obvious relationship the variables are Uncorrelated. - Correlation can be either positive (y increases as x increases) or negative (y decreases as x increases). Perfect correlation Y * * * * * * * Y * Positive correlation

X X Negative correlation Y * * * * * X X Product moment correlation coefficient Coefficient of correlation (r) 66 No correlation Y * * * * * * * * * * * * * * * * * * *

DEFN: Is a measure of the extent to which the two variables are related. It measures the extent to which the independent variable x accounts for variability in the dependent variable y. The value r varies from -1 to +1. If r = 0 there is no correlation and therefore no relationship. If r = +1 its a perfect correlation and no outliers. If r = -1 its a perfect correlation. CONCEPTS OF VARIATIONS. Consider the following scatter diagram. Chart Y 7 6 5 4 3 2 1 70 deviation sums = 0 *Y * * * * * * X 80 90 100 110 120 130 140 150 The line y = 5.2 superimposed on the cater diagram is the line representing the mean value of y and it is the same for all values of x. There are of course variations of the values of y about y. The total variation is found by summing the squares of individual deviations or deviations; and if y has been calculated correctly, the sum of the deviations must be = to 0. i.e. sum of deviation; (y y) = 0 * *Y Y 5.2 (mean) * * line of regression Y=a+bx

Using sum of squares of deviations; (y y)2 total variation. If we take the average of the deviations squared, sum, we have the variance of y. i.e. (y y)2 = variance of y (S2y) n 2 S y = 1 (y y)2 = y2 Y)2 n n n Suppose also that a line of regression is inserted on the scatter diagram, each point on the regression line differs from the value of Y and we consider the deviation (Y Y) of which the sum of the squares will be (Y Y) 2 .

67

The variation is entirely explained by the regression line since Y depends on X. there is yet another variation to be considered i.e. the variation of the points on the scatter diagram around the regression line which is not explained i.e. Y Y whose sum = (Y Y) 2 . This forms part of the total variations. Thus total variation squared = (Y Y). Explained variation squared = (Y Y) 2 Unexplained variation squared = (Y Y) 2 Therefore Total variation = Explained variation + Unexplained variation. i.e. (Y Y) 2 = (Y Y) 2 + (Y Y) 2 Divide both sides by total variation. 1 The quantity, = (Y Y) 2 + (Y y) 2 (y Y) 2 (Y Y) 2 (Y Y) 2 = Explained variation (Y Y) 2 Total variation

Is called the COEFFICIENT OF DETERMINATION and is denoted by r 2 i.e. r2 = (Y Y) 2 Always between 1 and 0. (Y Y) 2 Observation mean The coefficient of determination measures the proportion of the total variation that can be explained by the variation in x. 100% variation shows perfect correlation. It is also used to measure how good the line of best fit is. The value of coefficient of determination should NOT be greater than 1 NOR less than 0, since we cannot have less than NO variation explained. When r2 = 0 there is no variation. If (Y Y) 2 = 0 then we have perfect CORRELATION, and if not then there (Y Y) 2 is LESS THAN perfect correlation. The quantity given by; r= Explained variation Total variation = (Y Y) (Y Y)
2 2

Is called the COEFFICIENT OF CORRELATION and it ranges between +1 and -1.

68

The above method is however more involving since it calls for the calculation of the regression line as well before taking the variations. The above quantity can be calculated by the method below. (Joint variance of x and y) r = Covariance (x, y) = S 2 xy = S 2 xy d 2 x. d2y S 2 x. S 2 y Sx. Sy.

But Covariance (x, y) = X Y X Y = S 2 xy n n2 2 2 2 Variance (x) = x x =Sx n n Variance (y) = Y2 n Y n


2

= S2y

Which is the correlation coefficient and is normal referred to as The Pearsons Product Moment Correlation Coefficient. N/B. The coefficient of determination is r2 and it measures the proportion of total variation that can be explained by the variation in x. Example. Calculate the Pearsons product moment correlation coefficient for the data below. Quantity X 15 12 20 17 12 25 22 9 18 30 X = 180 r= Cost Y 180 140 230 190 160 300 270 110 240 320 Y = 2140 X2 225 144 400 289 144 625 484 81 324 900 2 X = 3616 Y2 32400 19600 52900 36100 25600 90000 72900 12100 57600 102400 2 Y = 501600 X Y 2700 1680 4600 3230 1920 7500 5940 990 4320 9600 XY = 42480

nXYXY (n x2 ( x) 2 (n Y2 ( Y) 2)

69

r = (10 x 42480) (180 x 2140) (10 x 3616 (1802) ((10 x 501600) (2140)2) r= 39600____ 36160 32400) (5016000 4579600) r= 39600 ______ = 39600_____

3760 x 436400 1640864000 r = 0.977594 r = 0.9776 (Pearson product moment correlation coefficient.) Coefficient of determination is given by r2. r2 = (0.9776) 2 = 0.9557 = 95, 57%. Interpretation of r. The value suggests that there is a strong positive relationship between the variables. Interpretation of r 2 95.57% of the variation in cost (y) is explained by the change in quantity x. 4.43% of the variation in cost (y) is not due to the changes in quantity but by other factors. Exercise Calculate the coefficient of correlation hence the coefficient of determination for the data below. Hence interpret. n 1 2 3 4 5 6 7 8 9 10 X 15 24 25 30 35 40 45 65 70 75 x = 424 Y 60 45 50 35 42 46 28 20 22 15 Y = 363 X2 225 576 625 900 1225 1600 2025 4225 4900 5625 21926 70 Y2 3600 2025 2500 1225 1764 2116 784 400 484 225 Y = 15,123 X Y 900 1080 1250 1050 1470 1840 1260 1300 1540 1125 x y =

12,815 r= n x y x y______________ (n x2 ( x) 2) (n Y2 ( Y) 2 r = (10 x 12815) (424 x 363) ______ (n x2 ( x) 2) (n Y2 (Y) 2) r = 128150 153912 (10 x 211926) 4242) ((10 x 15123) 3632) r = - 25762 (219260 179776) (151230 131769) r = -25762 = 39484 x 19461
-

25762 = -0929365. 27719.99502

Coefficient of determination is given by r2 = (-0.9294)2 r2 = 0.8638 = 86.38% Interpretation of r. The value suggests that there is a strong negative relationship between the variables. Interpretation of r 2 86.38% of the variation in y is explained by the change in x. 113.62% of the variation in y is not due to the changes in quantity but by other factors. SPEARMANS RANK CORRELATION COEFFICIENT Its purpose is to establish whether there is any form of association between two variables when they are arranged in a ranked form. Ranking is done in some way i.e. - Order of size - Importance - Merit etc. When two variables are both ranked, the ranking should be very similar if there is a high degree of correlation between them. This degree is measured using spearmans coefficient of Rank correlation, denoted by Ys i.e. Ys = 1 6 D2 71

n (n2 1) Where D = difference between ranks of corresponding pairs i.e. (x and y) n = Number of pairs (x, y) OR rs = 1 6 ( D2 + +3 - + 12 ___________ n (n2 1) i.e. when two values tie in their ranking.

i.e. This formula is used if there are two or more scores tied; where t = number of ties. N/B when two or more scores tie for the same position (Rank) they are assigned the average of their individual ranks. Example Eight students took E.D.A examinations in a college in both economics and accounts. Calculate the coefficient of rank correlation between the subjects if their results are as follow. Student A B C D E F G H Marks Economics 3 76 4 73 1 84 5 71 7 63 2 81 8 58 6 65 Accounts 60 3 54 5 76 1 56 4 51 6.5 69 2 49 8 51 6.5

Solution Arrange in descending order i.e. Student Economics C 1 F 2 A 3 B 4 D 5 H 6 E 7 G 8

Accounts 1 2 3 5 4 6.5 6.5 8 72

0 0 0 0 1 1 0.5 0.5 0

02 0 0 0 1 1 0.25 0.25 0

0 2 Note 02 = 2.5, n = 8 += 2 rs = 1 6 ( D2 + +3 - + 12 ______________ n (n2 1) rs = 1 6(2.5 + 23 2) _____________ 8 (82 1) rs = 1 6(2.5 + 8 2) 12 _____________ 8 (64 1) rs = 1 (6 x 3) = 1 8 x 63 28 rs = 1 = 1 = 27 28 28

2.5

Exercise 2001 1. a) Distinguish between regression and correlation. (2 marks) b) A software development company is analysing demand for one of its software products. The company has collected the data below for the analysis. Year No. of products purchase 1993 1994 1995 1996 1997 1998 1999 51 10 53 58 60 63 60

Required i. Determine the regression line relating quantity purchased to time. ii. Determine the correlation coefficient for the relationship determined in (i). iii. Determine the coefficient of determination and comment on your results. 1998 paper 2. Explain the following terms. 73

a. (i) Coefficient of rank correlation (ii) Coefficient of determination. (iii) Spurious correlation b. The data below shows the record of all expenses incurred in running Mr. Magaris care for the last 10 years. Age of car years 1 Expenses 1 2 2 3 3 5 4 8 5 6 6 9 7 10 8 13 9 12 10 13

Required i. What doe the value of the coefficient of correlation indicate ii. What is the purpose of finding the coefficient of correlation in respect to the data above Solution 1. Regression is a term that is used to describe the relationship between two or more variables whereas correlation refers to the degree of the relationship between two or more variables. b (i) Regression is a term that is used to describe the correlation refers to the degree of the relationship between two or more variables. b (i) Year x 1993 1994 1995 1996 1997 1998 1999 x13972 No. of products purchased Y 51 10 53 58 60 63 60 y395

n 1 2 3 4 5 6 7

X2 3972049 3976036 3980025 3984016 3988009 3992004 3996001 x 2 = 27888140

Y2 2601 2500 2809 3364 3600 3969 3600 y 2 = 22443

X Y 101643 99700 105735 115768 119820 125874 119940 xy = 788480

Regression line is given by y = a + bx =Ybx but Y = IY and x = x n n and n = 7

=Ybx n n

74

= 395 b 13972 7 7 = 56.42857 1996 b b = n X Y x Y = 7 x 788480 13972 x 395 n x2 (x)2 7 x 27 278888140 - 139722 b = 5519360 5518940 195216980 195216784 = 56.42857 1996 x 2.14286 = 420 = 2.142857 196 = -4220.71999

ii) Correlation coefficient r = n X Y X Y (n x2 (X)2 (n Y2 (Y)2 r = 7 x 7888480 13972 x 395______________ (7 x 27888140 13972)2) (7 x 22443 (395)2 r = 420______ 196 x 1076 = 420___ _ 459.2341451 = 0.914566

(iii) Coefficient of determination is = r2 = (0.914566)2 r2 = 0.8364 Interpretation of r 2 . 83.64% of the variation in Y is explained by the changed in X. 16.36% of the variation in Y is not due to the change in x but the other factors. 2. Coefficient of rank correlation has a purpose of a) Establishing whether there is any form of association between two variables when the variables are arranged in a ranked form. When two variables are ranked the ranking should be very similar if there is a high degree of correlation between them. Its denoted by rs. Thus rs = 1 - 6D2 n (n2 1) where D is the difference between ranks of corresponding pairs.

Coefficient of determination (ii) It is a measure calculated to find out how good the line of best fit really is. It is denoted by r2 (because it is the square of the correlation coefficient, r) 75

It calculates what proportion of the variation in the actual values of Y may be predicted by changes in the values of x. Thus r2 is the ratio Explained variation = (YE 2 ) Total variation (y )2 Where YE = Estimate of y given by the regression equation for each value of x. = mean of actual values of y. Y = individual actual values of y.

(iii) Spurious correlation / nonsense correlation. Its a situation whereby you find two variables which produce a high calculated r value yet which have no casual relationship e.g. Wheat harvest in America and the number of deaths by drowning in Britain. b) Age in years X 1 2 3 4 5 6 7 8 9 10 X = 55

Expenses Y 2 3 5 8 6 9 10 13 12 13 Y = 81

X2 1 4 9 16 25 36 49 64 81 100 x 2 = 385 =

Y2 4 9 25 64 36 81 100 169 144 169 2 y = 801

X Y 2 6 15 32 30 54 70 104 108 130 xy = 551

r=nXYXY (n X2 ( X)2 (nY2 (Y)2) r = 5510 4455 (3850 3025) (8010 65617)

(10 x 551) (55 x 81)__________ (10 x 385) 552) (10 x 801) 812)

1055 ___ 825 x 1449

1055 1093.35

= 0.96

Interpretation of r. The value suggests that there is a strong positive relationship between the variables. 76

6. Operations Research DEFN: Is a research approach for solving problems in design and decision making. N/B. One of the functions of managers is to make decisions on planning controlling and monitoring business operations. Decision making involves. i. Finding occasions for making decisions ii. Finding possible causes of actions iii. Choosing among causes of actions iv. Evaluating past choices. Managers require quality and timely information for decision making. However, there are some constraints which affect any decision to be made i.e. limited personnel, limited finances, limited time, and limited raw material. Therefore, operations research provides a set of techniques for analysing available information in order to make best decision possible or to solve a problem in the best way possible. Characteristics of Operation Research (OR) OR Operation Research Steps or Procedures. 1) Identify the problem, objective function, constraints and corresponding decision variables 2) Formulate the model 3) Assess the model parameters 4) Obtain the solution of the model 5) Exploit all the solutions. Operations Research Techniques 1) Linear programming 2) Network planning / Analysis 3) Simulation Others 4) Dynamic programming 5) Integer programming 6) Goal programming 7) Quadratic programming Application areas 1) Optimisation in networks e.g. cabling, plumbering etc. 2) Transport planning e.g. in urban areas 3) Controlling traffic flows 77

4) Scheduling vehicle, and service towns 5) Stock control systems. OPERATION RESEARCH TECHNIQUES

1. LINEAR PROGRAMMING (L.P)


DEFN: Is a technique for allocating limited resources in the best way possible in order to achieve the desired objective. Is concerned with utilisation of scare resources to the best advantage. NB L.P is concerned with optimisation problems e.g. minimising cost or maximising profits subject to some constraints. (Labour, machine time, space etc) Example 1) Consider M industrial estates and N housing estates. The problem is to minimise the total transport cost of workers from residence to factory. 2) Consider M warehouses supplying N supermarkets. The problem is to minimise transport cost. Linear Programming Assumptions 1) Problem must be capable of being stated in numeric terms 2) Variables involved must have linear relationships i.e. increasing one output doubles the labour for example. 3) Problem must permit a choice between alternative causes of action 4) There must be restrictions on the factors involved. Linear Programming Concepts 1. Model Is a representation / abstraction of the actual object or situation. There are physical and logical models etc. examples of models include flowcharts, mathematical equation. 2. Objectives function 78

Is a mathematical function which is to be optimised (minimised or maximised) according to the desired objectives. 3. Constraints Are mathematical equations or inequalities which express the restrictions imposed on the problem. 4. Feasibility Solution Are values of the decision variables which satisfy all the constraints. FORMULATION OF L.P MODELS Is the formation of the objectives function and the constraints in terms of the decision variables. Examples 1. A textile factory produces 3 types of cloth A, B and C each of which is a mixture of wool and Terylene. The number of units of wool and Terylene needed to make a unit of each type together with the profit to be made is given below. Type Units of wool Units of Profit Terylene A 3 2 4 B 1 1 1 C 4 3 5 The maximum number of units of wool and Terylene are determined by the board, the current units being wool = 8000 units and Terylene = 3000 units. The management requires to maximise the profit. Formulate an L.P model for this problem. Solution Let x1, x2 and x3 be the number of units of A, B, and C to be produced respectively, and z denotes the total profit. Decision variables X1 = number of units of A X2 = number of units of B X3 = number of units of C Z = Total profit. Then, maximise profit such that wool = constraint, Terylene = constraint. Max, Z = 4x1 + 1x2 + 5x3 objective function Such that Wool: 3x1 + 1x2 + 4x3 8000 Terylene: 2x1 + 1x2 + 3x3 3000 Constraint x1, x2, x3 0 (minor constraint)(maximised)

L.P model

79

2. A farmer requires to feed his pigs as cheaply as possible. The pigs required a diet consisting of a minimum amount of 3 nutrients n1, n2 and n3 which forms a part of 4 commercially available foodstuff f1, f2, f3 and f4. The number of units contained in each foodstuff and cost per unit is given below. Nutrients Foodstuff f1 5 3 4 1.0 f2 8 8 0 0.9 f3 4 7 5 1.2 f4 1 5 4 0.9 Minimum requirement 50 42 8

n1 n2 n3 Cost / Unit

Required Set up an L.P model which will help the farmer to solve his problem. Solution Decision variables Let x1 = number of units of f1 x2 = number of units of f2 x3 = number of units of f3 x4 = number of units of f4 i.e. let x1, x2, x3 and x4 be the number of units of f1, f2, f3 and f4 to be bought and z dente the total cost. Problem Minimise cost Subject to nutrients n1 = constraint n2 = constraint n3 = constraint. Then, min z = 1.0x1 + 0.9x2 + 1.2x3 + 0.9x4 objective function Subject to the nutrients n1: 5x1 + 8x2 + 4x3 + 1x4 50 n2: 3x1 + 8x2 + 7x3 + 5x4 42 constraints. n3: 4x1 + 0x2 + 5x3 + 4x4 8 Exercise 1. A tailor has the following materials available i.e. 16 units cotton, 11 units silk, 15 units wool. A suit requires the following i.e. 2 units cotton, 1 unit silk, 1 unit wool. A gown requires the following i.e.1 unit cotton, 2 units silk, 3 units wool. If a suit sells at 30 dollars and a gown at 50 dollars, how many of each garment should the tailor make to obtain a maximum amount of money. Formulate an L.P model for the problem.

80

Materials Available Suit Gown Cotton 16 2 1 Silk 11 1 2 Wool 15 1 3 Selling price 30 dollars 50 dollars.

Solution Let x1, x2, and x3 be the number of units of cotton, silk and wool. Suit Gown Available Cotton 2 1 16 Silk 1 2 11 Wool 1 3 15 Cost 30 dollars 50 dollars

Objective Function. Constraints will be cotton 2x1 + 1x2 16 Silk 1x1 + 2x2 11 Wool 1x1 + 3x2 15 x1, x2 0 minor constraint.

constraints

2. A company has 3 factories and 3 distribution centres for a particular product. If the capacities of each factory i.e. requirement for each centre and cost of transporting the item from any factory to any centre is as given below. Factory A B C Centre X 35 40 50 50 Availability 40 20 40 100

Y 30 30 60 20

Z 40 100 95 30

Set up an L.P model which will help a company in deciding how the centres should be supplied. Factory A B C X X1 35 X5 40 X9 50 50 Centre Y X2 30 X6 30 X10 60 20 Availability Z X3 40 X7 100 X11 95 30 X4 40 X8 20 X12 40 100

Solution Decision variables Let x 1 81

Methods of solving L.P Problems There are two methods i. Graphical ii. Simplex 1. Graphical method This method is applicable only to those problems which involve only 2 decision variables All the constraints are represented on the same graph and then intersection points given the feasible region of the problem i.e. a feasible region within which feasible solutions are found.] Example Refer to exercise 1. Solution Model formulation NOTE. Summarise the information in a table as below. Suit 2 1 1 30 dollars Gown 1 2 3 50 dollars Available 16 11 15

Cotton Silk Wool Price

Let x be number of suits to be made and y be number of gowns to be made. Let z be the total amount of money. Then; maximise total amount such that its within constraints. Max z = 30x + 50y objective function Subject to cotton: 2x + y 16 Silk: x + 2y 11 constraints L.P model Wool: x + 3y 15 x, y 0 minor constraint. Graphical Method (i) 2x + y = 16 x 0 y 16 x + 2y = 11 x 0 8 0 11 82

(ii)

(iii)

y 5.5 x + 3y = 15 x 0 y 5

0 15 0

Y 16 14 12 10 8 6 * 4 2 0 0 2 x0 The extreme points i. 0, 0 ii. 0, 5 iii. 8, 0 iv. 7, 2 v. 3, 4 Feasible points (Extreme) x, y i. 0, 0 ii. 0, 5 iii. 8, 0 iv. 7, 2 v. 3, 4 Z = 30x + 50y 0 250 240 310 290 4 * Feasible region * * 7, 2 (8, 0) 6 * 8 x + 2y 11x + 3y 15 * 10 12 2x + y 16 14 X y0 Unwanted region

Hence the maximum amount is 310 dollars at 7, 2 i.e. to maximise amount of money, the tailor has to make 7 suits and 2 gowns. 83

Question KNEC 1999 1. Using the graphical method, find the solution that maximises 3x + 10y given that 2x + 4y 80 X + 5y 70 x0, y 0 2. A linear programming problem is stated at max z = 12x1 + 10x2 Such that 6x1 + 8x2 240 4x1 + 2x2 80 x1 + x2 30 x1 12 x1 x2 0 Determine the optimum solution using graphical method. Solution 2x + 4y = 80 x + 5y = 80 x0 y0 The extreme points are:(i) 0, 0 (ii) 0, 14 (iii) 20, 10 (iv) 40, 0 Feasible points (Extreme) x, y i. 0, 0 ii. 0, 14 iii. 20, 10 iv. 40, 0 3x + 10y 0 140 160 120 x x x y 0 20 0 14 40 0 70 0

Let x1 and x2 be x and y respectively. 6x + 8y = 240 x y 0 30 40 0 84

4x + 2y = 80 X + y = 30 X 112 XY0

x y x y

0 40 0 30

20 0 30 0

Optimum point (Extreme) x, y i. 0, 0 ii. 0, 30 iii. 10, 20 iv. 12, 16 v. 12, 0

Max z = 12x + 10y 0 300 320 304 144

2. SIMPLEX METHOD Is a step by step arithmetic method of solving L.P problems more progressively from a position of a contribution hence 0 productions, until no further contribution can be made. Consists of a number of steps each giving a better result than the one before i.e. in maximising problem a given step gives a greater value for z than the one before and in minimising problems, a lower value than the one before.

SIMPLEX METHOD CONCEPTS 1. Simplex variance. Is a non negative variable which when added to a less than inequality makes it an equation. E.g. x 1 + 3x 2 b then x 1 + 3x 2 + x 3 = b. where x 3 is a slack variable. 2. Surplus variable. Is a non negative variable which when subtracted from a greater than inequality makes it an equation. E.g. x 1 + 3x 2 b then x 1 + 3x 2 x 3 = b where x 3 is a surplus variable. SIMPLEX METHOD PROCEDURE. 1) Introduce slack or surplus variable to the constraints. 85

2) Construct initial simplex table and identify the pivot point to identify the pivot point, Select a column with the largest +ve indicator i.e. in row z. Divide each positive entry in this column into the corresponding element in the last column i.e. solution quantity. Take the minimum of these values and note the row. The value which is in this row and column i.e. pivot row and pivot column is the pivot entry. Circle the pivot entry. 3) Construct table 2 from table 1 as follows: Divide the values in the pivot row by the pivot entry and then rename the resulting row with the name of the pivot column i.e. solution variable. Reduce all other rows in table 1 by the pivot row such that all the values in the pivot columns to 0 accept the values in the row. 4) Repeat step 2 and 3 until all the decision variables of 1 are equal to or less than 0. Example Solve the model below using simplex method. Max z = 30x1 + 40x2 such that x1 + x2 50 2x1 + x2 90 x1 + 2x2 80 x1, x2 0. Solution Step 1: Introduce slack variables to major constraints because the equations are less than inequalities i.e. x3, x4, and x5. Then max z = 30x1 + 40x2 such that x1 + x2 + x3 = 50 2x1 + x2 + x4 = 90 x1 + 2x2 + x5 = 80 Step 2: Draw the initial simplex table and identify the pivot point (entry). ROW SOLUTION DECISION SLACK VARIABLES SOLUTION VARIABLE VARIABLES QUANTITY X1 X2 X3 X4 X5 I x3 1 1 1 0 0 50 II x4 2 1 0 1 0 90 III x5 1 0 0 1 80 2 Pivot row 86

Row z IV

Z Pivot entry

30

40

Pivot column Pivot row rename x5 by x2 divide other values by pivot entry. Step III. Draw table II from table 1. R III X2 1 0 Step 4. ROW SOLUTION VARIABLES 0 40

SLACK VARIABLES X1 X2

SLACK VARIABLES X3 1 0 0 0 X4 0 1 0 0 X5 - -20

SOLUTION QUANTITY

Row 1 Row III

X3 /2

0 0 1 0

10 50 40 -1600

Row II Row III X4 Row III X2 Z

10

Row IV 40 Row III

Pivot Entry Pivot column

ROW

SOLUTION VARIABLES X1

DECISION VARIABLES X2 1 0 0 0 0 1 0 87

SLACK VARIABLES X3 2 -3 -1 -20 X4 0 1 0 0 X5 -1 1 1 -10

SOLUTION QUANTITY

Row 1

X1

20 20 30 -1800

Row II - 3/2 row X4 1 Row III X2 Rows Row IV 10 Row1

i.e. table III is optimal because row z decision variables are 0. Answer. The value of x1, x2 and z are the solution quantities which are given by 20, 30 and (-1800). The z value is given by ve solution quantity. Thus x1 = 20, x2 = 30, z = 1800 Exercise Solve using simplex method. Max z = 30x1 + 50x2 such that 2x1 + x2 16 x1 + 2x2 11 x1 + 3x2 15 x1, x2 0. Step 1. Introduce slack variables to major constraints. Max z = 30x1 + 50x2 such that 2x1 + x2 + x3 = 16 x1 + 2x2 + x4 = 11 x1 + 3x2 + x5 = 15

Step 2 Draw the initial table. ROW SOLUTION VARIABLE x3 x4 x5 Z DECISION VARIABLES X1 2 1 1 30 SLACK VARIABLES X2 1 2
3

SOLUTION QUANTITY X5 0 0 1 0 16 11 15 Pivot row. 0

I II III IV

X3 1 0 1 0

X4 0 1 0 0

50

88

Pivot entry Pivot column ROW SOLUTION VARIABLE x3 x4 x2 Z DECISION VARIABLES X1 5 /3 1 /3 1 /3 -120 SLACK VARIABLES X2 0 0 1 -100 X3 /3 -2 /3 1 /3 150
-1

SOLUTION QUANTITY
-1

Row 1 Row II Row II 2RIII III Row IV 50 Row

X4 0 +1 0 150

X5 /3 -2 /3 1 /3 -150

11 +1 5 -150

ROW

SOLUTION VARIABLES

DECISION VARIABLES

SLACK VARIABLES

SOLUTION QUANTITY

Example (Refer to example on textile factory) Use simplex method to solve that L.P model Max z = 4x1 + x2 + 5x3 Such that 3x1 + x2 + 4x3 8000 2x1 + x2 + 3x2 3000 x1, x2, x3 0 Solution Step 1: Introduce slack variables i.e. x4 and x5 for the two constraints respectively. Max z = 4x1 + x2 + 5x3 such that 3x1 + x2 + 4x3 = 8000 2x1 + x2 + 3x2 = 3000 x1, x2, x3 x4, x5 0 Step II: Draw initial table. ROW SOLUTION VARIABLES DECISION VARIABLES 89 SLACK VARIABLES SOLUTION QUANTITY

X1 I II III X4 X5 Z 3 2 4

X2 1 1 1 Pivot entry

X3 4
3

X4 1 0 0 Pivot column

X5 0 1 0 8000 3000 0

Step III and IV ROW SOLUTION VARIABLES X1 I II III X4 X3 Z Pivot entry


2 1

DECISION VARIABLES X2
-1

SLACK VARIABLES X4 1 0 0 X5
-4

SOLUTION QUANTITY

X3 0 1 0

/3
2

/3

/3

4000 1000 -5000

/3

/3 /3

/3 /3

/3

-2

-5

Pivot column

Step V table III ROW SOLUTION VARIABLE


1

Row 1 /3RIII Row II Row III- 2/3RII Answer X1 = 1500 X2 = 0

X4 X1 z

DECISION VARIABLES X1 0 1 0

SLACK VARIABLES X2 - 0 X3 /2 3 /2 1 X4 1 0 0 X5 /2 2
-3

SOLUTION QUANTITY 3500 1500 -600

-1

90

X3 = 0 Z = - (-16000) = 6000 Example KNEC 2001 A linear programming model has been formulated as follows:Maximise: 70x + 150y + 250z objective function Subject to: x + 4y + 5z 25000 2x + 3y + 6z 3500 (i) Explain why the problem above cannot be solved suing the graphical method. Graphical method is used to solve problems with two decision variables / unknowns whereas this problem has got three decision variables. Set up the initial table for the problem Show the 2nd simplex table by applying one simple calculation to the initial table.

(ii) (iii)

Solution Let x y and z become x1, x2 and x3 respectively. Step 1: Introduce slack variables i.e. x4 and x5 for the two constraints respectively. Max x = 70x1 + 150x2 + 250x3 such that x1 + 4x2 + 5x3 + x4 = 2500 2x1 + 3x2 + 6x3 + x5 = 3500

Step 2: Draw the initial table. ROW SOLUTION VARIABLES X1 I II III X4 X5 Z 1 2 70 DECISION VARIABLES X2 4 3 150 6 250 X3 5 X4 1 0 0 SLACK VARIABLES X5 0 1 0 2500 3500 0 SOLUTION QUANTITY

91

Pivot entry Step II and IV. ROW SOLUTION VARIABLE DECISION VARIABLES X1
1

Pivot column

SLACK VARIABLES X2
4

SOLUTION QUANTITY

X3
5

X4
1

X5 0 1 0
-

Row 1 Row II-6R1 Row III- 250R1

X3 X5 z

/5 /5

/5 /4

/5

/5 /5

500 500 125000

-9

0 0

-6

20 Pivot entry

50

50

Pivot column

7. NETWORK ANALYSIS / PLANNING


DEFN. Is a technique developed to aid management to plan and control projects. Shows interrelationship of various jobs/activities/tasks which make up the overall project and clearly identify the critical paths of the project. NB. Projects are affected by a number of factors i.e. time, resources and costs. Above factors, if not properly managed leads to projects:i. Not completed on time ii. Costing more than what was budgeted for iii. Not meeting user requirements Project mismanagement leads to washing of time, money and effort. BASIC NETWORK TERMINOLOGIES AND NOTATIONS. 1) ACTIVITY Is a network/job/work that takes time and resources. Is represented in the network by an arrowed line i.e. Head of arrow indicates where task ends and the Tail where task begins. The arrow points from left to right. Example of activity 1. build a wall 2. verify debtors 3. Acquire hardware. 2) EVENT 92

Is a point in time and indicates the start or finish of an activity or activities. Is represented in a network by a circle/node i.e.

Examples of events. 1) Wall built 2) Debtors verified. 3) DUMMY ACTIVITY Is an activity which does not consume time or resources. Is use to show clear and logical dependencies between activities so as not to violate rules for drawing line. (Usually not listed with real activities.) 4) NETWORK DIAGRAM/CHART. Is a diagram showing a combination of activities, dummy activities and events in a logical sequence according to the rules for drawing networks. RULES FOR DRAWING NETWORKS. 1) A complete network should have only one point of entry (i.e. start event) and only one point of exit (i.e. finish) event. 2) Every activity must have one preceding or tail event and one succeeding or head event. Many activities may use the same tail event and many may use same head event. However, an activity must not share the same tail event and the same head event with any other activities. 3) No activity can start until its tail event is reached. 4) An event is not complete until all activities leading into it are complete. 1 day 3 days 5) Loops i.e. series of activities leading back to the same event are not allowed because the essence of networks is a progression of activities always moving onwards in time. Day 2 Day 1 Not allowed Day 4 Day 3 6) All activities must be tied into the network i.e. must contribute to the progression or to be discarded as irrelevant. Activities which do not link into the overall project are termed as danglers i.e.

93

Dangler ACTIVITY IDENTIFICATION Activities may be identified in the following ways:a) Shortened description of the job e.g. plaster wall, order timber etc. b) Alphabetical or numeric codes e.g. A, B, C or 100, 101, 102 etc c) Identification by tail and head events e.g. 0 1, 1 2, 2 3, 3 4 where the numbers 0, 1, 2, 3, 4, etc are used for events.

Example Draw a network diagram for the activities below. Activity Preceding activity A B C A D B and C E B Solution A 0 B 2
Dummy Activities

C 3 D 4

Exercise Activity A B C D E

Preceding activity A B B 94

F G H I

B C, D F F

A 0

c 3 D G E F H 4 I 5 6 d

B 2

Exercise 1. Activity A B C D E F G H J K L M N

Preceding activity A A A C C C B, D F,J E,H,G,K E,H L,M M E 5H C G D 2 J 3 95 4 K d L 7 N 8

A B

2. Activity A B C D E F G H J K L M Preceding activity A A A B C E F G,H C D J, K, L E F 5 K 7 L 8 M 9

2 B A 0 1 D C 3

4 H

G 6 J

TIME ANALYSIS Once the logic of the network has been agreed and the outline network drawn, it can be competed by inserting the activity duration times. Multiple time estimate approach is used to estimate the time for each activity using the following three estimates:i. Optimistic time (to) ii. Most likely time (tm) iii. Pessimistic time (tp) These time estimates are then combined to give an expected time (te) and accepted formula is te = to + tp + 4tm 6 1. Optimistic Time Is the shortest time possible an activity can take, or competed. 2. Pessimistic Time. Is the longest time an activity is unlikely to take. 96

3. Most likely Time. Is the time an activity is likely to take or to be competed under normal circumstances. Usually in between optimistic and pessimistic time. Example Assume that the three estimates for an activity are:to = 11 days tp = 18 days tm = 15 days Determine the expected time Solution te = to + tp + 4tm = 11 + 18 + 4 + 15 = 14.8 days 6 6 NB: Time units could be minutes, hours, days, weeks, months etc. all time estimates within a project must have same units. CRITICAL PATH Is a chain of activities with the longest duration i.e. Is the largest path though the network with the longest time. Critical path gives the shortest time in which the whole project can be completed.] It is possible to have more than one critical path in a network and also running through a dummy activity. NB. Activities on critical path must be started and completed on time. A 3 B 2 3 Paths Critical Path Duration 7 7 5 E C 4 4
d

A A B

C D E

EARLIEST START TIME (EST) Is the earliest possible time at which a succeeding activity can start if the whole project is to be completed on time. LATEST START TIME (LST) Is the latest possible time at which a succeeding activity can start if the whole project is to be completed on time. 97

NB. Earliest start time of the succeeding activity is the earliest finish time of the preceding activity. Latest start time of the succeeding activity is the latest finish time of the preceding activity i.e. Node identifier/label

2 2B
EST LST FOR B FOR B

B
Activity duration

3
EST LST FOR C FOR C

Also is LFT for B Also is EFT for B

CALCULATING THE EST Forward pass method The EST of a head event is obtained by adding onto the EST of the tail event the linking activity duration i.e. starting from event 0, time 0 and working forward through the network. B Where two or more routes need at an event, the longest route time must be taken. C The earliest start time in the finish event is the project duration and is the shortest time in which the whole project can be competed.

Example Insert the EST for the network diagram below. 2


3

B 2 0 A 1 1 1 C 3 98 3 4 4 E

D F 2

4 7

5 9

0+1=1

1+3=4

CALCULATING THE LATEST START TIME. Backward pass method a. Starting at the finish event, insert the latest stat time i.e. latest start time = Earliest start time of the finish event) and work backward through the network deducting each activity duration from the previously calculated latest start time. b. Where two or more tails join an event the latest start time for that event is the lowest number. E.g. insert the LST for the diagram below.

2 3 3 B 2 0 0 0 Exercise i. ii. Draw a network diagram for the activities below. Insert the EST and LST in the network diagram. Activity A B C D E F G H J K Preceding Activity A A A C C C B, D F, J 99 Activity Duration 9 3 8 2 3 2 6 1 4 1 A 1 1 1 1 C 3 4 6 3 4 E 1 7 7 4 F 2 9 9 5 D

L M N

E, H, G, K E, H L, M

2 3 4

E D C

A 0 0 0

11 9

1 9 9 2 D

E 3
18 22

M 3
25 25

N
29 29

H
2 17 17

d
6 23 23

2 L

B3
3 11 18

F 2 J 4
19 22 4

K 1

Critical path = A C G L N (activities o not have spare time) Critical path duration = project duration = 29. CRITICAL PATH IMPLICATIONS Activities along the critical path are vital and must be completed by their earliest start Time and Latest start Time; otherwise the project will be delayed. Non-critical activities have spare time or float available i.e. can be delayed a little bit without delaying the project duration.

100

To reduce project duration, the time of one or more critical activities must be reduced. Perhaps by:i. Using more labour ii. Using more equipment iii. Using better equipment iv. Working overtime.

DEFN: FLOAT Is the spare time for a non critical activity. Critical activities do not have float. There are three types of float:i. Total float. ii. Free float iii. Independent float. i. Total float Total spare time available to an activity i.e. total float = latest finish time earliest start time activity duration i.e. Total Float = LFT EST D. Free Float Time an activity can be delayed (can stay free). Before it is started without affecting the starting of subsequent activities i.e. free float = EFT EST D. = earliest finish time earliest start time activity duration Path Duration Critical path 1. A D E 8 X 2. B D E 12 LFT LFT iii) Independent Float. Time an activity can be delayed when all preceding activities are completed as late as possible and all succeeding activities are started as early as possible i.e. Independent float = EFT LST D. = Earliest Finish time latest start time activity duration. Example Consider part of network diagram below. F J
10 20 5

ii.

K
40 50

G 101

Example Calculate all the floats to activity K. Solution i. Total Float = LFT EST D = 50 10 10 = 30 ii. Free Float = EFT EST D = 40 10 10 = 20 iii. Independent Float = EFT LST D = 40 20 10 = 10 Exercise Activity A Preceding Activity Duration 9 B 3 C A 8 D A 2 E A 3 F C 2 G C 6 H C 1 J E, D 4 K F T 1 L M E,H,G,K E H 2 3 N L M 4

Required i. Draw the network diagram ii. Determine the critical path iii. Calculate all the floats for each activity. Solution i) 1 9 A 11 9 9 2 D B3
3 11 18

E 3 G 8
2 17 17 18 22

M 3
25 25

N
29 29

0 0 0

H d
6 23 23

2 L

F 2 J 4
19 22 4

K 1

ii) The critical path is A C G L N iii) floats for each activity Activity A i. Total float = LFT EST D = 9 0 9 = 0 ii. The critical activities do not have float thus activities A C G L and N do not have float. Activity B i. Total float = L F T E S T D = 18 0 3 = 15 102

ii. iii.

Free float = E F T E S T D = 11 0 3 = 8 Independent float = E F T D = 11 0 3 = 8

Activity D i. Total float = L F T E S T D = 18 9 2 = 7 ii. Free float = E F T E S T D = 11 9 2 = 0 iii. Independent float = E F T D = 11 9 2 = 0 Activity E i. Total float = L F T E S T D = 22 9 3 = 10 ii. Free float = E F T E S T D = 18 9 3 = 6 iii. Independent float = E F T D = 18 9 3 = 6 Activity F i. Total float = L F T E S T D = 22 17 2 = 3 ii. Free float = E F T E S T D = 19 17 2 = 0 iii. Independent float = E F T D = 19 17 2 = 0 Activity H i. Total float = L F T E S T D = 22 17 1 = 4 ii. Free float = E F T E S T D = 18 17 1 = 0 iii. Independent float = E F T D = 18 17 1 = 0 Activity i. ii. iii. Activity i. ii. iii. J Total float = L F T E S T D = 22 17 4 = 7 Free float = E F T E S T D = 19 11 4 = 4 Independent float = E F T D = 19 18 4 = -3 K Total float = L F T E S T D = 23 19 1 = 3 Free float = E F T E S T D = 23 19 1 = 3 Independent float = E F T D = 23 22 1 = 0

Activity M i. Total float = L F T E S T D = 25 18 3 = 4 ii. Free float = E F T E S T D = 25 18 3 = 4 iii. Independent float = E F T D = 25 22 3 = 0 Activity B L.S.E 0 E. S. T 0 E.F. T 11 L F T 18 D 3 Total float = LFT-EST-D 18-0-3=15 103 Free float = EFT-EST-D 11-0-3=15 Indep. Float = EFT-LST-D 11-0-3=8

D E F G H J K M

9 9 17 17 17 18 22 22

9 9 17 17 17 11 19 18

11 18 19 23 18 19 23 25

18 22 22 23 22 22 23 25

2 3 2 6 1 4 1 3

18 92=7 2292= 0 2293=10 23-17-2=3 22-17-1=4 22-11-4=7 23-19-1=3 25-18-3=4

11-9-2=0 18-9-3=6 19-17-2=0 23-17-6=0 18-17-1=0 19-11-4=4 23-19-1=3 25-18-3=4

11-9-2=0 18-9-3=6 19-17-2=0 23-17-6=0 18-17-1=0 19-18-4=-3 23-22-1=0 25-22-3=0

PERT MODEL (Project Evaluation and Review Technique.) DEFN: Is a model or technique that is used to analyse projects characterised by uncertainty. Is use to determine:- Probability of project completion by a specified date. - Measures of variability i.e. standard deviation. Calculation of the Variability of Project Duration and Expected Duration (Mean). Calculating the standard variation of the project duration i.e. duration of critical path involves the following steps:1) Determine the standard deviation of the duration of each activity on the critical path i.e. use the standard deviation formula below. di2 = to to 2 di = te to 2 6 6 Where di = STD deviation of activity i. tp = pessimistic time of activity i to = optimistic time of activity t i = critical activities. 2) Determine the STD deviation of the total duration of the critical path on the basis of the above information i.e. sum the STD deviation of all critical activities. d2 (project duration) = d12 + d22 + d32..+ dnn = ndi2 d (project duration) = = d12 + d22 + d32..+ dnn ndi2

NB: Expected project duration is given by the sum of all critical activity expended duration i.e. te (project) = te i + te 2 + te 3 ++ te n = te i 104

Where tei = te +tp + 4m for activity i 6 Calculation of probability of completion by a specified date (TIME) Armed with information about expected project duration (T) and STD deviation (d) for critical path duration, which is normally distributed, we can compute the probability of project completion by a specified date (D) as follows. i. Find z = D T i.e. standardise the specified date. d ii. Obtain probability from the standard normal distribution table.

Area
D (z) T Use tables to get the area which D is z.

C. P. M MODEL (CRITICAL PATH METHOD) DEFN: - Is a model or technique used to analyse projects which are relatively risk free. Is used to determine the project scheduled which minimises total cost by:1) Variation of activity time as a result of resource assignment. 2) Variation of activity cost as a result of resource assignment. C.P.M analysis seeks to examine the consequences of crushing on total cost i.e. relationship between total cost and project duration.

Example The time estimates for each activity in a project are given below:Activity Preceded by Time estimates Optimistic Most likely Pessimistic A 5 7 15 B 3 5 7 C 2 4 6 D A 1 2 3 E A 6 9 18 F C 1 6 11 105

G H I

B, E, F B, E, F D, G

1 4 1

3 7 3

5 10 5

Required i. Construct the network diagram and determine critical path activities and the expected duration of the project. ii. What is the probability that the project will be competed within 27 days iii. Construct a 95% confidence interval for expected completion time.

Solution A 0 8 2 D C4
2 4 12

1 8 11 8 E 10

D 2
4

G 3 H
3 18 18 25 25

21 22

3 L
5

F 6

Activity A B C D E F G H I i.

Expected duration te = to+tp+4tm 8 5 4 2 10 6 3 7 3

ii.

Critical path A E H Expected duration of the project = 25 days = teA + teE + teH = 8 + 10+ 7 = 25 days. Z=DT D = 27 days d T = 25 days 106

D project

= =

dA2 + dE2 + dH2 (15 5) 2+ (18-6)2 + (10-4)2 6 6 6 100+ 144+ 36 62 62 62 = 280 62 = 2.789

i.e. z = 27-25 = 2 = 0.717 2.789 2.789

Probability of project completion by 27 days = 0.7625 0 0.717 95% confidence interval = x+ z dx


2

0.025 0.025 equivalent table value = 1.96

107

8. TIME SERIES ANALYSIS DEFN: Time Series. Arrangement of statistical data in a chronological order. A set of values which occur sequentially in time. Time series depicts the relationship between two variables, one of them being time and the other is the occurrence of a particular event e.g. time data values, population of a country over years, death over years, birth over years etc. If data are segreted with time (hours, days, weeks, months, years) the value of the variable changes from time to time. These fluctuations are affected not by a single force, but are due to the net effect of multiplicity or addition of forces putting it up and down. Examples The retail prices of a particular commodity are influenced by a number of factors: Demand Supply Soil profile Irregular factors / random factors. COMPONENTS OF A TIME SERIES. Forces affecting time series data can broadly be classified into:i. Securely trend ii. Seasonal variations iii. Cyclic variations iv. Random/irregular movement (factor) i. Secular trend. Os the general tendency of the data to increase or decrease during a long period of time. Trend is the result of those forces which are either constant or change very gradually over a long period of time e.g. literacy. Examples of upward tendency are:108

- Population data - Agricultural production - Currency in circulation data. Examples of downward tendency are:- Birth and death data - Epidemics i.e. we come up with better technique of treating diseases. Examples of forces causing trend are:- Advancement in medical sciences - Better medical facilities. - Literacy - Higher standards of living. ii) Seasonal variations. Are periodic and regular movements caused by rhythmic forces inherent in most time series. Usually the period of movement is less that one year i.e. period of one cycle is less that one year. E.g. rainfall, weather, manmade conventions e.g. Christmas time, festivities, marriages. In most of the cases, the period is less than a year. Seasonal variations are attributed to two forces:1. Natural forces Rainfall Weather Manmade conventions. 2. Manmade conventions Marriages Festivals e.g. Christmas, Easter. Examples of seasonal variations are seen in the following: Agricultural commodities Sales and profits. iii) Cyclic variations Are the oscillatory movements in a time series with a period of more than 1 year i.e. 7 to 11 years. One complete movement or fluctuation period forms a cycle. Cyclic movements in a time series are generally attributed to the business cycle i.e. four phase cycle composed of: - Period of prosperity/Boom - Period of recession - Period of depression - Period of recovery.

Examples of cyclic variations are:Economic and commercial series relating to:109

Prices Production Wages i.e. all affected by business.

TIME SERIES ANALYSIS Involves: - Identification of forces or components at work, the net effect of whose interaction is exhibited by the movement of a time series. Isolating, studying, analysing and measuring these components independently i.e. by holding other things constant. MODELS OF TIME SERIES There are two models that are used to describe time series. i.e. additive model, multiplicative models. 1. Additive model. This model assumes that an observation (y) in a time series is the result of adding algebraically the trend (T), the seasonal factors/seasonal variations (s) and the residual factors . These may be expressed as: - Y = T+S+R trend (secular & cyclic). 2. Multiplicative model. This assumes that an observation (y) is the product of the trend (Y), the seasonal factor (S) and the residual factor . This may be expressed as: - Y=T x S x R. Methods of Decomposition. Time series can be decomposed into its components using the following methods. i. Trend Graphic methods Method of semi-averages Method of moving averages Method of curve fitting by principles of least squares. ii. Seasonal Simple averages Moving averages Link relatives. iii. Cyclic Moving averages. DETERMINATION OF TREND i. Graphic method. A free hand smooth curved is obtained on plotting the values of Y against time. Allows us to form an idea about the general trend of the series. 110

Smoothing of the curve eliminates other components i.e. regular and irregular fluctuations. Advantages The method is simple and flexible. Disadvantages It is subjective or bias (i.e. people can come up with deferent curves). ii. Method of semi-averages. Dived the whole data into two parts with respect to time e.g. given Y for t from 1961-1972 then two parts are data from 1961 to 1966 and from 1967 to 1972. For odd number of years, the two parts are obtained by omitting the value corresponding to the middle year e.g. 1961-1971 then we have 1961 to 1965 and 1967 to 1971, 1966 being omitted. Next, we compute the arithmetic mean for each part (i.e. x, and x2 against the mid values of the respective periods covered in each part.

Example Fit a trend line to the following data by method of semi-averages. Year (X) 1956 1957 1958 1959 1969 1961 1962 1963 1964 1965 1966 1967 1968 Solution 111 (bank clearances) (000) (Y) 53 79 76 66 69 94 105 87 79 104 97 92 101

n=13 i.e. an odd number Part 1 = 1956-1961 Part II = 1963-1968 Part 1 x1 = 53+79+76+66+69+94 = 467 = 72.83 6 6 Part II x2 = 87+79+104+97+92+101 = 560 = 93.33 6 6 iii. Moving average method. It involves measurement of trend by soothing out the fluctuations of data by means of a moving average. Moving average (i.e. arithmetic means of M terms at a time, starting with the 1st m terms then another m terms from 2nd to (m+1)th term etc. If m is odd = 2k+ 1 the moving average is place against the middle value of the time interval it covers. Example The data below shows the takings of a shop during the last 3 weeks. Week 1 Week 2 Week 3 Mon 128 142 150 Tue 168 186 194 Wed 80 84 86 Thur 190 201 210 Friday 230 240 262

Required: By use of moving average, find the trend. Solution The period of the daily data is 5. We therefore use a 5 term moving average. Day Takings (x) 5 term moving Moving average Total (Trend values) Monday 128 Tuesday 168 Wednesday 80 796 159.2 Thursday 190 810 162.0 Friday 230 828 165.6 Monday 142 832 166.4 Tuesday 186 843 168.6 Wednesday 84 853 170.6 Thursday 201 861 172.2 Friday 240 869 173.8 Monday 150 871 174.2 Tuesday 194 880 176.0 Wednesday 86 902 180.4 Thursday 210 112

Friday 262 NB: To find the trend, we calculate the moving average i.e. divide the moving total by the period. Exercise Use the least squares method to calculate the line of best fit (regression line) for the given data. X Y XY X2 1 128 128 1 2 168 336 4 3 80 240 9 4 190 760 16 5 230 1,150 25 6 142 852 36 7 186 1,302 49 8 84 672 64 9 201 1,809 81 10 240 2,400 100 11 150 1,650 121 12 194 2,328 144 13 86 1,118 169 14 210 2,940 196 15 262 3,930 225 x = 20 Y = 2,551 xy = 1,615 1240 Regression line is given by a + b + bx a = Y bx but .: a = Y b x n n a = 2551 b 120 = 170.066 b 8 15 15 = 170.656 4.3107(8) = 135.58 b = nxy X Y n x2 (x)2 b = (15 x 21615) (120 x 2551) = 18,105 (15 x 1240) (120)2 = 4,200 b = 4.3107 The value of best fit is Y = 135.5 + 4.32x (ii) Use the least squares method to get the trend values for the given data. 113 Y = x n and x = x n

X 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Y 128 168 80 190 230 142 186 84 201 240 150 194 86 210 262

Trend values Y = 135.5 + 4.31(128) Y = 135.5 + 4.31(168) Y = 135.5 + 4.31 (80) Y = 135.5 + 4.31 (190) Y = 135.5 + 4.31 (230) Y = 135.5 + 4.31 (142) Y = 135.5 + 4.31 (186) Y = 135.5 + 4.31 (84) Y = 135.5 + 4.31 (201) Y = 135.5 + 4.31 (240) Y = 135.5 + 4.31 (150) Y = 135.5 + 4.31 (194) Y = 135.5 + 4.31 (86) (Y = 135.5 + 4.31210) Y = 135.5 + 4.31 (262)

687.18 859.58 480.3 954.4 1126.8 747.52 937.16 497.54 1001.81 1169.9 782.0 971.64 506.16 1040.6 1264.72

DETERMINING OF SEASONAL VARIATIONS. i. Moving averages method. Calculate the moving average values for the data i.e. this will give estimates for the combined effect of secular trend and cyclic variations. Calculate the difference between the trend values and the original data i.e. this will give estimates of the combined effect of seasonal and residual factors. Average the combined effect of seasonal and residual factor in each period of the season i.e. this removes residual factor R leaving us with a seasonal factor S.

Example Refer to the previous factor i.e. average daily variation. Calculate the seasonal factor i.e. average daily variation. Y Takings 128 168 80 190 230 142 186 T 5 term moving average 159.2 162.0 165.6 166.4 168.6 114 S + R Difference -79.2 28.0 64.4 -24.4 17.4 S Seasonal factors(s) -24 18 87 28 65 -24 18

Time (days) 1 2 3 4 5 6 7

8 9 10 11 12 13 14 15

84 201 240 150 194 86 210 262

170.6 172.2 173.8 174.2 176.0 180.4 -

-86.6 28.8 66.2 -24.2 18.0 -94.4 -

-87 28 65 -24 18 -87 28 65

Redraw the table. Mon Week 1 Week 2 -24.4 Week 3 -24.2 Total -48.6 Average -24.3

Tue 17.4 18.0 35.4 17.7

Wed -79.2 -86.6 -94.4 -260.2 -86.7

Thur 28.0 28.8 56.8 28.4

Fri 64.4 66.2 130.6 65.3

Since the original data is given to the nearest integer, we round off the seasonal factors to the nearest integer i.e. seasonal factors (s)

Mon -24

Tue 18

Wed -87

Thur 28

Fri 65

SEASONALLY ADJUSTED DATA. This is obtained by subtracting algebraically the seasonal factors from the data e.g. for the above example, subtract the seasonal factor (s) from the original values (y).
Takings Seasonal factor Seasonally factor Seasonally adjusted data

1 2 3 4 5

128 168 80 190 230 142 186 84 201 240 150

-24 18 -87 28 65

128 (-24) = 152 168 18 = 150 80 (-87) = 167 190 28 = 230 65 =

115

194 86 210 262 NB: The seasonal factors should sum up to 0. In the above case, they happen to sum to 0. Sometimes they dont sum to 0, and this is due to the presence of residual factor. If this happens, some adjustment is necessary e.g. if the seasonal factors had summed up to 5, then we would subtract 1 from each factor. ii. Simple average method. Average the data by years (or quarterly). Compute the averages xi for each period or season (i.e. monthly). Compute the average x of the periodic or seasonal average. Computer the seasonal indices for the different period i.e. seasonal index, S. 1 = xi x 100. x

The seasonal index is taken as seasonal variation.

Example The data below gives the average quarterly prices of a commodity for 4 years. Year 1967 1968 1969 1970 1 st quartet 40.3 50.1 47.3 55.4 2 nd quarter 44.8 55.3 52.1 59.0 3 rd quarter 46.0 55.3 52.1 66.6 4 th quarter 48.0 59.5 55.2 65.3

Required Calculate the seasonal variation indices. Solution Quarter 1 2 3 4 x = Xi = 212.53 = 53.13 116 Average of each quarter xi 48.28 52.25 55 57 S. I = xi x 100 53.3 90.87 98.34 103.5 107.28

DETERMINATION OF RESIDUAL FACTOR. i. Moving average method. The average sum of the trend (T), seasonal (S) and Residual factor is:Y=T+S+R Y T = S + R leaves is us with seasonal and residual factor. We therefore algebraically subtract seasonal factors to remain with Y T S = R i.e. residual factor. Examples Refer to the example on takings of a shop. Calculate the residual factors. Takings 128 168 80 190 230 142 186 84 201 240 150 194 86 210 262 Trend (T) 159.2 162.0 165.6 166.4 168.6 170.6 172.2 173.8 174.2 176.0 180.4 Seasonal factors (S) -24 18 87 28 65 -24 18 -87 28 65 -24 18 -87 28 65 Y T S = R 7.8 8 0 -0.6 -1 -0.4 0 -0.6 -1 0.4 0 0.8 1 1.2 1 -0.2 0 0 0 -7.4 -7 -

2. MULTIPLICATIVE MODEL. Here, the calculation of the trend is the same as for the additive model thus trend values for additive models and multiplicative model are similar. To find the seasonal factors, using the multiplicative model you carry out the following division i.e. Y = Y X S X R = S X R T T Then multiply by 100 Y = S X R includes seasonal and random factors. T Multiplying by 100 expressed S X R as a percentage. Averaging for each quarter removes the random factor R leaving with the seasonal factor (S).

117

Example Refer to the example of shop takings. Required. Use multiplicative model to get S i.e. seasonal factors. Takings 128 168 80 190 230 142 186 84 201 240 150 194 86 210 262 Week 1 Week 2 Week 3 Total Average Mon 85 86 171 855 86 Trend (T) 159.2 162.0 165.6 166.4 168.6 170.6 172.2 173.8 174.2 176.0 180.4 Tue 110 110 220 110 110 Seasonal factors (S) 0.50 1.17 1.39 0.85 1.10 0.49 1.17 1.38 0.38 1.10 0.48 Wed 50 49 48 147 49 49 Thur 117 117 234 117 117 Y T S = R

50 117 139 85 110 49 117 138 38 110 48 Fri 139 138 277 138.5 189 Se

1a) Explain i. ii. iii.

asonal factors. briefly each of the following components of time series. Cyclic variation Seasonal variation General Trend.

Solution Cyclic variation is the oscillatory movement in a time series with a period of more that 1 year. b) Table 1 shows the number of shoes sold by a shop attendant. Year 1 st Quarter 2 nd Quarter 3 rd Quarter 4 th Quarter 1993 28 38 34 32 1994 32 50 48 42 1995 38 56 52 46 118

1996

52

74

66

60

Required i. Calculate the trend values using the moving averages methods. ii. By using the additive model, calculate the average seasonal variation for each quarter. X Y Trend values Y T = S + R 1 28 2 38 3 34 4 32 33.0 -1 5 32 34.0 -2 6 50 37.0 13 7 48 40.5 7.5 8 42 43.0 -1 9 38 44.5 -6.5 10 56 46.0 10 11 52 47.0 5 12 46 48.0 -2 13 22 51.5 0.5 14 74 56.0 18 15 66 59.5 6.5 16 60 63.0 -3 ii. Number of suits Quarter 1 Quarter 2 Quarter 3 Quarter 4 1993 -1 1994 -2 13 7.5 -1 1995 -6.5 10 5 -2 1996 0.5 18 6.5 -3 Total -8 41 19 -7 Average 4.67 13.67 6.33 -1.75 Seasonal factors -3 14 6 -2 -3 + 14 + 6 2 = 1570 Adjusted seasonal factors: 15 (3 x 4) = 3 -3-3 -6, -6-0, -6, Exercise 14-3 11, 11-1, 10, 6-3 -2.3 3, -5 3-1, -5-1 2, -6

119

b) Page up limited has been selling computer accessories for the last 5 years. Its quarterly sells by a value (shillings) 1000 for the last 3 years is shown in the table below:Year 1997 1998 1999 i. ii. 1 46 53 51 Quarters 2 86 89 91 3 120 125 132 4 61 61 66

Using the method of moving averages, and by applying the multiplicative model, find the trend and average seasonal factors. [9 marks] Using the results of B (i) above, estimate the sales for the 1st quarter of year 2000. [3 marks]

a) Explain the components of the time series [8 marks] Year 1997 Quarte r 1 2 3 4 1 2 3 4 1 2 3 4 Time (x) 1 2 3 4 5 6 7 8 9 10 11 12 Sales (y) 46 86 20 61 53 89 125 61 51 91 32 66 Trend 78.25 80 80.75 82 82 81.5 82 83.75 85 y/t = S x R 0.78 0.66 1.10 1.52 0.74 0.62 1.00 1.58 0.78 (S x R) 100 78 66 110 152 74 62 111 158 78

1998

1999

Quarter 1 66 63 Total 128 Average 64.5 Seasonal 65

Quarter 2 110 111 221 110.5 111

Quarter 3 152 158 310 155 155

Quarter 4 78 74 78 230 76.7 77

Rearrange the trend values in a table and get the average trend values for each period i.e. this gives the estimates for the next year. 120

Year Quarter 1 Quarter 2 Quarter 3 Quarter 4 1997 78.25 1998 80 80.75 82 82 1999 81.5 82 83.75 85 Average 80.75 81.38 82.88 81.75 Estimate for each Quarter in the following years is 80.75, 81.38, 82.88, 81.75 = 81,81,83,82. The sales for a Pentium II computer for a certain firm over the last 13 months are shown in table II below. Month 1 Sales (units) 450 i. ii. 21 3 440 460 4 410 5 380 6 400 7 370 8 360 9 410 10 450 11 470 12 490 13 460

Using a 3 month period calculate the sales forecasts for the 4th month and above using the moving average. [2 marks] Using the forecasts produced for the 7th month to the 13th month; calculate the mean squared error for the forecast error. [4 marks]. Month (x) 1 2 3 4 5 6 7 8 9 10 11 12 13 Sales (y) 450 440 460 380 400 370 360 410 450 470 490

Instead of placing the trend values at the middle or end of the 1st period, this question requires that you place them at the beginning of the next period. Month (x) 1 2 3 4 5 6 Sales (y) 450 440 460 380 400 370 121

7 8 9 10 11 12 13

360 410 450 470 490 460

450 436.67 396.67 383.33 376.67 380 406.67 443.33 470 473.33

ii) Month (x) 7 8 9 10 11 12 13 Sales (y) 370 360 410 450 470 490 460 Trend (T) 396.67 383.33 376.67 380 406.67 443.33 470 (Y T)2 711.29 544.29 1,110.89 4900 4010.69 2178.09 100

Mean squared error = (Y T) (n 1) degree of freedon M.S.E = 13555.2 = 2259.21 7-1

122

9. S I M U L A T I O N
DEFN: The process of experimenting a model and noting the results that occur. It consists of inserting different input values and observing the output values. At the heart of simulation is the concept of a model i.e. any representation of a real system. Business models can take the forms of flowcharts, formula equations etc. FACTORS TO CONSIDER WHEN CONSTRUCTING SIMULATION MODEL. a) Objective oriented. Must be construed with some definite purpose and model results be directly related to the purpose. b) Critical variable and relationship. Identifies those variables and relationships which must be included in the model. c) Simplicity Simple models have adequate productive qualities. d) Management involvement. Good models are constructed thorough understanding the actual operations and only management have this knowledge hence must be included. Types of simulation 1) Physical 2) Mathematical 3) Deterministic 4) Probabilistic 5) Monte Carlo. Monte Carlo simulation DEFN: Is simulation involving random or probabilistic elements. Because models must behave like the real system under observation, they must contain these random or probabilistic features. Examples. 123

1) Arrivals at a store may average 200 per day but the actual arrival pattern is likely to be highly variable. 2) Queuing system 3) Inventory system. RANDOM SELECTION To carry out realistic simulation involving probabilistic elements, it is necessary to avoid bias in the selection of input values which varies. This is done by selecting randomly using one of the following methods. Lottery method Random number generators Random number tables Paulette wheel. Repeated random selection of input values and the resulting outputs is the very essence of simulation. In this way, an understanding is gained of the likely pattern of results so that a more informed decision can be taken. Variables in a simulation model. a) INPUT VARIABLES. These are of two types:i. Controlled variables. Can be controlled by management by changing input values and noting the change in the output results. ii. Non-controlled variables. Not under management control i.e. they vary in some uncontrolled probabilistic function. Also called probabilistic or stochastic variables. b) PARAMETERS. They are also input variables which for a given simulation have a constant value. They help to specify the relationship between other types of variables e.g. time taken for routine maintenance cost of stock outs etc. c) STATUS VARIABLES. Variables that specify the state of the system at various times or seasons. In some simulations, the behaviours of the system vary not only according to individuals characteristics but also according to the general state of the system at various times or seasons. E.g. in a simulation of a supermarket demand and check out, queuing will be probabilistic and variable on any given day but the general level of demand will be greatly influenced by the day of the week and season of the year. d) OUTPUT VARIABLES. 124

They form the results of simulation. They arise from calculations and tests performed in the model using input vales of the control values and uncontrolled values and specified parameters and stats values. GUIDELINES FOR CONSTRUCTING A SIMULATION MODEL. STEP 1 1. Identify objectives of the simulation. A detailed listing of the results expected from the simulation will help clarify steps and the output variables. STEP 2 2. Identify input variables (i.e. distinguishing controlled and non-controlled variables. STEP 3 3. Where necessary, determine the probability distribution for non-controlled variables (random variables). STEP 4 4. Identify any parameters and status variables. STEP 4 5. Identify output variables. STEP 6 6. Determine the logic of the model. Advantages 1) Can be used in areas where analytical techniques are not available or are too complex. 2) Involvement of management allows deeper insight into a problem. 3) It is less risky than altering the real system. 4) Its cheaper in terms of time i.e. computers are used. Disadvantages 1) Requires a substantial amount of managerial and technical skill. 2) Requires use of computers which are not available to some organisations. 3) Does not produce optimal results i.e. at the end of the simulation, management have to make a decision. APPLICATION AREAS. 1) Queuing systems (problems). 2) Supply and demand systems (problems). 3) Cash flow systems (problems).

125

Example 1 A wholesaler stocks an item for which demand is uncertain. He wishes to assess two reordering policies i.e. order 10 units at reorder level of 10 units or order 15 units at reorder level of 15 units, to see which is economical over a 10 day period. The following information is available:-

DAILY DEMAND (UNITS) 4 5 6 7 8

PROBABILITY 0.10 0.15 0.25 0.30 0.20

Holding (carrying) cost is 15 pounds per unit per day. Ordering cost is 50 pounds per order. Loss of goodwill for each unit out of stock is 330 pounds. Lead time 3 days (before goods are delivered after ordering). Opening stock is 17 units. The probability distribution is to be based on the following random number. 1st set: 41, 92, 05, 44, 66, 07, 00, 00, 14, 62. 2nd set: 20, 07, 95, 05, 79, 97, 64, 26, 06, 48. Note: The reorder level is physical stock + any replenishment orders outstanding. SOLUTION Step 1: Objectives of Simulation. a) To simulate the behaviour of 2 ordering policies i.e. i. Order 15 at reorder level of 15. ii. Order 10 at reorder level of 10. b) To establish the cheaper policy. Step 2: Input Variables. a) Controlled variable. i. Order Quantity ii. Reorder level. b) Non-controlled variable. i. Probability demand. Step 3: Probability Distribution. Random numbers are allocated demand as follows:Daily Demand Probability Cumulative Probability 126 CP% Random No. Range

(units) 4 5 6 7 8

0.10 0.15 0.25 0.30 0.20

(C.P) 0.10 0.25 0.50 0.80 1.00

10 25 50 80 100

00 09 10 24 25 49 50 79 80 - 99

Step 4: Parameters and status variables. Parameters. 1) Opening stock 15 units. 2) Carrying cost $15 per unit per day. 3) Ordering cost $50 per order. 4) Losses of good will $30. 5) Lead time 3 days. There are no status variables in this example. Step 5: Output variables. Total cost. Number of orders placed. Number of stock outs. Cost of stock outs. Total carrying costs. Total order costs. Step 6: Determine the logic of the model. Reorder level = Physical stock + Replenishment order outstanding. Closing stock = opening stock demand. Carrying cost per day = stock x 15 per day. Good will cots = stock short fail x 30. Total cost = goodwill costs + carrying costs + order cost. SIMULATION RUN DAY R. NO DEMAND 1 41 6 2 92 8 3 05 4 4 44 6 5 66 7 6 07 4 7 00 4 8 00 4 9 14 5 10 62 7

DAY 1 2 3

Results for simulation using order 15 units at re-order level of 15 units are shown below. OPENING DEMAND CLOSING ORDER CARRYING STOCK TOTAL STOCK STOCK COST () COST () COST () COST () 17 6 11 50 165 215 11 8 3 45 45 3 4 30 30 127

4 5 6 7 8 9 10

0 + 15 6 9 50 135 9 7 2 30 2 4 0 + 15 4 11 50 165 11 4 7 105 7 5 2 30 2+15 = 17 7 10 50 150 T O T A L S 200 825 Costs for orders at reorder level of 15 units policy. Total Total Total Total cost carrying cost stock out cost order cost 1115 225 90 200

60 90

185 30 60 215 105 30 200 1115

Exercise Run Simulating for the policy of order 10 units at reorder level of 10 units using the 2nd set of random numbers. DAY R. NO DEMAND 1 20 5 2 07 4 3 95 8 4 05 4 5 79 7 6 97 8 7 54 7 8 26 6 9 06 4 10 48 6

DAY 1 2 3 4 5 6 7 8 9 10

OPENING DEMAND CLOSING STOCK STOCK 17 5 12 12 4 8 8 8 0 4 0 + 10 7 3 3 8 0 7 0 + 10 6 4 4 4 0 6 T O T A L S

ORDER COST () 50 50 50 150

CARRYING COST () 180 120 45 60 405

STOCK COST () 120 150 210 180 660

TOTAL COST () 180 170 120 95 150 210 110 180 1215

Costs for orders at reorder level of 10 units policy. Total costs Total carrying cost Total stock out cost 1215 405 660 128

Total order cost

150

129