Vous êtes sur la page 1sur 3

Volume 4 Issue 5 May 2012

Researchers Corner

Working out Percentages and Random Number Generation

We have seen tally marking to prepare frequency table and various parts and features of a table in the preparation for tabular presentation (Feb 2012 issue). Before we explore four steps to tabular presentation, following two interesting tables found in two recent draft papers attracted my attention.
Subject-wise distribution of books Subjects Number of Books Art & Architecture Biographies Generalia Language& Literature Mysticism Religion& Philosophy Sciences Social sciences Total 10 38 9 116 6 33 46 685 943 1.06 4.03 0.95 12.30 0.64 3.50 4.88 72.64 100 4 5 6 2 3 1 Percentage Sl. No. Branch Electronic Communication Engineering Computer Science Engineering Information Technology Electrical and Electronic Engineering Mechanical Engineering Civil Engineering TOTAL 25 19 14 150 17 13 09 100 32 30 21 20 30 20 Branch-Wise Distribution of Engineering Students Students Number Percentage

It is a common mistake in tabular presentations to work out percentages in wrong direction. In the above tables, sample books and sample students are presented with distribution by subject and branch of engineering respectively ignoring the total population. That is percentage of books digitized should have been more meaningful for a given subject in relation to total books in that subject and similarly the number of students in each branch in the sample in relation to total students in that branch is necessary. First one is a digitization study consisting of 943 books out of over 3 lakh books in the library and the second is a study of use of e-resources by engineering students with data collected through questionnaire from 150 sample students selected from a population of 2160. It is claimed that simple random method is adopted without any clue about how the size of sample was determined and the process followed for random sampling. In both cases how the sample (or response) is distributed among characteristics like subject in first case and engineering branch in the second is not examined. Some tips relating to percentages are:

Percentages including ratios and & proportion should be computed in the direction of causal factor, if any Percentage should run only in the direction in which a sample is representative Do not average percentages ( without weighing by the size of samples) Do not use very large percentages (e.g. 1200% increase) Do not use too small a base (e.g. 33 1/3% for 1 in 3)

Incidentally, size of sample should be Adequate to provide an estimate with sufficiently high precision Representative to mirror the various patterns and sub-classes of the population Neither too large nor too small, but optimum to meet efficiency (cost), reliability (precision) & flexibility Higher the precision and larger the variance, the larger the size and more the cost

The essence of Simple Random Sampling (SRS) is the non-zero equal probability of every unit in the population to get selected, i.e., the probability of an unit getting selected in the population N is 1/N [this is with replacement and the same without replacement is 1/N-1)]. However if we have to select n units (sample size) from a finite population of N, the probability of every unit getting selected is n!/(N-1)!. For example, if N=5 and n=2, then n!/(N-1)! = 1/12. A simple random sample is usually selected by without replacement. Often the phrase Random Sample and Simple Random Sample are wrongly used interchangeably. As mentioned above, in SRS each unit of the population has non-zero equal probability of being selected, where as in Random sample, it may have a known (equal or un-equal) probability of selection.

The selection process for finite population could be one of the following: 1. Lottery method (blind folded or using rotating drum) is an old classical method. All the units in the population are numbered from 1 to N (and it is called sampling frame), written on the small slips of paper, thoroughly mixed in the drum before picking blind folded. This method is used when size of the population is small. 2. Random number table (like Tippetts numbers) is used for larger population as it is difficult to mix the slips properly in lottery method. For example, one take two-digit numbers from the table of random numbers if the population is up to 100 starting from any column or row of the table. Of course any number above 99 will be ignored and if any number is repeated, it is not considered in sampling without replacement. For example to select 10 items from a population consisting of 150 items, Number the population from 1 to 900 (the highest multiple of 150 less than 1000) Select a starting position from the random table 2

Continue to choose numbers between 1 and 900 which has not already been selected till you reach 10 Both lottery method and random number table method can be cumbersome, particularly for large sample sizes. 3. Computer generated random numbers can be generated from free sources like StatTrek's Random Number Generator (http://stattrek.com/statistics/random-number-generator.aspx) or Random Integer Generator (http://www.random.org/integers/ ). Just answer online the questions like how many random numbers, Minimum and Maximum value, whether to allow duplicates, optional seed number you will have the SRS numbers in seconds.

The above example (under 2 above, Random number table) of choosing 10 samples from a population of 150 in Random Integer generator gave the result: Here are your random numbers 7 109 14 46 82 35 80 73 87 134

There are other methods of selection processes like Grid system for selecting a sample of an area. Note that SRS is the basic selection process and all other complex random sampling procedures are built on SRS. M S Sridhar sridhar@informindia.co.in