Vous êtes sur la page 1sur 2

While calculating the median of grouped data of total frequency NN, in order to find the median class which

value
should be taken into consideration to match against cumulative frequency : N2N2 or N+12N+12 (it seems both are
used)? I think N+12N+12 should be taken since in case of list of values (i.e. ungrouped data), its fractional value
indicates that the average of N2thN2th and (N2+1)th(N2+1)th values should give the median.
And then comes the second part of my question -- while calculating the median of grouped data, if the value
of N+12N+12 ( or N2N2) is a fraction, say 50.5, and there is a cumulative frequency 50, then what should we do?
Should we take two median classes, one having cumulative frequency 50 and another coming next to it, and
calculate two medians considering each of the median class using the formula: L+N2−Cf×wL+N2−Cf×w and take
their average as the ultimate median? Or do something else? I mean what is the correct procedure in this kind of
situation?

EDIT:
So, here is a specific problem regarding the second part of my question-

We have to find out the median score from the following frequency distribution table:

Score : 0-10 10-20 20-30 30-40 40-50


Number of students : 4 3 5 6 7
Cumulative frequency : 4 7 12 18 25
Here intervals are of type (,] .

Now, N=25⟹N2=12.5N=25⟹N2=12.5, which means that we have to look for the interval which covers 12th
item and 13th item. Looking at the cumulative frequencies, we see that the 3rd interval(i.e. 20-30) covers the 12th
item,while 4th interval(i.e. 30-40) covers the 13th item. If we are supposed to take both the intervals as median class
for the sake of using the formula: median=L+N2−Cf×wmedian=L+N2−Cf×w, then we will end up with two
medians. We can take the average of these as the required median, though. I want to the correct procedure here.
Note 1:
I am only concerned with using the above formula and not any other method of finding median of grouped data.
There is a variation of the above formula where N+12N+12 is used instead of N2N2, the first part of my question
refers to this confusion as well.
Note 2:
In the formula,

L = lower boundary of the median class


N = total frequency
C = cumulative frequency of the class preceding the median class
f = frequency of the median class
w = width of the median class i.e. upper boundary - lower boundary
Note 3:
If we consider the interval 20-30 as the median class and use the above formula, then the median will be

20+252−75×10=3120+252−75×10=31
Interestingly, considering the interval 30-40 as the median class, we would get the same median using the above
formula. Though, I am not sure if this will be the case for every problem of this type. In that case we can take any of
the two interval as the median class.
Note 4:
I don't know whether there is any rule for such kind of situation saying that we have to select that cumulative
frequency (and hence the corresponding interval as the median class) which is nearer to the value of N2N2, in
that case we have to take the interval 20-30 in this example as median class. It will be great and enough if anyone
can confirm such a rule.
statistics data-analysis median

shareciteimprove this question


edited Nov 10 '17 at 20:54
asked Jan 18 '16 at 18:56

Snehasish Karmakar
133115
add a comment
2 Answers
activeoldestvotes

1
Because this is essentially a duplicate, I address a few issues that are do not explicitly overlap the related question or
answer:
If a class has cumulative frequency .5, then the median is at the boundary of that class and the next larger one.

If NN is large (really the only case where this method is generally successful), there is little difference
between N/2N/2and (N+1)/2(N+1)/2 in the formula. All references I checked use N/2N/2.
Before computers were widely available, large datasets were customarily reduced to categories (classes) and plotted
as histograms. Then the histograms were used to approximate the mean, variance, median, and other descriptive
measures. Nowadays, it is best just to use a statistical computer package to find exact values of all measures.

One remaining application is to try to re-claim the descriptive measures from grouped data or from a histogram
published in a journal. These are cases in which the original data are no longer available.

This procedure to approximate the sample median from grouped data assumesassumes that data are distributed in
roughly a uniform fashion throughout the median interval. Then it uses interpolation to approximate the median. (By
contrast, methods to approximate the sample mean and sample variance from grouped data one assumes that all
obseervations are concentrated at their class midpoints.)

Vous aimerez peut-être aussi