Vous êtes sur la page 1sur 8

Data Representation & Statistics

Date Set: The data set is the list of all countries having internet user population more than 1 million and the corresponding number of internet user per country as on 31st December 2011. The data is sourced from the website (http://www.internetworldstats.com/stats.htm). The website does not directly provide the list of internet users for each country. The data is classified as per the geographic division. A combined data was created by taking details from each division. The data set is as below. The internet users number is approximated closest to the million so as to ease the representation of data and calculation.

Data Set-Internet Users in World.xlsx

Country

Internet Users(as of 31 Dec 2011) 513100000 245203319 121000000 101228736 79245740 67364898 61472011 55000000 52731209 50290226 45039711 42000000 40329660 36500000 35800000 35000000 30654678 30516587 29700000 29128970 28000000 27757540 25000000 23852486

China United States India Japan Brazil Germany Russia Indonesia United Kingdom France Nigeria Mexico Korea, South Iran Italy Turkey Spain Vietnam Philippines Pakistan Argentina Canada Colombia Poland

Internet Users ( as of 31 Dec 2011) (in millions) 513.10 245.20 121.00 101.23 79.25 67.36 61.47 55.00 52.73 50.29 45.04 42.00 40.33 36.50 35.80 35.00 30.65 30.52 29.70 29.13 28.00 27.76 25.00 23.85

Egypt Australia Thailand Malaysia Taiwan Morocco Ukraine Netherlands Saudi Arabia Venezuela Kenya Chile Peru Romania Belgium Sweden Uzbekistan Czech Republic South Africa Hungary Switzerland Austria Bangladesh Portugal Kazakhstan Israel Greece Tanzania Denmark Hong Kong Algeria Finland Norway Syria Belarus Slovakia Sudan Uganda Dominican Republic Serbia Ecuador Tunisia

21691776 19554832 18310000 17723000 16147000 15656192 15300000 15071191 11400000 10976342 10492785 10000000 9973244 8578484 8489901 8441718 7550000 7220732 6800000 6516627 6430363 6143600 5501609 5455217 5448965 5263146 5043550 4932535 4923824 4894913 4700000 4661265 4560572 4469000 4436800 4337868 4200000 4178085 4120801 4107000 4075500 3856984

21.69 19.55 18.31 17.72 16.15 15.66 15.30 15.07 11.40 10.98 10.49 10.00 9.97 8.58 8.49 8.44 7.55 7.22 6.80 6.52 6.43 6.14 5.50 5.46 5.45 5.26 5.04 4.93 4.92 4.89 4.70 4.66 4.56 4.47 4.44 4.34 4.20 4.18 4.12 4.11 4.08 3.86

Azerbaijan Singapore New Zealand United Arab Emirates Bulgaria Ireland Croatia Yemen Sri Lanka Guatemala Kyrgyzstan Lithuania Ghana Nepal Costa Rica Senegal Jordan Bolivia BosniaHerzegovina Uruguay Oman Cuba Puerto Rico Latvia Paraguay Palestine (West Bk.) Panama Zimbabwe Albania Moldova Slovenia Armenia Lebanon Iraq Georgia El Salvador Afghanistan Kuwait Macedonia Honduras

3689000 3658400 3625553 3555100 3464287 3122358 2656089 2609698 2503194 2280000 2194400 2103471 2085501 2031245 2000000 1989396 1987400 1985970 1955277 1855000 1741804 1702206 1698301 1540859 1523273 1512273 1503441 1445717 1441928 1429154 1420776 1396550 1367220 1303760 1300000 1257380 1256470 1100000 1069432 1067560

3.69 3.66 3.63 3.56 3.46 3.12 2.66 2.61 2.50 2.28 2.19 2.10 2.09 2.03 2.00 1.99 1.99 1.99 1.96 1.86 1.74 1.70 1.70 1.54 1.52 1.51 1.50 1.45 1.44 1.43 1.42 1.40 1.37 1.30 1.30 1.26 1.26 1.10 1.07 1.07

a) Summary Statistics: Summary Statistics 21.103590 Mean 9 Median 4.9093685 Standard 57.217923 Deviation 8 Sample 3273.8908 Variance 04 54.570740 Kurtosis 44 6.8379021 Skewness 02 Range 512.03244 Minimum 1.06756 Maximum 513.1 b) Pictorial Representation

Box Plot (The Box Plot.xlsx available in the companion CD of book Complete Business Statistics 7e Aczel, et. al. is used to generate the box plots)
Lower Whisker 1.06756 Lower Hinge 2.044809 Median 4.909369 Upper Hinge 18.16325 Upper Whisker 42

As seen from the above box plot, there are outliers in our data set which are impacting the proper representation of the data. These outliers are Country Internet Users(as of 31 Dec 2011) Internet Users (as of 31 Dec 2011) (in

millions) China United States India Japan Brazil 513100000 245203319 121000000 101228736 79245740 513.10 245.20 121.00 101.23 79.25

We shall plot the box plot without these outliers and see the impact on the pictorial representation of the data

Lower Whisker 1.06756

Lower Hinge 2

Median 4.661265

Upper Hinge 15.3

Upper Whisker 35

Histogram

Bin 10 20 30 40 50 60 70 80 100 200 300 400 500 More

Freque ncy 71 10 7 5 3 3 2 1 0 2 1 0 0 1

c)

Report The mean of the data set is 21.10 million indicating the average number of internet users in the data set. The range of the data set is 512.03 which is very large because of the wide variation in the minimum and maximum value of the number of internet users. The data set is negatively skewed with large number of countries having number of internet users less than 10 million. The kurtosis of the data set is also high i.e. 54.57 indicating highly peaked data. The effect of the outliers on the representation of the data is clearly evident from the box plots. Countries like China, USA, India, Japan and Brazil have very high number of internet users and affect the summary statistics of the whole data set.

d) Inference The data set summary statistics gives a false impression that the mean number of internet users in the countries is very high i.e. 21.1 million but this is not the case. The outliers in this data set have such huge number of internet users that the data summary statistics is highly affected due to inclusion of these outliers. To get the correct picture of the data set, these outliers must be removed from the data set and then the summary statistics and analysis must be done to get a better representation of the data set. As represented below the summary statistics gives a much better view of the data when the outliers are removed.

Summary Statistics(Without Outliers) 11.655473 Mean 66 Median 4.661265 Standard 15.060405 Deviation 02 226.81579 Sample Variance 94 2.9474004 Kurtosis 69 1.8844453 Skewness 81 Range 66.297338 Minimum 1.06756 Maximum 67.364898

Vous aimerez peut-être aussi