Académique Documents
Professionnel Documents
Culture Documents
Data Set 1 is a continuous variable because weight can be measured in infinite detail, dependent on the degree of precision attainable by the measuring equipment. Data Set 2 is an example of a discrete variable because the plots can only contain a finite number of seedlings, and the seedlings themselves can only be considered present or non-present. Data Set 3 can be considered derived variables as the level of infection in the population is calculated using the number of infected species.
2.
Data Set 2 Data Set 3 5.9097 0.4124 3 0.4 40.4044 0.2224 6.3564 0.0495
800 700 600 500 400 300 200 100 0 0 0.2 0.4 0.6 0.8 1 Level of Beetle Infection
Frequency
Frequency
4. The near equivalence of the mean and median of Data Set 1, coupled with the low deviation and continuous nature of the data suggest a normal distribution. Figure One indicates the data has a strong positive skew, which may require consideration in further statistical testing. Data Set 2 is likely a poisson distribution given difference between its median and mean and high level of variance. Figure Two supports this as it shows multiple frequency peaks throughout the data range. The median and mean of Data Set 3 are essentially equal, indicating that the data mirrors itself before and after the mean, as shown in the Fig Three. 5. I used my findings in question 4 to determine the tests used in question 5; Kolmogorov-Smirnov test for a normal distribution on Data Set 1, Kolmogorov-Smirnov test for a poisson distribution on Data Set 2 and a Wilcoxon Sign Rank test for Data Set 3. Table two shows that my distribution predictions for Data Sets 1 and 3 were correct but that Data Set 2 is not a poisson distribution.
Table 2: Statisical results of Kolmogorov-Smirnov and Wilcoxon Sign Rank tests on Data Sets 1, 2,and 3
6. Although Data Set 1 was found to have a normal distribution, its large skew means that 3 out of every 20 sample means will not have a normal distribution. In order to minimize this number it is possible to manipulate the data so that it is centralized, lessening the standard deviation. One way to do this would be to apply the log function to the data set.