Vous êtes sur la page 1sur 4

Assignment 1 (Sol.

)
Introduction to Data Analytics
Prof. Nandan Sudarsanam & Prof. B. Ravindran

1. In descriptive statistics, the aim is to:


(a) do the analysis of data that helps describe or show data in a meaningful way such that,
for example, patterns might emerge from the data.
(b) use probability theory to learn about population from a sample data.
(c) quantitatively describe or summarize the data.
(d) describe the data by measures of central tendency and measures of variability.
(e) all of the above
Sol. (a), (c) & (d)
Descriptive statistics provide simple summaries about the sample and about the observations
that have been made. Such summaries may be either quantitative, i.e. summary statistics
(measures of central tendency and measures of variability), or visual(histograms, pie charts,
scatter plots etc.) On the other hand using probability theory to learn about population from
a sample data is the part of inferential statistics.
2. Which of the following is a measure of dispersion:
(a) percentiles.
(b) quartiles.
(c) interquartile range.
(d) all of the above.
(e) none of the above.
Sol. (c)
The interquartile range (IQR) is a measure of variability, based on dividing a data set into
quartiles. Quartiles divide a rank-ordered data set into four equal parts. IQR is a measure of
statistical dispersion, being equal to the difference between 75th and 25th percentiles.
3. In a statistical study, when data are collected only for a portion or subset of all the elements
of interest, we are using:
(a) a sample
(b) a parameter
(c) a population

1
(d) (a) and (b)
(e) (b) and (c)
Sol. (a)

4. To test the linear relationship between y (dependent) and x (independent) continuous variables,
the best plot is:
(a) bar chart
(b) scatter plot
(c) histogram
(d) pie chart
(e) none of the above
Sol. (b)
To test the linear relationship between continuous variables Scatter plot is a good option. We
can find out how one variable is changing w.r.t. another variable. A scatter plot displays the
relationship between two quantitative variables.
5. The algebraic sum of deviations from the mean is
(a) mean
(b) zero
(c) maximum
(d) minimum
(e) undefined
Sol. (b)
While calculating the algebraic sum, the sum deviations with the positive and negative sign
are equal in magnitude. So they cancel out each other.
6. In an agriculture research center, the scientists collected the past 20 years data of rainfall along
with the crop yield. If they want to perform regression analysis on this data, which variable
should they consider to be the independent variable and which one should they consider being
the dependent variable?
(a) Independent variable: yield, Dependent variable: rainfall
(b) Independent variable: rainfall, Dependent variable: yield
Sol. (b) We expect the amount of rainfall to have an impact on the crop yield and not the
other way around.
7. In a glass production house, John recorded the temperature values in degree Celsius. After
working out he came to know that mean of the data is 28.6 C and variance is 4.0( C)2 . If the
data values were converted to Fahrenheit (F), what would be the values of mean and variance?
We use the following formula to convert a temperature value from degrees Celsius (C) to
Fahrenheit (F)
9
F = C + 32
5

2
(a) mean = 28.6 F and variance = 4.0( F )2
(b) mean = 57.2 F and variance = 8.0( F )2
(c) mean = 87.22 F and variance = 16.38( F )2
(d) mean = 83.48 F and variance = 12.96( F )2
Sol. (d)
Mean = 28.6 + 32 = 83.48 F
9
5
 2
Variance = 95 4.0 = 12.96( F )2

8. If a data set has even number of observations. Then median of the data set:

(a) cannot be calculated


(b) is equal to the mean
(c) is average of two middle items
(d) is one of the two middle items, chosen at random
(e) none of the above
Sol. (c)
9. In a given data set of 100 observations, if the largest value is doubled, which of the following
option is/are false? (assume that largest value is non zero)
(Note: multiple options may be correct)
(a) the variance increases
(b) the mean increases
(c) the median increases
(d) the IQR increases
(e) the range increases
Sol. (c) & (d)
Median and interquartile range are resistant, so they dont change.
10. A Boeing 747 passenger aircraft gets cancelled while severe snowstorms. The following his-
togram shows the number of days missed (per year) in last 75 years. Which of the following
should you use as a measure to describe the center of the distribution?

3
(a) mean, because it covers information from all 75 years
(b) IQR, because it is unaffected by the outliers
(c) median, because the distribution is skewed to the right
(d) standard deviation, because it is unaffected by outliers and the distribution is skewed

Sol. (c)
IQR and standard deviation are not measures of the central tendency. Since the distribution is
right skewed, mean will be biased towards values with higher frequency. Median is a resistant
measure which should be used here.

Vous aimerez peut-être aussi