Vous êtes sur la page 1sur 5

1/18/2020 © Universal Class, Inc.

- Class Lesson: Lesson 1 - Statistics Terms and Motivation

Lesson 1 - Statistics Terms and Motivation


Lesson Summary: This lesson will explain why the study of statistics is important, and it will
introduce you to some of the key terms and ideas that will be critical to understanding
subsequent lessons.

Introduction

This lesson will explain why the study of statistics is important, and it will introduce
you to some of the key terms and ideas that will be critical to understanding
subsequent lessons. Although this lesson is not math intensive, it lays the
foundation for understanding the mathematics that we will discuss in later lessons.

Key Terms

o Statistics

o Univariate data

o Bivariate data

o Multivariate data

o Population

o Sample

Objectives

o Learn why a knowledge of statistics is important and helpful

o Recognize and understand the meaning of various key terms

Lesson

https://app1.universalclass.com/z/406/9770453/shortform/printpreview.htm?LessonID=55675&FLAG=PrintPreview 1/5
1/18/2020 © Universal Class, Inc. - Class Lesson: Lesson 1 - Statistics Terms and Motivation

Introduction to Statistics

Statistics is a subject that has earned a certain amount of notoriety because of its
misuse in various contexts. Nevertheless, statistics is a tool that, if used properly,
can be of tremendous help in math, science, engineering, history, politics, and
numerous other fields. As you study this subject, always keep in mind that statistics
is more than just math: it is not simply manipulation of numbers through addition,
subtraction, multiplication, division, and other mathematical operations. Statistics
also involves language and units: when a statistician (or layman) provides a
statistic, it involves a number and a label of some sort. For instance, the number
5.3 is not in and of itself a statistical value; "an average age of 5.3 years," however,
is a statistical value. This linguistic aspect of statistics sometimes allows a certain
amount of ambiguity that can be misleading. By studying statistics, you will equip
yourself to identify and understand both uses and abuses of this tool.

Statistics is used for quantifying sets of data such as attributes of a group of people
and measurements taken in a laboratory. Consider, for instance, the population of a
particular country. The people who reside in that country have varying heights:
some are short, some are tall, some are in between. If we wanted to compare the
height of this population with that of some other population in a convenient manner,
we would not want to compare individual people. Such a task would be burdensome
(the number of people in a country might be in the millions or billions) and would
not necessarily be particularly helpful as a means of comparing populations as a
whole. Instead, we can use an average or median height as the basis for our
comparison. These statistical values are single numbers that quantify the data (the
heights of a country's population) and that provide a convenient way to express and
compare certain characteristics of those data. Part of the goal of this course is to
teach you how to select and use statistical tools like averages and medians, as well
as a host of others, in assessing and comparing data.

Simply defined, statistics (sometimes colloquially termed "stats") is the study of


collecting, analyzing, interpreting, and representing of sets of numerical data. Thus,
virtually any field of study that uses numbers can, at least occasionally, involve
statistics. Statistics, because it makes extensive use of numbers, is math-intensive,
and a decent grasp of basic arithmetic and algebra is required to study this field.

Types of Data

A set of data can involve a single variable or multiple variables. In this course, we
will only consider data sets that involve one variable (univariate data) or two
variables (bivariate data). The height of persons in a particular country is an
example of univariate data, since there is only one variable: height. An example of
https://app1.universalclass.com/z/406/9770453/shortform/printpreview.htm?LessonID=55675&FLAG=PrintPreview 2/5
1/18/2020 © Universal Class, Inc. - Class Lesson: Lesson 1 - Statistics Terms and Motivation

bivariate data is a person's height with respect to his age; in this case, there are
two variables: age and height. Much of the course studies univariate data, but the
principles that apply in this case also be extended to multivariate data (two,
three, or more variables). The course also covers (to a lesser extent) bivariate data,
which allows us to explore concepts such as correlation and regression.

Practice Problem: A pollster wants to find out the relationship between age and
income for a certain segment of the population. How many variables are involved in
the data that the pollster must analyze?

Solution: The pollster is looking at two different aspects of the population: age and
income. Thus, he is dealing with two different variables (bivariate data).

Populations versus Samples

Obviously, it is not always possible for a scientist, pollster, anthropologist,


journalist, or other professional (or non-professional) to measure or consider every
last member of a particular set to calculate a statistic. Ideally, measurement of
every member is required to get an exact value. Thus, if you wanted to calculate
the average gas mileage of all vehicles currently on the road, you would need to go
measure (or simply record) the gas mileage of every last vehicle. This would be a
daunting task that would not be worth the time and trouble. Instead of taking this
exhaustive approach, you might instead take the approach of measuring or
recording the mileage of a representative subset of the vehicles on the road and
using this subset to calculate the average. Such an approach offers obvious
benefits, but it also requires more care, since assumptions must be made
concerning what constitutes a "representative" subset. The set of all vehicles would
be considered a population--the totality of all members of a particular set. A
subset of vehicles used for the purposes of statistical analysis is a sample--some
portion of the population. Whether or not a particular sample is actually
representative may be debatable (inappropriately skewing a sample is one potential
way in which statistics can be abused).

It is important to note that the term population need not necessarily refer to
people. A population can be the set of all vehicles, the set of all potential outcomes
of an event or series of events, or the set of all entities of a given type (for
instance, the set of all stars in the universe). The population in a given context is
simply the set of all instances from which we might choose a sample for statistical
analysis.

https://app1.universalclass.com/z/406/9770453/shortform/printpreview.htm?LessonID=55675&FLAG=PrintPreview 3/5
1/18/2020 © Universal Class, Inc. - Class Lesson: Lesson 1 - Statistics Terms and Motivation

To differentiate between statistical calculations for populations versus those for


samples, Greek letters are used for the former (for instance, σ for the population
standard deviation) and Latin letters are used for the latter (for instance, s for the
sample standard deviation). Although these choices are purely conventional, they
are helpful in avoiding confusion and ambiguity.

Practice Problem: An investor wants to determine how to diversify his portfolio.


To quantify each potential investment area (technology, financial, manufacturing,
and so on), he calculates the average growth of the top 20 companies in each area.
Is the average growth in each area that of the population or a sample?

Solution: Each area of investment could have numerous companies--probably far


more than 20. Thus, the investor is calculating the growth of a sample of
companies from each area rather than that of the population.

Exercises

1. Determine whether a statistical analysis should best involve examination of the


population or of a sample in each case.

a. Average weight of humans on Earth

b. Median income of members of Congress

c. Variance of relative brightness of stars in the Milky Way

2. A biologist studying wolves wants to find a correlation between the lifespan of


wolves and the geographic latitude of their habitats. How many variables are
involved in such an analysis?

https://app1.universalclass.com/z/406/9770453/shortform/printpreview.htm?LessonID=55675&FLAG=PrintPreview 4/5
1/18/2020 © Universal Class, Inc. - Class Lesson: Lesson 1 - Statistics Terms and Motivation

Answers to Exercises

1. a. Sample

b. Population

c. Sample

In each case, simply determine whether it is feasible to measure or otherwise


determine the appropriate value for each member of the population. Since there
are over six billion humans on Earth and many billions of stars in the Milky Way, a
sample is best in these cases. Since there are only several hundred members of
Congress, it is relatively simple to calculate the median for the population.

2. Two variable (bivariate data)

The biologist wants to relate latitude to lifespan; by finding a representative


sample of the age of the wolves for a set of latitudes, such a correlation can be
proven (or disproven). The data thus involves two variables.

© Universal Class, Inc.

https://app1.universalclass.com/z/406/9770453/shortform/printpreview.htm?LessonID=55675&FLAG=PrintPreview 5/5

Vous aimerez peut-être aussi