Vous êtes sur la page 1sur 14

The Data

Exploration
Project

By

Gerald Hendrix
AP Statistics
Fudge 7th

The Data Exploration Project | Gerald Hendrix

INTRODUCTION
Physical attributes are perhaps the most important aspects of athletes. Looking at
different types of sports, certain genetic lines are obviously more predetermined to
succeed at sports that utilize specific traits. Looking at football, one of the most diverse
sports to see such physical attributes at work, one can see many different body types on
the field at any given time. Linebackers tend to be bigger guys, both in weight and height,
than a running back. Rowing is no different. Crew is dominated by people who are
vertically gifted. Although shorter people can be successful (short being relative; it is
very rare to see anyone shorter than 511 competing at higher levels of competition),
those who are taller typically find themselves at an advantage when competing. Taller
people are able to extend the length of their stroke, propelling the boat through the water
for longer. Though height is an important factor in a successful oarsman, coaches take
into account other variables when looking at prospective athletes.
Rowing machines (or ergs) are an important tool for evaluating athletes. The
machine is not biased and shows an athletes individual speed based strictly off of the
amount of work he or she is putting into the machine. Although there are many
confounding variables when testing an athletes ability solely off of a rowing machine, it
does give coaches a standardized assessment for an athletes speed.
Another reason that coaches pay so much attention to an athletes erg scores is a
theory called Weight-to-Watt Ratio. The beliefs found in this theory are stated in its
title; the theory is based off of the average amount of watts an athlete pulls over a 2kilometer piece. Coaches who follow this theory will put the average number of watts
from the piece over the athletes weight. The outcome is how many times the athlete pulls
his or her body weight during the piece. An equation is shown below:

=

To give an example, say a high school athlete who weighs 160 pounds goes 2kilometers in 6 minutes and 30 seconds, meaning his average wattage was 377.6 watts
per stroke. In order for him to estimate the amount of weight that he pulled, he would set
up his equation:
377.6
= 2.36
160
What this tells the athlete is that he pulled 2.36 times his bodyweight, or 2.36
watts for every pound that he weighed.
Why is this important? According to the Weight-to-Watt ratio, every watt generated is
equivalent to a pound of pressure during the stroke. When racing, a rower must generate
enough energy to move himself, the added weight of the boat and everything in it, and his
or her teammates, plus excess power to add speed. Coaches look at athletes weight-towatt ratio to get an estimate as to how much power the athlete generates, and how
effective he or she would be in the boat, without even meeting them.

The Data Exploration Project | Gerald Hendrix

THE DATA
In order to calculate the weight-to-watt ratio of athletes, two things must be
collected: their weight, and the time it takes them to go 2-kilometers on a rowing
machine. Using a calculator provided by Concept 2 , the athletes time can then be
converted to watts and used in the Weight-to-Watt Ratio Formula.
The population that I pulled from were rowers who had signed up for a
recruiting website and had been featured in the websites most popular athletes lists. The
reason I decided to look at these lists as opposed to randomly selecting athletes from all
over the website was because the athletes on these lists represented what college coaches
wanted to see out of high school athletes, or else their profiles would not have been
clicked on.
The stats of each athlete that I collected was the athletes height, weight, and 2kilometer personal record. From this information, I was able to convert the 2-kilometer
time into average wattage and then find that athletes weight-to-watt ratio. For
standardization, I converted the height to inches and kept the weight in pounds. Weightto-watt ratios can be shown as watts generated per kilogram of weight, however to keep
the data friendly the weight-to-watt ratios are all representative of watts pulled per pound.
I collected my data by randomly selecting three months from the 2014-2015
school year and then viewed the websites Top 10 Most Popular Athletes for that
month. I would view each athletes profile and record my needed data.
The website used: www.berecruited.com
I collected the weight and 2-kilometer personal record in order to be able to
calculate that athletes weight-to-watt ratio. Although the height is not necessary for this
calculation, I believed that it would be interesting to see how the athletes height affected
his weight-to-watt ratio.
My motivation for collecting this type of data is that I am being recruited
myself, and I was interested to see the stats of other people in my recruiting class who
were getting a lot of interest from collegiate programs in order to compare myself to
them. Part of the motivation for collecting this data is to see where I should be in order to
get more collegiate attention. The data that I collected all came from male rowers who are
seniors as of fall 2015 in order to not compare the stats of an experienced senior male
rower to an inexperienced freshman female rower. Doing this, I was able to keep the data
more consistent and balanced.

The Data Exploration Project | Gerald Hendrix

FURTHER CALCULATIONS
In order to make sense of the data collected, there are multiple values that can be
found with a calculator that makes the information easier to understand. The first step it
inputting the collected data into a calculator. Hit Stat -> Edit in order to input your data
into L1. Once all data points are in the calculator, hit 2nd -> Quit in order to return to the
calculators main screen, and then again hit Stat -> Calc and scroll down to 1-Var Stats.
Make sure that List is set to L1, and then press Enter down to calculate. A list of a
numerous variables will be calculated. We will only be interested in the bottom 6
variables.
The bottom 6 variables tells us (in order from top to bottom): sample size,
minimum (smallest data value), first quartile (the middle number between the smallest
number and the median), median (middle of the data collected), the third quartile (the
middle number between the median and the largest number), and the maximum (largest
data value). The bottom 5 variables together giver us our 5 number summary.
For my data:
Sample Size: 30
Five Number Summary:
o Minimum: 1.653
o First Quartile: 1.894
o Median:2.109
o Third Quartile: 2.227
o Maximum: 2.409
In order to find the mean and standard variation, all you have to do is scroll back
up. The mean is the very top variable (x with a line over it). Standard deviation is given
twice: Sx gives the sample standard deviation, while x gives the population standard
deviation. (Standard Deviation and Variance on a Graphing Calculator, 2013)
Since I used data taken from a sample, I will use Sx as my standard deviation.
The range of a set of data is taken by subtracting the minimum value from the
maximum value. This gives the distance on the number line that all of the data falls on.
Range:
2.409 1.653 = .756
The variance is simply the standard deviation squared. After you have
determined which standard deviation to use, hit 2nd -> Quit to return the calculator home
screen and type in the value of the standard deviation and then hit x2 - > to find variance.
Variance:
. 2022 = .041

The Data Exploration Project | Gerald Hendrix


Finally, to find the interquartile range (IQR) you subtract the first quartile from
the second quartile. This gives you the range of the middle 50% of you data. (Quartiles,
n.d.)
IQR:
2.227 1.894 = .333
All of the data grouped together:
Mean: 2.081
Median: 2.109
Range: .756
Standard Deviation: .202
Variance: .041
IQR: .333
In order to be able to look at his data fully, we need to identify which data is
detached from the main body. This data, the outliers, should be considered outside of the
normal range and should be identified as outliers in order to not confuse the data.
Outliers are found by multiplying the IQR by 1.5 and then subtracting it from
the first quartile or adding it to the second quartile. If a value falls outside of the
calculation, then it is considered an outlier. (Stapel, n.d.)
Outliers:
1.5(. 333) = .4995
1.894 .4995 = 1.3945
2.227 + .4995 = 2.7265
Considering the minimum nor maximum values of my data does not fall outside
of the range 1.3945-2.7265, I know that none of my data points are outliers.

The Data Exploration Project | Gerald Hendrix

HISTOGRAM, BOXPLOT, AND STEMPLOT

Frequency

FREQUENCY OF
WEIGHT-TO-WATT
RATIOS
10
8
6
4
2
0

9
5
1

6
4

2
0

1 . 6 5 3 1 . 7 9 3 1 . 9 3 3 2 . 0 7 3 2 . 2 1 3 2 . 3 5 3 2 . 4 9 3M O R E

Split Stem Plot of Weigh-to-Watt Ratios (rounded to the nearest tenth)


1
6778999
2

0000111111222223333344

Key:
1 6
1.6

The data is skewed left,


meaning that more people had
higher weight to watt ratios.
The spread was just .756. The
median of the data was 2.109.
There are no outliers.

The Data Exploration Project | Gerald Hendrix

FURTHER CALCULATIONS PT. 2 (+100)


Now we will recalculate all of the values found above in Further Calculations
for when the data points have 100 added to them. We will re-enter the data values into the
calculator under L2 and proceed with the same steps as before to find their values.
For the new data:
Sample Size: 30
5 Number Summary:
o Minimum: 101.653
o First Quartile: 101.894
o Median: 102.109
o Third Quartile: 102.254
o Maximum: 102.409
To find the new range, we will still subtract the minimum value from the
maximum value:
102.409 101.653 = .756
The range is .756, which is the same as in Further Calculations, because adding
a numerical value to every data point will ultimately just increase the values and not their
spread.
We can still see the mean and standard deviation the same way that we saw
them in Further Calculations. By scrolling up from where we see the 5 number summary,
the mean is still represented by the x with the line over it, while the standard deviation
that well use is still Sx. By adding 100, all that we really did was increase the mean by
100 units, while the standard deviation stayed the same.
Our new mean is 102.08. The standard deviation stays at .202.
Since our standard deviation is the same, we should expect our variance to
remain the same too. After all, it is directly related to the standard deviation.
. 2022 = .041
As is expected, our variance has remained at .041.
Since we saw that our range stayed the same, once again we can expect the IQR
of the data set to be the same since it is technically the range between the first and third
quartiles.
102.254 101.894 = .333

The Data Exploration Project | Gerald Hendrix


When we group this data:
Mean: 102.08
Median: 102.109
Range: .756
Standard Deviation: .202
Variance: .041
IQR: .333

While observing the new trends for this data, I noticed that the mean and the
median both increased with the data by exactly 100, since they both are used to
describe the data directly. The standard deviation, on the other hand, stayed the same
as the original because it is used to describe more of the patterns of the data, which
hadnt changed.

HISTOGRAM, BOXPLOT, AND STEMPLOT PT. 2 (+100)

FREQUENCY OF WEIGHT-TOWATT RATIOS (+100)


10

Frequency

8
6
6

5
4

4
2
2

1
0

0
101.653 101.793 101.933 102.073 102.213 102.353 102.492 MORE

The Data Exploration Project | Gerald Hendrix

Stem Plot for Weight-to-Watt Ratios (rounded to the nearest tenth)


101
6778999
102
0000111111222223333344
Key:
101

6
The data is skewed left. The
spread was only .756. The
median was 102.109. There
were no outliers.

101.6

FURTHER CALCULATIONS PT. 3 (+50%)


We will recalculate one last time our data points by adding 50% of each value to
that value. We can expect to see every variable change, as the percent of change varies
per data point.
Once again, we can begin with inputting all of our new data into the L3 section
of our calculator under Stats -> Edit. Once the values are in the calculator, exit out of Edit
and go back to Stat -> Calc -> 1-Var Stat. Make sure your list is set to L3 before you hit
enter. We can once again see the sample size plus the 5 number summary by scrolling
down the list.
For the new data:
Sample Size: 30
Minimum: 2.48
First Quartile: 2.841
Median: 3.164
Third Quartile: 3.341
Maximum: 3.614

The Data Exploration Project | Gerald Hendrix


The new range is found by subtracting the minimum from the maximum:
3.614 2.48 = 1.134
We can see that the new range increased with the rest of the values by 50% of its
own value, following the trend set by the data points. This is due to the range being
directly related to the data points.
In order to see the mean and standard deviation, we will still go to 1-Var Stats
and look at the x with the line above it and Sx. Both the mean and the standard
deviation have increased; the mean now sits at 3.122 and the standard deviation is at
3.03.
Once again, these values increased with the rest of the data set by 50% of the
original value.
To find variance, we will once again square the standard deviation:
. 3032 = .092
The variance did not follow the trend that the other values have been following,
however this is because it is related directly to the standard deviation, and not the data
set.
The last value we will find is the IQR, taken from subtracting the first quartile
from the third quartile:
3.341 2.48 = .861
When we group this data:
Mean: 3.122
Median: 3.164
Range: 1.134
Standard Deviation: .303
Variance: .092
IQR: .861
The mean, median, and standard deviation in this calculation are all 50% greater than
the mean, median and standard deviation in the first calculation.

The Data Exploration Project | Gerald Hendrix

HISTORGRAM, BOXPLOT, STEMPLOT PT. 3 (+50%)

FREQUENCY OF WEIGHT-TO-WATT
RATIOS (+50%)
9

Frequency

7
6

5
4

3
2

0
2.48

2.68

2.88

3.08

3.28

3.48

3.68

Frequency of Weight to Watt Ratios (50%) (rounded to the nearest tenth)


2
56678888
3
001122223333334444
3
566
Key:
2 5
2.5

10

The data is skewed left. The


median of the data is 3.164.
The spread is 1.134. There are
no outliers.

MORE

The Data Exploration Project | Gerald Hendrix

DISTRIBUTION
In order to find the number of people who are 5 units above my mean, I first
must decide on the unit I will use. In order to keep it simple, we will round to the nearest
hundredth for each data point.
To figure out how many of my original data points are 5 units above the mean, I
am going to need my sample size as well as the original mean:
Sample Size: 30
Mean: 2.081
Now all that we have to do is count the number of figures 5 units (hundredths)
above 2.081.
2.254
1.894
1.653
2.107
2.196
2.194
2.111
2.218
2.304
2.275
2.185
2.227
1.707
1.979
2.176
2.024
1.887
1.831
2.292
2.261
1.876
1.889
2.397
2.409
2.078
1.736
2.102
2.067
1.961
2.147
There are 13 figures that are greater than 2.081 plus 5 units (2.131).
13
= 43.33%
30
The next step will be a bit tricky; we are going to find the percentage that are 3
hundredths below the mean and 2 hundredths above the mean. To calculate the range of
where the weight-to-watt ratios will fall:
2.081 .03 = 2.051
2.081 + .02 = 2.101
In order for a weight-to-watt ratio to be counted it must fall between:

2.254
2.218
2.176
1.889
1.961

2.194
2.304
2.024
2.397
2.147

1.894
2.275
1.887
2.409

2.051 2.101
1.653
2.107
2.185
2.227
1.831
2.292
2.078
1.736

2.111
1.707
2.261
2.102

2.196
1.979
1.876
2.067

There are only 2 data points that match this requirement.


2
= 6.67%
30
In order to find the number of units required for the top 10%, all we do is divide
our sample size by 10:

11

The Data Exploration Project | Gerald Hendrix

30
=3
10
There are 3 units required to be in the top 10%. These units are: 2.409, 2.397,
and 2.304.

CONCLUSION
After having studied the data thoroughly for hours, we can come to the
conclusion that weight-to-watt ratio does increase a coachs interest. Many of the subjects
that data was collected on were well above pulling 2.1x their bodyweight over 2kilometers. Although there are confounding variables, such as the persons experience
rowing, and their fat-to-muscle ratio, the Weight-to-Watt Ratio theory is a good predictor
as it what a person can bring to the coachs team, in regards to speed. There were a few
rowers whom I took data on who did not have the best weight-to-watt ratio, but looking
back at the data you can see that they were just really fast on a rowing machine.
Despite the few ratios that were relatively low, a majority of the rowers on the
Most Popular Athlete lists tend to pull at least twice their bodyweight. Despite many
confounding variables, athletes who can pull a good weight-to-watt ratio can grab the
attention of many collegiate coaches. Unless they are just plain fast, athletes who have
good weight-to-watt ratios show coaches that they are just as good on the water as on the
machine, which is ultimately where they will end up racing.

12

The Data Exploration Project | Gerald Hendrix

REFERENCES
Quartiles. (n.d.). Retrieved from Math Is Fun:
https://www.mathsisfun.com/data/quartiles.html
Standard Deviation and Variance on a Graphing Calculator. (2013, April 17). Retrieved
from Math Bootcamps RSS: http://www.mathbootcamps.com/how-to-find-thestandard-deviation-and-variance-with-a-graphing-calculator-ti83-or-ti84/
Stapel, E. (n.d.). Box-and-Whisker Plots: Interquartile Ranges and Outliers. Retrieved
from Purplemath: http://www.purplemath.com/modules/boxwhisk3.htm

13

Vous aimerez peut-être aussi