Vous êtes sur la page 1sur 13

1

Ryan Butler
1040 Final Project
Eportfolio Link: http://rbutler635147.weebly.com

Statistical Analysis: College Graduates Starting


Compensations
We will be evaluating the different entry level salaries for college graduates
by using Histograms for our visual reference. These Histograms were created using
Data supplied from the US Department of Labor Statistics on college students. The
Data was collected in 2014. I can therefore use these data to compare and analyze
my choice in a major.
I will use a variety of procedures in this project, which will allow me to analyze the
data from a variety of standpoints. I will determine the five-number summary the
minimum, median, maximum and first and second quartiles for each field. I will
create boxplots to visually display these five-number summaries. I will determine
the IQR (inner quartile range) and the upper and lower fences, and, subsequently,
any outliers in each major field. I will construct confidence intervals of the data,
and finally, I will analyze hypothesis tests for the data.
Major
Field

Business

Communi
cations

Computer
Science

Education

Engineerin

Humaniti
es &
Social
Sciences

Sample
Mean

54537

46227

59542

40021

60664

38817

41923

Sample
Std.
Dev.

7232

7214

4670

2365

7879

7146

4938

Data Collection
Histograms for Compensation Distribution for the
Following Fields of Study

Math &
Sciences

Business
30
20
Frequency

Frequency

10
0
2

Salary in Tens of Thousands

Communications
30
20
Frequency

Frequency

10
0
2

Salary in Tens of Thousands

Computer Science
30
20
Frequency

Frequency

10
0
1

Salary in Tens of Thousands

More

Education
30
20
Frequency

Frequency

10
0
2

Salary in Tens of Thousands

Engineering
30
20
Frequency

Frequency

10
0
2

Salary of Tens of Thousands

Humanities & Social Sciences


30
20
Frequency

Frequency

10
0
2

Salary in Tens of Thousands

Mathematics & Sciences


40
Frequency

20

Frequency

0
2

Salary in Tens of Thousands

So what we see here based on our visual analysis of our data is that our
highest paying degree is engineering and the lowest paying degree would be
Humanities and Social Sciences. Not all of the graphs exhibit the normal
distribution, mathematics and science is very right skewed, while computer science
has a small left skew.
After analyzing the data, this would seem discouraging because I have chosen
Sociology and Social Work as my focus. I guess one could say its a good thing Im
not doing it for the money.
Next we will evaluate our data by using a 5-number summary and boxplots for
visual analysis.

Boxplot:
Graduating Student Compensation Distribution for each
Field of Study

Business

40000

45000

50000

55000

Business:

60000

65000

70000

75000

IQR: 62633-46710= 15923


Outliers: 46710-1.5 x 15923= 22825.50 22826
62633+1.5 x 15923= 86517.50 86518

Communications

30000

35000

40000

45000

50000

55000

60000

Communications:
IQR: 54728-39263= 15465
Outliers: 39263-1.5 x 15465= 16065.50 16066
54728+1.5 x 15465= 77925.50 77926

65000

Computer Science

45000

50000

Computer Science:

55000

60000

65000

70000

IQR: 63945-54709= 9236


Outliers: 54709-1.5 x 9236= 40855
63945+1.5 x 9236= 77799

Education

35000

36000

Education:

37000

38000

39000

40000

41000

42000

43000

44000

IQR: 42801-37346= 5455


Outliers: 37346-1.5 x 5455= 29163.50 29164
42801+1.5 x 5455= 50983.50 50984

45000

Engineering

45000

50000

Engineering:

55000

60000

65000

70000

75000

80000

85000

IQR: 70444-52931= 17513


Outliers: 52931-1.5 x 17513= 26661.50 26662
70444+1.5 x 17513= 96713.50 96714

Humanities & Social Sciences

20000

25000

30000

35000

40000

45000

50000

Humanities and Social Sciences:


IQR: 46313-31308= 15005
Outliers: 31308-1.5 x 15005= 8800.50 8801
46313+1.5 x 15005= 68820.50 68821

55000

Mathematics & Sciences

30000

35000

40000

45000

50000

55000

Mathematics and Sciences:


IQR: 46795-36193= 10602
Outliers: 36193-1.5 x 10602= 20290
46795+1.5 x 10602= 62698

Observations of the Boxplots


No Outliers were found using the formula (Q1-1.5xIQR and Q3+1.5xIQR) and
the IQR formula (Q3-Q1). None of the major fields have extreme high or low values.
We would normally use the t-distribution since it is a sample, however according to
the Central Limit Theorem when n > 30 we can assume that the distribution is
normal. Therefore, the t and normal distributions would be appropriate for this data
set.

9
Ryan Butler
Term Project- Part 2

Confidence Interval Estimates


A confidence interval is a mathematically derived range of numbers, based
on the given sample data, the data includes a predicted value of the population
parameter. The confidence interval is also called interval estimate.
A confidence interval estimate for the mean starting compensation for
students graduating in Humanities and Social Sciences:
Confidence Level= 95%= 0.9500
= 0.0500
/2= 0.0250
n=50
D.F. = 49
Critical Value: 2.009
S= 7,146
xx = 38,817
Margin of Error (E) = 2,030.29
Confidence Interval= 36,786.71 <40,847.29
A confidence interval estimate for the true standard deviation of starting
compensations for students graduating in Business:
Confidence Level= 99%= 0.9900
= 0.01
/2 = 0.0050
n= 50
D.F. = 49
S = 7,232
xx = 54,537
L= 29.707
R= 76.154
Confidence Interval= 5,801.10< < 9,288.10
A confidence interval estimate for the proportion of all students with
starting compensation over $50,000:
Confidence Level= 80%= 0.8000
= 0.2000
/2= 0.1000
Critical Value: -1.28 and +1.28
pp = 0.426
qp = 0.574
n=149
E= 0.338
Confidence Interval= 0.3922< p < 0.4598

10
Discussing and Interpreting the Confidence Interval Results:
We can say with 95% confidence that the mean starting compensation for students
graduating in Humanities and Social Sciences is between $36,786.71 and
40,847.29. We can also say with 99% confidence that the true standard deviation of
starting compensations for students graduating in Business is between $5,801.10
and $9,288.10. On top of that, we can say with 80% confidence that the proportion
of all students with starting compensation over $50,000 is between 39.22% and
45.98%.
Hypothesis Tests
The purpose and meaning of hypothesis testing is to have a procedure for
testing a claim about a property of a population. Hypothesis testing can also be
called the test of significance.
The claim that students graduating in Education have an average starting
compensation of under $35,000 with using a 0.05 significance level to
test:
H1: =35,000
Ha: <35,000
= 0.05
xx = 40,021
n= 50
s=2,365
Confidence Level= 95%= 0.9500
Critical Value= 1.676
Test Statistic= 39,916.35
There is sufficient evidence to reject the null hypothesis because the average
starting compensation package for graduating students in Education is less than
$35,000.
The claim that the 80% of students graduating with a college degree will
find a starting compensation package valued at over $40,000 with using a
0.01 significance level to test:
H1: p=0.80
Ha: p>0.80
= 0.01
xx = 41,923
n= 269
s=4,938
Confidence Level= 99%= 0.9900
Test Statistic= -32.034
p= 0.80
q= 0.20
pp =0.768571

11
We fail to reject the null hypothesis because there is sufficient evidence to
support the claim that over 80% of graduating students with a college degree will
find a starting compensation package that is valued over $40,000.

Discussing and Interpreting the Hypothesis Testing Results:


The first hypothesis test, there is sufficient evidence to reject the null
hypothesis because the average starting compensation package for graduating
students in Education is less than $35,000. We fail to reject the null hypothesis for
the second hypothesis test, because there is sufficient evidence to support the
claim that over 80% of graduating students with a college degree will find a starting
compensation package that is valued over $40,000.

Reflection:
The conditions for doing an interval estimate and hypothesis test for
population proportion is that the sample observations must be a simple random
sample and must meet the conditions for a binomial distribution. The conditions for
a binomial distribution are that there is a fixed number of independent trials that
have a constant probability and that each trail has two outcomes which the success
category and the failure category. The conditions for doing an interval estimate and
hypothesis test for a population mean when sigma is not known, is that the sample
must be a simple random sample and it either has to be normally distributed or
n>30. The conditions for doing an interval estimate and hypothesis test for sigma
(standard deviation and variance) is that the sample must be a simple random
sample and the population has to be normally distributed even if it is a large
population.
We do not know however, if our data is from a simple random sample or if it
is normally distributed. Therefore, we could have made the error in assuming that
the data given is normally distributed and is a simple random sample when it is
actually not. Although, for finding the population mean, we can assume that it is
normally distributed because n>30 (n=50). Therefore, we can only assume that the
sample of the population mean, met the conditions. The sampling method could be
improved by letting us know ahead of time if the data is a simple random sample
and is normally distributed. We can conclude from this statistical research that, it is
important to challenge claims and statements. When we are able to do the
statistical research, we are able to look more in depth of what is being claimed and
see if we reject the hypothesis or fail to reject the hypothesis. We are also able to
observe the confidence level and confidence interval when we are able to do
statistical research on the data and claims given.

12

Final Reflection:
What have you learned as a result of this project?
I could really use a course in the applications of excel and word. But overall
what I have learned from the data analysis is that career path I have chosen is the
lowest paying. But, I knew this before I started. Im not doing it for the money.
Discuss how the math skills that you applied in this project will impact
other classes you will take in your school career.
Im currently graduating with a degree in Sociology. I have to be able to
analyze social statistics to be an effective Sociologist. (If I continue down that path.)
Its looking more like Social Work now.
Identify specific parts of the project and your own process in completing
the project that may have applications for other classes.
Being able to use the formulas in excel and creating different types of graphs
as visual aids. I find this will be a valuable asset in the future, especially with
PowerPoint presentations.
Discuss how the project helped to develop your problem solving skills.
Being able to ask for outside help in problem solving, trying to teach yourself
how to do anything is always harder than asking questions, and seeking those that
can help.

13
Discuss how this project changed the way you think about real-world math
applications. If your thinking was not changed, then discuss how the
project supported your views about real-world math applications.
This course has been most beneficial; I can see and understand the practical
application of statistics. We see it every day in newspapers, social media, and on TV.
To be able to understand the process of data collection, and analysis of the data
indispensable. I can now spot biased and unreliable data sources, which cuts down
on my frustration and strengthens my arguments, which is all part of being and
effective student.

Vous aimerez peut-être aussi