Académique Documents
Professionnel Documents
Culture Documents
7
Paired t-Test
Course Code: IE 301 Program: BSIE
Course Title: Advanced Statistics for IE Date Performed:Sep. 12, 2015
Section:IE31FB3 Date Submitted:Sep. 12, 2015
Members:Madrigal, Dominic R. Instructor: Engr. Rica Navarro
Cruz II, Robert D.
1. Objective(s):
The activity aims to introduce basic ideas of power and sample size calculations for 2-sample t-Test.
2. Intended Learning Outcomes (ILOs):
The students shall be able to:
2.1 test for a difference between two population means using a 2-sample t-test
2.2 determine the sample size required to detect an effect of a given size with a given degree of
confidence.
3. Discussion:
A paired t-Test helps determine whether the mean differences between paired observations is significant.
Statistically, the paired t-test is equivalent to performing a 1-sample t-test on the differences. A paired t-
test also helps you to evaluate whether the mean difference is equal to a specific value.
32
assumption, provided the observations are collected randomly and the data are continuous, unimodal,
and reasonably symmetric.
33
4. Click Graphs
5. Check Individual value plot and Boxplots of differences.
6. Click OK in each dialog box
7. Interpret the results
8. Draw conclusions.
Part 3: Checking for Normality: the paired t-test is actually a 1-sample t-test on the pair wise difference.
Therefore, the pair wise differences must satisfy the 1-sample t-test assumptions, including normality.
34
Before checking for normality, store the pair wise differences in the worksheet.
1. Choose Stat ►Basic Statistics ►2 Variances
2. Complete the dialog box as shown below.
3. Click OK.
4. Interpret the results.
5. Draw conclusions.
6. Data and Results:
Part 1:
Boxplot of Differences shows the mean is within the hinges of the box while the Individual Value Plot of Differences
shows the most of the red dots is within the mean, therefore, there is no mean difference between paired observations in the
population.
35
7. Data Analysis and Conclusion:
Part 2:
The two graphs shows that the data points approximately follow the straight line, therefore, the two products has no
Difference.
Part 3:
The individual value plot shows the mean of SupplrA increases than SupplrB. In histogram, SupplrA has a normal shape
while SupplrB is left-skewed. The individual plot of SupplrA and SupplrB do not overlap and the boxplot shows the medians are
close, therefore, the mean probability is significant.
36
8. Assessment (Rubric for Laboratory Performance):
TIP-VPAA–054D
Revision Status/Date:0/2009 September 09
Safety Members do not follow Members follow safety Members follow safety
Precautions safety precautions. precautions most of the precautions at all
time. times.
Work Habits
Time Members do not finish Members finish on time Members finish ahead
Management/ on time with incomplete with incomplete data. of time with complete
Conduct of data. data and time to revise
Experiment data.
Cooperative Members do not know Members have defined Members are on tasks
and their tasks and have no responsibilities most of and have
Teamwork defined responsibilities. the time. Group responsibilities at all
Group conflicts have to conflicts are times. Group conflicts
be settled by the cooperatively managed are cooperatively
teacher. most of the time. managed at all times.
Neatness and Messy workplace during Clean and orderly Clean and orderly
Orderliness and after the workplace with workplace at all times
experiment. occasional mess during during and after the
and after the experiment.
experiment.
Ability to do Members require Members require Members do not need
independent supervision by the occasional supervision to be supervised by the
work teacher. by the teacher. teacher.
Other Comments/Observations:
TOTAL SCORE
RATING=
x 100%
37
Laboratory Exercise No.8
Correlation
Course Code:IE 301 Program:BSIE
Course Title:Advanced Statistics for IE Date Performed:Sep. 12, 2015
Section:IE31FB3 Date Submitted:Sep. 12, 2015
Members:Madrigal, Dominic R. Instructor:Engr. Rica Navarro
Cruz II, Robert D.
The sample correlation coefficient , r, measures the degree of linear association between two variables
(the degree to which one variable changes with another). A positive correlation indicates that both
variables tend to increase or decrease together. A negative correlation indicates that, as one variable
increases, the other tends to decrease.
Use correlation when you have data for two continuous variables and wish to determine whether a linear
relationship exists between them. The correlation does not tell you whether the variables are related in a
non linear fashion.
Some statisticians argue that correlation should not be used if one variable is a dependent response of the
other.
38
1. Are two variables related in a linear manner?
2. What is the strength of the relationship?
Example
A. Is there a linear relationship between dollars spent on training and customer satisfaction ratings?
B. What is the relationship between revenue and the number of sales calls made?
Additional Considerations
Correlation quantifies the degree of linear association between two variables.
A strong correlation does not imply a cause-and-effect relationship. For example, a strong correlation
between two variables may be due to the influence of a third variable not under consideration.
A correlation coefficient close to zero does not necessarily mean no association. The variables may have a
nonlinear association. Always plot the data so that you can identify nonlinear relationships when they are
present.
Some statisticians argue that correlation should not be used if one variable is a dependent response of
the other.
Correlation assumes that the values of both variables are free to vary. Correlation is not appropriate if you
fix the values of one variable to study changes in another.
4. Resources:
MiniTab Software/Manual
Training Data Sets
Textbooks
5. Procedure:
Practice Problem: The sales department for a software company wants to determine whether a relationship
exists between the number of sales calls made and the revenue earned. Analysts record the number of
sales calls and the revenue earned each day for a period of 420 days.
Variable Description
39
Revenue Daily Revenue in thousands of dollars, rounded to the nearest dollar
Sales Calls Number of sales calls made each day.
Part 1:
1. Open SoftRev1.MPJ
2. Choose Graph ►Scatterplot
3. Choose Simple, then click OK
4. Complete the dialog box as shown below.
5. Click OK.
6. Interpret the results
40
12. Complete the dialog box as shown below.
13. Click OK
14. Interpret the results
15. Draw conclusions.
6. Data and Results:
41
The plot shows the data values of variable x and y. As the sales calls increases, the
revenue also increases.
Therefore they are directly proportional and the correlation is postive.
The correlation is equal to 0.802. The relationship between revenue to sales calls is directly proportional but the amount
is not consistent.
42
8. Assessment (Rubric for Laboratory Performance):
TIP-VPAA–054D
Revision Status/Date:0/2009 September 09
Safety Members do not follow Members follow safety Members follow safety
Precautions safety precautions. precautions most of the precautions at all
time. times.
Work Habits
Time Members do not finish Members finish on time Members finish ahead
Management/ on time with incomplete with incomplete data. of time with complete
Conduct of data. data and time to revise
Experiment data.
Cooperative Members do not know Members have defined Members are on tasks
and their tasks and have no responsibilities most of and have
Teamwork defined responsibilities. the time. Group responsibilities at all
Group conflicts have to conflicts are times. Group conflicts
be settled by the cooperatively managed are cooperatively
teacher. most of the time. managed at all times.
Neatness and Messy workplace during Clean and orderly Clean and orderly
Orderliness and after the workplace with workplace at all times
experiment. occasional mess during during and after the
and after the experiment.
experiment.
Ability to do Members require Members require Members do not need
independent supervision by the occasional supervision to be supervised by the
work teacher. by the teacher. teacher.
Other Comments/Observations:
TOTAL SCORE
RATING=
x 100%
43
Laboratory Exercise No.9
Simple Linear Regression
Course Code:IE 301 Program:BSIE
Course Title:Advanced Statistics for IE Date Performed:Sep. 12, 2015
Section:IE31FB3 Date Submitted:Sep. 12, 2015
Members:Madrigal, Dominic R. Instructor:Engr. Rica Navarro
Cruz II, Robert D.
1. Objective(s):
The activity aims to measure the degree of linear association between two variables using graphs and
correlation
Model the relationship between a continuous response variable and one or more predictor variables.
2. Intended Learning Outcomes (ILOs):
The students shall be able to:
2.1 Evaluate the linear relationship between two variables using scatterplot, correlation, and fitted line plot.
2.2 Analyze and interpret results and draw conclusions about the output provided by Minitab.
3. Discussion:
Simple Linear Regression examines the relationship between a continuos response variable (y) and one
predictor variable (x) . The general equation for a simple linear regression model is:
Y O 1
Where Y is the response, X is the predictor, O is the intercept (the value of Y when X equals zero), 1 is
Use simple linear regression when you have a continuos y and one predictor , x. The following conditions
44
should also be met:
1. X can be ordinal or continuos
2. In theory, x should be fixed by the investigator. In practice, however, it is often allowed to vary.
3. Any random variation in the measurement of x is assumed to be negligible compared to the range
in which x is measured.
The y-values obtained in your sample differ from those predicted by the regression model (unless all points
happen to fall on a perfectly straight ine). These differences are called residuals.
To confirm that the analysis is valid, verify all assumptions about the model error term. Use residual plots to
check that the errors have the following characteristics:
1. Normally distributed
2. Constant variance for all fitted values
3. Random over time
Simple Linear Regression can help answer the following questions such as
1. How important is x in predicting y?
2. What value can you expect for y when x is 5?
3. How much does y change if x increases by one unit?
For example,
Is the number of mistakes made in processing loans related to cycle time?
What salary can you expect to make with five years experience in a particular field?
How much does salary increase for every additional year of experience?
S is an estimate of the average variability about the regression line. S is the positive square root of the
mean square error (MSE). For a given problem, the better the equation predicts the response, the lower S
is.
2
R (R Sq )
R 2 is the proportion of variability in the response that is explained by the equation. Acceptable values for
2
R vary depending on the study. For example for engineers studying chemical reactions may require an
R 2 of 90% or more. However, someone studying human behavior ( which is more variable) may be
satisfied with much lower R 2 values.
45
2
R adjusted (R q (adj))
S
2
R adjusted is sensitive to the number of terms in the model and is important when comparing models
with different number of terms.
Prediction bands provide the estimated range in which a single new observation for a given value of the
predictor is expected to fall.
Analysts want to be confident that the mean and the individual points of the y-variable, Revenue, fall within
certain limits of variability.
Use the default confidence level of 95%
Confidence Interval
The 95% confidence interval defines a likely range of values for the population mean of y. For any given
value of x, you can be % confident that the population mean for y is between the indicated lines.
Prediction interval
The 95% prediction interval defines a likely range of y values for future individual observations. For any
given value of x, you can be 95% confident that the corresponding value of y for a single future observation
is between the indicated lines.
Note : The prediction interval is always wider than the confidence interval because of the added uncertainty
46
involved in predicting a single response versus the mean response.
Residuals
The residuals for each observation is the difference between the observed value of the response and the
value predicted by the model ( the fitted value). For example, if the observed response value is 12 and the
model predicts 10, the residual is 2.
Assumptions
1. To confirm that the analysis is valid. Verify all assumptions about the model error term. Use residual
plots to check that the errors have the following characteristics.
2. Normally distributed
3. Constant variance for all fitted values
4. Random over time
Histogram
Use the normal probability plot to make decisions about the normality of the residuals. With a reasonably
large sample size, The histogram displays compatible information with the normal probability plot
The histogram of the residuals should appear approximately bell-shaped with no unusual values or outliers.
Use the histogram as an exploratory tool to learn about the following characteristics of the data.
-Typical values, spread or variation, and shape
-Unusual values in the data
47
Curvilinear A quadratic term may be missing from the model
Fanning or uneven spread Non constant variance of the residuals
Of residuals across the different fitted values
Additional Considerations
1. Be careful when using regression analysis to assert that changes in the predictor values were fixed
at predetermined levels in a controlled experiment. If the values of the predictors are allowed to
vary randomly, other factors may influence both the predictors and the response.
2. Do not apply regression results to values of x that are outside the sample range. The relationship
between Sales calls and Revenue may be very different for sales calls above 168.
3. Be alert for outliers when using regression procedures. Some outliers (called high leverage points)
have a large effect on the calculation of the least squares regression line. In such cases, the line
may no longer represent the rest of the data very well.
4. Time order trends in the data can violate the assumption of independence,. A run chart or individual
chart is a useful tool for detecting such efforts.
4. Resources:
48
MiniTab Software/Manual
Training Data Sets
Textbooks
5. Procedure:
Practice Problem: The sales department for a software company wants to determine whether a relationship
exists between the number of sales calls made and the revenue earned. Analysts record the number of
sales calls and the revenue earned each day for a period of 420 days.Determine the effect of Sales calls on
Revenue. Use fitted line plot to calculate and plot the regression equation.
Variable Description
Revenue Daily Revenue in thousands of dollars, rounded to the nearest dollar
Sales Calls Number of sales calls made each day.
4. Click OK.
49
5. Interpret the results.
6. Evaluate the results using the ANOVA results to evaluate whether the simple regression model is
useful for predicting revenue. State Hypothesis
7. Interpret the p-value (P) .
8. Make a conclusion.
4. Click OK
5. Click Graphs
6. Complete the dialog box shown below
50
7. Click OK in each dialog box.
8. Interpret Results
5. Normal Probability Plot
6. Histogram
7. Residual versus fits
8. Residual versus order
9. Make conclusions
51
7. Data Analysis and Conclusion:
Part 1:
Ho = U1 = U2=...=U420
Ha = some means are differrent
The p-value is equal to 0.000 and is less than to 0.05. We reject Ho. Therefore, the means of the number of sales calls
and the revenue earned are different.
Part 2:
Fitted Line Plot – we can see that as the sales calls increases, so does the revenue.
Normal Probability Plot – the data points follow the straight line and the p-value is less than 0.05, therefeore, the normal
distribution appears to fit the sample data.
Versus Order – shows that the data are correlated with each other.
52
8. Assessment (Rubric for Laboratory Performance):
TIP-VPAA–054D
Revision Status/Date:0/2009 September 09
Safety Members do not follow Members follow safety Members follow safety
Precautions safety precautions. precautions most of the precautions at all
time. times.
Work Habits
Time Members do not finish Members finish on time Members finish ahead
Management/ on time with incomplete with incomplete data. of time with complete
Conduct of data. data and time to revise
Experiment data.
Cooperative Members do not know Members have defined Members are on tasks
and their tasks and have no responsibilities most of and have
Teamwork defined responsibilities. the time. Group responsibilities at all
Group conflicts have to conflicts are times. Group conflicts
be settled by the cooperatively managed are cooperatively
teacher. most of the time. managed at all times.
Neatness and Messy workplace during Clean and orderly Clean and orderly
Orderliness and after the workplace with workplace at all times
experiment. occasional mess during during and after the
and after the experiment.
experiment.
Ability to do Members require Members require Members do not need
independent supervision by the occasional supervision to be supervised by the
work teacher. by the teacher. teacher.
Other Comments/Observations:
TOTAL SCORE
RATING=
x 100%
53