Académique Documents
Professionnel Documents
Culture Documents
0:04
I was sitting in my office and talking on the phone. In the hallway, I heard some fake chit
chat of a few colleagues. Meanwhile, Marley, an energetic and very open person who I
was talking to on the phone, was very interested in telling me this.
0:21
I have finally persuaded my management to change the implementation of the policy we
talked about. Remember that our ANOVA analysis showed that the underlying factor
had absolutely no effect on our KPIs. I'm so happy that we have found this and that the
plans have changed.
0:40
So why am I telling you this? Well, this is one of the many times that I have realized that
a tool as basic and intuitive as the ANOVA technique can really make a difference,
and make people understand how something works, or how it doesn't work, and use this
knowledge to make better informed decisions.
1:01
In this series of three videos you will learn how to perform such ANOVA analysis.
1:08
ANOVA stands for analysis of variance.
1:12
In this first video I will explain the basics.
1:16
In the second video I will show you how to interpret the outcome. And in the third and last
video I will show you how to validate the outcomes.
1:28
Remember that you saw this diagram before in the introduction video on data analysis.
1:35
This diagram helps you to select the appropriate tool for the correct problem.
1:40
So when do we use ANOVA? We see that you have to look at the type of data.
1:47
And ANOVA is the appropriate statistical technique when the Y variable is numerical and
the X variable is categorical.
1:57
Let me start with an example in which I illustrate when to use the ANOVA technique.
2:03
Imagine you work in a factory producing coffee and you are in charge of the quality
inspections. The raw coffee beans are roasted before they are grounded and
packed. And one of the important quality metrics in the moisture content of the coffee
beans. If there is too much moisture the coffee will rot very quickly. If there is too little
moisture, the coffee will not taste too good. And as a quality inspector, one of the
problems that you have noticed was that this moisture percentage is not always within
specifications.
2:34
How would you solve this problem?
2:38
Well, you start a project and at one point you have an idea. There are four different
machines in the factory that make this coffee. Could it be that the machine influences
moisture contents of the coffee?
2:52
In order to answer this question, you first collect some data. From each machine, you
collect ten batches of coffee, and measure for each batch, the moisture percentage.
3:04
You also record which machine the batch was made on.
3:09
This is the collected data.
3:12
What will be your next step? This is a graph of the data. It shows the moisture percentage
per machine and what do you see? There appears to be a difference between the
machines. More specifically coffee produced on machine 1 shows a higher moisture
content on average at least than coffee produced on machine 2. And this is exactly how
ANOVA will turn out to work. It makes groups according to your categorical influence
factor. And it tests whether these groups have equal means or not.
3:45
Note that we have only measured ten data points per machine. Is this enough to conclude
that there are differences between these machines In the whole population? Suppose
that we take a second sample tomorrow. Do you expect machine one to still give the
highest moisture percentage?
4:04
So, can we generalize these conclusions to the entire population of batches? Well, to
answer this question we will perform a statistical analysis to generalize our findings from
the sample. But which statistical analysis should you use? For that, you have to think in
terms of Y and X variables. The Y variable is the variable you wish to understand or
explain. In a sigma project, this is nearly always your CDQ. In this example it is moisture
percentage.
4:37
Now look at your X variable. That is the variable that potentially influences your Y.
4:43
That here in your example is the machine.
4:46
Now ask yourself, what type of variables are these? Well, your Y variable, the moisture
percentage is numerical. And your X variable, machine Is categorical, so use the
tree diagram to see which analysis method you should use, and that is here ANOVA.
5:05
So, ANOVA is used to study the relationships between a numerical Y, moisture
percentage, and a categorical X variable machine. ANOVA can be performed in three
steps, the first step is to organize your data. The X and Y variables should have their
separate columns. Each row in your data set needs to contain one unit, a batch in our
example. The dataset you saw was not in this format. Each column contained ten
measurements from each machine.
5:39
If you do have a dataset that is in the correct format, you can obviously skip this step.
5:45
In Minitab you can stack your data. In the data menu under Stack, you have the possibility
to stack columns. You entered a columns you want, and the result should look like this.
5:58
We see that in our example we have two columns with 40 units. Each row now contains
a batch that we have measured, and moisture percentage and machine are the variables
in the columns. With data structured like this, we can go to step two and actually perform
the ANOVA. The third step consist of an assumption check because after performing an
ANOVA, we need to verify the reliability of the results. Steps two and three are discussed
in the next two videos.
6:31
Okay, let's summarize. ANOVA is a suitable statistical method if you are comparing
means across various groups. The analysis consists of three steps, organizing your
data, performing the ANOVA, and performing residual analysis just to check if the
assumptions are met.
M5-V2 ANOVA analysis
0:05
Remember, that ANOVA consists of three steps. In the previous video, we talked about
the first step, when to use an ANOVA and how to organize your data. In this video, I will
explain how to perform a basic ANOVA. Reusing the example from the previous video.
And in the next video, I will show you how to validate the conclusions using a residual
analysis.
0:28
Okay, we were studying the moisture percentage of batches of coffee and we were
wondering if this is influenced by the machine the batch is produced on. Therefore, the
moisture percentages is your Y variable or the CTQ in your project. The machine is our
X variable or the influence factor. Moisture is a numerical variable and machine is a
categorical variable. Using our tree diagram, we see that we need to perform an ANOVA
test just to see if machine is a significant influence factor.
1:03
These are the three steps of ANOVA. We will focus on the main analysis step. The goal
is to study if the group means are identical for each level of your X variable. Which means
that we will study if the moisture percentage is equal for each machine.
1:21
This was our collected data for machine. Now, in order to perform this ANOVA step, we
will go, of course, to Minitab. Please pause the video, load your data before continuing.
1:36
This is what your data in minitab should look like. You have Machine 1, 2, 3 and 4 in the
first four columns but the first step of ANOVA is to organize your data. And I have already
done this by stacking my data into the column Moisture and Machine.
1:55
ANOVA is a statistical technique so you can find it under the menu Stat. Under ANOVA
we take the One Way option. Now, Minitab asked you, what is your Response? That's
your Y variable or the CTQ? And that's of course Moisture. Now the next question is,
what's Factor? Well, factor is influence factor or X variable and that's Machine.
2:23
We also go to Graph and we ask for an Individual value plot. You can uncheck the other
plots because we don't need them. Okay, under Options, we uncheck the Assume equal
variances option. When starting this analysis, we do not know whether we can assume
this or not.
2:46
You can see the video test for equal variances for more information on this assumption.
Okay, let's study the output.
2:56
You will get an Individual Value Plot and you will also see that you have quite some output
in your session window.
3:08
The Individual Value Plot shows the measured moisture percentage for each machine.
The blue line connects the group means. On first sight, we see differences in the group
means, as the line is not horizontal.
3:21
Machine 1 and 3 appear to produce coffee with relatively high moisture content. Now that,
you will never get exactly horizontal lines even if the group means are in reality equal.
Especially for small data sets, you should statistically test whether the differences
between the means are real or due to random fluctuations.
3:44
That is why, we look at the statistical analysis in the session window.
3:48
The p-value indicates the chance that such difference occur due to random fluctuation. In
this example, it is 1.2%. As the probability that this difference occurred by chance is so
low, we conclude that the machines differ. In fact, the threshold is often sets to 5% and
the p-value here is below this 5%. This means that the effect is significant and that it
translates to the population. If the p-value would have been bigger that 0.05, it means you
either did not find the real difference or you did not gather enough data to prove it.
4:35
The statistical tests involving p-values are formally called hypothesis tests. For this
ANOVA analysis the hypothesis that all group means are equal is our null hypothesis.
The hypothesis that there is a significant difference between the group means is the
alternative hypothesis. In this case, the low p-value suggests that this is very unlikely that
the group means are equal. Thus, we can reject a null hypothesis and support the
alternative.It is important to note that the p-value only tells you where an influence factor
has a significant effect. This does not mean that it has a large effect. So, we also have to
wonder whether the influence factor is relevant. Let's have a look at how relevant the
effect of machine in our example is. We go back to Minitab.
5:30
To see how relevant and influence factor is, we can take a look at the R-squared. The R-
squared Always lies between zero and 100% and tells you how much of the variation in
your Y variable is explained by your X variable.
5:47
In our example, the R squared is 26% so the influence factor machine accounts for or
explains 26% of the variation in the CTQ moisture percentage.
6:01
Let's have a closer look at the R squared. Our data looked like this and we see that the
machines have some effect on the moisture content. Now consider the circled
measurements in the graph. There is a difference between these two measurements and
it cannot be explained by the machine as they have been produced on the same machine.
6:21
This is part of the other 74% of variation that is not explained by the machines but is as a
result of other factors.
6:31
Consider that this was the measured data.
6:34
What would you think the R-squared would be for this dataset?
6:38
Measurements are closer to the average, meaning that the machine explained more of
the variation. The R-squared will thus be higher, 58%.
6:49
We say that an influence factor is more relevant in the right example than it is in the left
example. The R squared measures the impact of the influence factor on the dependent
variable. If it is larger, we can say the influence factor is more important or relevant. it is
a big fish. If the R squared is small this means the explanatory power of the influence
factor is weak and that vital influence factors are missing in other words, this influence
factor is a small fish. Summarizing, the p-value tells us whether the difference between
the means of the machines are significant, meaning, that these differences are real and
not due to random fluctuations. The R squared tells us how relevant the effect of the
machines is. Always remember that ANOVA consists of three steps. And that before you
can be sure that the conclusions in your second step are valid, you will have to perform
a residual analysis. And, for that, see the next video.
M5-V3 ANOVA residual analysis
0:04
The residual analysis is a vital part of any statistical method. In the previous two videos,
you learned when and how to perform an ANOVA analysis. The P-value we use in a main
analysis is only valid if the assumptions are satisfied. In this video, you will learn how to
validate these assumptions using a residual analysis. Remember that we were wondering
in the moisture content in coffee beans differs between the four machines it can be
produced on. Moisture was our numerical y variable, and machine was our categorical x
variable. We performed an ANOVA analysis and these were the results we obtained. Our
ANOVA analysis gave us a p-value, which shows a statistical significant difference
between the average moisture percentages of the machines, because the p-value is
below 0.05. This difference in the means can also be seen in the individual value plot, as
the line connecting the means is not horizontal. Let's take a look at the R squared. This
shows that the influence factor machine explains 26% of the variation in the moisture
percentage. However, before we can completely trust these conclusions, we have to
validate the assumptions underlying the ANOVA. These checks are called the residual
analysis, and this is the last and final step of your ANOVA. As you probably remember,
ANOVA consists of three steps in total.
To validate the assumptions, we will check if the residuals are normally distributed and if
there are any outliers or other irregularities present.
But what is a residual? Let's take a look at the data to answer this question. Every dot in
the graph is one measurement. We also know the value that we would expect from a
measurement for Machine 1. That is the estimated mean. So there is a difference between
the measurement and our expectation. This difference is not explained by our influence
factor machine. It is left over variation, and this difference is called the residual. The
residuals are calculated by subtracting the expected value from each observation.
In the case of ANOVA, this expected value is the mean output over the relevant machine.
This is our data in a time order. Our categorical variable has four different groups and the
red lines are the group means. Then the residuals will look like this, with the mean of the
residuals equal to zero by construction. Okay, let's go back to our moisture example and
let's perform a residual analysis with Minitab. Now, pause the video, load your data into
Minitab before continuing.
Once you loaded your data into Minitap, this is what your data file would look like. You
have Machine 1 in the first column, Machine 2, Machine 3, and Machine 4. Note that I
already stacked my data into a column Moisture, and Machine. Okay, let's look at our
residual analysis.
We can find this in our ANOVA menu, which was under Stat > ANOVA > One Way.
Well, maybe you still have it there, but otherwise, fill in your response, which is moisture,
and your factor, which is machine. Your residual analysis can be found under the options
graph, and then half way, it ask you for residual plots. If you click on the four in one, you
get all plots once. Furthermore, you can also unclick the interval plot, because we don't
need it. Well, that's it. OK > OK, and then this is your four in one plot. Let's study the four
in one plot. Remember, that we needed to check two things in the residual analysis.
Let's start with the normality assumption. These can be checked in the probability plot.
Are your residuals normally distributed? Yes, they are. Now, let's have a look at the
second assumption That there are no outliers or irregularities in the residuals. To check
this assumption, we take a look at the four and one plot again. But now, we look at the
line graph. We see that there are no outliers or strange patterns present. This means that
this assumption is also satisfied, and that the original analysis is valid.
Let's have a look at another example, and assume that these are our residuals. We see
in the probabiity plot that the residuals are not normally distributed. And in the line plot,
we see outliers in the residuals. This means that if these were your residuals, the
assumptions of the ANOVA are violated. This implies that the conclusion in step two
would not have been valid, or at least they're not very precise. If this is the case, you can
perform a Kruskal-Wallis analysis.
Outliers
In summary, in this series of videos I have explained that the ANOVA is a technique to
test whether a categorical influence factor X has a significant effect on a numerical Y.
After organizing your data in the first step You run the analysis in the second step and
interpret the p value for significance, and the r squared for importance. In the third step,
you will validate your conclusions by checking whether the residuals are normally
distributed. And whether they don't contain any outliers or other strange patterns.
M5-V4 Kruskal-Wallis test
0:04
Let's talk about the Kruskal-Wallis Test. After this video, you will be able to perform this
test and you will be able to interpret the results. But first, let's look at when to use the
Kruskal-Wallis test.
0:18
Our tree diagram shows that it is an alternative to the ANOVA technique. And hence, the
Kruskal-Wallis test is used to analyze the relationship between a numerical CTQ and a
categorical influence factor.
0:34
The Kruskal-Wallis test is a non-parametric test. This means that there is no specific
distribution assumed on the residuals. Remember that the ANOVA was based on the
assumption that the residuals are normally distributed, and that the residuals contain no
irregularities such as outliers. Of course, if you let go of assumptions, the Kruskal-Wallis
test will be less powerful than the ANOVA analysis. This means that you basically need
more data to show the same difference. Okay, now let's take a look at an example.
1:10
Consider a call center. Some of the employees received a training while some of the
others did not. You want to know whether this training has any effect on the total handling
time, that is, the total time it takes an employee to handle an incoming call. You have
measured the total handling time for some calls, and have recorded if the employee
handling the call has received this training. Remember that you should always start by
first identifying your Y variable. That is the total handling time in this example, and it is
numerical. Next, identify your influence factor X. That is training or not, and it is
categorical. Now, the tree diagram will tell you to analyze this relationship between total
handling time in training using an ANOVA analysis. Performing this analysis, you will get
in your third step, this four in one plot.
2:13
As you can see, the residuals are not normally distributed. Furthermore, the time graph
also shows some irregularities, hence we need to apply an alternative analysis technique.
In this case, we will apply the Kruskal-Wallis test.
2:31
The data that is gathered for the call center looks like this. Now, please pause the video,
load this data into Minitab before continuing.
2:44
After loading your data into Minitab, it should look like this, with Total Handling Time in
the first column and Training in the second column. Now I will show you how to perform
a Kruskal-Wallis analysis using Minitab. Of course, as it is a statistical test, you will find it
under the menu Stat. Remember that the Kruskal-Wallis test is a nonparametric test so
we go to the menu Nonparametrics. Here you will find the Kruskal-Wallis analysis.
3:17
Now, Minitab asks us for the response, which is your y or your CDQ. And that's the total
handling time. Next, we have to fill in the factor, which is the training. And okay.
3:33
Now, let's study the Minitab output which you can find in the Session window. The
Kruskal-Wallis analysis is based on medians which are reported. The median handling
time for employees with training is 191 seconds. And the median handling time for
employees without training is 246 seconds.
3:57
So it took people without training a lot more time to handle a phone call than it did for
people that received the training. The difference is nearly a minute.
4:09
The p-value is also given.
4:11
It is lower than 0.05. So the difference in medians that we found is statistically significant.
It can be concluded that the training is an effective method to lower the total handling time
by 55 seconds. Now, let's summarize. You learned to perform a Kruskal-Wallis test. This
test should be performed when the underlying assumptions of the ANOVA analysis are
not met.
4:42
To interpret the output, you have to look at the medians for each group and of course,
add the P-value to see if this difference is statistically significant.
In the figure we see differences between teams in terms of processing times (PT). ANOVA is used to
analyze the difference between the teams ("Team").
Although we observe difference between Teams, the data cannot be used as the group sizes are too
small.
Although we observe differences between Teams, the evidence in the data is not significant,
indicated by the low p-value.
The mean processing times differs significantly between Teams. Furthermore, "Team" is a very
important influence factor as indicated by the R-square.
Correto
Incorreto
0/1 pontos
2.
In the figure we see the productivity for each shift (morning. afternoon. night). ANOVA is used to
analyze the difference in productivity between the shifts.
Although we observe differences between shifts, the evidence in the data is not significant. indicated
by the small -value.
The low -squared suggests that we should have performed another test to compare teams.
Correto
1/1 pontos
3.
Study the output. It contains an ANOVA residual analysis. The ANOVA studied whether the
influence factor Shift (morning, afternoon, night) influences the CTQ Productivity. What is the best
conclusion?
One of the residuals is an outlier, and therefore the ANOVA is not valid.
Correto
The influence factor is not categorical ANOVA is not the appropriate technique.
Correto
1/1 pontos
4.
CTQ: Number of remarks per report
Influence factor: the office handling the report (North. East. West)
We have performed a Kruskal-Wallis test to see if the number of remarks differs across offices.
Correto
Incorreto
0/1 pontos
5.
We have performed a 2-sample t-test on the amount of e-mails somebody receives on weekdays
and weekend-days. Refer to the output.
There is no evidence for the difference in means being more than 0 (zero).
Incorreto
0/1 pontos
6.
CTQ: productivity.
Consider the output. Next you wish to study the difference in location between the shifts.
TESTE 2
1.
In the figure we see differences between teams in terms of processing times (PT). ANOVA is used to
analyze the difference between the teams ("Team").
The mean processing times differs significantly between Teams. Furthermore, "Team" is a very
important influence factor as indicated by the R-square.
Although we observe differences between Teams, the evidence in the data is not significant,
indicated by the low p-value.
Although we observe difference between Teams, the data cannot be used as the group sizes are too
small.
Incorreto
0/1 pontos
2.
In the figure we see the productivity for each shift (morning. afternoon. night). ANOVA is used to
analyze the difference in productivity between the shifts.
The low -squared suggests that we should have performed another test to compare teams.
Although we observe differences between shifts, the evidence in the data is not significant. indicated
by the small -value.
Correto
1/1 pontos
3.
Study the output. It contains an ANOVA residual analysis. The ANOVA studied whether the
influence factor Shift (morning, afternoon, night) influences the CTQ Productivity. What is the best
conclusion?
The influence factor is not categorical ANOVA is not the appropriate technique.
Correto
One of the residuals is an outlier, and therefore the ANOVA is not valid.
Correto
1/1 pontos
4.
CTQ: Number of remarks per report
Influence factor: the office handling the report (North. East. West)
We have performed a Kruskal-Wallis test to see if the number of remarks differs across offices.
Correto
Incorreto
0/1 pontos
5.
We have performed a 2-sample t-test on the amount of e-mails somebody receives on weekdays
and weekend-days. Refer to the output.
There is no evidence for the difference in means being more than 0 (zero).
Incorreto
0/1 pontos
6.
CTQ: productivity.
Consider the output. Next you wish to study the difference in location between the shifts.
TESTE 3
1.
In the figure we see differences between teams in terms of processing times (PT). ANOVA is used to
analyze the difference between the teams ("Team").
Although we observe difference between Teams, the data cannot be used as the group sizes are too
small.
The mean processing times differs significantly between Teams. Furthermore, "Team" is a very
important influence factor as indicated by the R-square.
Correto
Although we observe differences between Teams, the evidence in the data is not significant,
indicated by the low p-value.
Incorreto
0/1 pontos
2.
In the figure we see the productivity for each shift (morning. afternoon. night). ANOVA is used to
analyze the difference in productivity between the shifts.
Although we observe differences between shifts, the evidence in the data is not significant. indicated
by the small -value.
The low -squared suggests that we should have performed another test to compare teams.
Correto
1/1 pontos
3.
Study the output. It contains an ANOVA residual analysis. The ANOVA studied whether the
influence factor Shift (morning, afternoon, night) influences the CTQ Productivity. What is the best
conclusion?
The residuals are clearly bimodal, so the ANOVA is not valid.
Correto
The influence factor is not categorical ANOVA is not the appropriate technique.
One of the residuals is an outlier, and therefore the ANOVA is not valid.
Correto
1/1 pontos
4.
CTQ: Number of remarks per report
Influence factor: the office handling the report (North. East. West)
We have performed a Kruskal-Wallis test to see if the number of remarks differs across offices.
Correto
Correto
1/1 pontos
5.
We have performed a 2-sample t-test on the amount of e-mails somebody receives on weekdays
and weekend-days. Refer to the output.
There is no evidence for the difference in means being more than 0 (zero).
Correto
Correto
1/1 pontos
6.
CTQ: productivity.
Consider the output. Next you wish to study the difference in location between the shifts.
Teste final 1
1.
It appears that the size of the batch affects the scrap percentage.
Which variable is likely to be the CTQ and which variable the influence factor?
Batch size is the CTQ and scrap percentage the influence factor.
Batch size is the influence factor and scrap percentage the CTQ.
Correto
Correto
1/1 pontos
2.
You want to know whether the Throughput Time is influenced by the day of the week that the
request is submitted.
Which test of the following four options is appropriate for this purpose?
ANOVA.
Correto
Regression.
Logistic regression.
Chi-square.
Incorreto
0/1 pontos
3.
You obtain a p-value equal to 0.01.
Correto
1/1 pontos
4.
This scatterplot implies that cheese consumption per capita (Cheese consumption) is related with
the number of people who died by becoming tangled in their bedsheets (Bedsheet tangling).
Many people who die from tangling's have first eaten a lot of cheese.
Correto
Incorreto
0/1 pontos
5.
In the figure we see the productivity for each shift (morning, afternoon, night). ANOVA is used to
analyze the difference in productivity between the shifts.
The low -squared suggests that we should have performed another test to compare shifts.
Although we observe small differences between shifts, the evidence in the data is not significant,
indicated by the -square that is larger than 5%.
Although we observe small differences between shifts, the evidence in the data is not significant,
indicated by the -value that is larger than 0.05.
Incorreto
0/1 pontos
6.
In the figure we see the productivity for each machine (1 through 5). ANOVA is used to analyze the
effect of machines on the productivity
The mean productivity differs significantly between machines. However, besides the machine, other
influence factors are also important as indicated by the R-square.
Although we observe differences between machines, the evidence in the data is not significant,
indicated by the small -value.
The mean productivity differs significantly between machines. Furthermore, machine is a very
important influence factor as indicated by the high R-square.
Although we observe differences between machines, the evidence in the data is not significant,
indicated the low -square.
Correto
1/1 pontos
7.
The output is a residual analysis corresponding to an ANOVA, which studies the effect of shift
(morning, afternoon, night) on the CTQ: Productivity. What is a valid conclusion based on the shown
output?
The influence factor is not categorical, therefore ANOVA is not the appropriate test.
The residuals are clearly nonnormal, so the ANOVA is valid.
The residuals show an outlier, therefore the results of the ANOVA (p-value, R-squared) are not
reliable.
Correto
Correto
1/1 pontos
8.
Influence factor: the office handling the report (North, East, West)
We have performed a Kruskal-Wallis test to see if the number of remarks differs across offices.
There is insufficient evidence that the medians of the groups are different.
Correto
Correto
1/1 pontos
9.
We have performed a 2-sample t-test on the amount of e-mails somebody receives on week-days
and weekend-days.
There is no evidence for the difference in means being more than 0 (zero).
Correto
Incorreto
0/1 pontos
10.
CTQ: Productivity.
You performed an equal variances test on the data from which the output is given.
In the next step you want to study the difference in average productivity between the three shifts. To
this end, which analysis should follow?
Perform another test for equal variances not assuming equal variance.
Perform an ANOVA assuming equal variances.
Teste final 2
It appears that the size of the batch affects the scrap percentage.
Which variable is likely to be the CTQ and which variable the influence factor?
Batch size is the CTQ and scrap percentage the influence factor.
Batch size is the influence factor and scrap percentage the CTQ.
Correto
Correto
1/1 pontos
2.
You want to know whether the Throughput Time is influenced by the day of the week that the
request is submitted.
Which test of the following four options is appropriate for this purpose?
Logistic regression.
ANOVA.
Correto
Regression.
Chi-square.
Correto
1/1 pontos
3.
You obtain a p-value equal to 0.01.
Correto
Correto
1/1 pontos
4.
This scatterplot implies that cheese consumption per capita (Cheese consumption) is related with
the number of people who died by becoming tangled in their bedsheets (Bedsheet tangling).
Many people who die from tangling's have first eaten a lot of cheese.
Correto
Incorreto
0/1 pontos
5.
In the figure we see the productivity for each shift (morning, afternoon, night). ANOVA is used to
analyze the difference in productivity between the shifts.
The low -squared suggests that we should have performed another test to compare shifts.
Although we observe small differences between shifts, the evidence in the data is not significant,
indicated by the -square that is larger than 5%.
The mean productivity differs significantly between shifts.
Although we observe small differences between shifts, the evidence in the data is not significant,
indicated by the -value that is larger than 0.05.
Incorreto
0/1 pontos
6.
In the figure we see the productivity for each machine (1 through 5). ANOVA is used to analyze the
effect of machines on the productivity
The mean productivity differs significantly between machines. However, besides the machine, other
influence factors are also important as indicated by the R-square.
Although we observe differences between machines, the evidence in the data is not significant,
indicated by the small -value.
The mean productivity differs significantly between machines. Furthermore, machine is a very
important influence factor as indicated by the high R-square.
Although we observe differences between machines, the evidence in the data is not significant,
indicated the low -square.
Correto
1/1 pontos
7.
The output is a residual analysis corresponding to an ANOVA, which studies the effect of shift
(morning, afternoon, night) on the CTQ: Productivity. What is a valid conclusion based on the shown
output?
The residuals show an outlier, therefore the results of the ANOVA (p-value, R-squared) are not
reliable.
Correto
The influence factor is not categorical, therefore ANOVA is not the appropriate test.
Correto
1/1 pontos
8.
CTQ: Number of remarks per report
Influence factor: the office handling the report (North, East, West)
We have performed a Kruskal-Wallis test to see if the number of remarks differs across offices.
Correto
There is insufficient evidence that the medians of the groups are different.
Correto
1/1 pontos
9.
We have performed a 2-sample t-test on the amount of e-mails somebody receives on week-days
and weekend-days.
Correto
There is no evidence for the difference in means being more than 0 (zero).
Incorreto
0/1 pontos
10.
CTQ: Productivity.
You performed an equal variances test on the data from which the output is given.
In the next step you want to study the difference in average productivity between the three shifts. To
this end, which analysis should follow?
Perform an ANOVA not assuming equal variances.
Perform another test for equal variances not assuming equal variance.
Teste final 3
1.
It appears that the size of the batch affects the scrap percentage.
Which variable is likely to be the CTQ and which variable the influence factor?
Batch size is the CTQ and scrap percentage the influence factor.
Batch size is the influence factor and scrap percentage the CTQ.
Correto
Correto
1/1 pontos
2.
You want to know whether the Throughput Time is influenced by the day of the week that the
request is submitted.
Which test of the following four options is appropriate for this purpose?
ANOVA.
Correto
Regression.
Logistic regression.
Chi-square.
Correto
1/1 pontos
3.
You obtain a p-value equal to 0.01.
Correto
Correto
1/1 pontos
4.
This scatterplot implies that cheese consumption per capita (Cheese consumption) is related with
the number of people who died by becoming tangled in their bedsheets (Bedsheet tangling).
Correto
Many people who die from tangling's have first eaten a lot of cheese.
5.
In the figure we see the productivity for each shift (morning, afternoon, night). ANOVA is used to
analyze the difference in productivity between the shifts.
Although we observe small differences between shifts, the evidence in the data is not significant,
indicated by the -value that is larger than 0.05.
Correto
The low -squared suggests that we should have performed another test to compare shifts.
Correto
1/1 pontos
6.
In the figure we see the productivity for each machine (1 through 5). ANOVA is used to analyze the
effect of machines on the productivity
The mean productivity differs significantly between machines. However, besides the machine, other
influence factors are also important as indicated by the R-square.
Correto
Although we observe differences between machines, the evidence in the data is not significant,
indicated by the small -value.
Although we observe differences between machines, the evidence in the data is not significant,
indicated the low -square.
The mean productivity differs significantly between machines. Furthermore, machine is a very
important influence factor as indicated by the high R-square.
Correto
1/1 pontos
7.
The output is a residual analysis corresponding to an ANOVA, which studies the effect of shift
(morning, afternoon, night) on the CTQ: Productivity. What is a valid conclusion based on the shown
output?
The residuals are clearly nonnormal, so the ANOVA is valid.
The influence factor is not categorical, therefore ANOVA is not the appropriate test.
The residuals show an outlier, therefore the results of the ANOVA (p-value, R-squared) are not
reliable.
Correto
Correto
1/1 pontos
8.
CTQ: Number of remarks per report
Influence factor: the office handling the report (North, East, West)
We have performed a Kruskal-Wallis test to see if the number of remarks differs across offices.
There is insufficient evidence that the medians of the groups are different.
Correto
Correto
1/1 pontos
9.
We have performed a 2-sample t-test on the amount of e-mails somebody receives on week-days
and weekend-days.
There is no evidence for the difference in means being more than 0 (zero).
Correto
The standard deviations differ significantly.
Correto
1/1 pontos
10.
CTQ: Productivity.
You performed an equal variances test on the data from which the output is given.
In the next step you want to study the difference in average productivity between the three shifts. To
this end, which analysis should follow?
Perform another test for equal variances not assuming equal variance.
Correto