Vous êtes sur la page 1sur 5

Statistical theory for preference tests

Statistical tests

A statistical test is a formal procedure, which allows a sensory researcher to make an


objective decision about the question of a sensory evaluation on the basis of the
experimental data collected.

For example, in a simple paired test, it is needed to test whether two products or two
different formulations of the same product are truly perceived by people as being different.

The sensory specialist wants to check in this test whether he/she can prove that the
difference is perceptible by a certain population group (or consumer group).
The experimental data are collected and analysed.
Two possible conclusions could be made:

The difference is perceptible or


There is no perceptible difference between two products.

You need to appreciate that just because the sensory specialist knows that there are some
physical and chemical differences between the two products, those differences may not
necessarily lead to a perceptible difference in taste. So, it could be that there is truly no
difference in taste.

On the other hand, people are responding differently, and it could be that a random sample
of participants selected for the sensory panel was not large enough to pick up confidently a
true but small difference in taste. (We are not considering here a situation of a biased
selection, when panellist were, for example, not familiar with the products and could not
thus differentiate well between them anyway)

In summary, a decision matrix for that test would be


Any statistical test operates on a certain level of significance, or sufficiency, of experimental
results. This could be given by the magnitude of detectable (perceptible) difference
between the products called a critical value of the test. Or, this could be reported by the
probability of false positive, detecting a difference when there is none. This probability
level is called the p-value (or p-level) of the test. The p-value is compared to a pre-assigned
probability level of false positive called the level of significance (denoted often with the Greek
letter ).

Testing rules

All testing rules can be expressed in terms of critical values of significance levels.

Critical value testing rule

If (a certain summary from the test < critical value of the test), ignore Type I error and report that
there is a true perceptible difference.

P-value testing rule

If (the p-value of the test < significance level of the test), ignore Type I error and report that there is
a true perceptible difference.

A level of significance chosen does not depend on the direction of the question. It rather depends on
the stage of the sensory evaluation in product development. The risk of Type I error one would
tolerate can be quite high at early stages (10%) but would generally be low at later stages (1%). In
this course, we will use a conventional 5% level of significance (=5%).

Notice however that a critical value, which is also governed by the level of significance, may change
depending on our expectations of the panel response if there is a true difference between samples.
If it is expected that the number of correct responses increases as the difference increase, the test is
called one-side or directional or unilateral; if the number of correct responses is expected to be
either very large or very small, a corresponding test is two-side or non-directional or bilateral.

Statistical theory for paired preference tests

Today, we will only consider a simple paired preference test. The intensity rating for difference
between products and the rating for preference will be considered in Session 5. The intensity ranking
and ranking for preference will be considered in Session 6.

The paired preference test is a forced choice test, which asks an evaluator to choose between two
products. It can be that the sensory specialist has some expectations about which particular product
is ought to be preferred more often. In this case, the test will be conducted in a unilateral or one-
side way. Or, it can be that the sensory specialist does not expect any certain winner between the
two products tested and just wants to see if there is any difference in preference between them. In
that case, the test will be conducted in a bilateral or two-side way.

The statistical logic of the test is similar to that of the duo-trio difference test, when a panellist has
to choose between two samples the one which is similar to the reference sample.
In the paired preference test, a random guesser could pick up one product in favour to the other
with the probability of 1/2 = 0.5 (50%). Therefore, if in a sensory panel the number of panellists
favouring one particular product is much larger than the half of the panel, we tend to conclude that
that product would be more preferred by consumers (who are represented by the panel) in general.
In the analysis, we will change the procedure accordingly to the following two expectations on the
results: (1) it is expected that one particular product should be preferred and (2) no particular
preference is assumed before the test.

Say, we are conducting a paired preference test to conclude whether Product A is preferred to
Product B. We had done some taste modification to Product A, and our preliminary small-scale
sensory trials also suggested that Product A has better taste in comparison to Product B. With all this
in mind, we would expect the majority of the panel choose Product A. However, if the achieved
difference in taste is not very strong on average or if our definition of taste improvement is different
to that of the consumers, we have to allow the hypothesis of random guess to be true and would
expect that only about 50% of the panel choose Product A.

This is a one-tail situation. For the analysis, we are only counting the preference cases for A, and we
expect a count larger than the half of the panel. The critical value for this test is from a binomial
table at any specified level of significance for any specified size of panel. The p-value of the test is
from the binomial distribution with the corresponding parameters: binomial (number of successes =
number of preference for Product A, number of trials = size of the panel, probability of success =
0.5).

From Table 3 in the Sensory Evaluation manual, you can see, for example, that at least 20 panellists
preferring Product A are needed to conclude that the preference is significant at the 5% level for a
panel of 30 people.

If we do not have any expectation about which product is preferred, the test is conducted in
a two-side way. We are simply counting the number of preferences for each of the two
products, and then compare the largest count to the critical value from a corresponding table
(Table 4 in the Sensory Evaluation Manual).

For the same two products, if we did not speculate in advance whether Product A or Product B is to
be preferred, the critical value should be taken from Table 4. For example, for a panel of 30 people,
the critical value at the 5% significance is 21 people preferring one of the two products.

This rule stays true in general. At the same level of significance, a critical value for a one-side test is
smaller than the critical value for the two-side test.

For a paired preference test, in which Product A was expected to be favourite, and 19 people out of
30 preferred Product A:

The p-value is larger than 5%. There is not enough evidence to suggest that Product A is preferred to
Product B.
For a two-side paired preference test in which one product was preferred by 19 out of 30 panellists:

The p-value is larger than 0.05. There is not enough evidence to suggest that one product is
preferred to the other.

Vous aimerez peut-être aussi