Vous êtes sur la page 1sur 3

History of Usability Testing

The Dawn of Usability Testing Usability testing as a means of getting feedback or evaluation of products started from in the early 1980s. In 1981, Alphonse Chapanis and colleagues suggest that observing about five to six users reveals most of the problems in a usability test. Wanting a more precise estimate of a sample size than 5-6, Jim Lewis (1982) published the first paper describing how the binomial distribution can be used to model the sample size needed to find usability problems. It is based on the probability of discovering a problem with probability "p" for a given set of tasks and user population given a sample size "n." After 1983 through 1889, there were few research on usability testing or the sample sizes of it.

The Developing Period of Usability Evaluation Starting from 1990 when GUI (Graphic User Interface) of computers got more popular than before, the need for more precision in usability evaluation generates multiple papers which reassuringly propose using the Binomial to model sample sizes. In 1990, Robert Virzi details three experiments at the HFES conference replicating earlier work from Nielsen. His paper explicitly uses the same binomial formula Jim Lewis used 8 years earlier. He later published these findings in more detail in a 1992 Human Factors paper. The two papers state (1) additional subjects are less and less likely to reveal new information, (2) the first 4-5 users find 80% of problems in a usability test, and (3) severe problems are more likely to be detected by the first few users. Wright and Monk (1991) also show how using the Virzi's formula can be used to identify sample sizes in iterative usability testing. Moreover, Jakob Nielsen and Tom Landauer (1993) in a separate set of eleven studies found that a single user or heuristic evaluator on average finds 31% of problems. In 1994, Jim Lewis was sitting in the audience of Robert Virzi's 1990 HFES talk and wondered how severity and frequency could be associated. His 1994 paper confirmed Virzi's first findingthe first few users find most of the problems, partially confirmed the second. His data did not show that severity and frequency are associated. It could be that more severe problems are easier to detect or it could be that it is very difficult to assign severity without being biased by frequency. There has been little published on this topic since then. The Declination Period of Usability Testing From 1995 to 2000, professionals and practitioners did not write much about usability testing and the sample sizes of it due to Dot-Com boom of computer and information technology, although usability testing goes mainstream. However, Nielsen (2000) publishes the widely cited

web-article: "Why you only need to usability test with five users", which summarizes the past decade's research. Its graph comes to be known as the "parabola of optimism." The Debating Period of Usability Evaluation Since the start of the 21st century, there has been much research and a great number of papers on usability testing and the sample sizes of it. At the same time, skepticism builds over the magic number five of usability testing of products. In 2001, Jared Spool & Will Schroeder show that serious problems were still being discovered even after dozens of users, disagreeing with Virzi's but agreeing with Lewis's findings. This was later reiterated by Perfetti and Landesman. Unlike most studies, these authors used open-ended tasks allowing users to freely browse up to four websites looking for unique CD's. Caulton (2001) argues that different types of users will find different problems and suggests including an additional parameter for the number of sub-groups of users. Furthermore, Hertzum and Jacobsen (2001) caution that estimating an average problem frequency from the first few users will be inflated. Lewis (2001) provides a correction for estimating the average problem occurrence from the first 2-4 users. In 2002, Carl Turner, Jim Lewis and Jakob Nielsen respond to criticisms of their usability teasing formula at one of usability evaluation conferences. In the following year of 2002, Laura Faulkner also shows variability in users encountering problems. While on average five users found 85% of problems in her study, some combinations found as few as 55% or as much as 99%. The year, 2003, Dennis Wixon argued the discussion about how many users are needed to find problems is mostly irrelevant and the emphasis should be on fixing problems (RITE method). In the same year of Dennis Wixon, A CHI (Computer Human Interaction) Panel with many of the usual suspects defends and debates the legitimacy of the "Magic Number 5" The Clarification Period of Usability Testing In 2006, In a paper based on the panel at UPA (Usability Professional's Association) four years earlier, Carl Turner, Jim Lewis and Jakob Nielsen review the criticisms of the sample sizes formulas but show how it can and should be legitimately used. Jim Lewis (2006) provides a detailed history of how we find sample sizes using "mostly math, not magic." It includes an explanation of how Spool and Schroeder's results can be explained by estimating the value of p for their study. In 2007, Gitte Lindgaard and Jarinee Chattratichart using CUE-4 data remind us that if you change the tasks you'll find different problems. For the following year of Gitte Lindgaard's research, in response to calls for a better statistical model, Martin Schmettow proposes the beta-binomial to account for the variability in problem frequency but with limited success. In 2010, Sauro wrote an article visually showing how the math in the binomial predicts sample sizes fine--the problem is in how it's often misinterpreted. The article reiterates the important caveats made for the past decades about the magic number 5, (1) you won't know if you've seen

85% of ALL problems, just 85% of the more obvious problems (the ones that affect 31% or more of users), (2) the sample size formula only applies when you test users from the same population performing the same tasks on the same applications, and (3) as a strategy don't try and guess the average problem frequency. Instead, choose a minimum problem frequency you want to detect (p) and the binomial will tell you how many users you need to observe to have a good chance of detecting problems with at least that probability of occurrence.

Vous aimerez peut-être aussi