Vous êtes sur la page 1sur 6

This article is provided courtesy of STQE, the software testing and quality engineering magazine.

Tools & Automation

Three Web load testing blunders, and how to avoid them


by Alberto Savoia

TRADE

se c
FROM A WEB TESTING EXPERT
QUICK LOOK
re t
■ Why concurrent users is
s
a misleading metric
■ The impact of user abandonment
■ Accurately analyzing results

L
oad testing has rapidly become one of
the top QA priorities for companies with
mission-critical Web sites. How many
users will your site be able to serve while
still maintaining acceptable response times?
That’s an indispensable piece of information for planning marketing cam-
paigns, estimating IT budgets, and basic delivery of service. And yet practi-
cally all Web site load tests are seriously flawed—because they all seem to
make mistakes that have a huge impact on the accuracy of the test and the
reliability of the results. Let’s look at three of the biggest and most common
Web load testing blunders, and how to avoid them.
54
www.stqemagazine.com STQE May/June 2001
This article is provided courtesy of STQE, the software testing and quality engineering magazine.

seconds to respond to each page re-


1. Misunderstanding quest, Alan’s, Betty’s, and Chris’s ses-
As Figure 1 illustrates, the num-
ber of concurrent users is not a mea-
Concurrent Users sions will not overlap. Alan will have sure of load. The load was identical in
The first blunder centers on the wide- received and read his four pages be- all three cases: three users, starting a
spread use of the concept of concur- fore Betty’s session starts, and Betty minute apart, viewing four pages each
rent users to quantify loads and de- will be done with her session before with 10 seconds of think time per
sign load tests. I find it amazing that Chris starts his (see Figure 1). In this page. The number of concurrent users
concurrent users is the prevailing case, it’s accurate to say that Chris’s, was a result: a measure of the Web
metric when it comes to describing a Betty’s, and Alan’s sessions are not site’s ability to handle a specific load.
Web load, because the approach is concurrent. A slower Web site resulted in more
riddled with obvious problems. So concurrent users.
many problems, in fact, that I could When it comes to measuring Web
probably write a book about it—but The number of site scalability, it turns out that the
number of concurrent users is not
fortunately for you, the space con-
even that useful as an output result of
straints of this article force me to be
concise.
concurrent users load testing. If the Web site is some-
In that spirit, I’ll focus on the what slow, the number of concurrent
main problem with concurrent users: shouldn’t be seen as users increases. If it’s really slow, a
the number of concurrent users lot of real users will abandon it, thus
shouldn’t be seen as input for a load
test run, but as the result of a number
input for a load test reducing the number of concurrent
users. (More on user abandonment in
of factors. And yet whenever you read the next section.) But, on the other
a load testing plan, you probably see run, but as the result hand, if a Web site is very fast ses-
something like this: “The Web site will sions will complete more quickly, also
be tested with a load of 1,000 concur- of a number of factors. reducing the number of concurrent
rent users.” users. You see the problem?
To explain why this is the wrong The bottom line is that concur-
way to look at things, let’s do a simple Let’s now assume that by rent users is a dangerously mislead-
thought experiment. lunchtime on Tuesday the site has ing metric that can be misused in so
Let’s assume that three users— slowed down a little bit. Now instead many ways that it’s practically guar-
Alan, Betty, and Chris—visit a finan- of taking 5 seconds, each page re- anteed to give you questionable re-
cial Web site to get stock quotes on quest takes 10 seconds, but the time it sults.
three consecutive days, and that each takes the user to read it remains con- So what should you use in its
of them plans to get three different stant. Each session will now last 80 place? For describing an input load,
stock quotes. None of our three users seconds instead of 60. Alan’s and Bet- my favorite metric is user sessions
knows each other, so their actions on ty’s sessions will overlap, and so will started per hour. This metric offers a
each day are completely independent Chris’s and Betty’s. In this case we major advantage: the number of user
and unsynchronized, like those of will have some periods of time with sessions started per hour is a con-
most Web users. one user and some periods of time stant, unaffected by the performance
On Monday, Alan starts his session with two concurrent users. of the Web site under test. If you’ve
at 12:00, Betty at 12:01, and Chris at On Wednesday, the Web site launched a big marketing campaign
12:02, and each of their sessions con- slows down even more. Now instead and expect to draw a peak of 10,000
sists of four page requests (Home of taking 10 seconds, each page re- user sessions per hour to your Web
Page→Quote 1→Quote 2→Quote 3). quest takes 30 seconds. Each session site, those users will come to the Web
Each of our users will, after receiving will now last 160 seconds. Alan’s and site and request the first page, regard-
each page, spend 10 seconds looking Betty’s sessions will overlap, as well less of whether or not your site can
at it before requesting the next page as Chris’s and Betty’s, and for a while handle them. Whether they complete
(in load testing parlance, this is called all three sessions will also overlap, re- their sessions or not, however, de-
think time). If the Web site takes 5 sulting in three concurrent users. pends on the Web site’s ability to sup-

Receiving a page Reading a page Two concurrent users Three concurrent users

Alan
Betty
Chris

12:00 12:01 12:02 12:03 12:00 12:01 12:02 12:03 12:00 12:01 12:02 12:03
ANNIE BISSETT

Monday Tuesday Wednesday

FIGURE 1 Users’ concurrency influenced by page request times


55
May/June 2001 STQE www.stqemagazine.com
This article is provided courtesy of STQE, the software testing and quality engineering magazine.

port the load with acceptable re- To explain why this is, let’s per- er additional users will abandon—at
sponse time—and that’s precisely form another simple thought experi- least until the load increases again and
what you want to find out with a load ment. the cycle repeats itself.
test, isn’t it? Let’s assume that you want to test It’s clear that in this example not
your Web site at a load of 10,000 con- all sessions will conclude happily; a
2. Miscalculating current users (just testing—I hope
you’ve banished concurrent users
number of people will abandon their
session. That’s a very important result,
User Abandonment from your vocabulary)…As I was say- a critical piece of information that your
Let’s move on to another constantly ing, let’s assume you want to test your load test should help you discover. Af-
overlooked load testing factor that Web site at a load of 10,000 user ses- ter all, aren’t you doing a load test to
has a huge impact on load testing re- sion starts per hour. Let’s also as- ensure that the Web site can serve a
sults: user abandonment. Have you sume that if the home page response specific number of users at a specific
ever left a Web site because its pages time is less than 5 seconds, no users performance level? User abandonment
were loading too slowly? Unless you will abandon the Web site because of is a very important metric and a clear
have the patience of a Benedictine performance. We’ll say that as home indication that the Web site is not able
monk, I am sure you have. And since page response time increases farther to operate satisfactorily at that load
many people seem to have the atten- and farther away from 5 seconds, more level.
tion span of a chipmunk when and more users will abandon. For ex- Considering how critical aban-
they’re on the Internet, abandoned ample: 30% will abandon between 5 donment is, it’s surprising that most
sessions are extremely common. and 10 seconds, 45% between 10 and load tests are designed to use scripts
Considering that the magnitude of 15 seconds, and so on. Let’s also as- that simulate abandonment only in ex-
user abandonment is going to be sume that each complete user session treme cases. (I most commonly hear
quite high at the levels that are likely consists of four pages. 60 to 120 seconds timeout. Unfortu-
to be used in a load test, and that this In one scenario, let’s assume that nately, I don’t know anybody in my
user abandonment will have a signifi- the Web site under test is able to han- immediate and extended family with
cant impact on the resulting load, I dle the 10,000 user sessions per hour that kind of patience; actually, I don’t
find it very surprising that most Web with a home page response time con- know anybody in my area code with
load tests don’t even attempt to sim- sistently below 5 seconds. In this case, that kind of patience.)
ulate this ver y common behavior all 10,000 users will complete their So, how do you implement real-
with any degree of realism. sessions and the Web site will have istic user abandonment? Here’s a
You should simulate user aban- served a load of 40,000 pages. simple approach you can use to get
donment as realistically as possible. In another scenario, let’s assume started. When you write your load
If you don’t, you’ll be creating a type that the Web site is not as scalable. testing scripts, determine what the ac-
of load that will never occur in real When it’s confronted with a load level ceptable page response times would
life—and creating bottlenecks that of 10,000 user session starts per hour, be for each type of page, and what the
might never happen with real users. the response time increases to 15 sec- likely abandonment rates are going to
At the same time, you will be ignor- onds per page. What happens in this be when they are exceeded. Then pro-
ing one of the most important load case is not as straightforward. Initially, gram each simulated user script to
testing results: the number of users as the performance deteriorates, some terminate if the response time for a
that might abandon your Web site users will start to abandon; but since page exceeds its pre-specified thresh-
due to poor performance. In other this abandonment reduces the load, old.
words, your test might be quite use- the performance will start improving Table 1 shows a sample matrix in
less. again. As performance improves, few- which to map out the possibilities.

User Abandonment Matrix


% % % %
ABANDONMENT ABANDONMENT ABANDONMENT ABANDONMENT
0–5 5–10 10–15 15–20
PA G E T Y P E SECONDS SECONDS SECONDS SECONDS

Home Page 0% 30% 45% 75%

Stock Quote 0% 15% 25% 45%

Stock Transaction 0% 0% 0% 15%

Account Information 0% 5% 15% 35%

TABLE 1 Estimated user abandonment rates for different page types


56
www.stqemagazine.com STQE May/June 2001
This article is provided courtesy of STQE, the software testing and quality engineering magazine.

This matrix takes into account the for a few hours, or a few days, until ly tests new Web site designs and
fact that people expect home pages to you have enough sessions to make navigation options on a small per-
load very quickly, but are more toler- the results statistically significant. (I centage of users to determine if the
ant of pages that they assume require would set a minimum threshold of new design increases or decreases
more work for the servers (e.g., com- 1,000 user sessions before drawing the percentage of sessions that re-
pleting a stock transaction). any conclusions.) sult in a purchase. New 40KB graph-
This kind of table is actually a After this period of time, using a ics for a home page might look
great way to get your peers and log file analyzer on both the regular great, but will the improved aesthet-
management to formally discuss and server and the slowed-down server, ics compensate for the additional
document the performance expecta- take a look at what percentage of loading time?
tions for your Web site. As we will The investment required for this
see a little later, this type of model approach is appropriate for a large,
also lets you create much better load mission-critical Web site, where just
testing reports—since you will be Abandonment is not 1% abandonment may mean losing
providing information that’s going big annual revenue; but it may be
to be significantly more meaningful
and relevant than what you’d be able
only an interesting harder to justify for smaller sites. In
such cases, your best bet is to make
to deliver without a simulation of some educated guesses about the
user abandonment. result in itself; it also low and high abandonment rates for
the various pages. You can be pretty
Accurately Simulating
Abandonment
has a major impact on certain, for example, that if a home
page takes 30 seconds to load, a lot
Whenever I talk about user abandon-
ment, people agree with its impor-
what parts of your Web of people will not put up with it; so
you can set the abandonment range
tance, and with the goal of simulat- with a low (best case) of 20% and a
ing it as realistically as possible. But site will get stressed high (worst case) of, say, 50%. You
how can they do that? They have no can then run one load test with the
idea what the user abandonment be-
havior might be for their Web site,
under real conditions. best-case numbers and one with the
worst-case numbers to get a rough
and what percentages they should idea of what the range of abandon-
put in their user abandonment ma- ment might be. Your goal is not to
trix. This is a very valid concern sessions requested the home page get these percentages exactly right,
that, fortunately, can be addressed in and then proceeded no further (i.e., but to recognize and document your
several ways, depending on how ac- all the sessions that abandoned after users’ expectations and behavior.
curate and realistic you want to be. the home page—I call them home- This abandonment is not only an
If you want to be as accurate alone pages). If the percentage of interesting result in itself; it also has
and realistic as possible, you could home-alone pages is 6% for the reg- a major impact on what parts of your
set up your Web site to redirect a ular server and, say, 20% for the Web site will get stressed under real
percentage of your visitors to a slowed-down server, you must con- conditions. If you have a very slow
slower mirror version of the Web clude that, since everything else was home page, for example, most real
site—one that’s identical to the main equal, the increased abandonment of users will not continue their session
site, except that it’s artificially 14% had to be caused by impatient and therefore will not put any load
slowed down. This is not as compli- users not putting up with an addi- on the rest of the Web site. In this
cated as it sounds, and it can be ac- tional delay of 5 seconds on the case, if you don’t realistically simu-
complished in a number of ways; the home page. late home page abandonment, you
simplest method might be to add to This approach does have a will apply an improbable and dispro-
each page some simple JavaScript downside: it requires some effort, portionate load to the rest of the
code whose only purpose is to add and you will have caused some in- Web site. This improbable load
an artificial delay of several seconds convenience for 10% of your Web might create an improbable bottle-
before displaying the content of the site visitors (and may have lost a few neck, and you might get stuck fixing
page. Let’s walk through a very sim- of them). But for some Web sites the a virtual problem that might never
ple example of how you might use cost of this type of experimentation have occurred naturally—while ig-
this approach. will be easily justified if it leads to a noring the slow home page that is
Assume that you’ve set up your better understanding of customer causing massive abandonment.
Web site so that 90% of the traffic is behavior—an understanding which It’s important to realize that
sent to the regular server, while the can then be applied to maximize the even the most primitive abandon-
other 10% is routed to a server iden- success of the overall Web site. ment model is a giant leap in realism
tical to the first, but in which the Frederic Haubrich, the Chief Web when compared to the commonly
home page has been artificially and Technical Officer for the Web used 60- or 120-second timeouts.
slowed down by, say, 5 seconds. Run site Hooked on Phonics™ And since there’s a lot of money in-
your Web site in this configuration (www.hop.com), for example, regular- volved, there is no doubt that our
57
May/June 2001 STQE www.stqemagazine.com
This article is provided courtesy of STQE, the software testing and quality engineering magazine.

understanding of Internet user ex- performance would be unacceptable when the response times from differ-
pectations and behavior is going to for a large percentage of users. ent types of pages are carelessly
increase dramatically in the next few How? Let’s consider three cases: combined into a single APRT. Let’s
years, allowing us to create increas- assume that you run a load test and
ingly accurate load models. (See this 1. One way you can get an APRT of you get an APRT of 4 seconds for all
article’s Sticky-Notes for more infor- 4 seconds is if each home page pages. Pretty good, no?
mation.) was returned in approximately 4 Well, possibly not. The home
seconds. page response time may have been
3. Over-Averaging 2. Another way to get an APRT of 4
30 seconds (which can easily happen
when Web designers get carried
Page Response Times seconds is if 5,000 of the home away with fancy graphics), while all
Even if you manage to design, devel- pages were returned in approxi- the other pages loaded in 2 or 3 sec-
op, and execute an incredibly realis- mately 2 seconds and the other onds. In this case the APRT might
tic load test, you have one last op- 5,000 in approximately 6 sec- look good, but—as you should know
portunity to mess things up, really onds. by now—most real users would never
mess them up, when you analyze and have gotten past that home page.
report the results. As you might ex- 3. Yet another way of getting an Here’s a simple way to remem-
pect, load tests generate loads of APRT of 4 seconds is if 9,000 ber that averages can be very mis-
data, and all this data can be users experience a 1-second re- leading. The next time you hear an
processed, mangled, diced, and sponse time, and 1,000 users average number, remember this: I
sliced in a number of ways by using, experience a 31-second re- could put one of your feet in a bucket
misusing, and abusing statistics to sponse time. of icy cold (0 degrees Celsius) water
produce reports that can look very and the other one in a bucket of boil-
pretty but fail to shed light on what In the first case, all users should be ing (100 degrees Celsius) water and
really matters. (Before you think happy, since they all experienced a tell you that, on average, your feet
that I have something against statis- response time below 5 seconds. In are in a nice, warm, cozy 50-degree
tics, let me reassure you that I am a the second case, the APRT is the bath.
big fan of this discipline…at least same, but half the users are experi-
62.3% of the time. And even though encing a 6-second page response Overcoming the
73.7% of statistics are usually made time—a time lag that you know will Problem of Averages
up on the spot, the remaining 26.3% cause some abandonment in a real- So, how do you get around the prob-
are potentially very useful.) world situation. In the last case, even lems associated with the APRT? For-
When it comes to reporting load though the APRT is still the same 4 tunately, there are several ways.
testing results, the greatest opportu- seconds, you have a thousand users
nity for voluntary, or involuntary, with a truly unacceptable response 1. First of all, make sure that you
misuse of statistics is related to av- time of 31 seconds, which points to a report different APRTs for dif-
erage page response time (APRT). potentially serious performance ferent types of pages.
Typically, the main objective of a problem and massive abandonment.
load test is to determine the scalabil- Unfortunately, this danger sign was 2. Augment the APRT number with
ity of a Web site, an important ques- well hidden by the averaging other statistical information such
tion if you don’t want to lose poten- process. as standard deviation or median;
tial customers due to performance Another way that APRT distorts however, first make sure that you
problems (i.e., slow-loading pages). otherwise valid load test results is take into account the statistical
Unfortunately, by averaging page re- competence of your audience.
sponse times, you run the risk of How many people will be able to
Page Response Time
masking serious performance prob- really understand what your sta-
20
lems that will impact your users. tistics mean, or how to act on
Average
Let me show you how, by using them? (For example, “At a load of
a final thought experiment. 15 Range (minimum 6,000 user sessions per hour, the
to maximum)
Let’s assume that you run a test average APRT for the home page
with a load of 10,000 session starts 10 was 4.9 seconds with a standard
per hour and you get an APRT for deviation of 2.3 seconds and a
the home page of 4 seconds. Since 5 median value of 4.8 seconds.”)
your chart tells you that at less than
5 seconds you will experience negli- 3. Show a chart with not only the
2,000 4,000 6,000 8,000 10,000
gible abandonment, you are in pretty average values, but the minimum
good shape, right? Well maybe yes, Load Level and maximum values as well, as
(Sessions started per hour)
maybe no—this single piece of data seen in Figure 2.
ANNIE BISSETT

does not tell you much. You could


have an APRT of 4 seconds, al- FIGURE 2 Chart showing average, 4. Forego the average altogether
though at that load level the Web site minimum, and maximum response times and report the percentage of
58
www.stqemagazine.com STQE May/June 2001
This article is provided courtesy of STQE, the software testing and quality engineering magazine.

Distribution of Page Response Times (at 6,000 sessions/hr) Number of Sessions


35% 7,000 Completed sessions
30% 6,000 Abandoned sessions
25% 5,000

20% 4,000

15% 3,000

10% 2,000

5% 1,000

1 2 3 4 5 6 7 8 9 10 2,000 4,000 6,000 8,000 10,000


Page Response Time (Seconds) Load Level (Sessions started per hour)

FIGURE 3 Histogram of page response times FIGURE 4 User abandonment rates

pages returned within a specific, page response time greater than five donment, you run the risk of creat-
relevant, time limit (e.g., “At a seconds. ing loads that are highly unrealistic
load of 6,000 users sessions per VP of Sales and Marketing: How in- and improbable. As a result, you
hour, 63% of the product informa- teresting. But…so what? How do I use may be confronted with bottlenecks
tion pages were returned in under this number? Is it good or bad? What that might never occur under real
5 seconds.”). should it be, ideally? circumstances. Risks abound at the
other end of the load testing cycle,
5. Present a histogram of the page Using the table and model described too: improper use of simple aver-
response time distribution, as il- in the previous section, you can turn ages in the analysis phase might eas-
lustrated in Figure 3. those numbers into meaningful infor- ily obscure ver y serious perfor-
mation, showing not only perfor- mance problems.
All of these methods enrich the APRT mance problems, but the impact they Some of the solutions we’ve
and will help highlight any perfor- might have on the business. My fa- looked at here are very simple to im-
mance abnormality. But unfortunately, vorite approach is to make page re- plement, while others require sub-
sponse time just one of the result stantially more work. In the end, it’s
metrics, and instead focus on user going to be up to you as a quality as-
satisfaction and potential abandon- surance professional to determine
Make page response ment. The chart in Figure 4 is an ex-
ample of how you might report the
how realistic and accurate your load
tests—and their results—have to be
results, showing how changes in per- for your particular situation. STQE
time just one of the formance will impact the user experi-
ence and, potentially, your business. Alberto Savoia (alberto.savoia@
result metrics, and This will greatly complement the keynote.com) is Chief Technologist
over-simplified (and potentially mis- of Keynote’s load testing division,
instead focus on user leading) data based on page re-
sponse time.
and has also served as founder and
CTO of Velogic, General Manager of
SunTest, and Director of Software
satisfaction and Research at Sun Microsystems Lab-
Conclusion oratories. His sixteen-year career
potential abandonment. Concurrent users, session timeouts,
and average page response time are
has been focused on applying scien-
tific methodology and rigor to soft-
three of the most fundamental con- ware testing.
cepts in load testing—three con-
they still deal with pretty dry metrics to cepts that are regularly misunder- Editors note: This is the second of
which most people cannot relate. Case stood, misused, and misrepresented, Alberto Savoia’s three Web-related
in point, this conversation between two leading to potentially misleading articles for STQE. His first article,
people who speak different languages: load testing results. Web Load Test Planning, appeared
When you adopt concurrent in the March/April 2001 issue. His
QA project leader: Thirty-seven per- users as a load testing input parame- third article in the series will ap-
ANNIE BISSETT

cent of the users experienced a home ter and fail to account for user aban- pear in the July/August 2001 issue.

59
May/June 2001 STaQ
STQE magazine is produced by STQE Publishing, division
E of Software Quality Engineering www.stqemagazine.com

Vous aimerez peut-être aussi