Académique Documents
Professionnel Documents
Culture Documents
AbstractWe present a new method to identify navigationrelated Web usability problems based on comparing actual and
anticipated usage patterns. The actual usage patterns can be extracted from Web server logs routinely recorded for operational
websites by first processing the log data to identify users, user sessions, and user task-oriented transactions, and then applying an
usage mining algorithm to discover patterns among actual usage
paths. The anticipated usage, including information about both
the path and time required for user-oriented tasks, is captured by
our ideal user interactive path models constructed by cognitive experts based on their cognition of user behavior. The comparison
is performed via the mechanism of test oracle for checking results
and identifying user navigation difficulties. The deviation data produced from this comparison can help us discover usability issues
and suggest corrective actions to improve usability. A software
tool was developed to automate a significant part of the activities
involved. With an experiment on a small service-oriented website,
we identified usability problems, which were cross-validated by domain experts, and quantified usability improvement by the higher
task success rate and lower time and effort for given tasks after
suggested corrections were implemented. This case study provides
an initial validation of the applicability and effectiveness of our
method.
Index TermsCognitive user model, sessionization, software
tool, test oracle, usability, usage pattern, Web server log.
I. INTRODUCTION
S the World Wide Web becomes prevalent today, building and ensuring easy-to-use Web systems is becoming
a core competency for business survival [26], [41]. Usability
is defined as the effectiveness, efficiency, and satisfaction with
which specific users can complete specific tasks in a particular environment [5]. Three basic Web design principles, i.e.,
structural firmness, functional convenience, and presentational
delight, were identified to help improve users online experience
[42]. Structural firmness relates primarily to the characteristics
that influence the website security and performance. Functional
convenience refers to the availability of convenient characteristics, such as a sites ease of use and ease of navigation, that
2168-2291 2014 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications standards/publications/rights/index.html for more information.
GENG AND TIAN: IMPROVING WEB NAVIGATION USABILITY BY COMPARING ACTUAL AND ANTICIPATED USAGE
performance enhancements for Internet Web servers [4]. Because of the vastly uneven Web traffic, massive user population,
and diverse usage environment, coverage-based testing is insufficient to ensure the quality of Web applications [20]. Therefore,
server-side logs have been used to construct Web usage models for usage-based Web testing [20], [39], or to automatically
generate test cases accordingly to improve test efficiency [34].
Server logs have also been used by organizations to learn
about the usability of their products. For example, search queries
can be extracted from server logs to discover user information
needs for usability task analysis [31]. There are many advantages to using server logs for usability studies. Logs can provide
insight into real users performing actual tasks in natural working conditions versus in an artificial setting of a lab. Logs also
represent the activities of many users over a long period of time
versus the small sample of users in a short time span in typical
lab testing [37]. Data preparation techniques and algorithms can
be used to process the raw Web server logs, and then mining can
be performed to discover users visitation patterns for further
usability analysis [14]. For example, organizations can mine
server-side logs to predict users behavior and context to satisfy
users need [40]. Users revisitiation patterns can be discovered
by mining server logs to develop guidelines for browser history mechanism that can be used to reduce users cognitive and
physical effort [36].
Client-side logs can capture accurate comprehensive usage
data for usability analysis, because they allow low-level user
interaction events such as keystrokes and mouse movements
to be recorded [18], [25]. For example, using these client-side
data, the evaluator can accurately measure time spent on particular tasks or pages as well as study the use of back button
and user clickstreams [19]. Such data are often used with taskbased approaches and models for usability analysis by comparing discrepancies between the designers anticipation and a
users actual behavior [10], [27]. However, the evaluator must
program the UI, modify Web pages, or use an instrumented
browser with plug-in tools or a special proxy server to collect
such data. Because of privacy concerns, users generally do not
want any instrument installed in their computers. Therefore, logging actual usage on the client side can best be used in lab-based
experiments with explicit consent of the participants.
B. Cognitive User Models
In recent years, there is a growing need to incorporate insights from cognitive science about the mechanisms, strengths,
and limits of human perception and cognition to understand the
human factors involved in user interface design [28]. For example, the various constraints on cognition (e.g., system complexity) and the mechanisms and patterns of strategy selection
can help human factor engineers develop solutions and apply
technologies that are better suited to human abilities [30], [40].
Commonly used cognitive models include GOMS, EPIC, and
ACT-R [2], [21], [28]. The GOMS model consists of Goals, Operators, Methods, and Selection rules. As the high-level architecture, GOMS describes behavior and defines interactions as
a static sequence of human actions. As the low-level cognitive
85
86
Fig. 1.
We propose a new method to identify navigation related usability problems by comparing Web usage patterns extracted
from server logs against anticipated usage represented in some
cognitive user models (RQ2). Fig. 1 shows the architecture of
our method. It includes three major modules: Usage Pattern Extraction, IUIP Modeling, and Usability Problem Identification.
First, we extract actual navigation paths from server logs and
discover patterns for some typical events. In parallel, we construct IUIP models for the same events. IUIP models are based
on the cognition of user behavior and can represent anticipated
paths for specific user-oriented tasks. The result checking employs the mechanism of test oracle. An oracle is generally used
to determine whether a test has passed or failed [6]. Here, we use
IUIP models as the oracle to identify the usability issues related
to the users actual navigation paths by analyzing the deviations
between the two. This method and its three major modules will
be described in detail in Sections IVVI.
We used the Furniture Giveaway (FG) 2009 website as the
case study to illustrate our method and its application. Additionally, we also used the server log data of the FG 2010 website,
the next version of FG 2009, to help us validate our method.
All the usability problems in FG2009 identified by our method
were fixed in FG2010. The functional convenience aspect of
usability for this website is quantified by its task completion
rate and time to complete given tasks. The ability to implement
recommended changes and to track quantifiable usability improvement over iterations is an important reason for us to use
this website to evaluate the applicability and effectiveness of
our method (RQ3).
The FG website was constructed by a charity organization
to provide free furniture to new international students in Dallas. Similar to e-commerce websites, it provided registration,
GENG AND TIAN: IMPROVING WEB NAVIGATION USABILITY BY COMPARING ACTUAL AND ANTICIPATED USAGE
Fig. 2.
87
88
Based on the aggregated trail tree, further mining can be performed for some interesting pattern discovery. Typically, good
mining results require a close interaction of the human experts
to specify the characteristics that make navigation patterns interesting. In our method, we focus on the paths which are used by a
sufficient number of users to finish a specific task. The paths can
be initially prioritized by their usage frequencies and selected by
using a threshold specified by the experts. Application-domain
knowledge and contextual information, such as criticality of
specific tasks, user privileges, etc., can also be used to identified
interesting patterns. For the FG 2009 website, we extracted
30 trails each for Tasks 1, 2, and 3, and 5 trails for Task 4.
V. IDEAL USER INTERACTIVE PATH MODEL CONSTRUCTION
Our IUIP models are based on the cognitive models surveyed
in Section II, particularly the ACT-R model. Due to the complexity of ACT-R model development [9] and the low-level rulebased programming language it relies on [12], we constructed
our own cognitive architecture and supporting tool based on the
ideas from ACT-R.
In general, the user behavior patterns can be traced with a
sequence of states and transitions [30], [32]. Our IUIP consists
of a number of states and transitions. For a particular goal, a
sequence of related operation rules can be specified for a series
of transitions. Our IUIP model specifies both the path and the
benchmark interactive time (no more than a maximum time)
for some specific states (pages). The benchmark time can first
be specified based on general rules for common types of Web
pages. For example, human factors guidelines specify the upper
bound for the response time to mitigate the risk that users will
lose interest in a website [22].
Humans usually try to complete their tasks in the most efficient manner by attempting to maximize their returns while
minimizing the cost [28]. Typically, experts and novices will
have different task performance [1]. Novices need to learn taskspecific knowledge while performing the task, but experts can
complete the task in the most efficient manner [28]. Based on
this cognitive mechanism, IUIP models need to be constructed
individually for novices and experts by cognitive experts by
utilizing their domain expertise and their knowledge of different users interactive behavior. For specific situations, we can
adapt the durations by performing iterative tests with different
users [38].
Diagrammatic notation methods and tools are often used to
support interaction modeling and task performance evaluation
[11], [15], [33]. To facilitate IUIP model construction and reuse,
we used C++ and XML to develop our IUIP modeling tool based
on the open-source visual diagram software DIA. DIA allows
users to draw customized diagrams, such as UML, data flow,
and other diagrams. Existing shapes and lines in DIA form part
of the graphic notations in our IUIP models. New ones can
be easily added by writing simple XML files. The operations,
operation rules, and computation rules can be embedded into
the graphic notations with XML schema we defined to form
our IUIP symbols. Currently, about 20 IUIP symbols have been
created to represent typical Web interactions. IUIP symbols used
GENG AND TIAN: IMPROVING WEB NAVIGATION USABILITY BY COMPARING ACTUAL AND ANTICIPATED USAGE
Fig. 3.
IUIP model for the event First Selection (top) and explanation of the symbols used (bottom).
Fig. 4.
89
cally linked to the related pages. In user Trail 7, one user spent
more time on index.php than the benchmark time for the
corresponding state S2. So, one temporal deviation was counted.
Furthermore, two users took more time on page selection rules
than the benchmark time specified for the corresponding state
S4. Therefore, two temporal deviations were counted. Similarly,
we obtained two temporal deviations for the category pages: one
for the page cat=1 and one for the page cat=5.
We can perform the same comparison and calculation for all
the trails we extracted for all the corresponding tasks. Results
were obtained for the FG 2009 website in this way. Tables I and
II show the specific states (pages) in the IUIP models with large
(5) cumulative logical and temporal deviations respectively.
The results single out these Web pages and their design for further analysis (to be described in the next section), because such
large deviations may be indications of some usability problems.
90
Fig. 5.
TABLE I
STATES (PAGES) WITH LARGE LOGICAL DEVIATIONS
Task
1
2
States (pages)
Logical deviation
index.php
Selection Rules
Category Pages
Register.php(post)
Category Pages
My Selection
Show.php?cat=2
Show.php?cat=1
16
10
7
6
18
10
9
7
TABLE II
STATES (PAGES) WITH LARGE TEMPORAL DEVIATIONS
Task
1
2
States (pages)
Temporal deviations
index.php
register.php
Selection Rules
Category cat=2
cat=1
cat=5
Details
detail=18
detail=31
My Selections
Category cat=5
cat=2
Details
detail=9
detail=69
My Selections
27
23
7
14
7
6
6
5
23
23
8
7
5
6
raw log data using our tool, we identified 58 unique users and
81 sessions. Then, we constructed four event models for four
typical tasks. We extracted 95 trails for these tasks. Meanwhile,
a designer with three-year GUI design experience and an expert
with five-year experience with human factors practice for the
Web constructed four IUIP models for the same tasks based on
their cognition of users interactive behavior.
By checking the extracted usage patterns against the four IUIP
models, we obtained logical and temporal deviations shown in
Tables I and II and identified 17 usability issues or potential
usability problems. Some usability issues were identified by
both logical and temporal deviation analyses. Next, we further
analyze these deviations for usability problem identification and
improvement.
In Table I, 16 deviations took place in the page
index.php. The unanticipated followup page is the page
login.php, followed by the page index.php?f=t
(login failure). Further reviewing the index page, we found that
the page design is too simplistic: No instruction was provided
to help users to login or register. We inferred that some users
with limited online shopping experience were trying to use their
regular email addresses and passwords to log in to the FG 2009
website. They did not realize that they needed to use their email
addresses to register as new users and setup password. Therefore, numerous login failures occurred. Once this issue was
identified, the index page was redesigned to instruct users to
login or register.
We also found some structure design issues. For example, we
observed that some users repeatedly visited the page Selection
Rules. It is likely that when the users were not permitted to
select any furniture in some categories (the FG website limited
each user to select one piece of furniture under each category),
they had to go to the page Selection Rules to find the reasons.
To reduce these redundant operations and improve user experience, the help function for selection rules should be redesigned
to make it more convenient for users to consult.
GENG AND TIAN: IMPROVING WEB NAVIGATION USABILITY BY COMPARING ACTUAL AND ANTICIPATED USAGE
91
unlike other quality attributes such as reliability or capability, typically involved users perception and experts subjective
judgment. Therefore, in the absence of direct user feedback,
validation of the effectiveness of usability practice must be performed by usability experts.
A designer of the FG website with three years GUI design
experience and an expert with five-year experience with human
factors practice for the Web were invited to serve as the usability
specialists to review and validate our results. Among the 17
usability problems identified by our method for the FG website
described in Session VII, three problems were combined as one.
All the 15 problems were confirmed as usability problems by
the usability specialists.
Additionally, it is meaningful to determine the severity of the
problems to assess their impact on the usability of the system.
The four-level severity measure [5] was adopted in our study.
Among 15 identified usability problems, there was one problem
that prevented the completion of a task (severity level 1), ten
problems that created significant delay and frustration (level 2),
three problems that had a minor effect on usability of the system
(level 3), and one problem that pointed to a future enhancement
(level 4). The results indicate that our method can effectively
identify some usability problems, especially those frustrating to
users.
C. Impact on Usability Improvement (RQ3.3)
We used version 2009 and version 2010 of the FG website
to evaluate the impact of our method on usability improvement.
All the usability problems in FG 2009 identified by our method
were fixed in FG 2010. FG 2009 and FG 2010 users are different
incoming classes of international students, but they share many
common characteristics. Therefore, FG 2009 and FG 2010 can
be used to compare usability change and to evaluate usability
improvement due to the introduction and application of our
method.
For well-defined tasks, we can measure the task success rate
and compare it in successive iterations to evaluate usability
improvement. Another way is to examine the amount of effort
required to complete a task, typically by measuring the number
of steps (pages) required to perform the task [41]. Time-on-task
can also be used to measure usability efficiency.
Table III shows the usability improvement between FG 2009
and FG 2010 in term of changes in task success rate, average
amount of effort-on-task and average time-on-task. The average
improvement of task success rate is 8.2%. The average number
of steps for each task decreased from 7.7 to 5.8 and the average
time for each task was reduced from 2.5 to 0.8 min. The usability
of the FG website has apparently improved, with higher task
success rate and reduced effort and time required for the same
tasks.
IX. DISCUSSION
Our method is not intended to and cannot replace traditional
usability practices. First, we need to extract usage patterns from
Web server logs, which can only become available after Web
applications are deployed. Therefore, our method cannot be
92
TABLE III
TASK SUCCESS RATES, AVERAGE EFFORT, AND TIME ON TASKS FOR FG 2009 AND FG 2010
Task
Task success rate
Average effort
(# of step)
Average time
(minutes)
FG 2009
FG 2010
Improvement
FG 2009
FG 2010
Improvement
FG 2009
FG 2010
Improvement
Overall
45/53 or 84.9%
37/39 or 94.9%
10.0%
5
3
2
2.9
1.2
1.7
38/45 or 84.4%
35/37 or 94.6%
10.2%
9
8
1
2.5
1.4
1.1
37/50 or 74.0%
40/48 or 83.3%
9.3%
10
7
3
2.4
1.3
1.1
5/13 or 38.5%
9/17 or 52.9%
14.5%
4
3
1
0.5
0.4
0.1
77.6%
85.8%
8.2%
7.7
5.8
1.9
2.5
0.8
1.7
directly used to identify usability problems for the initial prototype design as can be done through the traditional heuristic
evaluation by experts, nor can it replace usability testing for the
initial Web application development before it is fully operational
and made available to the users. After the initial deployment
of the Web applications, our method can be used to identify
navigation related usability issues and improve usability. In addition, it can be used to develop questions or hypotheses for
traditional heuristic evaluation and usability testing for the subsequent updates and improvements in the iterative development
and maintenance processes for Web applications. Therefore, our
method can complement existing usability practices and become
an important part of an integrated strategy for Web usability
assurance.
Traditional usability testing involving actual users requires
significant time and effort [26]. In the heuristic evaluation, significant effort is required for human factors experts to inspect
a large number of Web pages and interface elements. In contrast, our method can be semiautomatically and independently
performed with the tools and models we developed. The total
cost includes three parts: 1) model construction (preparation);
2) test oracle implementation to identify usability problems; and
3) followup inspection. Although usability experts are needed to
construct event and IUIP models in our method, it is a one-time
effort that cumulatively injects the experts domain knowledge
and cognition. The automated tasks in our method include log
data processing, trail tree construction, and trail extraction, and
the comparison between IUIP models and user trails to calculate logical and temporal deviations. This type of methods offer
substantial benefits over the alternative time-consuming unaided
analysis of potentially large amounts of raw data [19]. Human
factor experts must manually inspect the output results to identify usability problems. However, they only need to inspect and
confirm the usability problems identified by our method. For
subsequent iterations, our method would be even more costeffective because of 1) reuse of the IUIP models and event models, possibly with minor adjustment, 2) automated tool support
for a significant part of the activities involved, and 3) limited
scope of followup inspections.
There are some issues with server log data, including unique
user identification and caching [8]. Typically, each unique IP
address in a server log may represent one or more unique
users. Pages loaded from client- or proxy-side cache will not
GENG AND TIAN: IMPROVING WEB NAVIGATION USABILITY BY COMPARING ACTUAL AND ANTICIPATED USAGE
93
ACKNOWLEDGMENT
The authors would like to thank the anonymous reviewers for
their constructive comments and suggestions.
REFERENCES
[1] A. Agarwal and M. Prabaker, Building on the usability study: Two explorations on how to better understand an interface, in Human-Computer
Interaction. New Trends, J. Jacko, Ed. New York, NY, USA: Springer,
2009, pp. 385394.
[2] J. R. Anderson, D. Bothell, M. D. Byrne, S. Douglass, C. Lebiere, and
Y. Qin, An integrated theory of the mind, Psychol. Rev., vol. 111,
pp. 10361060, 2004.
[3] T. Arce, P. E. Roman, J. D. Velasquez, and V. Parada, Identifying web
sessions with simulated annealing, Expert Syst. Appl., vol. 41, no. 4,
pp. 15931600, 2014.
[4] M. F. Arlitt and C. L. Williamson, Internet Web servers: Workload
characterization and performance implications, IEEE/ACM Trans. Netw.,
vol. 5, no. 5, pp. 631645, Oct. 1997.
[5] C. M. Barnum and S. Dragga, Usability Testing and Research. White
Plains, NY, USA: Longman, Oct. 2001.
[6] B. Beizer, Software Testing Technique. Boston, MA, USA: Int. Thomson
Comput. Press, 1990.
[7] J. L. Belden, R. Grayson, and J. Barnes, Defining and testing EMR
usability: Principles and proposed methods of EMR usability evaluation and rating, Healthcare Information and Management Systems
Society, Chicago, IL, USA, Tech. Rep., (2009). [Online]. Available:
http://www.himss.org/ASP/ContentRedirector.asp?ContentID=71733
[8] M. C. Burton and J. B. Walther, The value of Web log data in use-based
design and testing, J. Comput.-Mediated Commun., vol. 6, no. 3, p. 0,
2001.
[9] M. D. Byrne, ACT-R/PM and menu selection: Applying a cognitive
architecture to HCI, Int. J. Human-Comput. Stud., vol. 55, no. 1,
pp. 4184, 2001.
[10] T. Carta, F. Patern`o, and V. F. D. Santana, Web usability probe: A tool for
supporting remote usability evaluation of web sites, in Human-Computer
InteractionINTERACT 2011. New York, NY, USA: Springer, 2011,
pp. 349357.
[11] G. Christou, F. E. Ritter, and R. J. Jacob, CODEINA new notation for
GOMS to handle evaluations of reality-based interaction style interfaces,
Int. J. Human-Comput. Interaction, vol. 28, no. 3, pp. 189201, 2012.
[12] M. A. Cohen, F. E. Ritter, and S. R. Haynes, Applying software engineering to agent development, AI Mag., vol. 31, no. 2, pp. 2544, 2010.
[13] J. Conallen, Building Web Applications with UML. Reading, MA, USA:
Addison-Wesley, 2003.
[14] R. Cooley, B. Mobasher, and J. Srivastava,, Data preparation for mining
World Wide Web browsing patterns, Knowl. Inf. Syst., vol. 1, no. 1,
pp. 532, 1999.
[15] O. L. Georgeon, A. Mille, T. Bellet, B. Mathern, and F. E. Ritter, Supporting activity modelling from activity traces, Expert Syst., vol. 29,
no. 3, pp. 261275, 2012.
[16] S. R. Haynes, M. A. Cohen, and F. E. Ritter, Designs for explaining intelligent agents, Int. J. Human-Comput. Stud., vol. 67, no. 1,
pp. 90110, Jan. 2009.
[17] M. Heinath, J. Dzaack, and A. Wiesner, Simplifying the development
and the analysis of cognitive models, in Proc. Eur. Cognitive Sci. Conf.,
Delphi, Greece, 2007, pp. 446451.
[18] D. M. Hilbert and D. F. Redmiles, Extracting usability information from
user interface events, ACM Comput. Surveys, vol. 32, no. 4, pp. 384421,
2000.
[19] M. Y. Ivory and M. A. Hearst, The state of the art in automating usability
evaluation of user interfaces, ACM Comput. Surveys, vol. 33, no. 4,
pp. 470516, 2001.
[20] C. Kallepalli and J. Tian, Measuring and modeling usage and reliability
for statistical Web testing, IEEE Trans. Softw. Engin., vol. 27, no. 11,
pp. 10231036, Nov. 2001.
[21] D. E. Kieras and D. E. Meyer, An overview of the EPIC architecture for
cognition and performance with application to human-computer interaction, Human-Comput. Interaction, vol. 12, no. 4, pp. 391438, 1997.
[22] S. J. Koyani, R. W. Bailey, and J. R. Nall, Research-Based Web Design and
Guidelines. Washington, DC, USA: U.S. Dept. Health Human Services,
2004.
[23] B. Liu, Web Data Mining: Exploring Hyperlinks, Contents, and Usage
Data. New York, NY, USA: Springer, 2007.
94
Ruili Geng (M12) received the B.S. degree in computer application from University of Jinan, Jinan,
China, in 1998, and the M.S. degree in computer science from Beijing Technology and Business University, Beijing, China, in 2004. She is currently working toward the Ph.D. degree in computer science with
Southern Methodist University, Dallas, TX, USA.
She has worked as a Software Engineer, Test Engineer, and Project Manager for several IT companies,
including Microsoft China between 1998 and 2008.
Her research interests include software quality, usability, reliability, software measurement, and usage mining.