Vous êtes sur la page 1sur 103

Library De

laration Form

SAPERE-AUDE

University of Otago Library


Author's full name and year of birth: Steven Ross Smithies,
(for ataloguing purposes)
1976
Title of thesis: Freehand Formula Entry System
Degree: Master of S ien e
Department: Computer S ien e
Permanent Address: 4B Fairbanks Pla e, Glendene, Au kland 1008
I agree that this thesis may be onsulted for resear h and study purposes and that
reasonable quotation may be made from it, provided that proper a knowledgement of
its use is made.
I onsent to this thesis being opied in part or in whole for
i) a library
ii) an individual
at the dis retion of the University of Otago.
Signature:
Date:

Freehand Formula Entry System


Steve Smithies

a thesis submitted for the degree of

Master of S ien e

at the University of Otago, Dunedin,


New Zealand.
May 24, 1999

ii

Abstra t

Current equation editing systems rely either on text-based equation des ription languages or on intera tive onstru tion by means of sele ting
and lling in templates. These systems are often tedious to use, even for
experts, be ause the user is for ed to determine the stru ture of the formula
before entry.
This thesis des ribes a system that enables the freehand entry and editing of formulae using a pen and tablet. The raw input strokes are passed
through a new algorithm for automati ally segmenting them into symbols.
It uses a hara ter re ogniser to evaluate di erent possible groupings. The
user interfa e has tools for easy orre tion of the inevitable errors o urring
in the grouping and re ognition stage. A pop up menu o ers hara ter
re ognition alternatives while stroke segmentation errors are orre ted by
drawing a temporary line through the strokes that belong to a single hara ter. The re ognised symbols an be passed through a graph rewriting
formula parser, produ ing a linear ommand string representation of the
formula.
A user test was designed and ondu ted to evaluate the e e tiveness of the
user interfa e and the e e tiveness of a graph rewriting parser for pro essing handwritten input. It was found that a usable pen-based formula entry
system an be built and, more importantly, is preferable to use over existing
template or ommand-string based systems, although the hara ter re ognition and formula pro essing modules in the system need improvement
before it would be of ommer ial value.

iii

iv

A knowledgements

I wish to thank my supervisor, Dr. Kevin Novins, for all the guidan e,
ideas, motivation and help he gave me while I was working on my Masters
and writing this thesis. Se ond, thanks to everyone in the Graphi s Lab for
the fun, ideas and help that have gone into this system over the past year.
Parts of Chapter 4 of this thesis was originally part of a paper submitted
to Graphi s Interfa e '99, so I would like to say thanks to my oauthors
on that paper: Kevin Novins and Jim Arvo, for the work they did helping
write the paper and, as a result, parts of Chapter 4. Last, but not least, I
would like to thank my family and all my friends for providing the essential
friendship, support and en ouragement I needed over the year.

vi

Contents
1 Introdu tion

2 Literature Review

1.1 Ba kground . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.1 Existing Formula Entry Methods . . . . . . . . . . .
2.1.1 Command Line Interfa es . . . . . . . . . . .
2.1.2 Template-style Editors . . . . . . . . . . . . .
2.1.3 Graphi al Online Pen Entry Systems . . . . .
2.2 Issues In Formula Re ognition . . . . . . . . . . . . .
2.2.1 Input . . . . . . . . . . . . . . . . . . . . . . .
2.2.2 Noise Versus Small Symbols . . . . . . . . . .
2.2.3 Symbol Segmentation and Re ognition . . . .
2.2.4 Ambiguous Symbols . . . . . . . . . . . . . .
2.2.5 Identifying Signi ant Spatial Relationships .
2.2.6 Ambiguity of Symbol Pla ement . . . . . . . .
2.2.7 Little Redundan y . . . . . . . . . . . . . . .
2.2.8 Conne ted and Overlapping Symbols . . . . .
2.2.9 Ambiguity in the Formula . . . . . . . . . . .
2.2.10 Post-pro essing Error Corre tion Rules . . . .
2.3 Formula Parsers . . . . . . . . . . . . . . . . . . . . .
2.3.1 Modi ed Grammars . . . . . . . . . . . . . .
2.3.2 Box Languages . . . . . . . . . . . . . . . . .
2.3.3 Proje tion Pro le Cutting . . . . . . . . . . .
2.3.4 Pro edurally Coded Math Syntax . . . . . . .
2.3.5 Sto hasti Grammars . . . . . . . . . . . . . .
2.3.6 Graph Rewriting . . . . . . . . . . . . . . . .
2.3.7 Data Driven and Knowledge Driven Modules .
2.4 Summary . . . . . . . . . . . . . . . . . . . . . . . .

3 The Formula Pro essor

3.1 Formula Pro essor Details . . . . . . .


3.1.1 Bounding Regions . . . . . . . .
3.1.2 Input to the Formula Pro essor
3.1.3 Building the Initial Graph . . .
3.1.4 Building the Ar s . . . . . . . .
3.1.5 Initial Graph Prepro essing . .
vii

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

6
6
8
9
10
11
12
12
13
16
17
17
17
18
21
21
21
22
26
27
28
30
31
32
33

35
36
38
39
39
45

3.1.6 Main Pro essing . . . .


3.1.7 Parser Implementation
3.2 Formula Straightening . . . .
3.3 Summary . . . . . . . . . . .

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

4 The Interfa e

4.1 Aspe ts of User Interfa e Design . . . . . .


4.2 Pen Based Computing . . . . . . . . . . .
4.3 A Pen Based Formula Entry System . . .
4.3.1 The Chara ter Re ogniser . . . . .
4.3.2 Basi Input . . . . . . . . . . . . .
4.3.3 Stroke Segmentation . . . . . . . .
4.3.4 Online Annotation . . . . . . . . .
4.3.5 Stroke Regrouping . . . . . . . . .
4.3.6 Modify Chara ters . . . . . . . . .
4.3.7 Parsing and Preview . . . . . . . .
4.3.8 Corre ting Equation Parsing Errors

5 User Testing

5.1
5.2
5.3
5.4
5.5
5.6

Designing the Test . .


Choosing Parti ipants
Ethi al Considerations
The Test Itself . . . . .
Post-test Analysis . . .
Usability Inspe tion . .

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

6 Evaluation

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

6.1 User Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .


6.1.1 Working Styles . . . . . . . . . . . . . . . . . . . . . . . .
6.1.2 Time Spent Entering and Corre ting Formulae . . . . . . .
6.1.3 Parsing . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.1.4 Comparative Timing Results . . . . . . . . . . . . . . . . .
6.1.5 Error Rates . . . . . . . . . . . . . . . . . . . . . . . . . .
6.1.6 Evaluation of User Interfa e Features . . . . . . . . . . . .
6.1.7 Evaluation of the User Testing . . . . . . . . . . . . . . . .
6.2 Usability Inspe tion . . . . . . . . . . . . . . . . . . . . . . . . . .
6.2.1 Visibility of System Status . . . . . . . . . . . . . . . . . .
6.2.2 Mat h Between the System and the Real World . . . . . .
6.2.3 User Control and Freedom . . . . . . . . . . . . . . . . . .
6.2.4 Consisten y and Standards . . . . . . . . . . . . . . . . . .
6.2.5 Re ognition Rather Than Re all . . . . . . . . . . . . . . .
6.2.6 Flexibility and E ien y of Use . . . . . . . . . . . . . . .
6.2.7 Aestheti and Minimalist Design . . . . . . . . . . . . . .
6.2.8 Help Users Re ognise, Diagnose, and Re over From Errors
6.2.9 Error Prevention . . . . . . . . . . . . . . . . . . . . . . .
6.2.10 Help and Do umentation . . . . . . . . . . . . . . . . . . .
viii

.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

46
49
50
51
53

53
55
55
57
58
58
64
65
66
68
71

75

76
77
79
80
82
83

85

85
86
86
91
93
94
95
99
100
101
101
101
102
102
102
102
103
103
103

6.2.11 Overall Degree of Usability .


6.3 The Overall System . . . . . . . . .
6.3.1 Positive . . . . . . . . . . .
6.3.2 Negative . . . . . . . . . . .
6.3.3 Overall . . . . . . . . . . . .

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

7 Future Work

7.1
7.2
7.3
7.4
7.5
7.6
7.7

The Formula Parser . . . . . . . . . . . . . . .


Keyboard Input . . . . . . . . . . . . . . . . .
Magi Hot-Spots . . . . . . . . . . . . . . . .
Indi ation of Areas . . . . . . . . . . . . . . .
Training of the Chara ter Re ogniser . . . . .
Squiggle Sele t for Other Sele ting Operations
Morphing of Symbols . . . . . . . . . . . . . .

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

104
104
104
105
105
107

107
108
109
109
109
110
110

8 Con lusion

113

Referen es

117

A A ompanying CD-ROM

A.1
A.2
A.3
A.4
A.5

Readme Text File . . . . . .


The Thesis . . . . . . . . . .
Qui ktime Movie . . . . . .
Tar File . . . . . . . . . . .
Graphi s Interfa e '99 Paper

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

123

123
123
123
125
125

B Ethi al Statement

127

C Parti ipant Consent Form

131

D Anonymous Questionnaire

135

E Oral Questionnaire

139

F Anonymous Responses

141

G Oral Responses

147

H Raw Data

153

H.1 Error Rates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153


H.2 Parsing Attempts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
H.3 Drawing and Corre tion Times . . . . . . . . . . . . . . . . . . . . . . 155

ix

List of Figures
2.1 S reenshot of Mi rosoft's Equation Editor. . . . . . . . . . . . . . . . .
2.2 Without the ontext of the surrounding symbols, the identity of a symbol
sometimes annot be determined. In (a), it is not possible to determine
whether it is an o (the letter o) or 0 (the digit zero). In (b) it is a 0, yet
in ( ) it is an o. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.3 The relative lo ation of symbols an be ambiguous. (a) represents a
times x, and ( ) represents a to the power of x, but what about (b)? . .
2.4 Overlapping symbols. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.5 A box language summation template. . . . . . . . . . . . . . . . . . . .
2.6 These y- entres of these symbols line up, although their bounding boxes
do not. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.7 Building a proje tion pro le. . . . . . . . . . . . . . . . . . . . . . . . .
3.1 Sample rules from a graph grammar. . . . . . . . . . . . . . . . . . . .
3.2 Constru tion of bounding regions. . . . . . . . . . . . . . . . . . . . . .
3.3 These bounding boxes overlap, although the symbols do not. . . . . . .
3.4 Using entre points instead of bounding boxes for geometri tests. While
the 1 is outside the fra tion bar's bounding box, the 3 is not. . . . . . .
3.5 A simple formula. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.6 Prepro essing and parsing of a simple formula. . . . . . . . . . . . . . .
3.7 Regions used by the geometri he k. . . . . . . . . . . . . . . . . . . .
3.8 Is \A" to the top-right of \B"? Yes. . . . . . . . . . . . . . . . . . . . .
3.9 As the length of the exponent grows, it moves from the \top-right" to
the \right" region. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.10 The shape of the \right" region that Lavirotte and Pottier use. . . . . .
3.11 No ar is built between symbols that have other symbols between them.
3.12 The \nothing inside" test. If a grammar rule ollapsed the to a single
node, the entre point of the 3 would end up inside its new bounding
region. Be ause of this, the appli ation of the rule is not permitted. . .
3.13 A problem with the no-inside restri tion. Collapsing the integral is forbidden due to the fra tion bar ending up inside it. . . . . . . . . . . . .
3.14 A sloped or skewed formula an be di ult to pro ess reliably. . . . . .
4.1 A suggested undo gesture. . . . . . . . . . . . . . . . . . . . . . . . . .
4.2 A formula that has been entered into, and parsed by, the system. . . .
4.3 Delays between strokes for writing the alphabet. . . . . . . . . . . . . .
4.4 All groupings for four units. . . . . . . . . . . . . . . . . . . . . . . . .
2

9
13
16
18
23
24
26
34
36
37
38
39
40
42
43
44
44
45

xi

48
49
50
55
56
60
64

4.5 A user beginning to enter a formula. The rst three hara ters have
been re ognised, and the remaining two are still waiting to be re ognised.
4.6 Modifying stroke groupings. . . . . . . . . . . . . . . . . . . . . . . . .
4.7 Corre ting a misre ognised hara ter. . . . . . . . . . . . . . . . . . . .
4.8 A formula being pro essed. . . . . . . . . . . . . . . . . . . . . . . . . .
4.9 The LTEX preview window. . . . . . . . . . . . . . . . . . . . . . . . .
4.10 The display for a formula that the system was unable to parse. . . . . .
5.1 The proportion of problems with a user interfa e found as the number
of evaluators is in reased. . . . . . . . . . . . . . . . . . . . . . . . . .
6.1 Times for people entering Formula 1. . . . . . . . . . . . . . . . . . . .
6.2 Times for people entering Formula 2. . . . . . . . . . . . . . . . . . . .
6.3 Times for people entering Formula 3. . . . . . . . . . . . . . . . . . . .
6.4 Times for people entering Formula 4. . . . . . . . . . . . . . . . . . . .
6.5 Times for people entering Formula 5. . . . . . . . . . . . . . . . . . . .
6.6 Times for people entering all the formulae. . . . . . . . . . . . . . . . .
6.7 Time spent by users entering and orre ting formulae. . . . . . . . . . .
6.8 Part-way through parsing a formula, with an erroneous de ision having
been made by the parser. . . . . . . . . . . . . . . . . . . . . . . . . . .
6.9 A misinterpreted formula. . . . . . . . . . . . . . . . . . . . . . . . . .
6.10 Misre ognition and misgrouping rates. . . . . . . . . . . . . . . . . . .
6.11 Parsing attempts for ea h formula. . . . . . . . . . . . . . . . . . . . .
6.12 The overlapping bounding boxes of the square-root symbol and the z are
indistinguishable, making it hard to tell at glan e if they are orre tly
grouped. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
A

xii

65
67
69
70
71
72
78
87
87
88
88
89
89
90
92
92
96
96
97

Chapter 1
Introdu tion
1.1

Ba kground

Mathemati al formulae have a two-dimensional nature. A lot of information is onveyed


through the relative positions of symbols in a formula: the operation that is being
applied to the symbols, or what the arguments to a fun tion are. Original systems
for doing mathemati s on a omputer were developed in the times of hara ter-only
interfa es (Littin, 1995). As a result, formulae were entered as a linear text string of
\ ommands". Graphi al entry was out of the question as the equipment required was
either unavailable, or prohibitively expensive.
A linear ommand string an represent all the same formulae as a traditional twodimensional notation. Turning a two-dimensional formula into a linear representation
does not in ur a loss of information, but it is ne essary to know the overall stru ture
before it an be linearised. In e e t, the user has to do a mental \parse" of the formula
to determine its stru ture before starting to enter it.
Even with the proliferation of graphi apable displays and pointing devi es su h as
mi e or pens and tablets, appli ations and systems are still produ ed with the option
of, or even restri ted to, the use of keyboard entry for formulae.
Entering a formula as a ommand string is not very di ult if you have the formula
in front of you and are just trans ribing it, assuming you have a knowledge of the
ommand language you are using. It is only when you are trying to write a formula
from a \mental image", or want to manipulate it afterwards, that it be omes di ult.
For example, LTEX (Lamport, 1994) is a non-WYSIWYG system for reating and
typesetting do uments, using text ommands for all layout and des ription of the
stru ture of a do ument. The entry of formulae is also through a ommand language.
A

The formula:

x dx
2

is entered into LTEX as \int^{4}_{0}{x^2}dx. While this is not omplex, the stru ture of a formula be omes a lot harder to visualise and manipulate as the length of its
ommand string grows. For example, the formula:
Z  f (ei ) d
1
f (a) =
(T (ei ))d
2i
T (ei ) d
is entered as:
A

f(a) = \fra {1}{2\pi i} \int^{2\pi}_{0}{


\fra {f(e^{i\theta})}{T^{-1}(e^{i\theta})} \fra {d}{d\theta}
(T^{-1}(e^{i\theta})) d\theta}

By looking at the ommand string for this formula, it is di ult to gain an idea
of its overall stru ture. Furthermore, hanging the layout of the omponents in the
formula, or performing other major editing operations is very di ult.
The people who are most likely to bene t from the ability to arry out mathemati s
on a omputer are also the ones who are least likely to be interested in typing in omplex
or unnatural notations for formulae. Mathemati ians would have spent many years of
their lives using pen il and paper, a method whi h does allow 2D input and display.
In spite of the enormous potential o ered by omputers for mathemati al omputation
and manipulation, a simple means of entry still does not exist.
While most omputer algebra systems have graphi al formula entry front-ends available, for example Mathemati a (Wolfram, 1996), sometimes this front-end is unavailable for the parti ular pa kage of interest, or di ult to use. Typi ally these are
template based editors, where the user hooses a template for the operation they desire, and lls in the boxes provided by the user interfa e. While template systems are
a step towards the goal of being able to enter formulae while keeping their two dimensional layout, they are still not as uid to use as a pen il and paper and still require a
mental pre-parse of formulae to determine their stru ture before entry.
An ideal omputer-based mathemati s system would be one that had the ease of use
of pen il and paper but with the power of a omputer behind it. A user would be able
to enter their formula as they would write it on a pie e of paper: drawing with a pen
on a tablet and having their strokes appear on the s reen. On e the user had drawn
their formula, they would then be able to freely manipulate, add to and hange the
2

formula as they desired. On e the formula was entered to their satisfa tion, the system
ould then insert it into their word pro essor, LTEX do ument, or other appli ation.
Should the formula entry system be part of a WYSIWYG system, a typeset version of
the formula ould be displayed. If it was part of a omputer algebra system, they ould
evaluate or perform other operations su h as expansion, simpli ation, or evaluation
of it.
This thesis des ribes a new system that is a step towards the ideal system des ribed
above: the Freehand Formula Entry System (FFES). It allows the freehand entry and
editing of formulae using a pen and tablet. Automati handwritten formula re ognition
is then used to generate a LTEX ommand string for the formula.
The system onsists of a number of independent modules:
 A handwriting re ogniser that re ognises the strokes input by the user to determine the symbols written.
 A formula pro essor whi h takes the list of symbols and their positions, and
returns the formula they represent.
 A user interfa e whi h provides transparent intera tion with the handwriting
re ogniser and formula pro essor. The user interfa e provides easy entering and
manipulation of formulae, along with the means to orre t any errors made by
the other omponents in the system.
The handwriting re ognition module was developed as part of a larger system
(Smithies, Novins and Arvo, 1999), and the formula pro essor is an implementation of
published te hniques (Lavirotte and Pottier, 1997). The synthesis and evaluation of
these modules and the user interfa e are the main ontribution of this thesis. The user
interfa e in ludes a new method for automati ally determining the orre t grouping of
the user's strokes into symbols and a number of new user interfa e tools. The tools enable the user to easily orre t the inevitable mistakes made by the hara ter re ogniser
and automati stroke grouping pro ess.
This thesis also reports on user testing arried out on this system. It was found that
a pen-based formula entry system is easier and more omfortable to use over existing
formula entry systems, both for the initial entry and subsequent editing of formulae.
Chapter 2 reviews existing formula pro essors and formula entry systems. Ea h
formula pro essor is analysed and its suitability for handwritten formula interpretation
is dis ussed. Chapter 3 des ribes in detail the implementation of the formula pro essor,
a subset of the system des ribed by Lavirotte and Pottier (1997).
A

Chapter 4 des ribes the new user interfa e designed and implemented for the freehand entry and editing of formulae, in luding the new te hniques developed for automati ally grouping a user's strokes, and orre ting errors that arise as the user enters
their formula.
Chapter 5 des ribes user testing and how the user testing of this system was ondu ted. Chapter 5 also dis usses usability inspe tions. Chapter 6 is an evaluation
of the system, based on results of the user testing and an informal usability study.
Chapter 7 dis usses dire tions that further development of the system ould take, in
response to issues and ideas raised in Chapter 6.
The nal hapter, Chapter 8, summarises the ndings and ontributions of this
thesis, and suggests some future avenues of resear h.

Chapter 2
Literature Review
Sin e the late 1960s a lot of resear h has been done in the eld of parsing mathemati al
formulae, for a range of purposes from the re ognition of a person's s ribings with a
pen and tablet, through to the automati interpretation and e ient storage of printed
tables of integrals that have been s anned from books. Whatever the purpose, there
are a number of approa hes that an be used. Some are more suited to typeset while
others are more suited to handwritten input.
The main di eren e between handwritten and typeset input is that typeset input
has, at least for a single sour e, a mu h more rigorous and predi table nature with
regard to its layout and appearan e. Handwriting, on the other hand, exhibits a great
deal of variability, even for a single author. There are unpredi table variations in the
size of symbols, layout, and notational onventions used, whi h even vary within a
single formula. When dealing with handwritten input, it is important to be able to
interpret formulae reliably, in spite of minor disturban es in the positions and sizes of
symbols.
A good sour e of general information on formula re ognition is \General Diagram
Re ognition Methodologies" by Blostein (1996), and Chapter 22 of \Handbook of Chara ter Re ognition and Do ument Image Analysis", by Blostein and Grbave (1996).
This hapter will dis uss existing formula entry methods: ommand line interfa es,
template-style editors, and graphi al pen and tablet entry systems. For ea h of these,
their strengths and weaknesses are dis ussed. This hapter then dis usses issues in
formula re ognition that apply to both handwritten and typeset input. Existing formula parsing systems are then presented, with attention paid to their ability to pro ess
handwritten input.
5

2.1

Existing Formula Entry Methods

This se tion dis usses existing methods for entering formulae into a omputer. Take
the following formula as an example:

Z 4x
ln x dx
Either the formula an be turned into some linear form, and then typed at the keyboard, or some system ould be used where the 2D nature of the formula is preserved.
The paper by Kajler and Soi er (1998) gives a good overview of the te hniques
and onsiderations involved in making interfa es for omputer algebra systems. They
predominantly dis uss window based, template style entry systems, though they do
have a se tion on alternative input methods, su h as pen and tablet or voi e.
20

10

2.1.1 Command Line Interfa es

Command string based equation entry is fast and powerful for a user who an visualise
the formula and is familiar with a system's ommands. Command string input is qui k
as is depends upon typing skills; advan ed users will able able to type faster than they
an write (Brown, 1988). If a user is able to determine the orre t ommand string for
their formula, it does not take them long to type it in.
For a novi e user, ommand string based entry an be frustrating to use while they
are over oming the initial learning urve. A signi ant amount needs to be learnt
before they an enter omplex formulae. There is a lot of information that has to be
remembered, su h as syntax and ommand names.
In spite of these disadvantages, the use of ommand string based equation entry is
still popular.
LATEX

LTEX (Lamport, 1994) is a software system for typesetting do uments. It is in ommon


use in many dis iplines, in luding Computer S ien e, Mathemati s and Physi s, for
publi ation quality typesetting of do uments and mathemati al formulae.
Formulae are expressed in LTEX as textual ommand strings. As with all ommand
string based systems, the learning urve is steep and many di erent ommands must
be learnt to des ribe the expression and its layout. Even long time users of LTEX an
6
A

nd it frustrating as they still have to o asionally refer to books to nd out how to


a hieve the result they desire.
The example formula given at the beginning of this se tion is entered into LTEX
as:
A

\int^{20}_{10}{\fra {4x^3}{\ln{x}}}dx

For longer and more omplex formulae, the nesting and balan ing of bra es an be ome di ult. It also gets be omes di ult to visualise the formula from its ommand
string. For example, the formula already presented in Se tion 1.1:
Z  f (ei ) d
1
f (a) =
(
T (ei ))d
i
2i
T (e ) d
is entered into LTEX as:
2

f(a) = \fra {1}{2\pi i} \int^{2\pi}_{0}{


\fra {f(e^{i\theta})}{T^{-1}(e^{i\theta})} \fra {d}{d\theta}
(T^{-1}(e^{i\theta})) d\theta}

Mathemati a

Mathemati a (Wolfram, 1996), produ ed by Wolfram Resear h, is a powerful system


for arrying out a wide variety of mathemati s. Formulae an be entered in two ways,
either as a ommand-string, or through a template based equation editor. The template
based editor is dis ussed later in Se tion 2.1.2.
Mathemati a's ommand string based entry shares the same problem as LTEX: the
user has to know the orre t \ ommand" for ea h mathemati al operation and symbol.
As Mathemati a o ers an intera tive session, online help is available whi h helps to
alleviate this problem.
The example formula is entered for evaluation into Mathemati a with the ommand
string:
A

Integrate[(4x^3/Log[x),{x,10,20}

The language used by Mathemati a is simpler and more onsistent than that used
by LTEX. Fun tions have their names fully spelt out, their arguments are ontained
within a single pair of square bra kets and ea h argument is separated by a omma.
LTEX often uses shortened ommand names, has pairs of urly bra es around ea h
argument and the number of these pairs varies depending on the operation. The variety
of bra kets used in Mathemati a's expression tends to make them more readable and
easier to understand. This di eren e is probably due to the fa t that LTEX's primary
purpose is typesetting, while Mathemati a is a mathemati s pa kage.
A

LISP-like Pre x Notation

In a LISP-like pre x notation, all omponents are entered as operator, argument 1,


argument 2, : : : , argument n. If an argument of an operation is another operation, it
is en losed in bra es. For example, in a LISP-like pre x notation, the example formula
from above is:
(Integrate (/ (^ (* 4 x) 3) (Log x)) x 10 20)

LISP-like pre x notation is more di ult for a user to enter, but better for omputer
pro essing. It also provides a good intermediate representation of formulae, due to its
onsisten y and regular stru ture.
2.1.2 Template-style Editors

Template-style equation editors require the user to sele t, either from a toolbar or
menu, templates for the mathemati al onstru ts that they wish to use.
All template editors have a similar style of use. Basi operations su h as addition
and subtra tion an be entered from the keyboard by pressing the appropriate key. For
operations whi h are not on the keyboard or are two-dimensional in nature, su h as
exponentiation, integration, square-root, summation and fra tions, the user typi ally
uses the mouse to sele t from a toolbar the template for the operator they want. The
operator then appears on the s reen, and the user positions the ursor in boxes, using
the TAB key, arrows on the keyboard, or by li king in the boxes with the mouse, and
then lls in the boxes. The boxes are initially pla eholders for, and nally ontain the
operands for the operators.
Figure 2.1 shows a s reenshot of a user part way through entering a formula using
Mi rosoft's template based equation editor. Along the top of the window is a toolbar
that o ers templates for the operations that the system supports. In the main entry
area, you an see a partially entered formula. The box in the lower part of the fra tion
template is yet to be lled.
A number of template based editors exist, typi ally as part of some larger system.
Some examples in lude:

 Mi rosoft's Equation Editor. This usually omes as part of Mi rosoft Word


(Mi rosoft, 1993).
 Mathemati a's equation editor for its graphi al front-end, produ ed by Wolfram
Resear h (Wolfram, 1996).
8

Figure 2.1: S reenshot of Mi rosoft's Equation Editor.

 LyX (Ettri h, 1999), a front-end for the LTEX typesetting system. It provides
a WYSIWYG interfa e for the reation and editing of LTEX do uments, and
in ludes a template based equation editor.
A

 INFORM (van Egmond, Heeman and van Vliet, 1989). An intera tive syntaxdire ted formulae editor.
 Newton (Hayden and Lamagna, 1998) is a tool designed for tea hing introdu tory mathemati s. As part of it, there is a formula entry system whi h is based
on a template style of entry.
2.1.3 Graphi al Online Pen Entry Systems
An example of a pen entry system is that developed by Littin (1995). Littin's system
allows the entry of formulae with a mouse or pen and tablet. Chara ters are re ognised
as they are drawn, using a feature-based hara ter re ogniser. Be ause of a low pro essing requirement for his parsing te hnique, the formula is parsed as the user writes.
The output of the system is a linear format, su h as a LISP-like pre x notation.
As explained later in Se tion 2.3.1, Littin's use of a modi ed SLR(1) parser puts a
number of restri tions on the system. Users are limited in the order in whi h symbols
for a parti ular formula an be entered, though the order is fairly reasonable. Editing

is also limited to the modi ation or deletion of the most re ently entered symbol. As
a result, arbitrary editing of formulae with his system is impossible.
After a hara ter written by a user has been re ognised, it is morphed to a prede ned ideal stroke shape. Be ause the formula is parsed as the user enters it, the
morphing also rearranges the positions of the hara ters to the orre t arrangement
a ording to the parser's urrent understanding of them. The rearrangement is done
so that enough room is left around the most re ently entered symbols so that the user
an still write around them.
Littin laims that moving hara ters to their orre t positions as the person writes
en ourages them to write in straight lines. He also notes that the morphing of hara ters to what they have been re ognised as and rearranging the formula in real time
provides valuable feedba k to the user on how the omputer's interpretation of their
formula is going.
When ompared to other methods of formula entry, pen based systems have the
advantage that, assuming they are well designed, they are more natural and intuitive
to use. A system that allows the user to enter their formulae as they would by writing
on a pie e of paper has the advantage that it o ers a style of intera tion that is as
similar as possible to writing with a pen il and paper, yet it also has the power of
a omputer available to perform omputations on and manipulations of the formulae
entered.
The disadvantage of pen based systems is that allowing freehand input of formulae
means that the system must be able to deal with the sloppiness inherent in handwritten
input, and ideally the arbitrary order in whi h users enter formulae. Littin's system
avoids this by limiting the order in whi h users an enter symbols for their formulae,
and restri ts the editing of formulae to the most re ently entered symbol. Ideally a
system would allow a ompletely arbitrary symbol entry order.
Pen and tablets are also not ommon hardware for home users and writing neatly
and qui kly with a mouse is very di ult.
2.2

Issues In Formula Re ognition

The goal of pro essing a mathemati al formula is to take a list of symbols and their
lo ations, then return a des ription of the formulae that they represent. Re ognising
the symbols themselves is an issue as well.
This se tion dis usses a number of general issues that arise during the pro essing
10

of mathemati al formulae.
2.2.1 Input
When designing a parser, a supply of input is required. A lot of systems that pro ess
typeset input use input generated by LTEX, des ribed in Se tion 1.1, for several reasons.
It is a onvenient way to generate input. The pro ess an be easily automated so
that an input string an be passed into a system and, after the formula pro essing,
the result an easily be ompared to he k that it was the same as the input string.
The existen e of an input string guarantees that there is a \solution" to the parsing
pro ess. If something was known to be generated by LTEX, then it should be possible
to regenerate the LTEX for it. Generating input from s reenshots gives lean input
data, free from noise and other artifa ts that would be introdu ed as part of a printing
and s anning pro ess.
For handwritten input, data is gathered as the user writes with a pen and tablet. In
ontrast to typeset input, it is quite possible that there will be input that, although it
is quite reasonable for the person who wrote it, annot have LTEX generated for it, no
matter how good the underlying formula pro essor is. This may be as a result of LTEX
not being powerful enough to represent the user's input or, more likely, the formula
pro essor not being programmed to anti ipate a parti ular user's style for laying out
formulae.
Ea h individual author will have a fairly onsistent style that they use, allowing for
variations in the positions and sizes of symbols. Mathemati ians also invent their own
notations to improve the brevity and readability of their formulae. To a ommodate
this, an online handwriting based entry tool would ideally be easily extensible by the
user, possibly through some sort of GUI tool.
To simplify the problem of re ognising mathemati al formulae, the onsisten y of
input an be improved by restri ting oneself to a ertain style of mathemati al notation.
For example, this an be a hieved for typeset input by taking all input from a single
sour e, su h as LTEX, or an individual publi ation. Within this single sour e, the style
(i.e.: fonts, sizes, spa ings, et .) will be relatively onsistent.
Both typeset and handwriting based systems have to re ognise the hara ters that
are input to the system. There are issues of segmentation and re ognition, then dealing
with errors arising from these steps.
If we are using handwritten input, the fa t that the user may give sloppy input,
erroneous input, or an in omplete formula must be fa ed. While books an have
A

11

mistakes as well, the likelihood of a user giving erroneous input is mu h higher.


2.2.2 Noise Versus Small Symbols

The system that is pro essing the input data has to be able to distinguish between
noise, dots, ommas and symbol annotations, for example: A0 and x_ . If the information
is noise, then it must be removed from onsideration. However, removing something
whi h is an annotation must be avoided. This is a more signi ant problem in the
pro essing of s anned input, as noise is likely to arise from the s anning. Online input
with a pen and tablet is not as sus eptible, unless you want to a ount for a user
a identally tapping on the tablet with the pen and drawing odd dots and lines. This
does not o ur very often, and it's easy enough for the user to noti e that they have
done so and let them either undo or delete the mistake themselves.
2.2.3 Symbol Segmentation and Re ognition

Whether typeset or handwritten, the input is pro essed to form a set of symbols, their
positions, and sizes. For typeset input, working from s anned images of pages, there
is a large variety of fonts, sizes and styles. Within a single publi ation this will be
restri ted to a smaller subset. Raw input pixels have to be segmented into individual
symbols and then re ognised. The analogous problem for online handwritten input is
determining whi h strokes belong to whi h hara ters, then re ognising them.
R Some symbols do not have a onstant aspe t ratio, for example: bra kets, , and
. Their size depends on the symbols that they are asso iated with. The segmentation
pro ess must also be able to nd symbols that are inside others. This allows for the
re ognition of the square-root operator, \p " and any other symbols inside it.
Some systems (Chou, 1989; Miller and Viola, 1998) enable feedba k to the hara ter
re ogniser from later stages of formula pro essing, so that the identity of symbols an
be determined based on surrounding symbols. This helps with ases like that illustrated
in Figure 2.2. The symbol shown in Figure 2.2(a) ould be either a o (the letter o) or a
0 (the digit zero). It is not until it is viewed in the ontext of the surrounding symbols
that its identity an be determined. In Figure 2.2(b) it is a 0, in Figure 2.2( ) it is an
o.
For handwritten input there is also a large variety of writing styles. It would be
ideal to have a re ogniser su iently exible so that it would be possible to train it
to work well with a parti ular user, but also have su ient generality so that multiple
12

(a)
(b)
( )
Figure 2.2: Without the ontext of the surrounding symbols, the identity of a symbol
sometimes annot be determined. In (a), it is not possible to determine whether it is
an o (the letter o) or 0 (the digit zero). In (b) it is a 0, yet in ( ) it is an o.
users an use it without additional training.
2.2.4 Ambiguous Symbols
Some symbols have many possible meanings and a distin tion an only be made by examining it in the ontext of surrounding symbols. Examples of unambiguous symbols,
and several ambiguous ones are listed here.

 \!" is always a post x fa torial operator. Its argument an always be found to


its immediate left.
R
 \ " is always an integration operator, but we do not know how many arguments
it has. There will be an integrand and a di erential (the \dx" part), but either
zero, one or two limits.
 \ " (a horizontal line) an be an in x subtra tion operator, a pre x negation
operator, or a fra tion bar. It is also possible that it is part of some other symbol,
su h as =, , or .
 \:" (a dot) an be multipli ation, a de imal point, part of a symbol, or an annotation, e.g.: 3x:y, 2:71828, !, or x_ .
 aij an mean either the array element a(i; j ): the ith row in the j th olumn in a
2D array, or a(i  j ): the element whi h is the produ t of i and j in a 1D array.
 The a in \X a" an either be a power or, as some authors use it, an index into an
array.
13

Some of the ambiguities listed above are on erned with understanding the underlying meaning of things: their semanti s. Others are to do with syntax. For example,
the last ase is to do with semanti s. Determining the meaning of X a is impossible
without the knowledge of what the author intended it to mean. The se ond example
above, determining the number of limits on an integral, is a synta ti al problem. If a
limit is found, its fun tion is unambiguous. The problem is that the number of limits
to look for is indeterminable in advan e.
A large amount of relian e is pla ed on the knowledge and experien e of the person
reading mathemati al formulae. They are expe ted to understand the ontext in whi h
something is written, and thus interpret things orre tly. To build su h experien e into
an automated system an be di ult.
If the purpose of parsing the formula is to produ e LTEX that generates output that
looks like the user's input, determining the underlying meaning is not as important; we
are only interested in appearan e, not meaning. If we are generating input for mathemati al omputation pa kages, su h as Mathemati a or Matlab, to do al ulations with
or operations on the formulae, then it is important to know the underlying meaning of
onventions that the formula's author uses, so that a orre t ommand string an be
produ ed.
Anderson (1971) and Bernstein (1971) believe that syntax and semanti s of a formula are di erent, and say that the parsing stage should only return something whi h
des ribes the layout of the formula. Bernstein's view is that if we are intending to pass
the formula onto some later stage that has its own input format, then the problem of
going from the layout des ription to this input format should be done as a subsequent
stage of pro essing. This ould be done, for example, with a 1D string parser.
An example of a formula represented by a layout des ription follows. This is taken
from the paper by Fateman, Tokuyasu, Berman and Mit hell (1996).
The formula
Z xq 1 dx  q
= tan 2p
xp x p x 2p
is represented in a positional notation as:
A

(hbox
(vbox integral nil nil)
(vbox quotient
(hbox (expbox x q) - 1)
(hbox (expbox x p) -

14

(expbox x (box - p))))


(vbox quotient
(hbox d x)
x)
=
(vbox quotient
pi
(hbox 2 p))
Tan
(vbox quotient
(hbox q pi)
(hbox 2 p)))

The hbox and vbox are operators that perform horizontal and verti al on atenation
of symbols and subexpressions, in a similar manner to the on atenation operators that
Martin uses (Martin, 1967), des ribed in Se tion 2.3.2. For example, a fra tion is a
verti al on atenation of the numerator, a horizontal line and the denominator.
Splitting the formula pro essor into two parts with the rst stage being a layout
pro essor returning a des ription of the layout of the formulae, and the se ond stage
being a formula pro essor that takes the layout des ription and returns the ommandstring for the formula, has the advantages that:

 it breaks the system into two distin t, independent, simpler stages.


 the layout pro essor does not have to take into a ount the meaning of the formula, as it is not the nal stage in the pro ess. As a result it de ouples the layout
pro essor from the formula pro essor, simplifying its ode. All author-dependent
ustomisation an be done at the level of the formula pro essor, independent of
the layout pro essor.
 either the layout pro essor or formula pro essing unit an then be easily taken
out and repla ed with minimal e ort. Ea h unit in itself is relatively simple
with respe t to a ombined fun tion unit, and has very well de ned inputs and
outputs.
It an also be argued that a single ombined fun tion unit an provide the same
thing, if it uses a arefully hosen nal language. For example, LISP-like notation
essentially des ribes the layout of a formula. Splitting the pro ess into two parts
15

means you have to write a parser that will take the positional des ription and output
a more human-readable version, su h as LTEX or a Mathemati a ommand string. It
also makes it harder for the layout pro essor to use ontextual information in making
hoi es in ambiguous situations, as the layout pro essor is now a separate part.
A

2.2.5 Identifying Signi ant Spatial Relationships

A lot of information is onveyed by the relative lo ation of symbols in a formula.


Operations su h as exponentiation and impli it multipli ation use the relative positions
of symbols to indi ate the operation intended. This is in ontrast to operations that
use an expli it symbol to indi ate the operation, su h as addition where a \+" will
always appear between its operands.
Integration uses a symbol to indi ate the operation, but uses the relative positioning
of symbols to indi ate the parameters for the integration. The limits appear near the
top and bottom of the integral symbol, and the integrand between the symbol and the
di erential.

(a)

(b)

( )

Figure 2.3: The relative lo ation of symbols an be ambiguous. (a) represents a times
x, and ( ) represents a to the power of x, but what about (b)?
Be ause so mu h information is onveyed by the relative positions of symbols it
is important, when interpreting a formula, to orre tly identify the intended relative
positioning between symbols. In many ases there is a grey area between alternatives.
Figure 2.3 illustrates this. In the ontext of a mathemati al formula, it is reasonably
\obvious" that Figure 2.3(a) is \a times x" and that Figure 2.3( ) is \a to the power
of x", but what about Figure 2.3(b)?
This is primarily a problem for the pro essing of handwritten formulae, as there
must be a degree of lenien y in the allowable positions that a user an write symbols,
but it does also apply to the pro essing of typeset formulae.
16

Geometri relationships between symbols, or groups of symbols, an be determined


by using:

 global thresholds.
 lo al thresholds, based on the symbol involved.
 statisti al labelling, whi h determines the probabilities of various possible arrangements of the symbols.
 onstraints based on symbol identity, for example: a is legal, while a is not.
2

Template based equation editors, where the user sele ts a template for the operator
they want and then lls in the boxes in the template, avoid this problem of having
to determine the orre t geometri relationship between symbols. By having the user
sele t operators from menus and then ll in the appropriately positioned boxes for the
operator's arguments, the user is expli itly spe ifying the stru ture of the formula, even
to the extent of what the arguments for ea h operator are.
2.2.6 Ambiguity of Symbol Pla ement
Taken in small lo al ontexts, it is not possible to determine the orre t relationship
between symbols. For example, seeing \xi",the i ould either be a subs ript of x, as in
xiy j , or a oin idental alignment as in axi. More examples of ambiguities, even ones
that would onfuse a human, are in a paper by Martin (1971).
2.2.7 Little Redundan y
There is very little redundan y in mathemati al notation. Be ause of this, there are
very few ross- he ks that an be made to on rm that an interpretation of a formula
is orre t. Some operators ome in pairs, e.g. left and right bra kets, or an integral
sign and di erential, but the majority do not.
2.2.8 Conne ted and Overlapping Symbols
When symbols are onne ted or overlap, there is the problem of separating them.
This is more of a problem with s anned input, as the input is simply an image. For
handwritten input, there is information available on the timing and order of the strokes
drawn.

17

Figure 2.4: Overlapping symbols.


Figure 2.4 shows another problem, where the tail of the y overlaps the fra tion bar
and the x. This makes it hard to determine the geometri relationship between these
symbols reliably. Anderson (1968) initially represents the position of ea h symbol with
a re tangular bounding box, the smallest re tangle that ontains all of the symbol's
original pixels. He then shrinks these bounding boxes to a single entre point to avoid
problems when onsidering the relationships between symbols. The position of the
entre point is dependent on the identity of the hara ter. The way Anderson de nes
the entre points of bounding boxes is overed in more detail in the dis ussion of his
equation parser in Se tion 2.3.2.
2.2.9 Ambiguity in the Formula

Zhao, Sakurai, Sugiura and Torii (1996) dis uss \ta it agreements", whi h fall into two
lasses: determinable and indeterminable. The indeterminable agreements in lude the
examples of ambiguity that Martin talks about in his paper (Martin, 1971). One of
Martin's examples is:
X
X
X!
Does
i + Y mean
(i + Y ) or
i +Y?
10

10

10

i=5

i=5

i=5

Indeterminable agreements require the knowledge and experien e of the reader to


resolve. Determinable agreements orrespond to rules in the interpretation of formulae,
su h as the impli it pre eden e of operations. For example, readers of the formula
a + b  , knowing the standard pre eden e of mathemati al operators, understand
that the multipli ation operation has pre eden e over the addition. This means that
they interpret it as a + (b  ) and not (a + b)  .
18

Zhao et al. dis uss di erent types of grammars, the omplexity of ea h depending
on the level of formality of the formula entry system. More formality means that
the user spends more of their time spe ifying \boxes", not dissimilar to the boxes in
template-based equation editors, that en ode the geometri and logi al relationships
between various elements in their formulae. The higher the formality, the fewer ta it
agreements that have to be en oded into the grammar.
Zhao et al. des ribe three levels of formalisation: strong, weak, and free. As you
move from one to the next the omplexity of the grammar in reases, as it has to be able
to determine more of the ta it agreements in the formulae. Unfortunately, although the
strong formalisation is the easiest to write a grammar for, and is the one that o ers the
most on den e that after pro essing that you have got the right thing, it also involves
the most additional work for the user who is entering formulae. Free formalisation is
the exa t opposite.
Strong Formalisation

Every stru ture in a system using strong formalisation has to be put in a box by the
user, all determinable and indeterminable ta it agreements are indi ated by the user
during input. No information is needed in the grammar on the priority of operators,
as everything is in a hierar hy of boxes that the user has supplied.
For example, the formula:
Z
x + 3 sin4x dx
4

is en oded as:

R 4
0

+ 3 sin 4 x

dx

It an be seen that the user has had to expli itly en ode the pre eden e of all the
operators, along with spe ifying the two dimensional layout of the formula, and the
arguments for ea h of the operators.
Weak Formalisation

The user has to supply fewer boxes in a system using weak formalisation; the grammar
now en odes the pre eden e of operators. The boxes that no longer have to be drawn
are those that originally indi ated the pre eden e of operations.
19

The resulting grammar is alled a \weak grammar" whi h uses grammati al ategories su h as senten e, relation, al ulation, term, fa tor and atom. The priority of
operators is en oded using these ategories.
For example, to en ode the impli it operator pre eden e of multipli ation over
fun tion appli ation, for expressions su h as \sin xy", a grammar an be designed with
rules like:
<sin op> sin <term> and
<term> <term> <fa tor>
The example formula above is now entered as:

R 4
0

+ 3 sin 4 x dx

This level of boxing of entries is similar to that used by template based equation
editors.
Free Formalisation

The only boxes required are those that spe ify the layout of formulae.

R 4
0

+ 3 sin 4 x dx

When using a free formalism, the grammar for parsing the formula has to be extended to determine the start and end of groups of symbols, su h as operands for
operators. In the example above, the grammar has to determine where the integrand
starts and nishes.
No Formalisation

From the user's point of view, an ideal system would be a step beyond free formalisation,
where the system would determine where the boxes are. This would let the user
on entrate on the meaning of the formula and not worry about having to expli itly
de ne its layout. The only problem with this is that the user then has to trust the
system to interpret their formula orre tly. The system has to be su iently powerful
to do so, without overly restri ting the positions that symbols an be pla ed with
respe t to one another.
20

2.2.10 Post-pro essing Error Corre tion Rules


These rely on the programmer anti ipating what sort of errors may o ur and providing
means for dealing with them in advan e. For example:

 \5in" ! \sin",
 \ 0s" ! \ os",
 \= < i < n" ! \1 < i < n",
 \5 " ! \S "
2

The limitation of this approa h is that it is not possible to automati ally generate
these rules. However, a sto hasti grammar is able to automati ally hoose likely alternatives when given an unparsable formula. Miller and Viola (1998) give an example,
where their system attempted to parse \0 ". The system realised that \0 " was not a
legal onstru t and on luded that it was most likely to be \00". As the output \0 "
would be illegal, due to a two digit number not being allowed to begin with a \0",
it onsidered all the digits to be approximately the same size, and use an alternative
re ognition of the rst digit, giving \600" as the best interpretation. While this was
not the \ orre t" interpretation of the formula given that the input was \0 ", it was
the most likely interpretation within the onstraints of the grammar the system was
using.
00

00

00

2.3

Formula Parsers

This se tion dis usses types of existing formula parsing systems, past and present.
These systems are possibly omponents of some larger existing system. Their ability to
pro ess input ranging from neat typeset print to sloppy handwritten entry is dis ussed.
The goal of a formula parser is to start with a set of re ognised symbols, and
return a des ription of the formula that they represent. Blostein and Grbave (1996)
give a good overview of the ategories of existing te hniques for parsing mathemati al
formulae.
2.3.1 Modi ed Grammars
One te hnique is to take an existing one dimensional grammar that parses 1D strings,
and modify it so that it in orporates he ks of the geometri relationships between

21

symbols. This te hnique is only appli able to online input as time is used to order the
symbols before parsing. This is the approa h used by Littin (1993).
Littin uses a SLR(1) parsing te hnique, with additional tests for he king the geometri relationship between tokens. He does this by rst building a SLR(1) parser
then extending it where ne essary to in lude these geometri tests.
To enable the use of this modi ed SLR(1) parser, the input has to be ordered
orre tly: the user has to enter the symbols for a formulae in a prede ned order. For
example,
a+b
d

has to be entered in the order a, +, b, | , , , d. For some formulae there are a


number of possible orders, but this only o urs when a symbol has both a subs ript
and supers ript. The user is then able to de ide whether to write the subs ript or
supers ript rst. Editing of the formula during and after entry is limited to the deletion
or alteration of the most re ently written symbol.
Permitting the user to enter symbols in an arbitrary order would mean that this
type of grammar ould not be used. He justi es this restri tion through the fa t that
people tend to enter formulae in a fairly standard order, whi h he informally veri ed
by observing several people writing a number of formulae. Although this assumption
is reasonable, if users want to be able to go ba k later and edit their formulae, they
are unable to. Littin has reated a formula entry system, rather than a formula editing
system.
The modi ation to the SLR(1) grammar adds geometri tests whi h he k that
the symbols are in the orre t lo ations. Toleran e for the sloppiness of handwritten
input is implemented by putting a threshold on the distan e symbols an appear from
their expe ted lo ations.
One major advantage of SLR(1) parsing is its time and spa e e ien y. Littin
shows the omputational omplexity of his system to be O(g), where g is the number
of symbols in the input formula.
2.3.2 Box Languages

A box language divides the input plane into areas based on the symbols found. For
example, the rule for a summation in ludes a term that subdivides the input area, as
in Figure 2.5.
22

Figure 2.5: A box language summation template.


When the parser nds a \", it looks in the appropriate areas above and below the
\" for the limits, then to the right for the expression being summed.
The way that box languages divide up the input area an be implemented in two
ways. It an use expli it de nitions in the grammar rules that spe ify where the boxes
are, with respe t to the symbol being onsidered. This approa h is used by Anderson (1968). An alternative approa h is to use \ on atenation operators" whi h o er
geometri operations su h as \verti al on atenation". Using on atenation operators,
a fra tion an re ognised as a verti al on atenation of the numerator, a horizontal line
and the denominator. This approa h is used by Martin (1967).
When the on atenation operators are applied, they de ne the subdivision of the
input plane, restri ting the positions in whi h to look for symbols. The way the on atenation operators a tually subdivide the input area is de ned externally, not as part
of the grammar.
The use of box grammars is a ommon approa h, from the early formula parsers
(Anderson, 1968; Martin, 1967) through to systems urrently being developed (Fateman,
Tokuyasu, Berman and Mit hell, 1996; Zhao, Sakurai, Sugiura and Torii, 1996).
Martin's system from 1967 (Martin, 1967) is one of the earliest systems for the
parsing of mathemati al formulae, pro essing handwritten formulae entered using a
pen and tablet.
Martin gives details of the method he uses for analysing the formula on the 2D
input plane, and how to de ide where to look next for ea h part of the equation, based
on these positional or \ on atenation" operators. The rule for addition in Martin's
grammar is (C= T* + E*). This de nes the horizontal on atenation (C=) of a Term
(T*) a plus symbol (+) and an Expression (E*).
Martin's system uses several spe i hard- oded tests and rules in addition to the
grammar, to ensure that his system works orre tly. He onstantly looks out two
hara ters ahead for exponents on derivatives, e.g.: the i's in ddyx . Ea h symbol input
i

23

to the system has a bounding box that de nes the area it overs. As the system works
in a single left-to-right pass on the input, Martin needs to ensure that the orre t
symbols are en ountered rst. He extends the bounding boxes of 's (for summation)
and fra tion bars ( ) one hara ter to the left. Doing so means that the symbol
indi ating the operation is found rst, so when the symbols around it are found, the
parser is able to asso iate them orre tly with the operator.
y-centre

bp

y-centre

Figure 2.6: These y- entres of these symbols line up, although their bounding boxes
do not.
Another well known early equation parser is Anderson's system (Anderson, 1968).
He assumes that the OCR problem has already been solved, so ea h hara ter or
\synta ti unit" has known physi al bounds and an x- and y- entre, not ne essarily
the average of the left and right or upper and lower bounds. Working with these xand y- entres avoids the problem of the bounding boxes of as enders and des enders
not lining up. Figure 2.6 demonstrates this. The bounding boxes are not lined up,
however the y- entres of the symbols do.
To save on pro essing time, Anderson uses a prepro essing step to do an initial
lexi al analysis of the symbols input. Individual symbols are preassigned with their
synta ti ategory, instead of using rules in the main grammar su h as:
ajbj j::jz

=) < letter >

Instead of using on atenation operators, Anderson puts tests in as part of rules


in the grammar, limiting the positions where symbols an be legally positioned. As a
result, the 2D rules in his grammar have several parts to them, that:
 he k to see if the right synta ti units are present.

 he k the layout of these synta ti units, using geometri onstraints.


 have a rule to determine the physi al hara teristi s of the result of applying the
rule. This al ulates things su h as the size, position, and x- and y- entres of
the new bounding box.
24

 build up a parse tree or a LISP-like expression that represents the formula.


The paper by Martin (1971) dis usses various aspe ts of input, parsing, and display
of mathemati al formulae. It provides useful results of a study they did on the syntax
and layout of mathemati al formulae. He determined that there is no one \o ial"
layout, but he does ome up with a number of general observations about the layout
of formulae that hold true in most ases.
He dis usses a method for parsing formulae, whi h is the same as that des ribed by
Martin in his earlier paper (Martin, 1967), and how to avoid problems with it. He also
dis usses the splitting of formulae a ross several lines when the formula is too long,
display of formulae, ambiguity in input and how to judge if it is a valid expression.
More re ent papers by Fateman et al. (1996) and Fateman and Tokuyasu (1996)
dis uss their re ent work on the automati interpretation of typeset formulae that have
been s anned in from books of tables of integrals. On e interpreted, they intend to
store the formulae so that they an be retrieved from an online integral lookup table,
for use in omputer algebra systems.
In order to simplify the problem, they assume that the layout of the typeset notations being input to the system is onstant. This is a valid assumption, as all their
input is being taken from a single book. They also assume that the output of the OCR
system that feeds the re ogniser is 100% orre t, so the misre ognition errors are of no
on ern. Test input is urrently leaned up by hand if there are hara ters that have
been mistakenly joined, or if there is ex essive noise that is onfusing the OCR stage.
Their formula parsing te hnique is similar to the box language used by Anderson (1968). The input to their system is taken from the output of a sub-system that
automati ally pro esses raw bitmaps of s anned pages or s reen- aptures of previewed
LTEX formulae. The sub-system automati ally lo ates and re ognises text in these
images. This is then passed on to the formula parsing stages.
Initially, before the main parsing stage, they do a lexi al analysis to olle t up
multi- hara ter omponents. For example, \ os" would originally be passed as three
separate hara ters, i.e.: \ ", \o" and \s". These are put together to make a single
\ os" element. Other multi-element symbols are gathered up as well, su h as \=" and
\i", should this not have happened already in the pre eding OCR stage. The main
parsing stage then takes this and turns it into a des ription of the symbols' layout
using positional operators. This is then parsed to generate a representation of the
mathemati al formula.
A

25

2.3.3 Proje tion Pro le Cutting


Proje tion pro le utting, also known as \stru tural analysis", determines the stru ture
of a formula from a number of repeated horizontal and verti al proje tions of a formula's
image. Based on these proje tions the formula is subdivided, ea h subdivision being
re ursively proje ted and further subdivided. A tree stru ture is reated, representing
the formula's geometri stru ture. This stru ture is then further pro essed, taking into
a ount the symbols in the formula, and the formula is determined.

Figure 2.7: Building a proje tion pro le.


Figure 2.7 shows how a proje tion pro le is reated. The height of the histogram
is determined by the area of the bounding boxes (Ha, Harali k and Phillips, 1995).
The histogram an also be reated using the density of pixels at ea h x- or y-position.
For example, if this formula was subdivided based on the minima in the histogram, it
would be split into three parts: the integral symbol, the fra tion, and the di erential.
The fra tion ould then be horizontally proje ted whi h would identify the numerator
and denominator.
Blostein and Grbave (1996) say the disadvantage of proje tion pro le utting is
that spe ial pro essing is required for square-roots, sub- and super-s ripts. They report
26

that proje tion pro le utting has been used on both typeset and handwritten input,
although with handwritten input it has trouble with square-roots, losely-written symbols and skew.
Proje tion pro le utting is also used to pro ess s anned text do uments, subdividing the text into olumns, paragraphs, and lines (Srihari, 1986; Ha et al., 1995). It
is also used for analysing s anned images of sheet musi , to separate the staves and
musi al symbols.
2.3.4 Pro edurally Coded Math Syntax
Pro edurally oded math syntax uses a olle tion of rule-of-thumb observations about
formulae. These observations are oded into a formula pro essing program. An example
of a rule, quoted in Blostein's paper (Blostein and Grbave , 1996), \A length threshold
of 20 pixels is used to lassify a horizontal line as a long bar or a short bar. If a long
bar has symbols both above and below it, it is treated as a division. If there are no
symbols above it, it is treated as boolean negation. If a short bar has no symbols above
or below it, it is treated as a minus sign. If it has hara ters above or below it, then
ombination hara ters (e.g.: =, , ) are formed."
A olle tion of rules su h as these are used to parse formulae. The use of thresholds
is su ient for the pro essing of typeset input taken from a uniform sour e, but the
high variability of handwritten input ould make it fail.
The rules that are hard oded into the system perform essentially the same fun tion
as the rules in a box language, des ribed in Se tion 2.3.2. The only di eren e is that
in pro edurally oded syntax, they are built into the system as part of the ode for
the formula pro essor. In a box language, the rules are provided through a modi able
external data le.
Using pro edurally oded math syntax means that it may be easier to write more
omplex or \intelligent" rules that use extra pro essing that ould not be en oded
as part of box language rules. However, the major disadvantages arise from the fa t
that rules are oded into the system itself, so that hanging them involves rewriting
parts of the program. It may be impossible for the end user to make modi ations
or extensions. The ability to modify a handwriting based formula parser is important
due to the variability of notations, and the need to allow advan ed users to reate new
notations.
Rules are typi ally added to the system as ne essary throughout its development,
orre ting errors as they o ur. As a result, the set of rules for a system progressively

27

grow, with ea h new rule addressing the urrent problem. Systems end up with a large
number of rules, with spe ialised se tions of ode for dealing with parti ular situations
and problems.
2.3.5 Sto hasti Grammars
Sto hasti grammars are reported to yield good results with both typeset and handwritten input. A sto hasti approa h an be added to any type of grammar, and two
examples of systems that use this type of approa h are Chou's (1989), and Miller and
Viola's (1998). Chou's work pro esses typeset input, while Miller and Viola are now
moving from typeset to handwritten input.
A sto hasti grammar has asso iated with every produ tion a probability that the
produ tion is used. Thus, for any given sequen e of produ tions in a given parse, the
overall probability of this sequen e an be al ulated. The orre t parsing of a set of
symbols is the parsing that has the highest probability.
To use this approa h, ea h produ tion in the grammar needs to be assigned a probability. There are a number of algorithms for assigning probabilities, typi ally working
from a set of example strings whi h are known to be in the language that the grammar des ribes. Chou (1989) des ribes how to adapt the \inside/outside algorithm",
originally designed for linear one dimensional input, to a two dimensional grammar
that uses verti al and horizontal on atenation operators. The inside/outside algorithm makes a number of passes over the grammar and examples from that grammar's
language, determining the probabilities for ea h rule in the grammar.
Sto hasti grammars ope well, and in a pleasing way, with geometri tests as
symbols an be given a probability that they have a parti ular geometri relation
to other symbols or subexpressions. This is in ontrast to other approa hes whi h
judge arrangements to be either valid or invalid. Miller and Viola (1998) model the
positions of symbols as Gaussian variables, the probability that two elements are in a
parti ular relationship relative to ea h other is de ned by a two-dimensional Gaussian
distribution around the expe ted position of the se ond expression. This helps ope
with ambiguities, su h as those des ribed in Se tion 2.2.5.
Another advantage of a sto hasti approa h is that it an take as its input the
output of the hara ter re ogniser in the form of symbols and possible alternatives,
along with on den e values. As a result, the sto hasti parser itself an hoose from
the alternatives in ambiguous ases, or when an error o urs, to get the most likely
parse.

28

To simplify matters, the Miller and Viola sub- lass symbols into sets of equivalent
symbols:

 as ender letters (b,d,h,i,k,l,t,A-Z (ex ept Q), ).


 des ender letters (g,p,q,y, ).
 small letters (a, ,e,m,n,o,r,s,u,v,w,x,z, ).
 as ender/des enders (f ,j ,Q, ).
 binary operators (+, ,=).
 zero (0).
 non-zero digits (1-9).
 other symbols, ea h in a lass of their own (round, urly and square bra kets,
fra tion symbol).
Chou maintains all possible re ognitions of symbols with a probability over a given
threshold. Miller and Viola note that all symbols in ea h sub lass are synta ti ally
equivalent, so only keep the best from ea h sub lass.
Miller and Viola use an A heuristi to guide their sear h with the probabilities of
ea h produ tion being al ulated during the parsing pro ess. The heuristi provides
an estimate of the number of steps between the urrent state and the solution. For a
heuristi to be A, it is required to never overestimate the number of steps. Thus, if
su h a heuristi is used to guide a sear h, then the shortest path to the solution will
be found.
Their paper reports the su ess their system has, and how mu h faster their system
is with the te hniques that they use. They report that the heuristi s redu e the parsing
times for formulae from minutes to se onds. The paper by Miller and Viola has some
very good results: it urrently works very well on typeset text and their preliminary
results with handwritten input are also very en ouraging.
The use of sto hasti grammars seems to be one of the two best approa hes available
for the pro essing of handwritten input, the other being graph rewriting.
29

2.3.6 Graph Rewriting

Graph rewriting uses a graph, onsisting of nodes and ar s, to represent a formula.


\Rewrite rules" are used to progressively redu e the graph, repeatedly repla ing subgraphs with new graphs. This te hnique is des ribed by Blostein and Grbave (1996).
Lavirotte and Pottier (1995; 1997; 1998) present a system they have implemented that
uses this te hnique, pro essing s anned LTEX formulae, and an optimisation they have
developed for speed.
To parse a formula using a graph grammar approa h, a graph is rst de ned that
represents the re ognised symbols and the geometri relationship between them. Nodes
in the graph, one for ea h symbol, have attributes holding information about the
identity, lexi al lass (letter, digit, et .) and position of symbols in the formulae.
Ar s in the graph represent relative positions of the symbols, above, below, up-right,
down-right, et .
Rules in the grammar are also graphs, typi ally one to ve nodes in size, de ning
sub-graphs that are sear hed for in the graph representing the formula. These subgraphs are typi ally templates for expressions or parts of expressions found in formulae.
As the rules in graph grammars are produ tions, ea h rule also ontains a se ond graph
that is the result of applying the rule: what the rst graph is repla ed with. Finally,
the rule also ontains information on how to transfer the attributes of the nodes of
the rst graph to the repla ement graph. There may also be omponents in the rule
that de ne how to treat the ar s that were onne ted to the subgraph that has been
removed.
These rules are extended by Lavirotte and Pottier to ontain graphs that de ne the
ontexts in whi h it is allowable to apply a parti ular rule. This in reases the e ien y
of the parsing pro ess, as the ontexts de ne extra onditions that must be met before
a rule is applied. The extra onditions remove some ambiguities from the grammar,
and thus redu e the possibility of exploring erroneous derivations due to hoosing the
wrong rule.
As the parsing pro ess works by su essively nding sub-graphs and repla ing them
with smaller graphs, at the end of the parsing pro ess a single node is left representing
the original formula. The original formula an then be determined by examining the
attributes of this remaining node.
Graph rewriting is a very general and powerful tool and has been used for a wide
range of tasks. Graph rewriting has been used in the re ognition and interpretation of
s hemati diagrams (Bunke, 1982), and the des ription and pro essing of \stru tured"
A

30

pi tures, in luding ow harts, organi hemistry mole ules, and images of parti le
traje tories produ ed in physi s experiments.
Blostein reports that graph grammars are tolerant to irregular symbol positioning,
as found in handwritten input. Work by Lavirotte and Pottier has only been on single
typeset formulae and works well. Input to their system is s anned images of printed
LTEX formulae.
Lavirotte and Pottier optimise the graph redu tion pro ess by adding ontext information to the rules in the graph grammar that avoid ambiguities where two or more
rules an be applied in a given situation. These rules are reated semi-automati ally,
also using information su h as operator pre eden e information supplied by the person
who builds the grammar.
As a graph rewriting system is used in this thesis, it is des ribed in more detail in
Chapter 3.
A

2.3.7 Data Driven and Knowledge Driven Modules


Blostein (1996) outlines systems known as a \bla kboard ar hite ture", after the similarity to a number of human experts ommuni ating by writing on a bla kboard. This
approa h takes a number of independent modules that ommuni ate via shared memory. Central to this approa h is the shared memory, appropriate data stru tures to
hold and ex hange the information, and ontrolling logi that arbitrates and dire ts
the a ess to the data.
Ea h module takes information from the bla kboard, pro esses it and then puts
results, possibly with on den e information, ba k onto the bla kboard. This information is then available for other modules to work with.
One of the bene ts of a bla kboard ar hite ture is that it supports multiple, possibly
on i ting, hypotheses and the ability to explore them all. It also allows the easy
integration of new \knowledge sour es" to the system. It also makes it easy to set up
ommuni ation between various modules, so that a high level module, for example a
formula parser, an provide ontextual information to a low-level module, su h as a
hara ter re ogniser.
This approa h is also useful to bring together multiple approa hes, and to be able
to automati ally hoose between them in a given situation.
Although this approa h has a number of advantages, it was not used as part of
the system des ribed by this thesis. Although it would have provided a method for
exploring many di erent solution paths and given the ability for di erent parts of the

31

system to ommuni ate with ea h other, the bla kboard ar hite ture approa h was not
used in order to keep the overall omplexity of this system down.
2.4

Summary

Existing ommer ial formula entry systems are either ommand-string or template
based. While these are reasonably powerful and not too hard to use, they do not
provide easy entry and editing of formulae. The ideal input method is via a pen-based
interfa e, whi h means that the formula has to be parsed either as it is entered, or on e
it is omplete.
Ea h of the existing methods for parsing handwritten or typeset formulae have their
own strengths and weaknesses with respe t to the various issues of formula pro essing. Of the approa hes presented here, a formula pro essor using either a sto hasti
grammar or graph rewriting appears to be the best hoi e for parsing handwritten
formulae.

32

Chapter 3
The Formula Pro essor
The formula pro essor implemented is a graph rewriting system, similar to that des ribed by Lavirotte and Pottier (1997). Using an input of symbols and their bounding
boxes, a graph is onstru ted with nodes representing the symbols in the formula. Ea h
node has data asso iated with it, holding the identity of the symbol, the lo ation of
its bounding box and a unique ID. As parsing pro eeds, nodes will ome to represent
subexpressions within the formula. Dire ted ar s are built in the graph, within predetermined restri tions, signifying the geometri relationships between the symbols in
the formula. Ea h ar has a label that stores what geometri relationship it represents
between the two nodes it onne ts.
By making it possible to build more than one ar between any two given nodes,
un ertain relationships an be represented. For example, to represent the possibility
a symbol may be either to the \top-right" or \right" of another, two ar s are built
between their respe tive nodes.
The restri tions on the building of ar s limit the ar s to \sensible" ones, based on
their types and positions relative to ea h other and to other symbols. This helps the
later parsing pro ess by simplifying the graph built.
After being onstru ted from the symbols in the input formula, the graph is parsed
by a graph grammar parser. The grammar used by the parser des ribes the syntax
and layout of mathemati al formulae through a number of small, typi ally one to ve
node, graphs that are templates for subexpressions that are found in mathemati al
expressions. Figure 3.1 shows some rules from a graph grammar.
In Figure 3.1, we an see boxes representing nodes in the graph. Ea h box has
two parts. The lower part is the \data" omponent, and the upper part is the \type"
omponent. When rules are being mat hed to a graph representing a formula, the
33

Element

Left

Right

Element

Element

A+C

(a) Addition.
Element
B
Element
Top-Right

A^{B}

Element
A

(b) Supers ript.


Element
B

Top-Right
Integral

Right

Element

Right

DXBIT

Element

\int^{B}_{C}DE

Bottom-Right

Element
C

( ) Integral with limits.


Figure 3.1: Sample rules from a graph grammar.

34

\type" values are onsidered. When the rule is being applied, the \data" omponents
di tate the data omponent of the resulting node after appli ation. As nodes are
ombined, a new bounding box is al ulated based on the nodes being ombined.
The supers ript rule in Figure 3.1(b) shows how a supers ript operation is de ned.
Two nodes of type \element", with a \top-right" ar between them are found. If the
rule is applied, these nodes are then ollapsed to a single node. The type of the new
node is \element" and the data value for the new node is onstru ted from the data
value of the original two nodes as shown, the \A" and \B" being the original data
values of the original two nodes.
It an be seen in these rules that not all the data values of the original nodes are
always used. The dis arded data values are typi ally those of the operator symbol. For
example, in the integral rule in Figure 3.1( ), the integral is node \A", however the
integration is represented by the LTEX ommand \int in the nal string.
It is possible to build either an abstra t representation of the formula, su h as a
parse tree, or a LTEX string as is being done here. The parse tree is a mu h more
versatile output format and an be linearised after parsing to generate LTEX or some
other notation. Here, the dire t generation of LTEX is su ient for this system's
purposes. A LISP-like pre x notation is also ommonly used (Anderson, 1968; Martin,
1971; Fateman et al., 1996), and is more suited as an intermediate representation if
you do not want to use LTEX or an internal data stru ture for the parse tree.
As des ribed in Se tion 2.2.4, Anderson (1971) and Bernstein (1971) re ommend
having the formula parser initially produ e a layout des ription of the formula, then
having a later stage of pro essing to determine the a tual formula based on this. As the
system aims to generate LTEX that represents the formula that the user has entered, it
is valid to bypass the intermediate layout representation and generate LTEX dire tly.
However, should it prove ne essary to generate the positional notation, hanging the
grammar to produ e a LISP-like layout des ription would be simple, as it is a matter
of hanging the grammar from produ ing LTEX to produ ing the LISP-like layout
des ription ode.
A

3.1

Formula Pro essor Details

This se tion gives details of the internal workings of the graph rewriting formula pro essor. It overs the de nition of bounding regions, the input to the formula parser,
then how the formula parser builds a graph that represents the formula and parses it.
35

3.1.1 Bounding Regions

Bounding region information for a symbol, or groups of symbols, is important as it is


used in the building of ar s between nodes, as des ribed in Se tion 3.1.4, and de iding
whether or not to apply rules, as des ribed later in Se tion 3.1.6.

(a) Re tangular Regions.

(b) Individual Re tangular Regions.

( ) Smallest Convex Hull.


Figure 3.2: Constru tion of bounding regions.
Figure 3.2 shows three di erent methods for managing bounding regions. Figure 3.2(a) is the approa h used by this system. The original bounding boxes are
the smallest re tangles that en lose the symbols' strokes. When ombined, the new
bounding box is made from the outermost extents of the bounding boxes of the original
symbols. Other approa hes are to keep tra k of the individual bounding boxes that
went into it, as shown in Figure 3.2(b), though this raises the issue of how to deal with
the gaps between the boxes. Figure 3.2( ) illustrates the use of the smallest onvex hull
around the pixels or strokes of the original hara ters, as Miller and Viola do (Miller
and Viola, 1998). When the symbols are ombined, the new bounding region is the
smallest onvex hull en losing all the symbols.
The only disadvantage of using re tangular bounding boxes is that the bounding
36

Figure 3.3: These bounding boxes overlap, although the symbols do not.
box of sloped hara ters is ex essively large and takes in a large amount of empty
spa e. As a result, users an a identally put things overlapping or inside other symbols
without intending to. Figure 3.3 demonstrates this. From the user's point of view, the
hara ters do not overlap, but a system whi h works with the bounding boxes will say
they do. This auses problems when parsing the formula, as the system will interpret
the geometri relationship between the symbols in orre tly. As a result, the formula
will either be misparsed or be ompletely unparsable.
Treating the symbol you are testing against another as a entral point, and seeing
if this point is inside the other's bounding box helps, but the problem an still o ur.
Figure 3.4 shows that the entre point of the 1 is not inside the fra tion bar, but the
entre point of the 3 still is. The use of this approa h has, in spite of this problem,
worked well unless users write ex essively sloped fra tion bars or integral symbols.
It is in situations like this that Miller and Viola's (1998) onvex hull approa h for
bounding regions is mu h better. The omputational omplexity for reating omplex
hulls from a set of n pixels is O(n log n). The union of two onvex hulls an be omputed
in O(l + m), where l and m are the number of verti es in the onvex hulls. The
interse tion of two onvex hulls an be determined in O(l + m) also. Convex hulls
will also at times take in large amounts of empty spa e for large subexpressions in a
formula, as the re tangular regions do, but for single symbols the area they over is
more intuitively what \belongs" to a single symbol. In the ase of a sloped fra tion
37

Figure 3.4: Using entre points instead of bounding boxes for geometri tests. While
the 1 is outside the fra tion bar's bounding box, the 3 is not.
bar, a problem for re tangular bounding regions, the onvex hull approa h is a great
improvement.
This system does not use the onvex hull approa h due to the fa t that it was
dis overed after the system was already using re tangular regions. The re tangular
regions work well enough that hanging the system to use onvex hulls is not urrently
ne essary.
3.1.2 Input to the Formula Pro essor
The input to the formula pro essor is a set of tuples. There is one tuple per symbol that
the system has re ognised, ea h tuple holding the symbol's identity and its bounding
box information.
For example, the formula shown in Figure 3.5 is en oded as:
integral 14 15 37 127
x 63 42 25 39
d 150 54 20 56
x 171 73 31 35
2 92 16 24 21
- 61 85 70 8
4 86 100 28 43

38

Figure 3.5: A simple formula.


Note that these do not have to be sorted in a \logi al" input order, as order information is not used by a graph rewriting parser.
3.1.3 Building the Initial Graph
The tuples are used to build a graph en oding their information. A node is reated for
ea h symbol in the formula, and the data and type attributes for ea h node are both
initially set to the symbol's identity. These may be hanged in the later prepro essing
step whi h will perform an initial ategorisation of the symbols and will group basi
olle tions of symbols su h as multi-digit numbers, into a single node.
Figure 3.6(a) shows the initial graph built, prior to the prepro essing and initial
building of ar s. The nodes are represented by re tangles. The position and size of
ea h re tangle onveys the position and size information en oded within the node. The
top label in ea h node shows the \data" value for the node, what the node a tually is,
and the lower label is the \type" value, whi h holds the node's lexi al lass.
3.1.4 Building the Ar s
Dire ted labelled ar s are added to the graph, representing the geometri relationship
between the nodes they link. Instead of making a omplete graph, with ar s between
every pair of nodes, it is restri ted so that only \reasonable" ar s are put in. The
onstraint on having only \reasonable" ar s simpli es the resulting graph. The fewer
ar s in a graph, the less likely it is to represent multiple formulae. For example, if a 2 is
linked so that a single x node is both to it's \right" and \top-right", it simultaneously
represents the formula 2x and 2x. On the other hand, too few ar s and the graph will

39

(a) Initial graph.

(b) Initial graph with ar s built.

( ) After the prepro essing step.

(d) After applying the \supers ript" rule.

(e) After applying the \fra tion" rule.

(f) The nal graph, after applying the


\integral" rule.
Figure 3.6: Prepro essing and parsing of a simple formula.

40

not represent a valid formula at all, either be ause it is not onne ted or an essential ar
is missing. Depending on the graph mat hing algorithm used by the graph-rewriting
parser, having fewer edges in the graph may also result in a speed in rease.
Every time the graph is hanged, in later prepro essing and parsing steps, the ar s
are rebuilt. While graph grammars an have rules that spe ify how ar s are handled
as nodes are repla ed and added, for a formula pro essing appli ation the ar s have
to be rebuilt every time the graph hanges. This is be ause you need to he k if ar s
that may not have originally been in the graph have to be added. For example, in the
formula:
x

there would be no link between the original x and nodes, as the ar onstru tion
type he k, des ribed below, does not allow the expression x . After ollapsing the
and y to make a single y node, this node has to be linked to the x for it to ontinue
parsing. Without globally re omputing all the ar s in the graph, it is not possible to
automati ally determine whether or not new links su h as this should be added.
As ea h ar represents a geometri relationship between the two nodes it onne ts,
various he ks are made to on rm that the link makes sense. There are three tests,
related to the types, the geometri relationship between them, and whether or not
there is anything else between them.
Type Che k

The grammar designer an spe ify parti ular types of nodes that are not permitted to
have parti ular types of ar s going into, or oming out of them.
For example, an integral sign neverRhas anything to its top-left, so the graph builder
is told never to build ar s so that a \ " has something to the \top-left" of it. A \+"
an never be found to the bottom left of something, so \bottom-left" ar s ending at a
\+" are not built.
Currently, these restri tions are read in from an external data le. Ideally these
rules should be automati ally determined, at least in part, by analysing the grammar.
Geometri Test

A geometri he k is made to ensure that the symbols or subexpressions that nodes


represent are a tually arranged the way that an ar between these nodes implies they
are.
41

11111111
00000000
00000000
11111111
00000000
11111111
00000000
11111111
Top-Left

111111
000000
000000
111111
000000
111111
000000
111111
Left

11111
00000
00000
11111
00000
11111
00000
11111
Top

Bottom-Left

Top-Right

11111
00000
00000
11111
00000
11111
00000
11111
Inside

1111111
0000000
0000000
1111111
0000000
1111111

1111111
0000000
0000000
1111111
0000000
1111111
0000000
1111111

11111
00000
00000
11111
00000
11111
00000
11111
Bottom

111111
000000
000000
111111
000000
111111
000000
111111
Right

1111111
0000000
0000000
1111111
0000000
1111111
0000000
1111111

Bottom-Right

Figure 3.7: Regions used by the geometri he k.

42

Figure 3.7 shows the regions used by the geometri he k. There are nine regions,
in luding an \inside" region, all de ned relative to the rst symbol's bounding box.
Note that neighbouring regions, for example the \top-right" and \right" regions, overlap. This helps deal with problems o urring when people a identally write hara ters
too lose to one another and end up with overlapping bounding boxes. It also allows
some lenien y in the pla ement of hara ters.
Top-Right Region

A
B

Figure 3.8: Is \A" to the top-right of \B"? Yes.


Before building an ar between two nodes A and B that would en ode relationship
x, it is tested to see whether or not symbol A is a tually geometri ally x of B . The
entre point of A's bounding box is tested against the regions de ned in terms of the
dimensions of B 's bounding box. Figure 3.8 illustrates the he k to see if A is to the
top-right of B . This approa h of using re tangular regions is also used by Littin (1995).
Symbols have to be su iently lose to one another to be linked. The impa t of
this on the user is that symbols in their formulae annot be spread out too mu h.
In reasing the maximum range means that the user an spa e things out more, but
the graph builder ends up putting more false onne tions between nodes, whi h then
results in longer and more erroneous parsing.
There are a variety of ways of dividing up the input area around ea h hara ter to
determine whi h areas are \to the right" and \above", et . An alternative approa h to
the re tangular subdivision of the input area is to test based on the angle that the line
between the entre points of the items on erned makes with horizontal. For example,
an item is to the \right" of another if the angle is between 22:5 and 22:5.
This works well for individual symbols, but fails for groups of symbols su h as
subexpressions in a formula. Figure 3.9 illustrates this. In Figure 3.9(a) B is in the
\top-right" region relative to A. However, if the length of the supers ript B grows, as
in Figure 3.9(b), its entre point moves into the \right" region. Using a re tangular
subdivision, as this system does, or other shaped regions, su h as that used by Lavirotte
43

B
A

B
A

(a)

(b)

Figure 3.9: As the length of the exponent grows, it moves from the \top-right" to the
\right" region.
and Pottier (1998) avoids this problem.
1111
0000
0000
1111
0000
1111
0000
1111
0000
1111

Figure 3.10: The shape of the \right" region that Lavirotte and Pottier use.
Figure 3.10 shows the region Lavirotte and Pottier use for \right". The shape of
this region is based on the fa t that most mathemati al notations, while 2D, are based
on normal reading, from left to right. As a result, symbols whi h have a \top-" or
\bottom-" aspe t to them are typi ally lose to the parent symbol, while horizontal
links (su h as \left" and \right") an be far o . This is a good idea as it means that
fewer erroneous links will be built. The idea behind this te hnique has been used by this
system by making the height of the geometri regions involving a \top" or \bottom"
omponent less than that for \left" and \right".
Overlap Che k

Ar s are not built between two nodes if any other bounding regions, belonging to other
symbols or subexpressions, are rossed by a line running between the entre points of
the symbols or subexpressions being linked. Within the subset of mathemati al formulae this system an urrently pro ess and to the limit of my mathemati al knowledge,
this will not restri t the system in any way.
Figure 3.11 shows a situation where an ar is not built. There is no link built
44

Not Linked

Figure 3.11: No ar is built between symbols that have other symbols between them.
between the 2 and the 4 be ause the +'s bounding box is on the line between the
entres of the 2 and 4.
This pro ess for building the graph works well, though possibly building too many
ar s in the graph due to too mu h lenien y in the geometri test. Unfortunately, this
is a trade o between having too many ar s but being able to parse sloppy formula,
and between having a more modest number of links, but requiring the input to be neat
and losely spa ed.
The omplexity of building all the ar s in the graph varies between O(n ) and
O(n ), where n is the number of nodes in the graph. The variability is due to the fa t
that, if the initial O(1) type and geometri tests fail, then the ar building algorithm
an rule out having to arry out the O(n) overlap test.
Continuing the earlier example, Figure 3.6(b) shows the initial graph for the formula
with the ar s built. Note, for example, that there is no link between the \x" and \4".
Although the 4 is a subs ript position relative to the x, no link is built be ause it would
pass over the fra tion bar.
2

3.1.5 Initial Graph Prepro essing

The next step is the prepro essing step. After observing that there are a subset of rules
in the grammar that an be applied rst, then ignored for the rest of the pro essing,
a prepro essing grammar was made. This approa h is not new, also being used by
Anderson (1968). The steps in the prepro essing grammar ould be in luded in the
main grammar, if desired, though possibly at the expense of parsing time.
The result of applying all the rules in the prepro essing grammar, whi h do basi
45

ategorisation of symbols and olle t up multi-symbol items su h as numbers, is shown


in Figure 3.6( ).
3.1.6 Main Pro essing
The main parsing o urs on e the initial graph has been reated and prepro essed.
The graph grammar is a set of rules whi h are templates des ribing the way various
mathemati al onstru ts are made.
The system he ks all the rules in the grammar against the urrent formula graph,
trying to nd one that mat hes a subgraph in the urrent version of the formula's
graph. If it is unable to nd a rule that mat hes, it ba ktra ks the parsing pro ess
to an earlier point that had more than one mat h and pro eeds from there with an
alternative hoi e. The ba ktra king is ontrolled by a priority queue using a simple
heuristi . The heuristi s will hoose the graph with the smallest number of nodes. In
the ase of a tie, it hooses the sear h that has applied the largest number of rules so
far.
The rst rule that mat hes in this example is the \supers ript rule", shown previously in Figure 3.1(b). This says that an \element" with another \element" to the
\top-right" is a supers ript. The A and B nodes are ollapsed into a single node, their
position and bounding box reated from the two original nodes. Figure 3.6(d) shows
the new graph. The data value for the new node is reated by taking the data values of the original nodes, represented by A and B , and inserting them into the string
\A^fB g".
The method used for sear hing for su h a mat hing subgraph within the main graph
is a brute for e approa h and is the main bottlene k of the parsing pro ess. For every
rule in the graph grammar, a sear h is done in urrent graph to see if it exists. To
nd a n node rule graph in an g node formula graph, all n node sub-graphs of the
graph are generated, testing to see if any mat h the rule graph. The number of n node
sub-graphs of a g node graph is Cgn = n ng g . This sear h is repeated for every rule
in the grammar. So, as the size of the formula graph grows, or the size and number of
the rules in the grammar grow, the sear hing takes longer.
Bunke and Messmer (1997) des ribe a method for speeding up attributed graph
mat hing. This works by doing an initial pre-pro ess of the rule graphs in the grammar
and nding ommon substru tures in them. They end up with a tree stru ture that
des ribes how to build all the rule graphs progressively, starting from all the initial
types of nodes.
!

)! !

46

The main graph is then sear hed for the rule graphs by, for ea h node in the graph,
taking a node then testing all the rule graphs that ould be built up starting with that
type of node. This testing involves seeing, as the rule graphs are progressively built,
whether they mat h the formula graph. Should they not mat h, the saving is made
due to the fa t that all the rule graphs that shared that non-mat hing sub-part are
ruled out.
This te hnique was not used sin e, at the time the graph sear hing was implemented,
it was not lear whether or not optimisations would be ne essary. As this was only
a prototype system, keeping the omplexity of the system to a minimum was also
desirable.
The next step applies the \fra tion" rule. Figure 3.6(e) shows the new graph.
Finally, the integral rule, shown in Figure 3.1( ), is applied. This gives the nal graph,
shown in Figure 3.6(e). From this point a number of produ tions are applied that
hange the node's data type from \Element" to \Formula", the parser's goal. It an
be seen that as the parsing pro eeds, the \data" value inside the nodes builds up to
the nal, in this ase LTEX, representation of the formula.
If the parser gets to a point where it is unable to mat h any of the rules in the
grammar to the urrent graph, it ba ktra ks to an earlier point and tries alternative
produ tions. Be ause of the urrent implementation of the graph pro essor, not having
ontextual information as Lavirotte and Pottier do, there are numerous ases where
more than one rule an be applied to a given graph. These alternatives are tra ked
using a priority queue that prioritises based on the number of nodes that are in the
graph and the urrent parse depth.
In determining the order in whi h to apply rules, a priority system an be used where
ea h rule has a priority, either impli itly de ned by its position in the grammar, or with
an integer asso iated with ea h rule (Pottier, 1995). A grammar an be onstru ted so
that only a maximum of one rule in the grammar an be applied at a time. Lavirotte
and Pottier (1997) take this latter approa h, des ribing a semi-automati method for
turning an ambiguous grammar to an unambiguous one by adding \ ontext rules" that
des ribe the onditions whi h must be true before a ertain rule is applied. The system
urrently uses an impli it priority, based on the rule's position in the grammar, whi h
means at times erroneous produ tions are made and the parser has to ba ktra k.
The only optimisation in the otherwise brute for e graph mat hing is that before
sear hing the graph to see if a rule mat hes, an initial test is done if all the node types
required by the rule do exist somewhere within the graph.
A

47

Due to the impli it ordering of rules, a \perfe t" parse almost always hooses the
rst rule that mat hes from the grammar every time. After nding that the slowest
part of the parsing pro ess was he king the rules in the grammar against the urrent
formula graph, the system always initially follows the rst mat h found in the grammar.
When this approa h nally runs out of hoi es, the system then ba ktra ks and looks
for further mat hes.
The \Nothing Inside" Test

Before the system applies a rule from the grammar, an additional test is made to see
if the appli ation of the rule would mean that other nodes end up inside the nodes
reated as a result of the produ tion.
As long as people do not write formulae with overlapping symbols, this does not
restri t the ability of the system to parse mathemati al formulae. This approa h was
independently developed by Miller and Viola (1998) who use the same ondition, but
testing with onvex hulls, to limit the appli ation of rules in their sto hasti grammar.

Figure 3.12: The \nothing inside" test. If a grammar rule ollapsed the to a single
node, the entre point of the 3 would end up inside its new bounding region. Be ause
of this, the appli ation of the rule is not permitted.
2
4

This test is useful be ause it redu es the number of rules that an be applied in
a given situation, but does not limit the ability of the system to parse mathemati al
formulae. In situations like that shown in Figure 3.12, it is possible to apply a fra tion
rule that ollapses the part, leaving the 3 behind. If the were to be ollapsed, the
3 would end up inside the new bounding box. Noting this, we an avoid applying the
rule.
A problem arising from the \nothing inside" test, is shown in Figure 3.13. A fra tion
has been written so that the fra tion barR overlaps the integral sign. When de iding
whether or not to ollapse the integral x dx, the denominator of the fra tion in
2

48

Figure 3.13: A problem with the no-inside restri tion. Collapsing the integral is forbidden due to the fra tion bar ending up inside it.
Figure 3.13, the bounding box that would be reated is found to in lude the entre
point of the fra tion bar, so the appli ation of the rule is an elled. This means that
the integral will never be ollapsed, making the formula unparsable.
Fortunately, as people tend to avoid overlapping symbols as they write, the problem
of symbols overlapping ea h other like this is not ommon.
3.1.7 Parser Implementation

The graph rewriting parser uses two main data stru tures: a labelled dire ted graph
and a priority queue. A graph is represented within the system using:

 an adja en y matrix, that allows for qui k adja en y tests.


 a olle tion of nodes, ea h having a number of attributes that store the identity
of the node, its lexi al lass, and the position of its bounding box on the input
area.
 a olle tion of ar s whi h store, in addition to whi h nodes they link, information
about geometri relationship between these nodes.
The priority queue is implemented as a doubly linked list, the items in the list
ordered by their s ore. Ea h item in the queue holds a opy of the formula graph,
49

along with additional information about the state of the formula pro essing at that
point.
3.2

Formula Straightening

Ex essively sloped formulae an be di ult to pro ess reliably, no matter how good the
underlying formula pro essor is. This is be ause the geometri relationships be ome
ambiguous.
Possible Baseline

Figure 3.14: A sloped or skewed formula an be di ult to pro ess reliably.


Figure 3.14 shows a skewed formula. While the author perhaps intended it to be
3x y + 4, it is most likely to be parsed as 3x y .
In sloped formulae, the positions of the symbols are most ommonly skewed, but the
symbols themselves are not rotated. A brief investigation into the automati straightening of formulae was arried out. Least squares linear regression and Hough transforms
were used to attempt to determine the baseline of symbols entered. From the baseline determined, formulae were then skewed to bring the baseline ba k to horizontal.
While it was o asionally helpful and showed initial promise, it had trouble with short
formulae, and formulae with exponentials or subs ripts. For example, if the formula
shown in Figure 3.14 was meant to be 3x y , the slope orre tion would \ orre t" it
to 3x y + 4.
These ambiguities an be di ult to resolve in general, but attributes su h as the
relative size of symbols an be used. For example: knowing that exponents are typi ally
smaller than the base symbol, top-right ar s will be only be built if the target node is
smaller than the parent node. Care has to be taken, however, if the target node is a
subexpression in a formula.
+4

+4

50

3.3

Summary

This hapter des ribed a basi graph rewriting formula parser. It works well for small
formulae, but slows down signi antly as the size of the formula or grammar grow.
Any future development of the system should in lude more advan ed te hniques for
graph mat hing and take advantage of ontextual information.
The graph rewriting parser is able to pro ess formulae well, and is easy to experiment with: adding and removing rules to the grammar for di erent mathemati al
onstru ts is simple.

51

52

Chapter 4
The Interfa e
A prototype interfa e was built in order to test the hypothesis that a pen-based formula
entry system was a viable alternative to existing systems, and that a graph-grammar
based parser would be able to parse handwritten formula well. In the development
of this interfa e a number of new interfa e on epts for the orre tion of hara ter
re ognition errors were reated.
4.1

Aspe ts of User Interfa e Design

There are a large number of publi ations (Mi rosoft, 1995; Apple Computer, In ., 1987;
S huma her Jr., 1992; Smith and Mosier, 1986; Shneiderman, 1992) that dis uss how
to design a good user interfa e, some o ering \standard" interfa e design guidelines.
Two on lusions may be obtained from these publi ations. Firstly, the interfa e
should be onsistent with other appli ations on that platform. Se ondly, when a user
is required to perform a task, it should be onsistent with their omputing experien e.
For evaluating the e e tiveness and quality of user interfa e designs, Nielsen (1994)
has reated a list of ten usability guidelines, whi h summarise most design guidelines.

 Visibility of system status. The system should always keep users informed about
what is going on, through appropriate feedba k within reasonable time.
 Mat h between system and the real world. The system should speak the user's
language, with words, phrases and on epts familiar to the user, rather than
system-oriented terms. Follow real-world onventions, making information appear in a natural and logi al order.
53

 User ontrol and freedom. Users often hoose system fun tions by mistake and
will need a learly marked \emergen y exit" to leave the unwanted state without
having to go through an extended dialogue. Support undo and redo.
 Consisten y and standards. Users should not have to wonder whether di erent
words, situations, or a tions mean the same thing. Follow platform onventions.
 Re ognition rather than re all. Make obje ts, a tions and options visible. The
user should not have to remember information from one part of the dialogue to
another. Instru tions for use of the system should be visible or easily retrievable
whenever appropriate.
 Flexibility and e ien y of use. A elerators, unseen by the novi e user, may
often speed up the intera tion for the expert user su h that the system an ater to
both inexperien ed and experien ed users. Allow users to tailor frequent a tions.
 Aestheti and minimalist design. Dialogues should not ontain information whi h
is irrelevant or rarely needed. Every extra unit of information in a dialogue
ompetes with the relevant units of information and diminishes their relative
visibility.
 Help users re ognise, diagnose, and re over from errors. Error messages should
be expressed in plain language (no odes), pre isely indi ate the problem, and
onstru tively suggest a solution.
 Error prevention. Even better than good error messages is a areful design whi h
prevents a problem from o urring in the rst pla e.
 Help and do umentation. Even though it is better if the system an be used
without do umentation, it may be ne essary to provide help and do umentation.
Any su h information should be easy to sear h, fo used on the user's task, list
on rete steps to be arried out, and not be too large.
The use of olour in a user interfa e is also an important onsideration. While the
use of olour an provide a lot of useful feedba k to the user, the designer must onsider
the fa t that mono hromati displays are still used, as well as the existen e of problems
su h as olour-blindness. 1 in 11 males and 1 in 300 females are olour blind. 1 in 3
million people have omplete olour blindness.
54

4.2

Pen Based Computing

The interfa e is designed primarily for pen input, however a mouse an still be used as
all operations use at most one button. This single button operation is a hieved with
the pen by pressing it against the tablet.
While some pens do o er additional buttons on their barrel or at the top end
of the pen, it an be in onvenient or di ult to use these with pre ision. Thus, all
fun tions in addition to basi drawing of strokes, su h as editing operations, should
either be onveyed to the system through the use of menus, toolbars, keystrokes, or
ideally spe ialised gestures.
There are a number of \standard" editing gestures used in pen based omputing. Books su h as Mi rosoft's \The Windows Interfa e Guidelines for Software Design" (Mi rosoft, 1995) des ribe a set of suggested gestures to use. Some of these
gestures are based on \traditional" proofreading marks, or reasonable abbreviations.
For example, as shown in Figure 4.1, a ir led \u" means undo.

Figure 4.1: A suggested undo gesture.


With a system based on a hara ter re ogniser, it is inevitable that re ognition errors will o ur as the user writes their formulae, no matter how advan ed the re ogniser
is. Humans often have trouble reading ea h other's writing, and, at times, people are
even unable to read their own. Thus, a simple method for orre ting errors needs to
be provided.
4.3

A Pen Based Formula Entry System

A omplete user interfa e for equation editing has been developed, using the re ognition and parsing modules des ribed earlier. The interfa e allows handwritten entry of
mathemati al formulae, orre tion of errors made in the automati interpretation of
what the user enters, and basi equation editing.
55

Figure 4.2: A formula that has been entered into, and parsed by, the system.

56

Formulae an then be parsed to determine the LTEX ommand string to generate


them, and the result is automati ally passed through external tools to generate a
preview. Figure 4.2 shows a s reenshot of the system. The formula has been written in
by the user, and parsed by the system whi h then presents a typeset result. The user
is able to opy the LTEX ode from the entry area at the top of the preview window
and insert it into their LTEX do ument.
The remainder of this hapter des ribes the new system reated. The user interfa e
ontrolling routines are written in T l/Tk. The parser, hara ter re ogniser, and stroke
grouping routines are written in C and C++. All implementation was done on a
180MHz Intel Pentium Pro based system, running X-Windows under Linux.
Ea h of the elements used to make the nal system would be greatly improved if they
were able to take advantage of ontextual information. Using ontextual information
has been a fo us of resear h in formula re ognition (Anderson, 1968; Lavirotte and
Pottier, 1997; Miller and Viola, 1998). However, even with higher level information
and improvements in hara ter re ognition systems, errors are still possible. Knowing
this, designing an interfa e that simpli es the orre tion of su h errors is important.
A

4.3.1 The Chara ter Re ogniser


The underlying hara ter re ogniser used by this system is the one used by Smithies,
Novins, and Arvo (1999). Symbols are en oded as olle tions of polylines representing
individual user-drawn strokes. The re ogniser uses an extremely fast on-line re ognition
algorithm based on nearest-neighbour lassi ation in a feature spa e of approximately
50 dimensions. Rubine (1991) and Avitzur (1992) both use a similar feature-based
strategy.
To train the hara ter re ogniser, the user supplies ten to twenty handwritten samples of ea h hara ter. These samples are stored and used by the re ogniser to re ognise
the input hara ters.
Although the re ogniser is theoreti ally user-dependent, the system is relatively
user-independent in pra ti e. For example, even though the re ogniser was trained
using samples supplied by just two people, others had little di ulty in using the
system.
For ea h group of strokes passed to the hara ter re ogniser, the top n interpretations of those strokes are returned, along with a on den e for ea h interpretation.
This on den e information is important as it is used by the stroke grouping method,
des ribed in Se tion 4.3.3.

57

To get higher re ognition rates from the hara ter re ogniser, and thus improve the
performan e of the stroke grouping, more versatile lassi ers, su h as neural nets (Yaeger,
Webb and Lyon, 1996), and perhaps the use of ontextual information from later pro essing stages, as des ribed by Miller and Viola (1998) ould be used.
Virtually any hara ter re ognition module an be in orporated into this system.
The only requirement imposed by the system on the re ognition module is that it must
be apable of ranking the n most likely andidates for a single pattern by a numeri al
measure of on den e, and that the on den e measures of di erent patterns must be
dire tly omparable.
4.3.2 Basi Input

Upon startup, the program is in \draw mode" whi h permits the user to enter strokes
into the system by drawing with the pen on the drawing tablet. As the user writes,
the system automati ally interprets their strokes, after waiting for the user to get a
number of strokes ahead before it begins pro essing. Sin e the pro essing, des ribed in
Se tion 4.3.4, annotates the user's input, this delay helps avoid any potential distra tion
for the user, and also ensures that the pro essing will not interfere with the symbol
that the user is urrently drawing.
Pro essing also automati ally begins after a user de nable period of ina tivity,
defaulting to one se ond. Thus, the system will \noti e" that the user has nished entering their formula, and automati ally at h up re ognition of all outstanding strokes.
Alternatively, the user an tap the pen on the tablet or hoose a menu option for the
same e e t.
Other basi editing operations su h as sele tion, moving and utting are provided
as well, through a sele t and move mode. This lets the user drag a box around a region
ontaining the hara ters that they wish to be sele ted. The ontents of the sele tion
an then be dragged around the s reen at will, or deleted.
The user interfa e supports multi-level undo and redo, and allows for the loading,
saving and printing of formulae that the user has entered.
4.3.3 Stroke Segmentation

As the hara ter re ogniser works on a symbol by symbol basis, the stream of strokes
provided by the user drawing with the pen on the tablet must be divided up into
separate symbols.
58

There are a number of approa hes that an be used for stroke segmentation. These
are dis ussed below.
Pauses Between Symbols

The user entering symbols is required to wait for a brief period, for example 500
millise onds, between ea h symbol that they write. This delay allows the system to
tell when the user has nished ea h symbol.
This is appealing be ause it is easy to implement, and o ers a high a ura y of
determining where ea h symbol ends. Unfortunately, it is frustrating for a user to
write like this as it is not natural. Being for ed to on entrate on writing slowly also
takes away from the user's ability to on entrate on the formula itself.
Finer Timing Information

A small investigation was made to see if the delay between strokes of separate symbols
was longer than that of the delay between strokes in the same symbol. This investigation was not in depth, and only provides an indi ation of the possible su ess of su h
an approa h. The idea was that strokes within symbols would be drawn with less delay
than strokes of separate symbols.
The two graphs in Figure 4.3 show the delays between strokes as the letters of the
alphabet were swiftly written out from a to z. The labels on the x-axis show whi h
strokes the delay was between. The label \a b" means that this is the delay between
the letters a and b. Labels su h as \x2" indi ate the se ond stroke for the symbol \x".
The lighter shaded bars indi ate the eight strokes whi h are se ond strokes for single
symbols.
If the assumption that shorter delays imply that two strokes belong to the same
symbol is orre t, then this should be evident in Figure 4.3(a) that shows the sorted
delay data. The assumption holds true apart from the o p, n o, and i2 j delays. If
a threshold is to be determined, there is a small di eren e in the duration of the delay
between the last of the gray bars, and the beginning of the solid bars: the di eren e
between the k k2 and d2 e delays is only seven millise onds.
Looking at Figure 4.3(b), where the data is in its original entry order, suggests that
it may also be possible to determine whi h strokes belong to the same hara ter by
noti ing that they fall in troughs. Still, this is not reliable as the o p and q r delays
also ome into this ategory, and the p p2 delay whi h should, does not.
59

f-f2

n-o
f2-g

t-t2

q-r
p2-q

f2-g
c-d
r-s

b-c

t-t2
s-t

v-w
y-z
t2-u

u-v
u-v

a-b
v-w

p2-q
w-x

x2-y
x-x2

60

x2-y

l-m
-a
y-z

t2-u

Sorted Delay Between Strokes For Writing The Alphabet

w-x
p-p2

Delay Between Strokes For Writing The Alphabet

Between Strokes

(b) In order of entry.

n-o

400

j-j2

350

i-i2
e-f

300

d2-e

250

f-f2
p-p2

200

c-d

d-d2

150

d-d2

100

o-p
b-c

Figure 4.3: Delays between strokes for writing the alphabet.

a-b

50

r-s
o-p

0
g-h

(a) Sorted by delay.

Between strokes

k2-l
m-n

400

q-r
l-m

350

m-n
k2-l

300

h-i
j2-k

x-x2
-a

j2-k
k-k2

250

e-f
j-j2

200

d2-e
i-i2

150

s-t
i2-j

100

k-k2
h-i

50

i2-j
g-h

Delay (milliseconds)
Delay (milliseconds)

Timing information on its own does not appear to be su ient for segmenting
strokes into symbols. It is not possible to onsistantly determine where strokes should
be grouped together or separated. For su h a s heme to work, variations due to the
user pausing or slowing down as they write would also have to be ompensated for, and
thresholds would have to be automati ally determined as the strokes are progressively
entered by the user.
Overlapping Strokes

Looking at the geometri relationship between strokes an yield some useful information. If two or more strokes overlap, it an almost be said with ertainty that they
belong to the same symbol. The doubt is only aused by the fa t that a sloppy writer
may a identally overlap adja ent symbols. From my experien e, most people writing
with a pen and tablet are not in lined to overlap symbols, so this is not a problem.
We an make the assumption that any strokes that tou h other strokes all belong
to the same symbol. Typi ally, if a person is writing with reasonable are then this is
not a problem.
Mathemati al expressions typi ally onsist of symbols taken from the set of Arabi
numerals, Roman and Greek alphabets, along with other mis ellaneous symbols and
notations. Nearly all of these are drawn with overlapping strokes apart from a small
minority, for example: i, j , %, !, =, ) and . In these ases, the overlapping strokes
approa h is going to fail.
Pro edural Code to Group Strokes

If the distan e between two strokes is less than some threshold, or if they satisfy some
other riteria, we an onsider them to be the same hara ter. For example, if there
are two \short" \horizontal" lines within y pixels of ea h other, then ombine them to
make an \=". Similar rules ould be written for other un onne ted hara ters.
Unfortunately this means that for every un onne ted hara ter su h a rule has to
be written, whi h limits the system's hara ter set and makes it harder for an end user
to extend.
It is also possible to have a generi rule that says that strokes that are approximately
horizontally aligned, but verti ally near to ea h other, are part of the same symbol.
This handles ases su h as i, j , and =, but symbols su h as  and ) need a di erent
approa h.
61

Combined Stroke Grouping and Formula Parsing

Let the hara ter re ogniser re ognise an \i" as a \." and an \" (the dot and the \i"
without the dot). These hara ters an then be pie ed ba k together by a prepro essing
stage (Fateman et al., 1996) or the parser (Pottier, 1995) at some later point. This is
similar to using pro edural ode for grouping the strokes, but it is the prepro essor or
parser that is making the de isions. If it is done by the parser, it allows the system to
ba ktra k and orre t any mistakes it might make.
The Approa h Used by This System

Most of the methods onsidered for segmenting strokes into symbols did not appeal as
they either relied on heuristi s, or for ed the user to have a ertain behaviour.
If the hara ter re ogniser is able to return on den e information for interpretations of strokes, it is possible to automati ally test di erent ombinations of strokes
and pi k the best. This is a new approa h, although is similar to that used by Yaeger,
Webb and Lyon (1996) whi h wasn't seen until afterwards.
The approa h that the system uses is to ombine overlapping strokes into indivisible
units, then use the hara ter re ogniser to test groupings of strokes and these indivisible
units. This solution works well, only limited by the strength of the underlying hara ter
re ogniser. While it is less reliable than other stroke segmentation methods, su h as
the user pausing between hara ters, it does allow a mu h more natural and uid entry
of symbols into the system.
The pro ess makes two assumptions: strokes that ross belong to the same hara ter
and all the strokes that belong to the one hara ter will be drawn before the user moves
onto the next. In other words, all i's must be dotted and all t's rossed before the next
symbol is drawn. From asual observation of people writing with pen and paper, and
from observation of people using this system, neither of these assumptions interfere
with people's writing: most people tend to write like this anyway.
The pro ess used to determine how to segment the strokes supplied by the user
works as follows:
 Determine the maximum number of strokes that a symbol an have. This an be
determined by analysing the hara ter data set used by the hara ter re ogniser.
Call this maximum number m.

 Wait until the user has entered 2m strokes, this way there will be at least one
fully ompleted symbol to re ognise.
62

 Colle t together all strokes that ross or tou h, and all ea h of these strokes a
single \unit". For all the remaining strokes, put them all into a unit also, ea h
of these units having one stroke in ea h. There will be k, k  2m units.
 Generate all possible groupings up the rst min(k; m) of these units, and assign
a on den e level to ea h. Ea h grouping orresponds to a possible grouping of
strokes into symbols. The hara ter re ogniser is used to re ognise the symbols in
ea h grouping and return a on den e level for ea h symbol. The on den e level
for a given group orresponds to the lowest on den e level a ross the symbols
re ognised in ea h group. This te hnique, of the on den e level of a group as a
whole being equal to the lowest on den e within it, is often applied in expert
systems (Turban, 1992).
Two other methods investigated for determining the on den e of the overall
group were:
to use a produ t of the on den es of ea h symbol within a group. This
unfairly penalises groups with more symbols.
{ to use the average on den e of the symbols in a group. Although this
worked, it an boost what should intuitively be a low s oring group. For example, if a symbol has a on den e of near zero, other symbols ontributing
to the average an pull up the overall s ore.

The relative on den e s ores for two sequen es of on den e levels using these
three di eren e methods is illustrated in the following table:
Symbol Con den es Minimum Average Produ t
0.8 0.3 0.1
0.1
0.4
0.024
0.8 0.3 0.1 0.8
0.1
0.5
0.0192
0.8 0.0 0.1
0.0
0.3
0.0
 Of all the groups generated, sele t the group with the highest on den e and
return information on the group sele ted to the user interfa e. This information
in ludes the stroke groupings and what ea h group of strokes was re ognised as.
The system also in ludes alternative re ognitions for ea h hara ter for use in
modify hara ters mode, des ribed in Se tion 4.3.6.
With the urrent hara ter training data, k = 4 so all groupings of 1, 2, 3, and 4
units are generated and tested. Due to the assumption that all strokes belonging to
63

the same symbol are drawn in order, we an avoid a ombinatorial explosion. The total
number of ombinations of up to the rst k units is 2k 1.
One Unit

Two Units

Three Units

3
Four Units
1

Figure 4.4: All groupings for four units.


Figure 4.4 illustrates the new method developed for generating all the groupings of
up to four units. To generate all groupings of n units, the tree is built starting with
a parent node of n 1's. The hildren of a node are then reated by the addition of
all adja ent pairs of boxes. Children that already exist elsewhere in the graph are not
generated. The hildren that are not generated are represented by dotted boxes.
These groups indi ate the groupings of strokes to try. For example, the group \1 2
1" means: \Take the rst unit on its own, then the next two units together, then the
last unit on its own."
4.3.4 Online Annotation
As the user writes, the system runs a ba kground thread whi h applies the above stroke
grouping te hnique and re ognises the symbols that the user has written.
As mentioned in Se tion 4.3.2, there is a delay between the user's writing and the
system's pro essing of their strokes. The number of strokes ahead that the system

64

lets the user pro eed is automati ally determined by analysing the data used by the
hara ter re ogniser. The limit is set to twi e the number of strokes in the hara ter
with the largest number of strokes. With the urrent set of data for the hara ter
re ogniser, the lag is eight strokes.
As hara ters are re ognised, the system pla es a shaded bounding box over them
and annotates the box with the hara ter that the system has determined it represents.

Figure 4.5: A user beginning to enter a formula. The rst three hara ters have been
re ognised, and the remaining two are still waiting to be re ognised.
Figure 4.5 shows a s reen apture of the drawing area of the program as a user
is beginning to enter a formula. The rst three symbols have been re ognised by the
system and their bounding boxes are marked and annotated with the system's urrent
interpretation. As a hara ter is re ognised, the olour of its strokes are hanged to
indi ate that the re ognition has taken pla e.
4.3.5 Stroke Regrouping
The strokes entered by the user are initially grouped by the stroke grouping algorithm,
des ribed in Se tion 4.3.3. The algorithm tries all possible grouping of strokes, ranking
ea h based on on den es returned by the hara ter re ogniser. As the su ess of this
algorithm relies on the a ura y of the hara ter re ogniser, it is possible that strokes
will be in orre tly grouped.
From a study of new users using the system with a hara ter re ogniser trained to
a style of writing similar to theirs, 13% of the hara ters they wrote were misgrouped
by the automati stroke grouping pro ess. This does improve if the user has trained
the hara ter re ogniser.
A simple and e e tive method is required to x grouping errors. There are two
possible situations that the user has to orre t:

65

 Strokes that should be re ognised as a single hara ter are grouped as parts of
separate hara ters, or
 Strokes that should be re ognised as part of separate hara ters are grouped into
a ommon hara ter.
The user an orre t both of these types of errors by entering modify stroke groups
mode. In this mode, drawing with the pen will temporarily mark out a line. Upon
nishing the line, any strokes that were tou hed by that line are for ed into a group
of their own, possibly ausing a regrouping of other strokes. The temporary line then
disappears, and the system automati ally reruns the hara ter re ogniser on all a e ted
groups. This te hnique has been alled \squiggle sele t", as the user an either draw a
line or squiggle over the strokes they want grouped together. SGI Inperson (SGI, 1999)
uses the same te hnique for sele ting obje ts in a multi-user ollaborative white-board
appli ation.
Drawing a line through the strokes to be grouped is better than ir ling or drawing
a box around the strokes. A single line is able to target spe i strokes easily, and is
easier to draw. If there is a group of losely spa ed strokes, it is easier to pi k out
individual ones with a single line.
Figure 4.6 shows the modify stroke groups mode being used to orre t the two types
grouping errors. Figure 4.6(a) shows the initial state, in whi h the strokes in the \=",
the \4" and the \2" are not orre tly grouped.
First, the user draws a line through the two strokes of the \=" that should be
ombined into a single group, as shown in Figure 4.6(b). Figure 4.6( ) shows the result
after the pen was lifted. Note that the temporary line has disappeared and the \="
has now been orre tly re ognised.
To split the \4" and the \2" apart, the user draws a line through one or more
strokes that should be split o from the larger group. In Figure 4.6(d), a line is drawn
through the two strokes of the \4". A line through the \2" would have had the same
e e t. Figure 4.6(e) shows the nal formula, with the strokes now orre tly grouped
and re ognised.
4.3.6 Modify Chara ters
No matter how good the underlying hara ter re ogniser is, errors in the re ognition
of symbols are still going to o ur. The modify hara ter mode allows the user to
li k on a misre ognised symbol's shaded bounding box and sele t from a pop up

66

(a) Initial grouping.

(b) The user indi ates that two strokes should be grouped together.

( ) The system displays the regrouped and re-re ognised hara ters.

(d) The user indi ates that two strokes should form their own group.

(e) The nal result.


Figure 4.6: Modifying stroke groupings.
67

menu the orre t interpretation for that symbol. The pop up menu ontains the best
hoi es, urrently the top ve, supplied by the hara ter re ogniser for that grouping of
strokes. If there are repeated symbols in the top ve, due to the fa t that the hara ter
re ogniser is able to re ognise multiple styles for individual symbols, all but one of the
o urren es are removed. Should the hara ter that the user desires not appear on
this pop up menu, the user may hose an enter option, and type the orre t hara ter
from the keyboard. Symbols that do not appear on the keyboard are entered using a
long-hand name. For example, \" is entered \Sigma". The user is also able to hoose
non-keyboard systems from a toolbar.
Figure 4.7 shows a user orre ting a misre ognised hara ter in modify hara ter
mode. The \z" in Figure 4.7(a) that the user drew was misre ognised as an \2". By
li king on the hara ter a pop up menu appears, as shown in Figure 4.7(b). Sele ting
the orre t hoi e from this menu then overrides the re ogniser. Figure 4.7( ) shows
the orre ted hara ter.
Even though the pop up menu orre tion method is easy and intuitive, users found
the pro ess of orre ting hara ter interpretation errors somewhat tedious if the orre t
alternative was not on the pop up menu. Having to resort to manually entering the
hara ter is distra ting as it requires that the user swit h from using the pen to using
the keyboard. High hara ter re ognition rates are therefore very important, and any
serious user of the system must take the time to train the re ogniser with their own
handwriting.
4.3.7 Parsing and Preview
When the user sele ts the menu option to parse the formula, the system pops up a
window that shows the workings of the parser as it attempts to pro ess the formula.
The parsing an be an elled by the user at any point.
Figure 4.8 is a s reenshot of the system part-way through pro essing a formula.
This is the same formula as shown in Figure 4.2. In the s reenshot you an see the
graph rewriting formula pro essor displaying its urrent graph. As the parsing pro eeds
this graph is updated. This graph was primarily intended as a debugging aid and is
of little use to most users of the system. However, someone who understands how
the system works an use it to diagnose why their formula may be taking a long time
to parse, ausing erroneous parsing, or ausing the parser to be unable to parse the
formula at all.
On e the formula has been parsed and a LTEX string generated for it, external
A

68

(a) Initial interpretation.

(b) Sele ting the orre t interpretation from the pop up menu.

( ) The orre ted hara ter.


Figure 4.7: Corre ting a misre ognised hara ter.
69

Figure 4.8: A formula being pro essed.

70

tools are used to generate a preview image whi h is then displayed. While the parsing
at times an be qui k, less than a se ond for small formulae, the preview generation is
urrently slow: taking eight to ten se onds per formula.

Figure 4.9: The LTEX preview window.


A

Figure 4.9 shows the window that appears after the preview is generated. The
LTEX ommand string at the top of the window is in a text entry area and an be
opied and pasted into the user's LTEX do ument. The user is also able to edit this
ommand string and press the redo preview button. This regenerates the preview for
the new, edited, ommand string. However, editing in this eld does not hange the
formula entered in the main pen-based formula editor. This fa ility is provided for the
user if they wish to make small hanges and qui kly he k their e e ts.
Should the parser be unable to pro ess the formula su essfully, it indi ates to the
user what the best parse found was. This is des ribed in Se tion 4.3.8.
A

4.3.8 Corre ting Equation Parsing Errors

The most di ult problem that the user fa es is that the formula parser sometimes
fails to re ognise the user's formula. If this happens, the user has to rearrange their
formula so that it is parsable.
If the system is unable to parse the formula, the system shows the \best" parse
found so far by boxing se tions for their formula. Figure 4.10 shows the display that
the system gives the user when it is unable to parse the user's formula. In this example,
there is a single limit on the integral, but the parser's grammar does not a ept this
onstru t. As a result, the best that the parser was able to do was redu e the underlying
graph to two nodes, these nodes being one for the integral itself, and one for the extra
limit. This is displayed to the user by outlining the bounding boxes of these two nodes,
these outlines overlaid on the user's input.
71

Figure 4.10: The display for a formula that the system was unable to parse.

72

The graph grammar based parser allows for some lenien y in the pla ement of hara ters so it usually parses formulae that are part of the grammar on its rst attempt.
Nonetheless, signi ant deviations in pla ement from what the grammar expe ts an
ause parsing failures. In su h a ase, the user must manually realign the input hara ters by using the sele t and move mode.
Other problems that lead to the misparsing of a formula in lude misgrouped or
misre ognised hara ters that are not noti ed by the user. When the formula does not
parse, the user has to nd these mistakes and orre t them using the modify stroke
groups or modify hara ter modes.

73

74

Chapter 5
User Testing
Until an appli ation has undergone user testing, it is not possible to on dently say
whether or not it is truly e e tive and useful. User testing an provide insights into
how people may use a system, in ontrast to how you may envisage them using it.
Those who develop a system may nd it easy to use and understand be ause they have
been using it over the period it was developed, and thus are unable to give an unbiased
evaluation of a system. They are too familiar with it.
In a user test a number of people, representative of the nal intended users of
an appli ation, use it and give feedba k. Usually, they will be observed and asked
questions to determine if the design of an appli ation is logi al, understandable and
usable, and if there were any aws and in onsisten ies.
The main motivation for user testing this system was to evaluate the desirability of
a pen-based formula entry system: to see if it is something that people would want and
use over existing systems, su h as ommand-string or template-based editors. It also
provided the opportunity to see how people found the user interfa e ideas: the modify
stroke groups with a squiggle sele t and the pop up menu for overriding the hara ter
re ogniser. It also provided the opportunity to gather data on the a ura y of the new
stroke grouping algorithm and the performan e of a graph rewriting formula parser on
handwritten input.
This hapter dis usses aspe ts of ondu ting user testing: designing the test, hoosing parti ipants, ethi al onsiderations, running the test itself and the post-test analysis. This hapter also dis usses usability inspe tion, another method for evaluating
user interfa es.
75

5.1

Designing the Test

There are a number of ways to ondu t user testing. The rst, and possibly the most
ommon, is the \thinking aloud" method (Dumas and Redish, 1993). When doing this
sort of testing, an observer sits with the parti ipant as they use the system and the
parti ipant is asked to voi e every thought that they have related to using the system.
If the parti ipant forgets to talk, the observer prompts them. Also, if more information
on the thoughts that user is having is required, the observer an ask them what they
are thinking. Ideally, the observer is not anyone who wrote the system or has a vested
interest: impartiality is desirable.
While this method has the advantage that you an \get inside" the user's head
and nd out what exa tly they are thinking, the user has to remember to always voi e
their thoughts. Remembering to and onstantly vo alising one's thoughts is a skill
that very few people have, as it is something that people do not usually do. Vo alising
thoughts also has the disadvantage that it slows the person down and breaks the
natural ow of operation and thought that they might normally have while using an
appli ation (Wildman, 1995).
Paired user testing (Wildman, 1995) is an approa h that tries to over ome the
awkwardness of getting a person to think aloud as they work. In this approa h, the
user testing of a system is done by pairs of users, who are both working with the
program on a single ma hine at the same time. The users are instru ted to dis uss
what they are thinking and doing. When they strike a problem, they are en ouraged
to dis uss the problem and how they are going about solving it.
Paired user testing gives a mu h more natural and relaxed intera tion with the
system. It an be argued that results gleaned from su h a study are more representative
of a real world situation as it re e ts the fa t that in the real world people tend to rely
on their peers for advi e and help with appli ations that they are not familiar with.
With this approa h, it is easier to gain insight into how they solve the problems that
they fa e.
For the user testing of this system, the parti ipant used the system while an observer
(myself) wat hed them and helped them when they were not able to solve problems
themselves. After they had nished using the system, The observer then dis ussed
any issues that may have arisen throughout the testing. This was supplemented by an
anonymous questionnaire and a verbal question session.
The thinking aloud method was not used as it would have meant that a large amount
76

of time would have had to be spent trans ribing and analysing the users' omments.
This is something that was outside the s ope of this thesis. The system is also relatively
small so, as the aim of the testing was to get feedba k on the pen-based entry and new
interfa e ideas, the questionnaire and verbal questions were su ient.
5.2

Choosing Parti ipants

The ideal users for testing a system are those in the appli ation's target population,
having a full knowledge of the terminology and te hniques in their eld, and a knowledge of the tasks that an appli ation will be required to be apable of. Testing with
people from the target population also means that they have a omparable omputing
skill level to that the end users will have.
Landay (1997) re ommends that if su h ideal users are unavailable, a \ losest approximation" an be used instead. For example, if a system is being targeted for
do tors and it is not possible to arrange to have a real do tor test your system, then
a medi al student an do the testing with almost equivalent skill, and provide similar
feedba k to what the do tor would have given.
The goal of testing of this system was to get feedba k on pen-based formula entry
systems, and the new interfa e on epts reated. A number of people were hosen that
e e tively represent the people who ould be expe ted to use a formula entry system:
mathemati ians, physi ists, omputer s ientists, and high-s hool students.
One important aspe t to onsider is the number of parti ipants to involve in the
user testing. If too few are used, only a small number of problems in an appli ation
will be found. Too many, and a lot of time will be spent with minimal returns for
the additional time and e ort of organising and analysing the additional parti ipants'
results and responses.
A large amount of the literature on doing user testing and usability studies dis usses
this important issue of how many people to involve. The urve shown in Figure 5.1 is
typi al for the number of evaluators and the proportion of problems found with a user
interfa e, taken from Nielsen (1994).
This suggests that the \ideal" number of people to test or evaluate a system lies
between four and ten. Nielsen (1994), for example, believes that evaluation tends
to work best with three to ve evaluators. Dumas and Redish (1993) suggest six to
twelve parti ipants for user testing. These gures apply to large systems, su h as word
pro essors or email programs. This system is relatively small in omparison.
77

100

90

80

Proportion of Problems Found

70

60

50

40

30

20

10

0
1

Number of Testers

Figure 5.1: The proportion of problems with a user interfa e found as the number of
evaluators is in reased.
A total of nine parti ipants took part in the testing, though one of these solely
observed another parti ipant using the system and made omments based on what he
saw, without using the system himself. The nine parti ipants onsisted of two high
s hool students, two physi s postgraduates, two mathemati s postgraduates, and one
postgraduate and two undergraduate omputer s ien e students.
There are a number of di erent ways that parti ipants an be obtained for user
testing. You an advertise for parti ipants, then s reen them as they apply to see if
they are suitable. You an go through a re ruitment agen y, giving them a list of the
riteria that the parti ipants must have, and they will do the s reening for you. It is also
onsidered a eptable to use personal networks to get parti ipants (Dumas and Redish,
1993), though this is not ideal. Care has to be taken be ause, as su h parti ipants are
likely to know you personally, their impartiality may be limited. They may be in lined
to be kinder in their responses and opinions, hoping to tell you what you would prefer
to hear, rather than be honestly riti al. Balan ing this is the possibility that friends
may possibly be less shy during the test, due to them not being apprehensive of saying
riti al things about something to someone that they do not know.
The people who tested this system omprised of three people I, the observer, had
never met before, and six people I already knew: either as friends of friends or personal
78

friends.
Of the people who used the system, six had not seen it before, two had seen it, and
only one had used it before. This meant that there was little bias from parti ipants'
previous experien e or knowledge of the system. Only two of the users had used a pen
and tablet before, unfortunately resulting in the remainder of the parti ipants having
to be ome familiar with the pen and tablet as they used the system. As a result, this
tended to de rease the ease with whi h they were able to use of the system. From my
personal experien e, the time it takes for a person to be ome omfortable and a urate
with a pen and tablet varies between hours and days.
5.3

Ethi al Considerations

One important thing that must be addressed when doing user testing are the ethi al
onsiderations. These are the aspe ts that relate to how the user doing the testing is
treated and how they might possibly feel as a result of the testing.
It is likely that the person testing the system an get the impression that it is
them who is being tested, and not the system itself. This is quite easy to believe,
espe ially if the testing involves the person being video re orded, along with them
being either dire tly observed by a person sitting beside them, or through one-way
glass. Landay (1997) tells of how users have left user testing sessions in tears as a
result of the pressure they felt and their frustration with and embarrassment from not
being able to use the appli ation they were testing.
The user testing parti ipants must be aware that parti ipation is voluntary and
that they are able to leave at any time they wish to, as well as being allowed to take
breaks as they need them. Ideally, a user test should last no more than one or two
hours (Dumas and Redish, 1993). Aspe ts of the environment that the user is in need
to be onsidered as well; you make the user physi ally omfortable. The aim is to
get the parti ipant su iently relaxed so that they behave as normally as possible,
thus giving a real indi ation of how the system would be used in the \real world."
If the environment is set up properly, the parti ipants' omments will relate to the
appli ation and not external fa tors.
For the testing of this system, an entire session with a single user would take
about an hour. This time in ludes an introdu tion to the system, the testing of the
system, and then the oral and written questionnaire. Over this hour long period, it
was asually noti ed that after about forty minutes the user is likely to start getting
79

bored or frustrated with problems that they might be having. This time did vary
somewhat, though, depending on the amount of trouble they were having and the
degree of enthusiasm that they had towards taking part in the testing.
Some organisations require formal ethi al approval before allowing any user testing
to take pla e. In the ethi al proposal, the resear her sets out what they propose
to do and how they have addressed the possible issues that may arise as a result
of their testing. At the University of Otago, where this resear h was arried out,
resear hers must go through an approval pro ess before being allowed to do user testing.
These steps involve the resear her submitting a do ument outlining the pro ess of the
testing that they intend to do, detailing the intended treatment of the parti ipants,
on dentiality, informed onsent, the parti ipant's right to withdraw and the intended
use of the data gathered. The ethi al statement prepared is in luded in Appendix B.
In a ordan e with these rules a onsent form was reated, reprodu ed in Appendix C, for the parti ipants to read and sign. It outlined what they were going to be
doing and what was going to happen to them. It des ribed the onditions under whi h
the testing was going to be done, and what was going to be done with the information
gathered. As ea h parti ipant's a tivities were video taped, it was explained that the
purpose of the video amera was to re ord the s reen and that the tapes were to be
later reviewed to gather information about the behaviour of the system. A monitor
onne ted to the video amera allowed them to see that only the s reen was being
re orded. The parti ipants were also informed of the points overed in the onsent
form verbally.
5.4

The Test Itself

As the use of a pen and tablet was new to most users and most of the users had
not used, or even seen, this system before, ea h testing session started with an initial
training phase. The users were able to be ome familiar with the use of a pen and tablet
and be ome familiar with using the system. The training was ondu ted by rst giving
them a short demonstration of the system, then the observer guiding them through a
set of pra ti e formulae. These formulae were similar in omplexity and stru ture to
those that were used in the main testing phase. When the users felt that they were
ready, or had worked through all the formulae, they moved onto the main testing phase.
During the training the possible additional pressure of being observed was minimised by not having the video amera running.
80

To guide the testing session, users were given a set of ve formulae to work through,
starting with a simple one and progressively in reasing in omplexity. Having a set of
predetermined formulae meant that all the parti ipants were doing the same things.
As a result, it was possible to make omparisons and statisti s a ross the users who
are used. It also means that if, in the future, further testing is done, omparisons an
be made a ross users using the system under di erent onditions.
The ve formulae that were entered by the users for the unaided part of the user
testing are shown below. These formula are representative of the omplexity of formulae
that the formula pro essor's urrent grammar an handle.
(1)
(2)
(3)
(4)

Z +4
x + 4dx
Z x +4
4 dx
X
z + 4z + 2
z
Z (2x + 4x)
pz dx = 8

=0

(5)

As the parti ipants used the system, their a tivities were re orded by a video amera
pointed at the omputer's s reen for later analysis. Dumas and Redish (1993) give
oor plans of a number of professional user testing suites, all having a number of video
ameras trained on the user and omputer, and often additional observers wat hing
through one-way glass.
An additional way to monitor parti ipants' a tivities is to modify the appli ation,
or run an additional program, so that an ele troni re ord of the user's a tions is kept.
Individual keystrokes and mouse a tivity an be re orded. Automati ally generated
reports on various aspe ts of their use of the user interfa e an then be produ ed, as
well as providing logs for later analysis. The user testing of this system relied solely
on the video re ordings of the user's s reen and any observations made by the observer
wat hing the users use the system.
On the on lusion of ea h testing session, the parti ipants were asked to respond to
both an oral and written questions about the system. The advantage of a written or
online questionnaire is that nobody has to trans ribe the parti ipant's answers, and it
81

also o ers the opportunity for anonymity: the parti ipant an freely express themselves
without fear of retribution. Having a verbal question session means that the questioner
is able to ask additional questions to get more information in relation to answers that
the parti ipant has given, or as a result of something the parti ipant did during the
testing. It is important to design the questions so that they are non-leading. It is also
suggested to have some redundan y in the questions that you are asking: designing
questions with overlap.
For the user testing of this system, the parti ipants lled out an anonymous written questionnaire, then answered a number of questions that were asked verbally by
the observer. The anonymous questionnaire, in luded in Appendix D, gathered basi information to gauge the user's experien e with omputers and formula-entry type
systems. It then asked questions to get the user's overall opinion of the system, and
how they found it in omparison to other systems that they may have already used.
The verbal questions, in luded in Appendix E, sought the parti ipant's opinions of the
new interfa e on epts developed. Some overlap did exist as a number of omments
re eived in the verbal questioning were dupli ated from their answers in the anonymous
questionnaire.
5.5

Post-test Analysis

After the testing was ompleted, it was ne essary to analyse the information gathered
and to draw on lusions from it. Landay (1997) suggests to:

 summarise the data gathered for statisti al measures su h as error rates.


 make a list of all the \ riti al in idents." These are all the major points that
o urred, both positive and negative.
Analysis of the above will indi ate if the user interfa e really did work as expe ted.
This analysis may result in the reation of a list of modi ations and improvements
that should be made to the system.
After the user testing was omplete, the video re ordings of the omputer's s reen
were reviewed, information on the error rates and the time it took them to enter ea h
formula re orded. All responses to the anonymous and verbal questionnaires were then
ollated.
The above results were then analysed. The on lusions are presented and dis ussed
in Chapter 6.
82

5.6

Usability Inspe tion

An alternative to user testing is usability inspe tion (Nielsen and Ma k, 1994). A


number of experts in the eld of user interfa e design methodi ally work through an
appli ation and evaluate its usability. There are a number of methods for evaluating
usability. For example, it an be judged using a set of heuristi s: a set of ideals for
user interfa e designs, or against a ognitive model of the way that people intera t
with omputer programs. The experts are experien ed in the design of user interfa es,
and an reliably nd a large proportion of the major and minor problems in a user
interfa e's design.
Unfortunately, these experts do not ome heap. A ost of US$500 to US$1000
per evaluator is typi al (Nielsen, 1994). While this approa h works best with interfa e
evaluation experts, either software engineers or users who are knowledgeable in the
appli ation's target eld, an be substituted for the experts. The non-experts are
introdu ed to and taught about user interfa e evaluation, then allowed to evaluate the
system.
While not using experts redu es the ost, the bene ts gained are also fewer. From
the studies done by Desurvire (1994), it is seen that the proportion of problems in a
system that experts found was ve to ten times more than non-experts found. This
nding was onsistent a ross all di erent types of usability inspe tion methods. The
experien e that experts have with usability guidelines, user testing and the design of
user interfa es leads to more e e tive usability testing and reporting of problems with
user interfa es.
Formal methods for evaluating user interfa es, while they may be su essful in
nding user interfa e aws, are still restri ted by the fa t that they do not use real
users. The interfa es are evaluated by an expert using a set of ideals or an approximate psy hologi al model of humans. The other problem with experien ed evaluators
is that, while they may by skilled in evaluating user interfa es, they may not know
mu h about the eld an appli ation is targeted for and skills that people in that eld
have, for example: how mu h experien e they have in using omputers. Nielsen and
Ma k (1994) state that usability inspe tions, having experts analyse the interfa e, is
not yet a substitute for user testing with real users. There is no ompletely a urate
model of the human mind predi ting how people will think and rea t when using a
user interfa e. It is hard to predi t what a real user will a tually do when fa ed with
a given situation. Even the rea tions a ross a number of real users, all fa ed with the
83

same situation, varies. Be ause of this, a ommon approa h in an appli ation's design
y le is to initially do a usability inspe tion rst, then, after revising the system and
addressing the issues found, doing full user testing.
Nielsen and Ma k (1994) point out that when hoosing to do either no testing at
all or some testing { no matter how limited in s ope, formality, number of parti ipants,
or skill of the testers { the limited testing is still better than none at all. You are still
going to get some insights into the usability of an interfa e and get suggestions and
ideas on how it ould be improved.
A true user interfa e analysis was approximated by a self-evaluation against the ten
usability guidelines that Nielsen (1994) proposes, and were dis ussed in Se tion 4.1.
The results of this are presented in Se tion 6.2.

84

Chapter 6
Evaluation
This hapter rst reports the ndings based on the user testing: overall impressions,
working styles, timing information and error rates. It then evaluates the user interfa e
with respe t to how the test users found it. After evaluating the e e tiveness of the
user testing, it gives an informal usability evaluation of the user interfa e with respe t
to Nielsen's ten usability guidelines (Nielsen, 1994). This hapter ends with a summary
of the positive and negative aspe ts of the system.
While a total of nine parti ipants gave a good indi ation of the usability of the system, statisti al results generated from the user testing information should be treated as
preliminary results. For more statisti ally valid results, a larger number of parti ipants
is required.
6.1

User Testing

Parti ipants in the user testing found the interfa e easy to learn, easy to use and effe tive for entering formulae. A number of the parti ipants also asked when a fully
featured version of the system would be available. When asked to rate the style of
intera tion that this system used against other systems they had used (typi ally Mi rosoft's equation editor or LTEX), on a s ale of 0 (Worse) to 5 (Better), the answers
were all at or above 3, with an average of 4.2.
Testers who were mathemati ians, those who would have the highest requirements
of su h a system, remarked positively on the system and what it was able to do. Their
main riti ism was that there was only a limited subset of mathemati al formulae that
the system ould understand, thus being of no present value to them. As the system
is based on a grammar loaded at run-time, this grammar an be edited so that it an
A

85

perform as they desire. Ideally a GUI tool would enable end users to work with the
grammar, adding or hanging any mathemati al onstru ts as they desire.
Parti ipants in the user testing lled in an anonymous questionnaire and then answered verbal questions. The answers given for the anonymous questionnaire are in luded in Appendix F. Answers given during the verbal questioning are in Appendix G.
6.1.1 Working Styles

After a short period of using the system, all but one of the parti ipants in the user
testing settled into a style of working in whi h they would write the entire formula,
then go ba k and make any ne essary orre tions. This style of operation proved to
be a more e ient and omfortable method of formula entry than stopping to orre t
any grouping or re ognition errors as they o urred.
This working style may have been partly indu ed by the fa t that the system
required the user to interrupt the ow of drawing in order to press a button on the
toolbar, to hange the system from drawing to one of the two orre tion modes. Had it
been possible to orre t mistakes without having to go through a \ hange-mode" step,
this working style may have not been so prominent.
6.1.2 Time Spent Entering and Corre ting Formulae

Figures 6.1, 6.2, 6.3, 6.4, 6.5 show the drawing and orre tion times for ea h of the ve
test formulae. Figure 6.6 shows the total times taken by ea h parti ipant a ross all the
formulae. There is no total time for parti ipant seven, as they did not enter formula
two or formula three. Figure 6.7 is a pie graph showing the proportion of time spent
by the average user in various stages entering and orre ting formulae.
Time spent by users xing their own mistakes, for example writing the wrong symbol
and then having to delete it and rewrite it orre tly, is in luded in this drawing time.
The drawing time also in ludes waiting for the interfa e to time out and re ognise the
outstanding strokes that the user has drawn.
Over the period of entry for the ve formulae, the total entry times for the parti ipants in the user testing ranged from 117 to 236 se onds, with an average of 194
se onds. An expert user who is familiar with the system is able to enter the same
formulae in around 68 se onds.
The di eren e between the ideal and observed time is due to the fa t that most
parti ipants did not write qui kly with the pen and tablet, due to unfamiliarity and
86

Drawing and Correction Times for Formula 1


30

25

Time (seconds)

20

Correcting Positions
Correcting Recognition

15

Correcting Grouping
Drawing

10

0
1

Average

Participant

Figure 6.1: Times for people entering Formula 1.

Drawing and Correction Times for Formula 2


90

80

70

Time (seconds)

60

Correcting Positions

50

Correcting Recognition
Correcting Grouping
40

Drawing

30

20

10

0
1

Average

Participant

Figure 6.2: Times for people entering Formula 2.


87

Drawing and Correction Times for Formula 3


180

160

140

Time (seconds)

120

Correcting Positions

100

Correcting Recognition
Correcting Grouping
80

Drawing

60

40

20

0
1

Average

Participant

Figure 6.3: Times for people entering Formula 3.

Drawing and Correction Times for Formula 4


160

140

120

Time (seconds)

100
Correcting Positions
Correcting Recognition

80

Correcting Grouping
Drawing

60

40

20

0
1

Average

Participant

Figure 6.4: Times for people entering Formula 4.


88

Drawing and Correction Times for Formula 5

200

180

160

Time (seconds)

140

120
Correcting Positions
Correcting Recognition

100

Correcting Grouping
Drawing
80

60

40

20

0
1

Average

Participant

Figure 6.5: Times for people entering Formula 5.

Drawing and Correction Times for All Formulae Combined


600

500

Time (seconds)

400

Correcting Positions
Correcting Recognition

300

Correcting Grouping
Drawing

200

100

0
1

Average

Participant

Figure 6.6: Times for people entering all the formulae.


89

Vous aimerez peut-être aussi