Académique Documents
Professionnel Documents
Culture Documents
Markus Borg Department of Computer Science Lund Institute of Technology Lund University
Outline
1.
2. 3. 4. 5. 6.
Introduction
Corpus description Event and time detection Ordering events Evaluation Conclusions
1. Introduction
Minute-by-minute reporting popular Subscribers get info in real-time
1. Introduction
Aftonbladet
1. Introduction
UEFA
1. Introduction
Certain advantageous characteristics
Concrete information Short reports The sentences are already ordered Only sentence internal order of interest
1. Introduction
Information can give possibilities:
Display the action Enable collection of statistics
1. Introduction
Objective:
Discover events and time information Sort the events Structure the collected information
2. Corpus description
Online management game 960 707 users May 9 Running since August 1997 40 languages
Estonia world champions!
2. Corpus description
Sentences produced from 170 events
An event has on average five wordings Result of different language administrators
2. Corpus description
Suitability
The texts are vivid yet limited
Quantity
More than a million reports each week
Availability
All reports are easily accessible from the server
Format
No graphics, lines, indenting etc.
2. Corpus description
In the 18th minute cheers broke Efter 18 minuters spel brt out as Nicolas Jullien found his jublet ls d Nicolas Jullien kom way through the guests central igenom gsternas mittfrsvar defence, clipping the 1 0 goal och dundrade in 1 - 0 fr in for Rydebck. Daniel Fridquist Rydebck. Daniel Fridquist i of Rydebck received a yellow Rydebck tilldelades efter 20 card in the 20th minute for minuter gult kort fr osportsligt unsporstmanlike behaviour. In the upptrdande. I den 22:e 22nd minute of the match, the matchminuten fick gsternas visitors central line of defence mittfrsvar se sig rundat av had to look on as Mikael Mikael Martinsson som slog in Martinsson dashed through, 2 - 0 fr Rydebck. knocking home 2 0 for Rydebck.
3. Detection
Annotation scheme:
TimeML (modified subset)
<TIMEX3> <EVENT> <TLINK>
3. Detection
Prototype in Java
Built-in package for regular expressions used 50 lines of code
3. Detection
Finding time expressions
Only absolute time Few regular expressions needed
3. Detection
Finding events
Only goal scoring events More challenging... 45 lines of code
(reducera|kvittera) till [0-9]+ - [0-9]+ drygade [\\w] ut (sin ledning|ledningen till [0-9]+ [0-9]+)
4. Ordering
4. Ordering
Baseline
Assume that the events happened in the same order as they appear in the text
Second strategy
Utilize the division into types
4. Ordering
Types
Given by grouping the regular expressions
Dribbling
TV-rddning Inkast
Nicktoucha
Reflexrddning Sprngnick
4. Ordering
Types
0. RESULTCHANGE 1. SAVE 2. FINISH 3. PREFINISH 4. IDLEBALL 5. OTHER Higher numbers happen before lower ones If many of same type, order as in the text
5. Evaluation
Experimental setup
25 reports in training set for crafting
8 higher divisions 7 mid-leagues 5 lower divisions 5 national games
5. Evaluation
Absolute time expressions
Recall: 100%
Event detection
Recall:
Precision:
79.4%
87.5%
6. Conclusions
Real-time generated reports interesting
Presented a system to extract time
information
6. Conclusions
Absolute time easy to find
Dividing events into types is useful Simple techniques can be useful for closed domain texts Should be possible to extend to other domains
6. Conclusions
Future work
Other event categories More robust system, part of speech tagger, WSD Information about participants