Vous êtes sur la page 1sur 29

Time Extraction from Real-time Generated Football Reports

Markus Borg Department of Computer Science Lund Institute of Technology Lund University

Outline
1.
2. 3. 4. 5. 6.

Introduction
Corpus description Event and time detection Ordering events Evaluation Conclusions

1. Introduction
Minute-by-minute reporting popular Subscribers get info in real-time

1. Introduction
Aftonbladet

1. Introduction
UEFA

1. Introduction
Certain advantageous characteristics

Concrete information Short reports The sentences are already ordered Only sentence internal order of interest

1. Introduction
Information can give possibilities:
Display the action Enable collection of statistics

1. Introduction
Objective:
Discover events and time information Sort the events Structure the collected information

2. Corpus description

Online management game 960 707 users May 9 Running since August 1997 40 languages
Estonia world champions!

2. Corpus description
Sentences produced from 170 events
An event has on average five wordings Result of different language administrators

2. Corpus description
Suitability
The texts are vivid yet limited

Quantity
More than a million reports each week

Availability
All reports are easily accessible from the server

Format
No graphics, lines, indenting etc.

2. Corpus description
In the 18th minute cheers broke Efter 18 minuters spel brt out as Nicolas Jullien found his jublet ls d Nicolas Jullien kom way through the guests central igenom gsternas mittfrsvar defence, clipping the 1 0 goal och dundrade in 1 - 0 fr in for Rydebck. Daniel Fridquist Rydebck. Daniel Fridquist i of Rydebck received a yellow Rydebck tilldelades efter 20 card in the 20th minute for minuter gult kort fr osportsligt unsporstmanlike behaviour. In the upptrdande. I den 22:e 22nd minute of the match, the matchminuten fick gsternas visitors central line of defence mittfrsvar se sig rundat av had to look on as Mikael Mikael Martinsson som slog in Martinsson dashed through, 2 - 0 fr Rydebck. knocking home 2 0 for Rydebck.

3. Detection
Annotation scheme:
TimeML (modified subset)
<TIMEX3> <EVENT> <TLINK>

Full example later

3. Detection
Prototype in Java
Built-in package for regular expressions used 50 lines of code

3. Detection
Finding time expressions
Only absolute time Few regular expressions needed

(I|i) (den)? [0-9]+:e (match|spel)?minuten

3. Detection
Finding events
Only goal scoring events More challenging... 45 lines of code
(reducera|kvittera) till [0-9]+ - [0-9]+ drygade [\\w] ut (sin ledning|ledningen till [0-9]+ [0-9]+)

4. Ordering

Crucial part of the application

Connecting time and events


<TLINK // type=DURING>

Connecting different events


Two approaches tested

4. Ordering
Baseline
Assume that the events happened in the same order as they appear in the text

Second strategy
Utilize the division into types

4. Ordering
Types
Given by grouping the regular expressions

Skott Inlgg Hrnspark

Dribbling

TV-rddning Inkast

Nicktoucha

Reflexrddning Sprngnick

4. Ordering
Types
0. RESULTCHANGE 1. SAVE 2. FINISH 3. PREFINISH 4. IDLEBALL 5. OTHER Higher numbers happen before lower ones If many of same type, order as in the text

5. Evaluation
Experimental setup
25 reports in training set for crafting
8 higher divisions 7 mid-leagues 5 lower divisions 5 national games

3 reports in the test set


games with 4 goals

5. Evaluation
Absolute time expressions
Recall: 100%

Event detection
Recall:
Precision:

79.4%
87.5%

Event order (sentence accuracy)


Disregarding: Keeping: Baseline: 66.7% Types: 100% Baseline: 66.7% Types: 83.3%

6. Conclusions
Real-time generated reports interesting
Presented a system to extract time

information

6. Conclusions
Absolute time easy to find
Dividing events into types is useful Simple techniques can be useful for closed domain texts Should be possible to extend to other domains

6. Conclusions
Future work
Other event categories More robust system, part of speech tagger, WSD Information about participants

Time extraction in Swedish


A machine learning approach to extract temporal information from texts in Swedish and generate animated 3D scenes.
Berglund et al. (2006)

Football related extraction


Ontology-based information extraction with SOBA.
Buitelaar et al. (2006)

Ordering within sentences


Inferring sentence-internal temporal relations.
Lapata and Lascarides (2004)

Vous aimerez peut-être aussi