Vous êtes sur la page 1sur 5

Dataset hpl-tamil-iso-char-online-1.0 & hpl-tamil-iso-char-offline-1.

0
Updated Jun 8, 2006

The dataset hpl-tamil-iso-char-online-1.0 contains samples of the 156 character classes


collected from different writers using a TabletPC application. Most writers contributed two
samples per class (trials), a few contributed as many as 10. The directory structure is same for
both the online and offline datasets except that the name of the root directory in the offline case
would be hpl-tamil-iso-char-offline-1.0. The offline data contains bilevel TIFF images derived from
the online data. Details of the directory structure, file contents and statistics regarding the number
of samples per class are provided in the following sections
1. Directory Structure

hpl-tamil-iso-char-online-1.0

usr_16 usr_50 usr_229

000t01.txt155t10.txt

000t01.txt.155t02

000t01.txt...155t02.txt

hpl-tamil-iso-char-online-1.0 directory is the root directory for the online data.


hpl-tamil-iso-char-offline-1.0 directory is the root directory for the offline data.
Ink data is organized by writer into subdirectories of the form usr_<user-id>, e.g., usr_16
corresponds to the 16th writer.
User-ids are in the range 16 229, but not contiguous
Ink data is stored in files of the form <3-digit class-id>t<trial-id>, e.g. 008t03.txt implies the 3rd
trial of the character with class-id 008.
Class-id is in the range 000 155
Trial-id is in the range 01 02 for most users, 01 10 for some. However they are not
guaranteed to be contiguous since bad samples may have been removed.

2. Ink File Contents


In each file, ink data is represented in UNIPEN v1.0 format, as shown below.
The channels reported for each ink point are X,Yand T. Files corresponding to some
users have valid T (time) values for the first and last points of each stroke, with
intermediate values set to 0. For other users, the time channel is set to 0 for all points.
Since the ink was captured using Microsoft Ink Picture Controls on a TabletPC, spatial

resolution and sampling rate correspond to the interpolated ink returned by the control,
and are higher than what is supported by the hardware.
.VERSION 1.0
.HIERARCHY CHARACTER
.COORD X Y T
.SEGMENT CHARACTER
.X_POINTS_PER_INCH 2500
.Y_POINTS_PER_INCH 2500
.POINTS_PER_SECOND 1200
.PEN_DOWN
935 523 0
935 523 0
935 523 0
935 520 0
935 517 0
935 514 0
935 511 0
.PEN_UP

For more information on UNIPEN format please refer to http://unipen.nici.ru.nl/unipen.def

3. Samples per class

Id No
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22

No of samples
527
521
522
509
518
525
513
522
512
501
497
522
517
497
508
508
524
523
513
492
518
521
519

23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67

508
516
515
503
503
512
498
521
477
271
509
469
508
490
515
512
512
511
516
506
520
511
518
519
519
518
515
518
521
524
511
500
271
511
475
515
486
513
524
510
520
504
502
518
508

68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112

514
509
512
514
500
507
502
509
513
511
270
512
509
519
502
502
506
509
516
512
514
508
516
515
521
510
520
494
502
507
519
488
460
501
495
512
505
525
522
522
523
521
522
509
519

113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155

418
495
507
511
507
528
504
512
488
443
426
272
470
457
455
438
266
440
439
516
487
507
516
524
523
524
506
513
520
523
508
525
516
508
510
518
520
511
484
271
508
472
516