Académique Documents
Professionnel Documents
Culture Documents
Pr o f. Tu l a s i Pr a s a d S a r i k i ,
S CS E , V I T C h e n n a i C a m p u s
w w w. l e a r n e r s d e s k . w e e b l y. c o m
Contents
Text Processing
Text Preprocessing
Challenges in Text Preprocessing
Types of Writing Systems
Even for a single language like English no single encoding was adequate
for all the letters, punctuation, and technical symbols in common use.
Unicode covers all the characters for all the writing systems of the world,
modern and ancient. It also includes technical symbols, punctuations, and
many other characters used in writing text. The Unicode Standard is
intended to support the needs of all types of users, whether in business
or academia, using mainstream or minority scripts.
Character Encoding
◦ ASCII, ISCII, Unicode
Font Encoding
◦ Eenadu, vaartha, Kumudam , Daily Thanthi