Fulbright Grant: Research Proposal

STATEMENT OF PROPOSED STUDY OR RESEARCH
Pamela Fox | Australia | Computer Sciences

A World Wide Web of Comparative Linguistics
Overview: The goal of this project is to develop a visual interface that allows the
user to input a desired word in an available language and interactively view the
etymological history of that word displayed visually on a map of the world. There are
several objectives to this project: developing an intuitive visual interface for displaying
etymological information to non-linguists in an entertaining yet educational way;
increasing a sense of global connectedness by showing how languages are intertwined
with one another; discovering the best techniques for describing to a computer database
how words in different languages are connected to one another; realizing the most
efficient method for storing words and discovering connections.
System Description: For example, after inputting “pass” in English, the system
would draw an arrow from England to France, which would display “passer,” and then to
Italy, which would display “passus.” Mousing over any word would display information
on its meaning and language (e.g. “step” and “Latin” for “passer”). By clicking on a word
from one of the non-modern languages and choosing the "Find Descendants option", the
system would then find all words that derived from that word. For example, clicking on
the Latin “passus” would then show the user that the Spanish word “paso” and that
another English word “pace” were derived from it. The user would then realize that
modern-day “pace” and “paso” share a related meaning because of their common
ancestor, and can be called cognates, words or morphemes related by derivation,
borrowing, or descent. The system would include an option that would force the system
to automatically search for and display all the possible cognates to the user. For words
that are simply borrowed from another language with no derivational change (called
“loan words”), a visually distinct arrow would be used to represent that type of
connection. It is of note that typical dictionaries only give ancestral etymology for a
given word, but since this system will combine the etymological knowledge of every
word, it can easily provide descendant information as well (the forward links).
System Extensions: These are several extensions I’ve already thought of; I
expect more to arise while I am conducting the research. When the information is
available to do so, a word could be broken down into its roots, and the user could find the
evolutionary information of just a root. In some cases, this will reveal connections
between words that would not otherwise be found when querying the entire word.
In my search for related work, I was upset that I couldn’t find linguistic
visualization programs on my own until a veteran of the field gave me links. It seems to
me that research with an interactive nature should be made easily available for its
audience to use and evaluate, and so I would code my program’s visual interface in a
web-accessible format (e.g. Flash, Java). To further increase its interactivity, I would
make it possible for certain users (e.g. linguists) to input their own linguistic knowledge
into the system. So the project would also be an experiment in the programming of
dynamic and interactive programs, which I see as a new paradigm for research in the
Internet age.
Related Work: The Tower of Babel (A) is an interactive etymological database,
but it doesn’t display the information visually, and is clearly only designed for a technical
crowd because it expects knowledge of specialized jargon and isn’t user friendly.
The Visual Thesaurus (B) is a fun interactive visual tool for displaying
synonymy/antonymy between words in the English language. Kirrkirr (C) is a similar
STATEMENT OF PROPOSED STUDY OR RESEARCH
Pamela Fox | Australia | Computer Sciences
A World Wide Web of Comparative Linguistics
project, a visual dictionary for indigenous languages that offers semantic and some
limited translation information in various multimedia formats. VerbOcean (D) is a project
which finds precedence links between English verbs by mining the web and displays the
connections (e.g. discuss-> pursue -> support ->approve).
My proposed project would combine more dimensions of information into one
view than any of the above visualization projects: temporal (word:time), spatial
(word:location), semantic (word:meaning), and parent-child (word:word). The challenges
posed by the storing, processing, and visualization of cross-lingual data in so many
dimensions is probably why such a project has never been attempted before, but I am
prepared to face them and resolve any related issues.
Sponsor: If granted a Fulbright, this project will be conducted at the University of
Melbourne in Australia under the supervision of Professor ----, whose research focuses on
computational models for linguistic information. I found the sponsor after a summer
research mentor in computational linguistics recommended that I check out Melbourne’s
“Human Language Technology” group. Prof. --- is director of that group and specifically
researches computational models for linguistic information. The group’s interests are in
discovering the best ways to digitally store, process, and perform computations on
linguistic information—perfectly aligned with my project’s interests. After researching
related computer science/linguistics departments worldwide, I’ve concluded that this
particular research center and professor will provide the best intellectual support and
resources.
Language-wise, several criteria make Australia a perfect country to conduct the research
in, as it is English-speaking in modern times but still has highly studied aboriginal
languages. A primarily English-speaking country is best for the project as the English
language is (in)famous for the extremely varied origins of its words and our lexicon
would make for the most interesting starting point for this project. Once the feasibility of
storing and visualizing the information for languages like English whose ancestors spread
across the globe, it will be an interesting extension to see how the system could deal with
and display the connectedness of the much more isolated Australian aboriginal languages.
Timeline: I would conduct the research over the span of two Australian
semesters, beginning February 2007 and ending November 2007. This aligns with the
University of Melbourne schedule so that I may also choose to enroll in classes relevant
to the research simultaneously. I would begin the research by first locating the necessary
etymological resources (e.g. dictionaries), and then I would work on discovering the best
way to convert and store those potentially diverse resources in one database, and then
developing connection algorithms and creating the actual visualization program. After
I’ve developed an initial prototype, I will work on efficiency and extensions (mentioned
above) that will add to the uniqueness of the research. As suggested by my sponsoring
professor, I will be submitting at least one paper detailing the results of my research to an
academic conference during that time.
Cited Works
A: Tower of Babel: An Etymological Database Project http://starling.rinet.ru/
B: The Visual Thesaurus http://www.visualthesaurus.com/
C: Kirrkirr http://www-nlp.stanford.edu/kirrkirr/
D: VerbOcean http://semantics.isi.edu/ocean/

Fulbright Grant: Research Proposal

Transféré par

Informations du document

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Fulbright Grant: Research Proposal

Transféré par

Droits d'auteur :

Formats disponibles

STATEMENT OF PROPOSED STUDY OR RESEARCH

Pamela Fox | Australia | Computer Sciences

Vous aimerez peut-être aussi