Vous êtes sur la page 1sur 2

Visual Data Mining Pak Chung Wong

Pacific Northwest National Laboratory

S eeing is knowing, though merely seeing is


not enough. When you understand what
you see, seeing becomes believing. A while ago scientists
A reliable visual data mining system must provide
estimated error or accuracy of the projected informa-
tion for each step of the mining process. This error infor-
discovered that seeing and understanding together mation can compensate for the deficiency that imprecise
enable humans to glean knowledge and deeper insight analysis of data visualization can cause.
from large amounts of data. The approach integrates the A reusable visual data mining system must be adapt-
human minds exploration abilities with the enormous able to a variety of systems and environments to reduce
processing power of computers to form a powerful the customization effort, provide assured performance,
knowledge discovery environment that capitalizes on and improve system portability.
the best of both worlds. The technology builds on visual A practical visual data mining system must be gener-
and analytical processes developed in various disciplines ally and widely available. The quest for new knowledge
including scientific visualization, data mining, statistics, or deeper insights of existing knowledge cannot be
and machine learning with custom extensions that han- planned. It may mean a portable system through
dle very large, multidimensional, multivariate data sets. telelinks or an embedded (local) system within the
The methodology is based on both functionality that information domain. This requires that the knowledge
characterizes structures and displays data and human received from one domain adapt to another domain
capabilities that perceive patterns, exceptions, trends, through physical means or electronic connections.
and relationships. Here Ill define the vision, present the Finally, a complete visual data mining system must
state of the art, and discuss the future of a young disci- include security measures to protect the data, the newly
pline called visual data mining. discovered knowledge, and the users identity because
of various social issues.
The vision So far Ive ignored discussing the underlying visual-
The vision of a visual data mining system stems from ization and mathematical techniques of visual data min-
the following principles: simplicity, user autonomy, reli- ing. This is partly because of the space limit and partly
ability, reusability, availability, and security. because of the steady and incremental technological
A visual data mining system must be syntactically sim- advancements in the field of visual data mining. You can
ple to be useful. Simple doesnt mean trivial or nonpow- find samples of the latest technologies in this special
erful. Simple to learn means use of intuitive and friendly issue. Although no one involved in this exciting field has
input mechanisms as well as instinctive and easy-to-inter- all the technical solutions today, everyone is fully aware
pret output knowledge. Simple to apply means an effec- of the grand challenges ahead.
tive discourse between humans and information. Simple
to retrieve or recall means a customized data structure to Current state of the art
facilitate fast and reliable searches. Simple to execute Visualization has been used routinely in data mining
means a minimum number of steps needed to achieve as a presentation tool to generate initial views, navigate
the results. In short, simple means the smallest, func- data with complicated structures, and convey the
tionally sufficient system possible. results of an analysis. Generally, the analytical meth-
A genuine visual data mining system must not impose ods themselves dont involve visualization. The loose-
knowledge on its users, but instead guide them through ly coupled relationships between visualization and
the mining process to draw conclusions. Humans should analytical data mining techniques represent the major-
study the visual abstractions and gain insight instead of ity of todays state of the art in visual data mining. The
accepting an automated decision. process sandwich strategy, which interlaces analytical

2 September/October 1999 0272-1716/99/$10.00 1999 IEEE


processes with graphic visualization, penalizes both time. This provides continuous interactions between
procedures with each others deficiencies and limita- man and machine during the data unfolding process.
tions. For example, because an analytical process cant Because of the layered design of the visualization and
analyze multimedia data, we have to give up the clustering processes, its considered a loosely coupled
strengths of visualization to study movies and music in visual mining system.
a visual data mining environment. Rohrer, Ebert, and Sibert describe a shape-based visu-
Perhaps a stronger visual data mining strategy lies in alization system to support data mining of text. The text
tightly coupling the visualizations and analytical process- information is mapped to document vectors before its
es into one data mining tool. Letting human visualiza- visualized using implicit surface modeling techniques.
tion participate in an analytical process decision-making The system supports querying articles of a corpus by
remains a major challenge. Certain mathematical steps matching the vectors shape through visualization.
within an analytical procedure may be substituted by Zoom views support data drilling of the document text.
human decisions based on visualization to allow the
same analytical procedure to analyze a broader scope of Conclusion
information. Visualization supports humans in dealing This issue showcases an exciting field where people
with decisions that can no longer be automated. This turn seeing into knowing, believing, and eventually
results in a tightly coupled visual data mining environ- human insights. I believe the vision defined here can be
ment that truly takes advantage of the strengths of all reached and the proposed tasks accomplished. As the
worlds. articles in this issue show, both the loosely and tightly
coupled visual data mining systems perform well in cer-
The future tain domains and environments. Scientists and engi-
All signs indicate that the field of visual data mining neers will continue to explore new ground and find new
will continue to grow at an even faster pace in the future. applications in this young discipline. As for the future,
In universities and research labs, visual data mining will I see the advancement of visual data mining resembling
play a major role in physical and information sciences the rapid growth of personal computers in our society.
in the study of even larger and more complex scientific The active participation of humans and the decisions
data sets. It will also play an active role in nontechnical based on visualization combine the art of human intu-
disciplines to establish knowledge domains to search for ition and the science of mathematical deduction, for-
answers and truths. For example, there may exist stan- ever changing the landscape of data analysis.
dard man pages for our favorite visual data mining
functions on our Unix system. An advanced form of scat- Acknowledgments
terplot matrix may substitute for the use of covariance The Pacific Northwest National Laboratory is operat-
and regression in statistics studies. National standards ed for the US Department of Energy by Battelle Memo-
will be developed to govern the functionality and rial Institute under contract DE-AC06-76RLO 1830.
resources of visual data mining.
In industries and households across the country, visu-
al data mining will be embedded in public utilities and
home appliances. Many searching referencessuch as Pak Chung Wong is a senior
the yellow pages, dictionaries, and even newspapers research scientist in the Synthesis,
will have visual mining capability. There may be com- Analysis, and Visualization of Infor-
puter chips dedicated to support visual data mining mation group at the Pacific North-
activities. The term visual data mining will be included west National Laboratory in
in school textbooks and literature. Audio- or haptic- Richland, Washington, where he per-
based substitutes will help the visually impaired. Our forms research and development on
imagination is the only limit of the future. scientific computation and information technology. His
research interests include visualization, data mining, sci-
About the articles entific data abstraction, steganography, and wavelets. He
This special issue on visual data mining attracted received a PhD in computer science from the University of
high-quality submissions from England, Germany, and New Hampshire.
the United States. All of the articles presented interest-
ing and promising research in visual data mining. Unfor- Readers may contact Wong at the Pacific Northwest
tunately, theres only room to include a few articles. National Laboratory, 902 Battelle Blvd., P.O. Box 999,
Hinneburg, Keim, and Wawryniuk introduce a novel MSIN: K7-28, Richland, WA 99352, e-mail
clustering algorithm on large amounts of high-dimen- pak.wong@pnl.gov.
sional data. Visualization techniques instead of auto-
mated decisions guide the recursive partitioning of the
new clustering algorithm. This is a major step towards
the goal of a tightly coupled visual data mining envi-
ronment.
Ribarsky et al. present a clustering algorithm for very
large data sets. The new technique enables clustering of
large amounts of data with a fast interactive response

IEEE Computer Graphics and Applications 3

Vous aimerez peut-être aussi