Vous êtes sur la page 1sur 7

Decision Tree Applet Design Notes

Overview:
This manual provides a very brief overview of the architecture and function of the various components used to build the decision tree applet. The applet is designed using a layered approach, separating the underlying algorithm from the GUI controls. More detailed information on each class can be found in the appropriate Javadoc HTM file.

Applet Packages:
The applet code is divided into four separate Java pac!ages, each of which is described in the table below.

Package ai.common

Description "ontains classes that provide a variety of general services. "lasses in this pac!age can potentially be reused in other applets and applications. "ontains classes that support the implementation of the decision tree learning algorithm. "ontains classes that provide GUI components used to control and display output from the decision tree learning algorithm. "ontains the main decision tree applet class. Table # $ %pplet pac!age descriptions.

ai.decision.algorithm

ai.decision.gui ai.decision.applet

Applet Structure:
The diagram that follows provides a high$level view of the ma&or applet components and their interactions. Most communication between components is handled through a ComponentManager instance. The ComponentManager acts as a 'reference repository( ) when one component wants to send a message to another component, the sender as!s the ComponentManager for the re*uired reference. Using this structure, GUI and algorithm components do not need to !eep multiple references internally, which simplifies the overall applet design.

GUI ayer

%lgorithm ayer

.ataset .ataset Menu

.ataset Table

"omponent Manager

.ecision Tree %lgorithm

%lgorithm Menu

.ecision Tree /isual 0epresentation of Tree

-igure # $ %pplet architecture

Packages Details:
The sections that follow provide more detailed information on important classes contained in each of the applet pac!ages.

+ac!age ai.common,
The ai.common pac!age contains a series of general utility classes that can potentially be reused in other applets and applications.

AlgorithmFramework "lass, AlgorithmFramework encapsulates functionality allowing an algorithm to run in a separate thread. The class implements a series of synchroni1ed methods to control the state of an algorithm 2for e3ample, whether it is started or stopped4. %dditionally, the class maintains a reference to the current 'run mode(, which is one of NORMAL_MODE, TRACE_MODE, or REA!_MODE.

AlgorithmListener, "ighlightListener and TreeChangeListener Interfaces, These interfaces are loosely based on the Java #.5 event model. -or e3ample, classes that need to respond to events associated with algorithm e3ecution 2algorithm start, stop and step events4 should implement the AlgorithmListener interface.

Code#anel, CodeReader and $n%alidCodeFileE&ception "lasses, The Code#anel class e3tends the 6wing '#anel component class, and is used to display algorithm pseudo$code. Code#anel implements the "ighlightListener interface, allowing lines of pseudo$code to be highlighted as an algorithm e3ecutes. +seudo$code is stored in an e3ternal HTM file, which is read and parsed by an instance of the CodeReader class. In addition to the standard HTM tags, CodeReader and Code#anel recogni1e the following tags, ) The ()unction*name+,-name.,/ (0)unction/ tags, which are used to identify the pseudo$code for a specific function. The tags must appear in pairs. 7ach line of pseudo$ code between the function tags should be terminated with a period ) a parsing error causes an $n%alidCodeFileE&ception to be thrown. ) The (ta1/ tag, which is used to indicate a standard indent level. Multiple (ta1/ tags can appear together ) when the pseudo$code is rendered in the Code#anel, (ta1/ tags are automatically replaced with a fi3ed number of spaces. This facilitates uniform indentation of nested lines of pseudo$code.

+ac!age ai.decision.algorithm,
"lasses in the ai.decision.algorithm pac!age implement standard decision tree learning and pruning algorithms. The pac!age is independent of any specific GUI components, allowing the learning and pruning algorithms to run as part of a standalone, te3t$based application. Attri1ute "lass, The Attri1ute class stores information about one particular attribute from a decision tree dataset, including the attribute name and a list of possible values. %dditionally, each Attri1ute ob&ect contains an internal statistics array, which is populated with values as part of the decision tree construction process.

%ttribute value inde3

Target value inde3

9umber of e3amples with particular attribute value and associated target value -igure 5 $ Internal %ttribute statistics array The decision tree learning algorithm uses the values stored in the statistics array to determine which attribute to split on at each position in the tree. Attri1uteMask "lass, %n Attri1uteMask is a one$dimensional array with a si1e e*ual to the number of attributes in a given decision tree dataset 2including the target attribute4. The mas! trac!s which attributes are 'split on( along the path to a certain position in the tree. 7ach cell in the mas! can contain an attribute value inde3, or the 2N23ED designation, which indicates that an attribute is not used along a particular path. %s an e3ample, consider a dataset with four attributes 2including the target attribute4. The mas! for the lower$left leaf node is shown in the diagram below.

Target %ttribute %ttribute # %ttribute 5 %ttribute = Target class 1 /alue inde3 8 /alue inde3 8 Unused
# 8 8 1 8 0 # 0 5 = # 5

-igure = $ %n e3ample %ttribute mas! In this case, attributes # and 5 are used along the path to the leaf. The path descends from the node representing attribute # along the arc labeled with attribute # value inde3 8. It then descends from the node representing attribute 5 along attribute 5 value inde3 8 to the leaf. The value stored at the leaf, #, identifies the target classification for e3amples that follow the path described. 9ote that indices are used, instead of string identifiers, for the sa!e of efficiency ) attribute # might correspond to :width; or :color;, for e3ample. The target attribute is always located at position 8 in the mas!. <ther attributes derive their indices from their positions in the dataset.

Dataset, File#arser, $n%alidMetaFileE&ception and $n%alidDataFileE&ception "lasses, % Dataset ob&ect encapsulates a decision tree dataset. .atasets are stored on dis! in two separate files, a meta file, and a data file. The meta file describes dataset attributes and their associated values> the data file contains all the e3amples from the dataset, one line per e3ample. ?hen a Dataset is created, it uses an instance of the File#arser class to parse both the meta file and the data file. 73amples in the data file are converted from string representation to an integer representation to save space. -or instance, the third attribute listed in the meta file might be :Height;, which ta!es on the values :Tall; and :6hort;. The Dataset assigns the Height attribute an inde3 value of 5 2with 8 as the first inde34 ) Tall and 6hort are assigned attribute value indices of 8 and #, respectively. If the meta file contains a synta3 error, the File#arser will throw an $n%alidMetaFileE&ception. i!ewise, errors in the data file cause the File#arser to throw an $n%alidDataFileE&ception. % Dataset ob&ect is capable of splitting a set of e3amples into two groups, a training group and a testing group. 73amples from each group can be accessed independently.

DecisionTree and DecisionTreeNode "lasses, % DecisionTree manages the set of DecisionTreeNode ob&ects that form the current decision tree. 7ach node is identified by a label 2which corresponds to the attribute or target classification that the node represents4 and contains references to any child nodes. The DecisionTree class controls access to the nodes, and maintains tree$related information 2such as the number of internal and leaf nodes in the tree, whether the tree is complete or not, etc.4. % DecisionTree ob&ect can inform associated TreeChangeListeners when the state of the tree changes 2for e3ample, when a node is added or removed4.

DecisionTreeAlgorithm "lass, The DecisionTreeAlgorithm class is the largest and most comple3 class in the series of ai pac!ages, implementing standard decision tree learning and pruning algorithms. In addition to being self$contained 2able to build and prune a tree while running in it(s own thread4, many of the class methods are 'e3posed(, giving GUI classes the ability to support interactive tree building and pruning.

+ac!age ai.decision.gui,
The ai.decision.gui pac!age contains classes that provide access and display services for the underlying decision tree algorithms. AlgorithmMenu and Algorithm#anel "lasses, The AlgorithmMenu class controls the current algorithm thread, allowing a user to run, suspend, bac!up and restart the decision tree learning and pruning algorithms. In each mode, the menu !eeps trac! of available menu options, enabling and disabling certain items

based on the state of the algorithm. Algorithm#anel is a thin wrapper class for the Code#anel that contains the algorithm pseudo$code. DatasetMenu, Dataset#anel and DatasetTa1leModel "lasses, Together, the DatasetMenu and Dataset#anel*classes are responsible for loading and displaying a decision tree dataset. DatasetMenu spawns a separate thread to load a dataset, which !eeps the GUI responsive. The listing of available datasets in the menu is derived from a parameter in the applet(s HTM page 2see the DecisionTreeApplet section for more information4. Dataset#anel uses an instance of the DatasetTa1leModel class to display e3amples from a dataset in a standard 'Ta1le. 9ote that the target attribute is always shown in the leftmost column of the table 2i.e. the leftmost column is the target attribute column4. TreeLa4out#anel, 5isualTree#anel, A1stractTreeNode, 5acantTreeNode, 5isualTreeNode and 5isualTreeArc "lasses, The TreeLa4out#anel class provides a tree display canvas, which can properly position and display an m$ary tree with optimal width and spacing characteristics. The positioning algorithm implemented by the class is detailed in John ?al!er(s #@@8 6oftware, +ractice and 73perience paper :A Node-positioning Algorithm for General Trees;. .ata re*uired by the positioning algorithm is contained in instances of the A1stractTreeNode class. A1stractTreeNode is an abstract class and cannot be instantiated $$ instead, tree nodes that are actually drawn are instances of 5acantTreeNode or 5isualTreeNode, both of which e3tend A1stractTreeNode. % 5acantTreeNode represents a position in the decision tree that is available for e3tension or growth, and is hence 'vacant(. % 5isualTreeNode, on the other hand, provides a visual representation of an underlying DecisionTreeNode. Aecause vacant positions have to be displayed, the applet !eeps two representations of a decision tree at all times. The first is the representation composed of DecisionTreeNodes, which form the tree that is manipulated by the underlying algorithm. The second representation, containing at least as many nodes as the underlying decision tree, provides a visual depiction of the tree and vacant positions. 5isualTree#anel e3tends TreeLa4out#anel and provided support for interactive construction and pruning of the decision tree. In addition, 5isualTree#anel stores a number of global variables, including the current 1oom level. 5isualTreeNodes and 5acantTreeNodes automatically chec! the global variable settings before drawing themselves. The 5isualTreeArc class contains a single static method that draws an arc between two tree nodes. The arc can have a te3t label painted at its midpoint if re*uired

-igure B $ 0elationship between visual and underlying tree nodes

.ecisionTree9odes /isual 0epresentation /isualTree9odes

Underlying 0epresentation

/acantTree9odes

-igure B ) 0elationship between visual and underlying tree nodes

+ac!age ai.decision.applet,
The ai.decision.applet pac!age contains a single class, DecisionTreeApplet, which is the applet implementation. DecisionTreeApplet "lass, The .ecisionTree%pplet class is responsible for instantiating various bac!end and GUI components and assembling the GUI display. The class loo!s for one particular attribute in it(s companion HTM page, the :.atasets; parameter, which references a semi$colon$ delimited list of available datasets.

Vous aimerez peut-être aussi