Académique Documents
Professionnel Documents
Culture Documents
In this part we present how to define global satisfaction and how to see all interactions between variables.
The import wizard automatically detects the file separators and title line.
The first column is an identifier. Since this information is not useful for analysis, the column becomes grey: it is unused.
The file contains missing data. The average value of present data shall replace any missing value in the considered column.
Data information is displayed here. 711 poll responses are gathered in this dataset.
Variables represent evaluation marks from 1 to 10. Manual discretization allows showing repartition function of the selected continuous variable.
Generate a discretization with equal distances with three intervals leads to this graph.
For characterizing global satisfaction, the first step is to use the search function for finding Satisfaction node.
This node is the target variable of the analysis. We are interested in the >7 satisfaction value.
The augmented Markov blanket shall be used for characterizing the target variable. It allows to find the minimal set of variables that characterize global satisfaction.
Zoom in and out tools are available for better graph visualization.
Force directed layout positioning algorithm allows organizing the nodes on the workspace
While switching to validation mode, note that only 15 nodes among 215 are selected relevant by the network
For highlighting important relationships between variables, the force of the arcs tools shall be used.
An arcs thickness is proportional to its relevance with regards to target variable. SE1 variable is the most important for global satisfaction
SE1 node is in first position : it is the most important variable of this analysis.
The probabilistic profile of polls presenting a global satisfaction mark >=7 is also reported.
After closing the report, note that it is possible to monitor all correlations between variables by right clicking in the right side of the screen.
The monitors display the probability distribution and permit changing the variables values.
Monitors can be used for finding the probabilistic profile of polls presenting high satisfaction mark.
When clicking on this modality, the probabilities are propagated throughout the network. The probabilistic profile becomes readable.
The same technique can be applied to other modalities and variables. The results are automatically propagated to the remaining variables.
After target variable characterization, the second part of this tutorial explores the relationship between all variables of the poll.
By using positioning and zoom tools, the graph becomes more readerfriendly. In this case, where the graph is large but with average connectivity, symmetric positioning is adequate.
For increasing network readability, a comments dictionary can be linked with the graph. In this file, the name of each node is completed with comments.
A modality dictionary can also be interactively designed. This can be done by double clicking on a node and opening modality name sheet
Once the modalities labels are validated, the dictionary can be exported as a text file
The dictionary can now be associated back to all nodes of the graph
The same process can be applied for attributing values to modalities and generating modality values dictionary. This is done in modelization mode, by double clicking a node and opening the values sheet.
When the modality is poor, it marks 0 points, 10 points for average and 20 points for very good
The same process consisting of exporting the dictionary, modifying the text file and importing back can be applied for attributing values to all nodes modalities
The total and average values of the graph modalities are calculated
Every question is related to a theme. For instance, this pool has 36 themes. The class concept in BayesiaLab is useful for associating themes to nodes. The themes dictionary is contained in a text file.
By clicking on the new-appeared icon on the bottom right of the window, the class editor opens. It becomes possible to apply modifications to classes instead of applying to nodes
Opens the class editor
The readability can be increased by applying automatic class colours. This is done by selecting all the classes with <ctrl + a> and clicking the color button.
Note that nodes are globally gathered by colour. This provides useful information about links inter and intra-theme. In this case, this also denotes a welldesigned poll.
When closing the Edit classes window, the nodes become coloured depending on their class.
In this example, themes have been created base on expert knowledge. Nevertheless, BayesiaLab provides tools for automatic theme design by grouping semantically close variables.
In validation mode, the variable clustering is based on association rules discovering in the network.
Moving this cursor forces the number of groups. The nodes colours are also changed.
BayesiaLab is able to build latent variables according to the recently realized clustering.
In modelization mode, the multiple clustering allows clustering individuals from each single variable group.
This wizard tunes the multiple clusterings realized. (one per identifier cluster).
In the same fashion as data clustering, a HTML report is created for each clustering. They are useful for renaming new variables and their modalities
Once the clusterings are realized, a new network is created with one node per latent variable (keeping the initial colour)
An internal database is created. It contains the most probable cluster values for each line of the initial file. This database can be saved in a spare file with the data menu.
Probabilistic relationships between the nodes of this new network can be discovered with the SopLEQ algorithm. After computation and automatic nodes positioning, the obtained network present 51 nodes representing the latent variables of the initial dataset.