Académique Documents
Professionnel Documents
Culture Documents
Probability Modeling
Hal Hagood
u07a1
PROBABILTY MODELING 2
(Instructions) Using the case study “O” and classroom instructions the following procedure was used …
“1) In SAS Enterprise Miner, create a new project, and a new diagram within that project, then:
2) Click on Sample > File Import node to import the MS Excel file into a SAS dataset.
3) Be sure Category Gross is set to "Target" for role in the File Import node.
4) Use Text Miner > Text Parsing node to parse the data (default settings are OK)
5) Use Text Miner > Text Filter node to filter the text (default settings are OK, although you can adjust if
you wish to do any of the additional detailed steps in original tutorial from the textbook)
6) Use Text Miner > Text Cluster node to cluster the data with all default settings except the following:
SVD Resolution should be set to "High" and Max SVD Dimensions should be set to "5"
8) Once this full diagram is run, go into the properties for the Text Cluster node, and click the ... to the
9) In the "Sample Properties" window, click on "Plot..." leave it as Scatter (the first option), and click
"Next>" Scroll down in the list of variables, and click in the cell under the "Role" column next to
"TextCluster_SVD1" and select "X". Click in the cell under the "Role" column next to "TextCluster_SVD2"
and select "Y." then click "Finish." This will create a scatterplot of SVD1 vs SVD2.
10) Do the same for each combination of the 5 SVD variables, and take screenshots of each as you go,
Create a scatter plot of the SVD plots using a statistical software for text mining
TextCluster_SVD1 vs TextCluster_SVD2.
PROBABILTY MODELING 6
TextCluster_SVD1 vs TextCluster_SVD3.
TextCluster_SVD1 vs TextCluster_SVD4.
PROBABILTY MODELING 7
TextCluster_SVD1 vs TextCluster_SVD5.
TextCluster_SVD2 vs TextCluster_SVD3.
PROBABILTY MODELING 8
TextCluster_SVD2 vs TextCluster_SVD4.
TextCluster_SVD2 vs TextCluster_SVD5.
PROBABILTY MODELING 9
TextCluster_SVD3 vs TextCluster_SVD4.
TextCluster_SVD3 vs TextCluster_SVD5.
PROBABILTY MODELING 10
TextCluster_SVD4 vs TextCluster_SVD5
Add a Decision Tree Node and link it up to the Text Cluster Node by going to Model → Decision Tree.
“Be sure to include a screenshot or download the visual representation of your Tree from the
results of the Decision Tree diagram, and explain what it means in terms of the content/text of the movies
(and their descriptions) and the original business problem of attempting to predict box office
Reference
https://courserooma.capella.edu/webapps/blackboard/content/listContent.jsp?course_id=_49663_
1&content_id=_5183598_1&mode=reset
Viewer.Books, (2017). Practical Text Mining and Statistical Analysis for Non-structured Text Data
Applications Tutorial O - Predicting Box Office Success of Motion Pictures with Text Mining.
http://viewer.books24x7.com/assetviewer.aspx?bookid=49265&chunkid=981083261&resumeboo
kmarkid=8bd103e7-f488-e711-a9c3-00505686029c#