Académique Documents
Professionnel Documents
Culture Documents
with
KNIME Analytics Platform
KNIME AG
https://en.wikipedia.org/wiki/Cross_Industry_Standard_Process_for_Data_Mining
Model
Data Model Model
Optimizatio Deployment
Preparation Training Evaluation
n
It always starts
with some data
…
Data Manipulation Model Training Parameter Tuning Performance Measures Files & DBs
Data Blending Bag of Models Parameter Optimization Accuracy Dashboards
Missing Values Handling Model Selection Regularization ROC Curve REST API
Feature Generation Ensemble Models Model Size Cross-Validation SQL Code Export
Dimensionality Reduction Own Ensemble Model No. Iterations … Reporting
Feature Selection External Models … …
Outlier Removal Import Existing Models
Normalization Model Factory
Partitioning …
…
Model
Data Model Model
Optimizatio Deployment
Preparation Training Evaluation
n
Original Partitioning: Training Set Validation Set Test Set New Data from Real
Data Set • Training Set World Applications
with Past • Validation Set
Observation • Test Set
s
• Databases
– MySQL, PostgreSQL
– any JDBC (Oracle, DB2, MS SQL
Server)
• Files
– CSV, txt
– Excel, Word, PDF
– SAS, SPSS
– XML
– PMML
– Images, texts, networks, chem
• Web, Cloud
– REST, Web services
– Twitter, Google
• Spark
• HDFS support
• Hive
• Impala
• In-database processing
• Preprocessing
– Row, column, matrix based
• Data blending
– Join, concatenate, append
• Aggregation
– Grouping, pivoting, binning
• Feature Creation and Selection
• Regression
– Linear, logistic
• Classification
– Decision tree, ensembles, SVM,
MLP, Naïve Bayes
• Clustering
– k-means, DBSCAN, hierarchical
• Validation
– Cross-validation, scoring, ROC
• Deep Learning
– Keras, DL4J
• External
– R, Python, Weka, H2O, Keras
• Interactive Visualizations
• JavaScript-based nodes
– Scatter Plot, Box Plot, Line Plot
– Networks, ROC Curve, Decision
Tree
– Plotly Integration
– Adding more with each release!
• Misc
– Tag cloud, open street map,
molecules
• Script-based visualizations
– R, Python
• Database
• Files
– Excel, CSV, txt
– XML
– PMML
– to: local, KNIME Server, SSH-,
FTP-Server
• BIRT Reporting
The buttons in the toolbar can be used for the active workflow. The most
important buttons:
– Execute selected and executable nodes (F7)
– Execute all executable nodes
– Execute selected nodes and open first view
– Cancel all selected, running nodes (F9)
– Cancel all running nodes
Not Configured:
The node is waiting for configuration or incoming data.
Configured:
The node has been configured correctly, and can be executed.
Executed:
The node has been successfully executed. Results may be
viewed and used in downstream nodes.
Image
DB Connection DB Data
Data
• Right-click node
• Select Execute in the context menu
• If execution is successful, status
shows green light
• If execution encounters errors, status
shows red light
• Right-click node
• Select Views in context menu
• Select output port to inspect execution results
KNIME Forum
Account Credentials
1. Edit Metadata
Node Execution Shift + F10 executes all configured nodes and opens all views
F9 cancels selected running nodes
Shift + F9 cancels all running nodes
Node Connections Ctrl + L connects selected nodes
Ctrl + Shift + L disconnects selected nodes
Ctrl + Shift + Arrow moves the selected node in the arrow direction
Move Nodes and Ctrl + Shift + PgUp/PgDown moves the selected annotation in the front or in the back of all
Annotations overlapping annotations
F8 resets selected nodes
Ctrl + S saves the workflow
Workflow Operations
Ctrl + Shift + S saves all open workflows
Ctrl + Shift + W closes all open workflows
Metanode Shift + F12 opens metanode wizard
Blog: knime.com/blog
Follow us on social
media:
Forum: forum.knime.com
KNIME Hub:
hub.knime.com
• Today we construct a workflow that joins diverse data sources into a set
of complete customer records. Using this, we will build and deploy a
predictive model to find people who might be interested in a newly
available product.
Output port
Status
Node label
File path
Basic
Settings
Advanced
Settings
Preview
Help Button
Mountpoint-relative URL
Local path
Sheet
specific
settings
Preview
This educational material was produced for the course held at ®
Copyright © 2019 KNIME AG 9
63 ODSC India 2019 in Bangalore on Aug 10, 2019. Do not copy or
distribute.
New Node: Table Reader
File path
Join by ID
Inner Join
Join by ID
Missing values in
the right table
Missing values in
the left table
Joiner mode
Aggregation methods
Select criteria to
keep row
• Workflow annotations
• Node labels
• Metanodes
− Right click -> Create
Metanode...
− Organize workflow by
task
− Hide complexity &
improve readability
1 Column
Color range
Discrete
for numerical
colors for
values
nominal
values
Apply selection
Table View
• Click layout button when inside • Add views and rows via drag&drop
Component to assign views to rows • Add columns using + buttons
and columns
Aggregation: Count
Sex Hair Age Sex blond brown black red
f blond 31 f 2 1 1 0
m red 22 m 1 1 0 2
f blond 53
m brown 16
f brown 47 Aggregation: Mean(Age)
f black 22 Sex blond brown black red
m blond 13 f 42 53 22 0
m red 55 m 13 16 0 38,5
Pivots ~ Columns
Aggregation
Example Applications:
• Anomaly Detection (fraud, predictive maintenance)
• Association Rule Learning (market basket analysis)
• Clustering (market segmentation)
• Classification (next best offer, churn preventions)
• Regression (trend estimation)
Train
Training
Model
Set
Apply Score
Model Model
Original
Data Set
Test
Set
New Data!
• Methods
− Decision Trees
− Neural Networks
− Naïve Bayes
− Logistic Regression
This educational material was produced for the course held at ®
Copyright © 2019 KNIME AG 7
122 ODSC India 2019 in Bangalore on Aug 10, 2019. Do not copy or
distribute.
Target Column
• C4.5 builds a tree from a set of training data using the concept of
information entropy.
• At each node of the tree, the attribute of the data with the highest
normalized information gain (difference in entropy) is chosen to split the
data.
• The C4.5 algorithm then recurses on the smaller sub lists.
• Methods
− Linear
− Polynomial
− Regression Trees
− Partial Least Squares
• Applications
− Market Segmentation
− Diversity picking
• Methods
− K-means/medoids
− Hierarchical
− DBScan
− OPTICS
− Neighbourgrams
…
1
5 2 2 7 7 6
• Pick a different random subset of the training data for each model in the
ensemble (bag)
1 4 1
5 2 5 7
… 7 6
2 9 6 7 2 8 9 3 3 9 5 7
• Allows testing the model using the training data: when validating, each
model should only vote on data points that were not used to train it
X1 X2
1 4 1 1 4 1
5 2 2 7 … 7 6 5 2 2 7 … 7 6
2 9 6 7 6 8 9 3 3 9 5 7 2 9 6 7 6 8 9 3 3 9 5 7
P1 P2 … Pn P1 P2 … Pn
y1OOB y2OOB
5 2
2 9 6 7
• A loop block is defined by appropriate loop start and loop end nodes
• Loop body = Nodes in between and side branches
Loop body
Loop
end
Loop start node
node
(Hint: don’t forget to use the Flow Variable in your learner node)
This educational material was produced for the course held at ®
Copyright © 2019 KNIME AG
21 162 ODSC India 2019 in Bangalore on Aug 10, 2019. Do not copy or
distribute.
Integrating External Tools
• This session gives a quick overview of the external tools that can be called
within KNIME, e.g.:
− Java, R, Python
− Web services
Syntax highlighting
Create and
store
templates
R workspace
Show
results
Evaluate
script R console
output
JSON Response:
XML Response:
https://www.knime.com/blog/a-restful-way-to-find-and-retrieve-data
https://www.knime.com/blog/OSM-meets-CSV-file-and-Google-API
https://www.knime.org/blog/giving-the-knime-server-a-rest
This educational material was produced for the course held at ®
Copyright © 2019 KNIME AG 12
174 ODSC India 2019 in Bangalore on Aug 10, 2019. Do not copy or
distribute.
KNIME Server as a REST resource
• Use the XML Reader (or the GET Resource) nodes to get
an XML cell
• Use XPath nodes to query the XML and extract certain
parameters
• Editor window simplifies construction of XPath queries by
auto-generating them (click on XML elements)
Enter
credentials
Upload!
Create
Create new
directory in
bucket
bucket
Create URIs
of local file
paths
Input Output
• File (CSV, Table, XLS, …) • Report (BIRT, Tableau,
• Database Spotfire)
• JSON for REST API • Email
• File (CSV, Table, XLS, …)
• WebPortal
To BIRT Report
Also available:
Nodes for Tableau
and Spotfire
Step 3
Step 1 Step 2 Step 4 Step 5
Customize
Upload File Select Columns Interactive View Download Image
Column Domains
WebPortal Page
(Step 1)
Upload File
Available in
WebPortal Page
KNIME Server (Step 4)
Interactive View
File
Selection Column
Selection
Stacked
Area Chart
Filter by
Row Filter
Range
Provide email
credentials, host, etc.
Convert binary
column to file and
save to temp dir
Open the workflow and click the Report Editor button in the tool bar