From Excel To KNIME 081921-44
From Excel To KNIME 081921-44
All rights reserved. This publication is protected by copyright, and permission must be obtained from the publisher prior to any prohibited reproduction, storage in a retrieval system, or
transmission in any form or by any means, electronic, mechanical, photocopying, recording or likewise.
KNIME Press
Hardturmstrasse 66
8005 Zurich
Switzerland
3
Table of Contents
General Usage.................................................................................................................................................................................................................................... 6
Spreadsheets | Workflows and Nodes ........................................................................................................................................................................................................ 7
Folders | Workspace .................................................................................................................................................................................................................................... 8
The KNIME Workbench ....................................................................................................................................................................................................................................... 9
Building a KNIME Workflow .............................................................................................................................................................................................................................. 10
Display Data Table ............................................................................................................................................................................................................................................ 11
Input/Output .................................................................................................................................................................................................................................... 12
4
Filtering / Removing Rows with Different Values Rule-based Row Filter ......................................................................................................................................................... 27
Reordering and Renaming Columns | Column Resorter and Column Rename Node ............................................................................................................................. 31
5
General Usage
6
Spreadsheets Workflows and Nodes
Microsoft Excel is a spreadsheet program, which features calculation, graphing KNIME Analytics Platform implements visual programming. This means that each
tools, pivot tables, and a macro programming language (Visual Basic for data analysis step is represented by means of an icon block, called a node, in a
Applications, VBA for short). graphical editor. Each node can perform one specific task. For example the Excel
Reader node can read one sheet of an Excel file or the Row Filter node allows to
By using cell mathematics, macros, and VBAs you can edit a sheet. This can be filter rows based on a filter criterion.
really easy cell mathematics, like summarizing the values from cells A1 and B1 (=
SUM(A1, B1)), but can be also really complex, embedded logic. A sequence of connected nodes is called a workflow and is the corresponding
concept of an Excel sheet with many functions and/or VBAs.
Data is organized through data tables, where each data cell is identified by a
column header and a Row ID. To visualize the content of a data table, see page 11.
Note. Nodes have four possible states displayed by a little traffic light under the
node itself:
- Not configured -> red light
- Configured -> yellow light
- Successfully executed -> green light
- Executed with error -> red light with cross
7
Folders Workspace
Excel files are normally saved in different folders. A single Excel file can contain The workspace defines the folder where all workflows, data and intermediate
data are saved. One workflow corresponds to an Excel sheet with all formulas,
multiple sheets. visualizations and VBAs. All the projects and datasets saved in a workspace are
available in the KNIME Explorer, located in the top left corner of the KNIME
workbench. The path to the workspace is selected at the very beginning, after
starting KNIME Analytics Platform.
8
The KNIME Workbench
After downloading and installing KNIME Analytics Platform you can start it from the desktop or from the installation folder. The KNIME workbench, which you can see
below, opens including the following panels:
“KNIME Explorer” showing the list of currently available workflows and KNIME servers for the selected workspace and the My-KNIME-Hub mountpoint.
“Workflow Coach” recommending the next node based on the KNIME user statistics and the node currently selected in the “Workflow Editor”.
“Node Repository” containing all currently installed nodes. A “Search” box is available at top of this panel to search for nodes.
“Workflow Editor” in the center allowing for the creation and editing of workflows.
“Node Description” showing a text describing the node task and configuration settings, for the selected node either in the “Workflow Editor” or in the “Node Repository”
panel.
“Node Monitor” showing a preview of the output table of the node selected in the “Workflow Editor”
“KNIME Hub” allowing use of the KNIME Hub to search for nodes, workflows, components, and extensions.
“Outline” offering an overview of the workflow
“Console” showing execution messages, e.g. error and warning messages.
Workflow Editor
KNIME
Explorer Node
Description
Workflow
Coach
KNIME Hub
Node
Repository Console / Node
Outline Monitor
9
Building a KNIME Workflow
KNIME workflows are created by dragging&dropping nodes from the “Node Repository” or “Workflow Coach” panel to the “Workflow Editor”. Use the search box on the
top of the Node Repository or browse through the nodes, sorted by different categories to find the correct node for your next step.
Nodes are connected to each other through their input and output ports. Just click the output port of the first node and release at the input port of the second node.
Nodes that have just been created show a red light status: not yet configured. To configure a node, right-click the node and select the option “Configure” or alternatively
double-click the node. The node “Configuration” window opens. Configure the node and close the configuration window. If the configuration is successful, the node status
changes to a yellow traffic light.
The node is now configured, but not yet executed. To execute the node, right-click the node and select the “Execute” option. If the execution is successful, the node
changes its status to a green light.
Step 1: Search and create a node via drag& drop Step 2: Connect the nodes
Note 1: To create a new, empty workflow right click in the KNIME Explorer panel, select “Create New KNIME Workflow…” and define the name and destination of the new
workflow in the new window.
Note 2: Click the magnifier next to the search box in the node repository to change the mode of the search box to a fuzzy search. This makes finding the correct node
easier in the beginning.
Note 3: The “Getting Started Guide” guides you step by step through building your first example workflow.
10
Display Data Table
In Excel what you see is what you get. This means that the data table you see is The output data tables produced after node
the final data table. execution are always available. To see them:
- Right-click the node in the workflow
- Select the last option in the context
menu
Note: Some nodes, like plotting and modeling nodes, also have a more complex
“View” function. The option leading to this “View” is usually displayed in the middle
of the context menu.
11
Input/Output
12
Opening an Excel File Excel Reader Node
File path
Sheet
Name
Column headers
Note: To read multiple Excel files which have the same column headers and are
all stored in one folder, select in the Input location part as Mode “Files in folder”.
This reads all Excel files and concatenates them, aka stacks them on top of each
other.
13
Opening a CSV or txt File CSV Reader Node
Delimiter
Column header
Headerssd
Note 1: Click the “Autodetect format” button if the node doesn’t create the preview.
Note 2: Check out the additional tabs to limit the number of rows or to change the
encoding.
14
Importing Content from Multiple Files of the same Type to a Single Table
Mode
Filter
Note: In the Transformation tab you can define whether you want to use the union
or the intersection of the columns from the different tables.
Union or intersection
15
Importing Content from Multiple Sheets into a Single Table
Note 1: Lesson 3 of the free KNIME Self-Paced Course L2-DW KNIME Analytics
Platform for Data Wrangles course introduces flow variables.
Note 2: Lesson 4 of the free KNIME Self-Paced Course L2-DW KNIME Analytics
Platform for Data Wrangles introduces loops in KNIME.
16
Saving an Excel File Excel Writer Node
Output location
Sheet name
Column headers
and row key
Note 1: To write multiple tables into different sheets you can add dynamic input
ports and define a sheet name for each input table.
17
Adding a Sheet to an Excel File Excel Writer Node
Location of the
existing File
Append option
Sheet settings
18
Data Types in Excel Data Types in KNIME
19
Connect to a Database Database Connector Nodes
A number of database
connector nodes are
available to connect to the
most commonly used
databases. However, the
DB Connector node allows
you to connect to all JDBC
compliant databases.
There are more database nodes to help build a SQL query for in database
processing. You can use them in between the DB Table Selector and the DB Reader
node.
20
Tips on Reading Data with KNIME Analytics Platform
All reader nodes require a path to the input file location. Let’s collect some Tips&Tricks Tip&Trick 3: Reading from another file system:
for this:
KNIME Analytics Platform allows you to connect and read from many
Tip&Trick 1: Use drag&drop from the KNIME Explorer: different sources / file system, e.g. Amazon S3, Microsoft SharePoint
Online, Databricks to name just a few. Three steps are necessary (the
Data files saved in the workspace folder are available in the KNIME Explorer panel (top file handling guide gives you further information).
left panel). To read in one of these files, you just drag&drop the file from the KNIME
Explorer panel to the workflow editor. KNIME automatically creates the correct reader c
node and sets the path of the input location. Step 1: Click “...” in the bottom left corner of the reader node icon to
add a File System Connection port
Tip&Trick 2: Different options to define a file path:
In KNIME we have different options to provide a file path. This becomes important when
you start sharing your workflows or exporting them to other KNIME Analytics Platform
installations or KNIME Servers. There are four default file systems available in KNIME
Analytics Platform.
• Local File System: Allows you to select a file/folder from your local system.
Step 2: Connect to the desired file system via the dedicated
• Mountpoint: You can connect to a KNIME Server or the KNIME Hub via additional connector node and connect it with the reader node
mountpoints in the KNIME Explorer. To read data from either LOCAL or another
mountpoint select “Mountpoint“. When selected, a new drop-down menu
appears so that you can choose the mountpoint. Unconnected mountpoints are
grayed out but can still be selected (note that browsing is disabled in this case).
Go to the KNIME Explorer and connect to the mountpoint to enable browsing.
• Relative to: Allows you to choose whether to resolve the path relative to the
current mountpoint, current workflow, or the current workflow's data area. When
Step 3: Select the file/folder in the connected file system
selected a new drop-down menu appears to choose which of the three options
to use.
21
Appending / Joining Data
22
Appending Data Concatenate Node
Note 1: Before copying and pasting ensure that all tables have the same column
structure. The Concatenate node writes two or
more tables below each other.
23
VLOOKUP Filter and Joiner Node
2. Join columns based on a primary key (look up value), e.g. join product
information based on the product ID.
An alternative function for the second task is INDEX MATCH. Note 1: Your full original table is still available at the output port of the Table
Reader node. See more information about the Row Filter and Column Filter nodes
on pages 26 and 30.
2. Join columns based on a joining column, e.g. join product information based on
product ID.
Filter
Columns
Join column(s)
Columnsdsdffd
Join mode
24
Filtering, Transformations,
and Aggregations
25
Filtering / Removing Rows with a Specific Value Row Filter
The Row Filter node filters the table based on a filter criteria,
e.g. by including / excluding all rows with a certain value in the filter column.
Filter column
To remove rows, select the rows you want to delete, right click and select “Delete
Rows”.
Note 1: On the right you can choose whether you want to include or exclude the
rows with the matching value
Note 2: If you only interested in the rows with one specific value you can use the
Row Filter node.
Note 3: If you want to include rows based on different values you can use the
Rule-based Row Filter. (See next page).
Note 4: Further filter options are available, e.g. on a numerical range, filter rows by
row number or row ID, or missing values only.
26
Filtering / Removing Rows with Different Values Rule-based Row Filter
Select the values you are interested in from the drop down menu. Column List
List of functions
Expression
To remove rows you have to select the rows you want to delete, right click and
choose “Delete Rows”.
Note 1: At the bottom of the configuration window you can choose whether you
want to include or exclude TRUE matches.
Note 2: Columns are given by their name surrounded by $. Add them to the
expression frame by double clicking a column name in the Column List.
Note 3: The Rule-based Row Filter node has a number of different functions for
many advanced filter options.
Note 4: Different rows in the expression frame work like an OR conjunction.
27
Removing Duplicates Duplicate Row Filter
Note 1: In the “Advanced” tab you can change the treatment for duplicates, for
example to keep duplicate rows and to add a column showing which of the rows
are unique, chosen, or duplicates.
28
Sorting Rows by Multiple Key Columns Sorter Node
Note 1: You can add as many key-columns as you want by clicking the “Add Rule”
button.
Note 2: You can temporarily sort the output table of a node. Click on the column
header based on which you want to sort and select whether you want to sort
ascending or descending.
29
Removing Columns Column Filter Node
Note 1: You can use the arrow buttons in the middle to move columns from the
Include to the Exclude frame and vice versa.
Note 2: You can use the Wildcard/Regex Selection to automatically remove
columns by a name patter.
Note 3: You can use the Type Selection to automatically remove columns by data
type.
30
Reordering and Renaming Columns Column Resorter and Column Rename Node
To rename a column just click on the column cell and change the cell value.
To rename columns you can use the Column Rename node. Double click the
column you want to rename, activate the checkbox “Change” and define the
column header in the textbox.
31
Changing Data Types String to Number and Number to String
Note 1: In the String to Number node you can choose between different numerical
types, e.g Double, Integer, and Long.
32
Tip on Data Manipulation with KNIME Analytics Platform
Resort columns
Remove columns
33
Data Aggregation
34
Pivot Tables Pivoting Node
Column Values
Row s
s
Note 1: The Pivoting node doesn’t have “Filter” options, but you can simply use a
Row Filter node beforehand.
Note 2: In KNIME you have to choose at least one column for the Groups and
Pivots. In case you want to choose only “Rows” you can use the GroupBy node.
35
Pivot Table without Columns GroupBy Node
Values
Rows
36
Unpivot Unpivoting Node
• This opens the “Power Query Editor”. Select the columns to unpivot by
holding down the shift key.
• Click the “Transform” tab of the Power Query Editor and select “Unpivot
Columns”.
• Click the “Home” tab of the “Power Query Editor”, and select “Close &
Load” to save the data unpivoted back in the Excel workbook.
37
Math Functions and
Text Functions
38
Math Functions Math Formula Node
Note 1: You can decide whether you want to append a new column or replace one
of the columns, by using the checkboxes underneath the Expression frame.
Note 2: By activating the checkbox “Convert to Int” you can ensure that your
output appended / replaced column is of type Integer.
Note 3: To perform the same mathematical expression on multiple columns you
can use the Math Formula (Multi Column) node.
39
Math Functions Math Formula Node
ABS(number1) abs(Col) = The absolute value for all values in the selected column
40
Concatenation and Find& Replace String Manipulation Node
In the dialog that opens, you can define the value you want to replace and the
value you want to replace it with.
2. Find&Replace
41
Formatting Excel Tables
42
In chapter 1 we introduced the Excel Writer, which you
can use to write your result table into an Excel Sheet.
By default, this is a simple table without any
formatting like colors, border cells, etc. In this chapter,
we want to show you how to use the XLS Formatting
nodes of the community extension Continental Nodes
for KNIME. These nodes enable you to add formatting
instructions and advanced settings to already existing
XLS files, so that you can create Excel reports that
have the look and feel you used to.
43
Figure 2 Bottom left you can see a control table with tag values, which is the key for your styled table. Based on
the tag values the yellow XLS formatting nodes collect formatting instructions, which are them applied by the
XLS Formatter (apply) node, producing the styled table (top right).
44
This chapter is divided into two sections. The first section of this chapter
shows two ways of creating an XLS Control Table with tags. The second
section introduces some of the nodes that are available to add formatting
instructions.
Hint: You can’t find the nodes in your node repository? The Continental
Nodes for KNIME are a community extension that you can install by
dragging the extension from the KNIME Hub to KNIME Analytics Platform
or by installing the extension as described in this video.
As the saying goes, many roads lead to Rome. This section introduces
two different roads or approaches for creating an XLS Control Table.
(The second approach happens to be my personal favorite!) The “key
node” in both examples is the XLS Control Table Generator node.
Figure 3 The configuration dialog for the XLS Control Table Generator node
45
Approach 1: Table Creator + XLS Control Table Generator
The first approach to create an XLS Control Table with tabs involves a
combination of a Table Creator and an XLS Control Table Generator node.
This is an easy approach, however the downside is that it entails a lot
manual work creating the tag table and you have the problem that the tag
table is static. Therefore, this approach is only recommended for small
tables, where the number of rows and columns won’t change.
Open the configuration window of the Table Creator node to add one or
multiple tag values for each cell. If you want to enter multiple tags,
remember to separate them with a comma. The XLS Control Table Here you can see one option to create a control table using the
Generator node transforms the table into an XLS Control table and Table Creator node and the XLS Control Table Generator node
replaces the column names with letters and the row IDs with numbers.
The checkbox “write column header to first row” gives you the option of
retaining the column headers, similar to the option “add column headers”
in the Excel Writer node.
46
Figure 5 Here you can see the resulting table when activating the checkbox “unpivot result
table” in the XLS Control Table Generator node. The node creates one row for each cell
including value, row number, column header, etc.
This table is a great basis to now transform values into tags with the Rule
Engine node. For example, we can replace all values in the first row with the
tag “header”, or replace all values in the first column that have a row number
higher than 3 three with the tag value “cw”.
Hint: Activate the checkbox “Replace Column” and select the column “Value”.
47
Figure 6 On the left is the configuration dialog of the Rule Engine node. In the Expression section you can see
defined rules to replace the original values with tags, based on the row and column number. On the left you can
see the output table where the rules are applied, and the values are replaced with the different tags
A second XLS Control Table Generator node can transform this table back into its original
form, where the values are replaced with the different tag. This feature is automatically
activated when the node detects an input table that was created by an XLS Control Table
Generator node in unpivot mode.
This approach involves much less manual work compared to the first approach and can be
implemented in a way to handle changing table dimensions gracefully.
Tip: If you want to write multiple tables with changing tables spec below each other you can
create for each table a XLS Control Table and concatenate them afterwards.
Hint: Another helpful node to create a static XLS Control Tables is the XLS Control Table
from Cell Range node.
48
Adding Formatting Actions based on Tag Values
The next step to adding background colors or a border, etc. to your table is a sequence of
XLS Formatting nodes, similar to the workflow in figure 2.
As you can see all nodes in the example workflow have two input ports and one output port:
The green square is a special port type of the extension, which collects the different
formatting instructions. The data input port expects the table with the tag values.
The optional input port can be used to feed an XLS Formatting table with previous
formatting instructions to which the instructions of the node should be added.
The figure on the right shows you an overview of all the nodes in the Continental extension. I
will introduce my favorite ones and leave it up to you to explore the others.
49
The XLS Background Colorizer node changes the The XLS Conditional Formatter node changes the
background color of cells. You can assign either a background for the cells with a certain tag value
static color and / or pattern fill. One option is to assign according to their numerical value. In the configuration
the same color to all cells with a specific tag value, e.g. window you can define a color scale by setting a
all cells with tag “header” should have a yellow minimum and maximum value and assigning a color to
background. Another option is to use RGB values in each. Optionally you could set a mid point value and
either hex syntax #FFD800 or decimal syntax R/G/B as assign a color to that. Cells with values higher or lower
tags and use them as the background color. than the thresholds will have the background color of
the minimum /maximum value.
The XLS Sheet Selector and the XLS Merger node are The XLS Border Formatter node can add borders to a
really helpful nodes if your EXCEL file has more than given range specified by a certain tag or by all tags. By
one sheet. By default the formatting is always applied activating the corresponding checkboxes, you can add
to the first sheet. So, if you have an Excel file with only borders to the top, right, bottom, and left. In addition to
one sheet you don’t have to worry about these two adding a border around the range specified by the tags,
nodes. However, if you have multiple sheets the XLS the node gives you the option to use inner vertical and
Sheet Selector allows you to define which sheet your horizontal boarder lines in each cell, too.
XLS Control table is for.
50
The XLS Formatter (apply) node reads an unformatted The XLS Cell Merger node merges the cells for given
Excel file, applies all the collected formatting rectangular ranges of input tags into one cell. For
instructions, and saves the nice Excel file in the defined example, we can merge all cells in the first row and
output location. centralize the title with the XLS Font Formatter node.
This node works only on strictly rectangular ranges.
The value of the merged cell is the value of the most
top left cell of the merged range.
The XLS Format Merger node allows you to either combine formatting instructions for different sheets prior to
using the XLS Formatter (apply) node or when applied to the same sheet, it merges the properties at the lowest
detail level (e.g. the formatting instructions for the cell A1 is bold in control table one and italic in control table two.
The subsequent formatting instruction for A1 is italic and bold). Thereby, the upper input port overwrites a lower
one in case of conflicting information (e.g. two different font colors for the same cell).
This was a short introduction. You can find further information about the
different XLS Formatter nodes in the Continental extension in the
documentation https://www.knime.com/community/continental-nodes-for-
knime-xls-formatter or, from within KNIME Analytics Platform, by looking in the
node description of each individual node.
51
The KNIME Booklet for Excel Users
Are you an experienced Excel user and want to start using KNIME Analytics Platform?
It’s sometimes difficult to switch from one software tool to another. But this booklet is the
perfect starting point as it maps the most commonly used Excel functions and techniques
to their KNIME equivalents. Find out, for example, how data reading, filtering, sorting, and
vlookup work in KNIME.
For a complete introduction to KNIME, please refer to my book “KNIME Beginner’s Luck”
available from KNIME Press under https://www.knime.com/knimepress
Kathrin Melcher is currently a Data Scientist at KNIME. She holds an MSc in Mathematics,
from the University of Konstanz, Germany. She joined the KNIME Evangelism team in May
2017 and has a strong interest in data science, machine learning, and algorithms. She enjoys
teaching and sharing her knowledge on these topics.
52