0% found this document useful (0 votes)
180 views7 pages

How To Process Excel Workbook Using Talend?

The document discusses how to process Excel workbooks using Talend. It describes using the tFileList and tFileInputExcel components to create a double loop to iterate over files in a directory and sheets within each workbook. As the files and sheets are processed, the filename and sheet name are extracted and used to populate additional columns for the encoded business values they contain. This allows values like operating region embedded in the filename and sales region in the sheet name to be loaded into the database along with the row/column data. The document provides an example configuration and expressions to extract the encoded values from the filename and sheet name.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
180 views7 pages

How To Process Excel Workbook Using Talend?

The document discusses how to process Excel workbooks using Talend. It describes using the tFileList and tFileInputExcel components to create a double loop to iterate over files in a directory and sheets within each workbook. As the files and sheets are processed, the filename and sheet name are extracted and used to populate additional columns for the encoded business values they contain. This allows values like operating region embedded in the filename and sales region in the sheet name to be loaded into the database along with the row/column data. The document provides an example configuration and expressions to extract the encoded values from the filename and sheet name.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

How to Process Excel

Workbook Using Talend?


About Bigdata Dimension Labs
Bigdata Dimension Labs was built on the ideal that every business should be
positioned to make more profitable, data-driven decisions. To achieve this, they
have partnered with the best modern data sharing technologies in the world and
are recognized as centers of excellence for:

Bigdata Dimension Labs have empowered organizations such as 3M, Agile and
Bill Gates Foundation with 360° performance overview, real-time analytics for
better decision-making, standardized reporting and faster data access and
processing.

Imagine what they can do for you.

Find out more at BDDLabs.com

How to Process Excel Workbook Using Talend? 2


Content
How to Process Excel Workbook Using Talend? .............................................. 4

Conclusion ................................................................................................................ 7

How to Process Excel Workbook Using Talend? 3


How To Process Excel Workbook Using Talend?
Processing input files often involves looking in a directory and iterating through a set of files that match a
particular filename pattern (for example, *.xlsx for Excel 2007 files). There may be additional processing
required if each Excel file is a Workbook with more than one Sheet.

This example uses the tFileList and tFileInputExcel components to create a double-loop of processing.
The outer loop iterates over the files in a directory, defined as a global variable using a ContextGroup and
Context. The inner loop iterates over each Sheet in the Workbook. As the files are processed, both the
Excel filename and the Sheet name are used in the data loading. That is, the filename and the sheet name
contain encoded business values that aren’t found in the rows and columns, but that needs to be loaded
in the database.

In this tutorial, an Operating Region is embedded in the filename. So, “exceltest_west.xlsx” results in
“west” being used for that column. Sales Region is embedded in the Sheet name. A Sheet named “New
England” would provide the values for that column.

Start off by creating a DATA_DIR variable in the context. Create a ContextGroup matching the Job name.
Add a DATA_DIR variable.

Talend Open Studio – Create Context Group and Variable

How to Process Excel Workbook Using Talend? 4


Then, add a default value for the variable, DATA_DIR. In practice, there might be Contexts created for
each environment.

Talend Open Studio – Set Variable Value


Drag the ContextGroup onto the Job so that DATA_DIR is available.

Next, add the four components: tFileList, tFileInputExcel, tMap, and tFileOutputDelimited (append mode).

Talend Open Studio – Add Components


Configure the components. Use the following as a guide for configuring the tFileList component.

Talend Open Studio – Configure tFileList

How to Process Excel Workbook Using Talend? 5


Configure the tFileInputExcel component. The most important part of this configuration is to use both the
DATA_DIR variable and the CURRENT_FILE_PATH as the File name/Stream. CURRENT_FILE_PATH
comes from the tFileList component and is prepended with “tFileList_1_”.

A tMap component is used to produce a delimited text file. All of the fields in the spreadsheet appear in
the tMap and are dragged to the empty schema of the text file. Two extra fields are added: OperatingRe-
gion and Sales Region.

Talend Open Studio – Map Fields


Here is the expression used for OperatingRegion. It uses some Java String functions to break off the suffix
(.xlsx) and the first part (exceltest_).

((String)globalMap.get("tFileList_1_CURRENT_FILE")).split("\\.xlsx")[0].split("_")[1]

split() is a Java function and the above statement isn’t particularly elegant. This can be cleaned up using a
library like Jakarta Commons Lang. Look for StringUtils.substringBetween() to grab the middle of a String
without the brittle indexing. Take a look at this blog post for instructions on how to bring this functionality
into Talend.
StringUtils.substringBetween( (String)globalMap.get("tFileList_1_CURRENT_FILE"),
"exceltest_", ".xlsx")

Here is the expression used for SalesRegion.

(String)globalMap.get("tFileInputExcel_1_CURRENT_SHEET")

It’s not uncommon for an Excel file to be contain special values encoded in the filename or the Sheet name.
A few global variables set by components during processing can capture these. This post used _CUR-
RENT_FILE, a variable from tFileInputList, and _CURRENT_SHEET, from tFileExcelInput. The available
globals are listed in the documentation, but to get the exact syntax, use auto complete.

How to Process Excel Workbook Using Talend? 6


Conclusion
You, too, are ready to begin your own enterprising journey. We hope that How to Process Excel Workbook
Using Talend? has armed you with the knowledge and know-how.

If you still want to learn more, I encourage you to visit BDDLabs.com to gain more information
about modern cloud data sharing. You can also access documentation, view webinars, browse our
offers, view scoops of upcoming events, and get support. We also invite you to schedule a free
demo of our data sharehousing technology so your business can get started right away!

Email: [email protected]
Phone: 888-856-2238

How to Process Excel Workbook Using Talend? 7

You might also like