How To Process Excel Workbook Using Talend?
How To Process Excel Workbook Using Talend?
Bigdata Dimension Labs have empowered organizations such as 3M, Agile and
Bill Gates Foundation with 360° performance overview, real-time analytics for
better decision-making, standardized reporting and faster data access and
processing.
Conclusion ................................................................................................................ 7
This example uses the tFileList and tFileInputExcel components to create a double-loop of processing.
The outer loop iterates over the files in a directory, defined as a global variable using a ContextGroup and
Context. The inner loop iterates over each Sheet in the Workbook. As the files are processed, both the
Excel filename and the Sheet name are used in the data loading. That is, the filename and the sheet name
contain encoded business values that aren’t found in the rows and columns, but that needs to be loaded
in the database.
In this tutorial, an Operating Region is embedded in the filename. So, “exceltest_west.xlsx” results in
“west” being used for that column. Sales Region is embedded in the Sheet name. A Sheet named “New
England” would provide the values for that column.
Start off by creating a DATA_DIR variable in the context. Create a ContextGroup matching the Job name.
Add a DATA_DIR variable.
Next, add the four components: tFileList, tFileInputExcel, tMap, and tFileOutputDelimited (append mode).
A tMap component is used to produce a delimited text file. All of the fields in the spreadsheet appear in
the tMap and are dragged to the empty schema of the text file. Two extra fields are added: OperatingRe-
gion and Sales Region.
((String)globalMap.get("tFileList_1_CURRENT_FILE")).split("\\.xlsx")[0].split("_")[1]
split() is a Java function and the above statement isn’t particularly elegant. This can be cleaned up using a
library like Jakarta Commons Lang. Look for StringUtils.substringBetween() to grab the middle of a String
without the brittle indexing. Take a look at this blog post for instructions on how to bring this functionality
into Talend.
StringUtils.substringBetween( (String)globalMap.get("tFileList_1_CURRENT_FILE"),
"exceltest_", ".xlsx")
(String)globalMap.get("tFileInputExcel_1_CURRENT_SHEET")
It’s not uncommon for an Excel file to be contain special values encoded in the filename or the Sheet name.
A few global variables set by components during processing can capture these. This post used _CUR-
RENT_FILE, a variable from tFileInputList, and _CURRENT_SHEET, from tFileExcelInput. The available
globals are listed in the documentation, but to get the exact syntax, use auto complete.
If you still want to learn more, I encourage you to visit BDDLabs.com to gain more information
about modern cloud data sharing. You can also access documentation, view webinars, browse our
offers, view scoops of upcoming events, and get support. We also invite you to schedule a free
demo of our data sharehousing technology so your business can get started right away!
Email: [email protected]
Phone: 888-856-2238