0% found this document useful (0 votes)
263 views17 pages

Pentaho Data Integration Overview

Pentaho Data Integration (Kettle) is an open-source ETL tool that reads procedures stored in XML format. The Spoon graphical tool is used to develop these procedures by linking different components. Procedures can connect to various data sources like databases, files, and web services. JavaScript and Java can also be used to develop complex routines. Procedures are then collected into jobs that can be run from the Spoon tool, Pentaho BI Suite, command line, or scheduled on a clustered environment or with web services.

Uploaded by

Peeyush Singh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
263 views17 pages

Pentaho Data Integration Overview

Pentaho Data Integration (Kettle) is an open-source ETL tool that reads procedures stored in XML format. The Spoon graphical tool is used to develop these procedures by linking different components. Procedures can connect to various data sources like databases, files, and web services. JavaScript and Java can also be used to develop complex routines. Procedures are then collected into jobs that can be run from the Spoon tool, Pentaho BI Suite, command line, or scheduled on a clustered environment or with web services.

Uploaded by

Peeyush Singh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

Pentaho Data Integration

(Kettle)

PDI Overview (Kettle)

An entry-level tool for data manipulation (ETL)


PDI (Kettle) reads procedures stored in XML
format
Spoon is a graphical tool used to develop that
procedures
Procedures are designed linking components
Many data sources can be used, JDBC, files,
web services
JavaScript and Java support for complex
routines

www.robertomarchetto.com

Development enviroment

www.robertomarchetto.com

Example, Source database

www.robertomarchetto.com

Example, destination database

www.robertomarchetto.com

Schema comparison

www.robertomarchetto.com

Procedure users_dimension

Query users:
SELECT u.id, CONCAT(u.first_name, ' ', u.last_name) as fullname, u.title
FROM users u
WHERE u.first_name is not null and u.last_name is not null
www.robertomarchetto.com

Testing

www.robertomarchetto.com

Procedure accounts_dimension

Query accounts:
select a.id, a.name, a.industry, a.billing_address_postalcode,
a.billing_address_city, a.billing_address_country
from accounts a
www.robertomarchetto.com

Procedure opportunities_fact

Query opportunities:
SELECT o.id, o.date_entered, o.date_closed, o.assigned_user_id,
o.sales_stage, o.name, o.amount
FROM opportunities o
WHERE o.sales_stage in ('Closed Won', 'Closed Lost') ORDER BY o.id
www.robertomarchetto.com

Procedure dates_dimension

www.robertomarchetto.com

Collect procedures in a job

www.robertomarchetto.com

Using JNDI

Edit JNDI /simple-jndi/jdbc.properties or


C:/Documents and Settings/<user>/.pentaho/simplejndi/default.properties

www.robertomarchetto.com

Running procedures

Directly from Spoon

From Pentaho BI Suite

Using command line (Kitchen, Pan)


kitchen.bat /file:D:\Jobs\jobname.kjb /level:Basic

In a clustered enviroment

Using a web services (Carte)

www.robertomarchetto.com

Publishing on Pentaho

www.robertomarchetto.com

Running from Pentaho

www.robertomarchetto.com

Scheduling

Using Pentaho's scheduler

Using an external scheduler (cron)

www.robertomarchetto.com

You might also like