Pentaho Data Integration
(Kettle)
PDI Overview (Kettle)
An entry-level tool for data manipulation (ETL)
PDI (Kettle) reads procedures stored in XML
format
Spoon is a graphical tool used to develop that
procedures
Procedures are designed linking components
Many data sources can be used, JDBC, files,
web services
JavaScript and Java support for complex
routines
www.robertomarchetto.com
Development enviroment
www.robertomarchetto.com
Example, Source database
www.robertomarchetto.com
Example, destination database
www.robertomarchetto.com
Schema comparison
www.robertomarchetto.com
Procedure users_dimension
Query users:
SELECT u.id, CONCAT(u.first_name, ' ', u.last_name) as fullname, u.title
FROM users u
WHERE u.first_name is not null and u.last_name is not null
www.robertomarchetto.com
Testing
www.robertomarchetto.com
Procedure accounts_dimension
Query accounts:
select a.id, a.name, a.industry, a.billing_address_postalcode,
a.billing_address_city, a.billing_address_country
from accounts a
www.robertomarchetto.com
Procedure opportunities_fact
Query opportunities:
SELECT o.id, o.date_entered, o.date_closed, o.assigned_user_id,
o.sales_stage, o.name, o.amount
FROM opportunities o
WHERE o.sales_stage in ('Closed Won', 'Closed Lost') ORDER BY o.id
www.robertomarchetto.com
Procedure dates_dimension
www.robertomarchetto.com
Collect procedures in a job
www.robertomarchetto.com
Using JNDI
Edit JNDI /simple-jndi/jdbc.properties or
C:/Documents and Settings/<user>/.pentaho/simplejndi/default.properties
www.robertomarchetto.com
Running procedures
Directly from Spoon
From Pentaho BI Suite
Using command line (Kitchen, Pan)
kitchen.bat /file:D:\Jobs\jobname.kjb /level:Basic
In a clustered enviroment
Using a web services (Carte)
www.robertomarchetto.com
Publishing on Pentaho
www.robertomarchetto.com
Running from Pentaho
www.robertomarchetto.com
Scheduling
Using Pentaho's scheduler
Using an external scheduler (cron)
www.robertomarchetto.com