Frequently Asked Questions
How can i improve the performance while loading data into a table in a database using pentaho ?
You can use multiple copies of your output step and connect the preceeding step by giving ‘Distribute rows’ while you connect the hop.
Can i make the steps in a transformation to run in sequence ?
No, the default transformation architecture is to run in parallel while jobs run in sequence. Changing this requires an architecture change that might affect the performance
Is it possible to use group by without sorting ?
Yes, you can use the ‘Memory Group By’ step in the transformation which helps you to group by without doing any sorting.
Do we have any way to implement SCD logic directly in Pentaho ?
Yes, we can use the ‘Dimensional Lookup’ step in the transformation which can perform the SCD type-2 logic directly. You can also ‘Merge Diff’ step which also does the same
How can i do a process repeatedly between transformation/jobs ?
It can be done when you enable ‘Execute for every input row’ option. You can see this option when you double click the transformation/job under Options tab under Execution section
Is it possible to find Pentaho version using job/transformation step ?
Yes, you can use the ‘Get System Info’ step in a transformation to get the Pentaho version. In the ‘Type’ column choose ‘Kettle version’
How can aggregate a whole dataset using Pentaho ?
You can calculate aggregate functions over the whole dataset by leaving this The Fields that Make up the Group Table blank in the ‘Group By’ step.
Can i do a process repeatedly inside the same transformation ?
No, we cannot form a loop inside the same transformation. But we can form a loop between transformation/jobs inside a job
How to handle huge amount of data loading into multiple database systems?
You can use bulk loading option which contains Vertica Bulk Loader, Oracle Bulk Loader, MySQL Bulk Loader, etc.
What is the difference between regular join and database join ?
We can execute a prepared sql join statement directly in the database join step whereas we cannot do that in the regular join