hello everyone, i am new to kettle, i want to know if the table input step is load the all data from table into memory once, if yes, how to avoid memory overflow?
PDI doesn't really pull the entire table down into local memory if it can avoid it.
It will set up a streaming cursor, and will only bring down the data that it can work with, discarding the data ( from memory ) that has reached the end of the pipeline.
Selecting only the columns that you need, and trying to avoid steps like "Sort Rows" will help avoid out of memory conditions.
Thank you for your answer, so as you say, i needn't worry out of memory if i want to extract and transform millions or more data from one table to another one?
Or i should settings something in kettle?
If you are doing Table Input -> Table Output then you shouldn't run into Out of Memory errors, even for millions of rows.
Depending on how many steps you're doing, and what the steps are, you could run into out of memory issues.
Talking in hypotheticals is really difficult.
Why don't you build your transform, and see if you run into out of memory?
Retrieving data ...