I want to let users store input files (csv, Excel) the Pentaho repository and I want to use it by PDI.
I know that Pentaho repository is not a file system (but can be sync with it by Pentaho Repository Synchronizer).
The repository is not designed for that. Wrong tool for the described job.
You may wish to perform file acquisition into a table and then let the users use the data
or use pentaho to keep a record of the location of the files or something.
But long story short the repository stores xml files that are used to extract, transform, load, analyse and report.
File storage is not one of those solutions.
it can move your files between places. e.g into a CMS and report on the link. even download it and present it to the user.
You can generate a csv with the required data on the fiy for example.
It may be worth taking a look at the "Community File Repository" plugin for the BA Server. It will allow uploads and downloads to the BA Server, not in the Jackrabbit repository, but still on the server in a location that could be accessible to KTRs. That would stave off the need to store information in yet another system.
Another vector that you point out is referencing files in another location. Many PDI input steps support vfs (virtual file system) URIs.
sorry to hijack this thread, but have you tried CFR with Pentaho 8 yet?
I'm about to, just wondered whether to just trust it works, or whether i need to find some time to work through the issues first.
Reading the docs on CFR it is pretty nice, see the readme here:
FYI CFR has now been updated for Pentaho 8.
Here's some comments on using it - the API is pretty darned nice!
Uploading files with CFR and Pentaho | Codeks Blog
You can, but its not ideal. I managed to connect and grab the contents of a file with the following endpoint. I've not used CFR, but it sounds like a much better option if it works on newer versions of the BA Server.
Still, I don't see why in the world would you do that. You need to have a third party software to see the contents of each file. It's simply not best practice.
I would recommend you to develop a data acquisition methodology where every data format ends up in the same place a consistent table.
Then your users always know where to go. To the acquisitions table and which tools to use... PDI
Retrieving data ...