I'm building a workflow in HCI to ingest files from HCP.
The files are bzip2 compressed eml files and as they arrive on intermediate content servers, they have metadata extracted and stored alongside the original file as a JSON object.
I developed a program in Python to upload the original files to HCP and to convert metadata from JSON to XML and upload it as well.
The problem that I have is that while HCI does see the custom metadata (HCP_customMetadata BOOLEAN true in HCI when testing a processing pipeline), but I don't see any way to make this custom metadata useful to HCI (to manipulate it and store within HCI's index).
Sadly, I have no idea what I'm doing wrong here. Now, I am aware that there already exist filters for e-mail type files in HCI, but for one thing, the files are bz2 (decompression would be needed), and second, there are > 25 million of them at the moment and adding just 0.1 second per each file processed would take an extra month to ingest them (and there's already a risk that it will take longer than anticipated).
Maybe it's just a matter of looking in the wrong place, and the answer is obvious, but I was unable to find it.