Our new application will produce 1000 x 6KB files per second, and we will need to keep these files for 10 years.
That means 1000 x 60 x 60 x 24 x 365 x 10 = 315 billion objects
I made an assumption that each HCP cluster will be able to store ~30 billion object in real world (maximum on data sheet is 64 billions), and that means 10 x HCP clusters.
I have the following questions:
- When an object is stored inside the HCP, is it only count as one object? Will the associated XML be counted as another one?
- Any idea about the real world maximum object limit? We are unlikely to have 10 HCPs initially, and we could potentially find out the real limit in the field … when error occurs...
- We are thinking about using one HCP cluster per year (if the calculation and assumptions are correct), and after we filled up the first HCP (let’s say 8 nodes), left it as an inactive HCP (powered on, only for read when needed), then start writing to second HCP. Is it possible to retire some of the nodes (e.g. 4 nodes from the first HCP), and add them to the second HCP cluster, because an inactive HCP may not need so many nodes (no more ingestion, just occasional read).
- From the performance white paper, it looks like for 1KB file, we can do around 1000 object write per second, another idea about the latency? We need to make sure the write is committed to let the application to mark a transaction as completed.
- In fact, we will need two sites with no data loss, and because HCP replication is async, we will have the application to duplex write into the production HCP and DR HCP, but that will make each HCP need to use DPL=2 in order to do a self-repair (because the HCP are not aware of each other), can you think of any smarter way to archive RPO=0 on HCP?
- I know in order to archive 80PB in a cluster, we need 80 nodes. But in this case, capacity shouldn't be a concern, only the number of objects. How many nodes, and memory per node, do we need to archive the maximum of objects?