Re: Bulk Performace

Mike Hearn
 

Thanks for the explanation.

By lazy loading, what I meant was actually a more technical optimisation at the app layer.

Let's imagine you have a node and you'd like to load 50 million records into it, so peers can use this to build transactions on top of the states (for e.g. each car).

You can structure your app so the 50 million records start out in an app-specific table, not states. Then in your flows, when a peer requests the state for a car (or indicates interest some other way), the flow does a criteria query via JPA to select that row, builds a tx and locally commits it, then marks it as "transferred to ledger" e.g. by recording the tx ID. Then you proceed as normal, as if it had been bulk imported already.

This lets you amortise the shift the cost of importing the record to the moment it's requested, thus smoothing the load out quite significantly. From the peer's perspective it should be transparent.


From: corda-dev@groups.io <corda-dev@groups.io> on behalf of Remo Meier via Groups.Io <remo.meier.ch@...>
Sent: Friday, October 4, 2019 18:21
To: corda-dev@groups.io <corda-dev@groups.io>
Subject: Re: [corda-dev] Bulk Performace
 

Hi Mike

We address this imports in the context of cardossier setting up a business network around cars. There are three complementary reasons:

- many different kinds of companies are involved and every single car is a collaborative effort between a subset of those companies. initial focus are on optimizing B2B processes and with that a significant number of data sets get touched sooner rather than later. It is also not so trivial to determine which companies must participate in that lazy loading process.
- we desire for the system to be able to become the master so that people do not have to maintain redudant data structures
- we like the system to be tamper-proof. So every single data items gets anchored one way or the other towards one hash value describing the state of the  system (not unlike blockchain, but distributed). Having off-ledger data would open the door for attacks even after the system becomes operational.  Nevertheless is is quite likely that some data will not leave its source node in the early days except for corresponding transaction hashes.

but absolutely right, there is a bit of a window here to optimizing things. There is no decision yet whether to source all (including historic) or only current data. But we hope for the former as far as the system allows it.

Regards Remo

Join corda-dev@groups.io to automatically receive all group messages.