Inconsistent DB behaviour on DB failover in an HA Corda Setup


Hi All,

I had ran a few test to check failover cases of DB between 2 parties PartyA and PartyB using the iou cordapp, where PartyA is in a HA Hot Cold setup which is pointing to a clustered vault Postgres 9.6 DB setup in an primary/secondary configuration. The setup is done on a RHEL 7.6 system using corda enterprise 4.1
From a client i fired continuous request of IOUFlow from PartyA to PartyB and vice versa.When the request were fired, the DB of PartyA node was brought down, observations  after 100 request were:
- There was a switchover, where the secondary DB was now the primary and the new request were getting recorded in the vault.
- But the number of records written to the vault of PartyA and PartyB differed by 1 in some of the cases, i compared the number of records of PartyA(iou_states + node_checkpoint) and PartyB(iou_states + node_checkpoint) and there was a difference of 1 record in some of the cases.

Sometimes the count of records differed in iou_states of PartyA and PartyB
Sometimes the count of records differed in node_checkpoint table of PartyA and PartyB, also these checkpoint remain as is and do not get scheduled later as well.

- What can cause the difference in record? Is this normal? As i was hoping that the transactions recorded in vault of both parties to be the same.

- Why does the node_checkpoint records do not get picked up after sometime?

Join to automatically receive all group messages.