Re: Need to know on network map

Mike Hearn
 

As JC says, the closest thing to that is to request a segregated sub-zone from the Corda Network Foundation. Nodes in the segregated sub-zone cannot communicate with other nodes on tCN but won't appear in the global network map. They will still have a directory of each other.

Corda doesn't support the kind of on-demand lookup you're requesting, which is a deliberate design decision. Here are some of the reasons why not.

Availability. If all identity lookups hit a central directory server, then that directory server becomes a single point of failure. Any outage at it creates an outage of the entire Corda network. This in turn makes it an attractive target for DoS attackers, and that, in turn, triggers complex SLA negotiations and high DevOps costs because you suddenly need a very rigorous on-call rotation. In a P2P system there aren't supposed to be (m)any single points of failure. Corda's current design allows nodes to cache all the data they will need locally, and because the network map is just static data it can be easily distributed via a caching CDN, which are the closest things we have to bulletproof shield walls against DoS attacks. If the underlying serving infrastructure goes down it doesn't matter until HTTP caches start expiring, and good CDNs can be configured to cache indefinitely if the backend is unreachable.

Privacy. The usual reason for requesting this kind of on-demand lookup is privacy. However, a central directory server still needs to know all the members and moreover, if nodes are doing interactive lookups whenever they need an identity, that central server now learns all the trading and business relationships of the members. This is not obvious and I see people overlook it constantly: query patterns are valuable business intelligence. Probably your users want to run on a peer-to-peer, decentralised system largely to stop a service provider knowing these things! Interactive directory servers create a fundamental privacy tradeoff: individual members may learn less, but the network operator now learns much more. In fact they learn data that in the current design nobody has anywhere (except maybe the NSA/GCHQ).

Bad UI. Interactive directory servers get their queries from somewhere. Ultimately the source of the query is a human, who needs to specify an identity via some user interface.

With nodes locally caching the network map, an application can provide a ComboBox type filter widget so the user can interactively type query fragments and pick what they want from the resulting hits, until they found the identity they're looking for. This may not matter much now in the early days, but in a large network with many similarly named companies a sophisticated query widget with very low latency response will be helpful.

If nodes don't know who is available in the network, you can't create such a UI. Instead you have to give the user a single input box, and if they mistype a name they just get an error message back saying "Unknown identity". There is no way for them to tell the difference between "the company I typed in doesn't use this app" and "the company I typed in is spelled differently".

That's a problem for any UX in which the user may not know ahead of time whether their desired counterparty uses the system. You may think this doesn't matter now, when your app is small, but over time it may easily evolve in such a way that users want to check whether they can interact with a company via your application, without having to manually phone up that company and find the right person in the bureaucracy who already knows.

In practice this user experience is far too bad to be usable for any reasonably sized network. You will be forced to improve it by your stakeholders. The only way to improve it is to switch to an API where a fuzzy query returns sets of plausible results the user can pick from. That's because:

  1. Spelling correction engines give you sets of results, not a single match.
  2. In international situations where there's a desire to support native alphabets, you may need to handle multiple transliterations e.g. via SoundEx matching. For instance if you have Chinese or Arabic names of companies in your network.
The moment you make this change which you will be forced to make by your users you lose all privacy anyway, because now I can write a scraper tool that just iterates through random-looking queries and creates a union of all the suggested results, letting me brute-force my way through the entire dataset. So anyone who wants it can get back to the old global network map situation, and anyone who lacks this very small amount of programming skill will believe network membership is private when it isn't.

Mostly, this sort of feature request comes from a desire to hide app developer customer lists from competitors. But a competitor that wants this data will certainly be capable of writing such a program.

For these reasons we're skeptical this is a good tradeoff. Central identity lookup servers create a lot of centralisation with consequent reliability issues, leak a lot of private data to the operator, and apparently trivial UX improvements can silently destroy whatever privacy you thought you had gained without anyone ever realising it.

The Corda Foundation is a non-profit that doesn't care about hiding who is using Corda. In return for not caring about this, it makes things much better for its users.

For these reasons I doubt Corda will ever be changed to support interactive identity lookups. You could of course fork the open source codebase and add support for this mode of operation. However, recall that the network map doesn't show you who uses what apps. As the Corda Network grows which nodes are there for which reasons will start to become unclear and blur together. Eventually merely using Corda won't tell you anything about business relationships (except maybe with R3).

Join corda-dev@groups.io to automatically receive all group messages.