Need to know on network map
Avery Starr
Hi,
Are we able to make the network map Need-to-Know as well? Current implementation makes network map visible for all participants in the network. We want a node to discovery anther node ON-DEMAND when that node wants to interact with the other node. All the nodes that never have to interact with that node should remain invisible from it.
Any suggestions and tips on how to do this are appreciated!
Thank you, Avery |
|
JC Jollant <jcjollant@...>
. Look into the notion of a Segregated Sub zone. That should accomplish a similar outcome. On Mon, Sep 9, 2019 at 10:02 PM Avery Starr <avery.starr@...> wrote:
|
|
As JC says, the closest thing to that is to request a segregated sub-zone from the Corda Network Foundation. Nodes in the segregated sub-zone cannot communicate with other nodes on tCN but won't appear in the global network map. They will still have a directory of each other.
Corda doesn't support the kind of on-demand lookup you're requesting, which is a deliberate design decision. Here are some of the reasons why not. Availability. If all identity lookups hit a central directory server, then that directory server becomes a single point of failure. Any outage at it creates an outage of the entire Corda network. This in turn makes it an attractive target for DoS attackers, and that, in turn, triggers complex SLA negotiations and high DevOps costs because you suddenly need a very rigorous on-call rotation. In a P2P system there aren't supposed to be (m)any single points of failure. Corda's current design allows nodes to cache all the data they will need locally, and because the network map is just static data it can be easily distributed via a caching CDN, which are the closest things we have to bulletproof shield walls against DoS attacks. If the underlying serving infrastructure goes down it doesn't matter until HTTP caches start expiring, and good CDNs can be configured to cache indefinitely if the backend is unreachable. Privacy. The usual reason for requesting this kind of on-demand lookup is privacy. However, a central directory server still needs to know all the members and moreover, if nodes are doing interactive lookups whenever they need an identity, that central server now learns all the trading and business relationships of the members. This is not obvious and I see people overlook it constantly: query patterns are valuable business intelligence. Probably your users want to run on a peer-to-peer, decentralised system largely to stop a service provider knowing these things! Interactive directory servers create a fundamental privacy tradeoff: individual members may learn less, but the network operator now learns much more. In fact they learn data that in the current design nobody has anywhere (except maybe the NSA/GCHQ). Bad UI. Interactive directory servers get their queries from somewhere. Ultimately the source of the query is a human, who needs to specify an identity via some user interface. With nodes locally caching the network map, an application can provide a ComboBox type filter widget so the user can interactively type query fragments and pick what they want from the resulting hits, until they found the identity they're looking for. This may not matter much now in the early days, but in a large network with many similarly named companies a sophisticated query widget with very low latency response will be helpful. If nodes don't know who is available in the network, you can't create such a UI. Instead you have to give the user a single input box, and if they mistype a name they just get an error message back saying "Unknown identity". There is no way for them to tell the difference between "the company I typed in doesn't use this app" and "the company I typed in is spelled differently". That's a problem for any UX in which the user may not know ahead of time whether their desired counterparty uses the system. You may think this doesn't matter now, when your app is small, but over time it may easily evolve in such a way that users want to check whether they can interact with a company via your application, without having to manually phone up that company and find the right person in the bureaucracy who already knows. In practice this user experience is far too bad to be usable for any reasonably sized network. You will be forced to improve it by your stakeholders. The only way to improve it is to switch to an API where a fuzzy query returns sets of plausible results the user can pick from. That's because:
Mostly, this sort of feature request comes from a desire to hide app developer customer lists from competitors. But a competitor that wants this data will certainly be capable of writing such a program. For these reasons we're skeptical this is a good tradeoff. Central identity lookup servers create a lot of centralisation with consequent reliability issues, leak a lot of private data to the operator, and apparently trivial UX improvements can silently destroy whatever privacy you thought you had gained without anyone ever realising it. The Corda Foundation is a non-profit that doesn't care about hiding who is using Corda. In return for not caring about this, it makes things much better for its users. For these reasons I doubt Corda will ever be changed to support interactive identity lookups. You could of course fork the open source codebase and add support for this mode of operation. However, recall that the network map doesn't show you who uses what apps. As the Corda Network grows which nodes are there for which reasons will start to become unclear and blur together. Eventually merely using Corda won't tell you anything about business relationships (except maybe with R3). |
|
Stefano Franz
Corda supports an extra folder called "additional-nodes". Nodes can use this to lookup nodes that are not part of the network map.
If you wanted to have truly on-demand discovery, it would be possible to write a (custom) service which used flows to ask a central party for permission to see a given X500 name. This service would return the node-info and the node would write it into the additional-nodes folder. Corda periodically reloads from this folder.
In this situation, there would only be two entries in the network map, the node-provider and the notary for the network.
From: corda-dev@groups.io <corda-dev@groups.io> on behalf of JC Jollant via Groups.Io <jcjollant@...>
Sent: 10 September 2019 04:14 To: corda-dev@groups.io <corda-dev@groups.io> Subject: Re: [corda-dev] Need to know on network map .
Look into the notion of a Segregated Sub zone. That should accomplish a similar outcome.
On Mon, Sep 9, 2019 at 10:02 PM Avery Starr <avery.starr@...> wrote:
|
|
You could do this but note:
|
|
Barry Kreiser
Mike You bring up an interesting point about centralization and the potential for Dos targets. It seems to me with that Business Networks themselves require centralized membership management and could be a point of failure. Has anyone explored options to prevent this centralized approach as the current membership management in the Corda solutions repo is centralized? Barry On Tue, Sep 10, 2019 at 4:15 AM Mike Hearn via Groups.Io <mike=r3.com@groups.io> wrote: As JC says, the closest thing to that is to request a segregated sub-zone from the Corda Network Foundation. Nodes in the segregated sub-zone cannot communicate with other nodes on tCN but won't appear in the global network map. They will still have a directory of each other. --
_______________________________________________ Barry Kreiser | Director/CTO | GuildOne Inc. Suite 940 333-5th Avenue SW, Calgary, AB T2P 3B6 |
|
Good question Barry!
The BNO membership service is push based, if I recall correctly. Maybe Ivan can comment further. Pushing data to nodes means if the BNO node goes offline for a while it doesn't matter, and because each node has the membership list, queries (e.g. to put access control on flows) can be done locally. So scaling and performance is good. It's somewhat network mappish. The downside is that, again, each node learns the members of the business networks they're in. This may not matter. One upgrade would be to implement data distribution groups and then use those. In future perhaps we can do something about query privacy by using SGX. Moxie Marlinspike has explored some of these issues in Signal. |
|
Millar, Patrick
I’m following this discussion with interest…
One quick add is to encourage anyone tackling these types of projects to remember that the infosec teams that manage access to your client’s node may have some very restrictive firewall rules. Distributed applications are not a well-known pattern – and many (most?) conservative infosec teams currently prefer to whitelist traffic from known IP blocks representing other known participants on the network rather than leaving ports completely open. This would make auto-discovery difficult.
I am optimistic this will change over time, but this is a challenge for the present.
Patrick Millar | Head of Technology The Institutes’ RiskStream Collaborative 720 Providence Road | Suite 100 | Malvern, PA 19355-3433 P (610) 644-2100, ext. 7664 : DD (610) 251-2777
From: corda-dev@groups.io [mailto:corda-dev@groups.io]
On Behalf Of Mike Hearn via Groups.Io
Sent: Tuesday, September 10, 2019 8:08 AM To: corda-dev@groups.io Subject: Re: [corda-dev] Need to know on network map
You could do this but note:
As I said above, "You could of course fork the open source codebase and add support for this". It's essentially doing that, so you may as well fork it properly and add whatever new APIs you need to make it work well rather than [ab]using the additional-node-infos directory. This e-mail and any attachments to it are confidential, privileged, and intended solely for the named addressee(s). The unauthorized use, disclosure, or alteration of this e-mail is strictly prohibited. If you have received this e-mail in error, please notify the sender immediately and delete the e-mail. |
|
Yes. It's a problem. The NodeInfos contain inbound IPs but I don't think there's any requirement to use those same IPs for outbound traffic.
If we extended the protocol so nodes could advertise separate outbound and inbound IPs, we could write a script that converts the network map data into configuration files for various firewall products, for example iptables rules. Then corporate IT teams that can't quite treat a Corda node like a web server yet, even with the Corda Firewall, could make running that script a regular task, like daily, or put it in a cron job. Bonus points if it knows how to intersect the network map with BNO membership lists. But, be aware that Corda Network policy forbids this kind of IP whitelisting. It says you must accept connections from anywhere on the internet: https://corda.network/policy/ip-addresses.html Node P2P ports must be globally reachable via the internet, from any part of the internet. By implication you may not use TCP/IP firewall rules to block who connects to your node. Instead, access control should be done cryptographically using TLS termination and membership rule checking at the start of flow logic (i.e. before any code other than session setup runs).
So using a tool like that on Corda Network would put you in violation of the network policy. The reason is that - beyond the obvious scaling problems with this sort of approach - in future there may be reasons for nodes to contact each other that firewall admins cannot predict. Cryptographic firewalls aren't a novel concept. Google has been promoting their use for some time now. Try visiting this website: https://dashboard.corp.google.com You'll see Google's intranet login page. Actually, many internal websites can be reached this way. They replaced their perimeter firewalls with TLS and client integrity checking infrastructure back in 2012, in a project called BeyondCorp: https://duo.com/blog/rsac-2017-beyondcorp-how-google-protects-its-corporate-security-perimeter-without-firewalls That's where we need to go for a global network of business nodes to achieve its full potential. |
|
Millar, Patrick
Thanks Mike – I think I agree with all you’re saying.
RE: violations of the Corda Network policies, that’s an interesting adjacent problem. Considering the specific issue of traffic being blocked by firewall rules:
- For those participating on the full Corda Network, it would be difficult to police and even harder to enforce – especially during the adoption phase of the network when there is an incentive to have more participants, not less.
- For those operating on private networks of some manner (but still on the Corda Network), I’m not sure that it’s really enforceable either. (The assumption being that private networks will typically only want to transact with other known participants.)
Longer term, I do agree that more open is better. Unfortunately, in the interim there are some challenges to be navigated before all parties fully trust the underlying technology. This may limit the viability of some more novel approaches to using the technology until that trust is established – but I’m an optimist that we’ll get there.
Patrick Millar | Head of Technology The Institutes’ RiskStream Collaborative 720 Providence Road | Suite 100 | Malvern, PA 19355-3433 P (610) 644-2100, ext. 7664 : DD (610) 251-2777
From: corda-dev@groups.io [mailto:corda-dev@groups.io]
On Behalf Of Mike Hearn via Groups.Io
Sent: Tuesday, September 10, 2019 10:36 AM To: corda-dev@groups.io Subject: Re: [corda-dev] Need to know on network map
Yes. It's a problem. The NodeInfos contain inbound IPs but I don't think there's any requirement to use those same IPs for outbound traffic. Node P2P ports must be globally reachable via the internet, from any part of the internet. By implication you may not use TCP/IP firewall rules to block who connects to your node. Instead, access control should be done cryptographically using TLS termination and membership rule checking at the start of flow logic (i.e. before any code other than session setup runs).
This e-mail and any attachments to it are confidential, privileged, and intended solely for the named addressee(s). The unauthorized use, disclosure, or alteration of this e-mail is strictly prohibited. If you have received this e-mail in error, please notify the sender immediately and delete the e-mail. |
|
JC Jollant <jcjollant@...>
. In the context of (large) banks running Corda behind corporate firewalls the reality is: 1) Firewalls and proxies are distinct services => separate public IPs 2) There is no such thing as an inbound port open to any source. Hardcoded whitelisting is the norm. Adding the proxy public IP to the NetworkMap would reduce frictions. If you have a chance to influence these clients, push for a cloud based Corda deployment to bypass #2. On Tue, Sep 10, 2019 at 9:36 AM Mike Hearn via Groups.Io <mike=r3.com@groups.io> wrote: Yes. It's a problem. The NodeInfos contain inbound IPs but I don't think there's any requirement to use those same IPs for outbound traffic. |
|
Avery Starr
Thank you all for commenting on this topic!
However, we found that the replies so far still do not satisfy our needs.
I want to start by saying that we are developing a private network. We have a centralized server that manages business logics and memberships for this private network. All participants are strictly authenticated entities, among whom they entertain unequal complicated business relationships. We do not wish to participate the global Corda network. We also want to run our own Notary for our own network.
On our network, there are two layers of participant identity: at our centralized business server layer, and at the corda node level. Just for simplicity, let’s just say user A and user B as the identity at the business server layer and node A and node B for the Corda layer.
At the business services layer, user A knows user B, and wishes to transact with user B. Therefore, user A issues a trade command to our Corda network to transact with B but node A does not know about node B. So now at this time, we want A to dynamically discover address for node B.
We were thinking of implementing something like Private Network Dispatcher. Upon demand from user A’s request the dispatcher gives user A node B’s Corda address; user A then issues command to node A to trade with node B.
However, we also do not want to implement anything that could cause problems in the future when Corda upgrades. Any comments on Private Network Dispatcher idea? And yes, this dispatcher has to work across different corporate firewalls.
From: corda-dev@groups.io <corda-dev@groups.io> On Behalf Of JC Jollant
. In the context of (large) banks running Corda behind corporate firewalls the reality is: 1) Firewalls and proxies are distinct services => separate public IPs 2) There is no such thing as an inbound port open to any source. Hardcoded whitelisting is the norm. Adding the proxy public IP to the NetworkMap would reduce frictions. If you have a chance to influence these clients, push for a cloud based Corda deployment to bypass #2.
On Tue, Sep 10, 2019 at 9:36 AM Mike Hearn via Groups.Io <mike=r3.com@groups.io> wrote:
|
|
If you're running your own network, you could initialise every node with a randomised Corda identity that gives away nothing about who they are, e.g. is a UUID. Then you implement your own protocol (or flow) to resolve user input to that randomised identity. The resolution must return zero results if the user input is even slightly wrong, for the reasons discussed above, so your own notion of business identity would need to be constructed with that in mind (e.g. check digits, if using numbers).
The advantage of this approach is you don't need to modify Corda. The disadvantage is nodes can see how many participants exist in the network and their IP addresses, but wouldn't know who owns them. The mere existence of an IP address in the system may still reveal too much information for you though. The problem with wanting to be compatible with corporate firewalls whilst still hiding possible IP addresses is, as JC observes, that some companies want to whitelist IP addresses in advance. This isn't compatible with being peer to peer and also not knowing who your peers are. Another way to do it is as above, but with the addition of a VPN. Everyone VPNs to a central point you administer and is allocated an internal IP address, so IP ownership is secret and corporate firewalls can whitelist the VPN endpoints. The network map would still reveal the number of participants in the system but nothing else. You can fill it with dummy entries if you want to hide the size of the system. At this point though, it may not be entirely clear what balance of power you're wanting to achieve. The central party would control all user interactions, would see all business relationships and could probably change node's public keys to impersonate them without anyone noticing. Availability would be identical to a centralised system. What powers do you want users to have in your system? Perhaps there's a more direct way to achieve it. |
|
Avery Starr
Thanks Mike!
Our network is not a democratic network so we don’t have the philosophic issue of power struggle. The participants in the network are strictly managed and signed roles to play. We use both centralized application/server and Corda to take the advantage of both: the centralized application manages complicated business relationships, business processes, roles, and identity; the trading activity happen on Corda and therefore we gain the shared ledger and traceability.
Revealing IP would not be ok in our network as it is too easy to guess or to programmatically figure out the identity should there be bad actors in the network. We have banking accounts of our participants connected with the network too so we need to be very careful.
Regarding corporate firewalls, we could possibly request all nodes of the corporate participants being installed on a web server outside the firewall so we don’t have to worry about fighting with different versions of corporate firewalls. If we do that, the nodes will have to interact with each other on HTTP instead of RPC right? I recall Corda implemented HTTP too. Our current code is using RPC calls.
So is there still anything in Corda we can leverage to implement our Private Network Dispatcher? Or maybe we could take Corda’s Network Map and modify the code and make it fit in our scenario? If that is doable, what about future Corda upgrades? If that is not doable, how about we simply just develop a piece of software on our own that will feed each node the contact info of their counterparties on-demand? Is this approach going to break anything in Corda though so it will give us headache when Corda updates in the future?
From: corda-dev@groups.io <corda-dev@groups.io> On Behalf Of Mike Hearn via Groups.Io
If you're running your own network, you could initialise every node with a randomised Corda identity that gives away nothing about who they are, e.g. is a UUID. Then you implement your own protocol (or flow) to resolve user input to that randomised identity. The resolution must return zero results if the user input is even slightly wrong, for the reasons discussed above, so your own notion of business identity would need to be constructed with that in mind (e.g. check digits, if using numbers). |
|
If you're looking primarily for signed transaction graphs without the rest of Corda, what you might want to investigate is just using the "core" module alone, without the node at all. This would be a currently unexplored approach, but a lot of the code in Corda and the node is really about building that peer to peer network and providing distributed governance in various ways. If you don't want that, it's all superfluous and will get in the way.
For the firewalls, R3's enterprise version of Corda has a component called "Corda Firewall" which is designed to solve firewall traversal issues - it's what's called a cryptographic firewall, i.e. it makes per connection allow/deny decisions based on the certificate and public key of the peer rather than IP address. Corda Firewall has enabled even very conservative organisations with complex firewalls to deploy a node, as the node itself can run behind the firewall with only a small set of components outside. Still, the problem remains that Corda really wants to be a p2p network. Providing SOCKS proxies or running nodes for your customers is probably the way to go. |
|