An overview of Block Pane’s validator architecture

It has been more than a year since I published information about how we build and operate our Tendermint validator nodes. Much has changed since then, so I figured an update was overdue.

Todd G
5 min readSep 23, 2022
Todd x DALL-E “a person looking confused standing in front of a complex machine with hundreds of knobs and levers, digital art”

One significant change is that our validators have moved from using single local signers to using a threshold signing cluster. More about that later. Another difference is that one of our favorite data center providers (Heztner) announced a zero-tolerance policy for cryptocurrency-related hosting. You can rest assured that Block Pane no longer hosts validator nodes on Heztner. What hasn’t changed is all validators still run on hardware.

I’ll break this post into three sections: geographic distribution, validator architecture, and security.

Geography

If anything, the situation with Hetzner woke up a lot of validators and got them thinking about not only decentralization of consensus but the distribution of nodes across providers.

Node placement follows four goals to counter the risk of either a single data center or a provider’s entire network going offline (assuming a 2/3 signing cluster):

  • Each node should be in a distinct geographic area, at least in another city.
  • A different data center provider should host each node.
  • The data centers should have less than 50ms of latency between them.
  • No other Block Pane validator should use the same combination of three sites.

These four goals are challenging to accomplish, so on Secret Network, the 3-provider rule is broken. And we haven’t moved Kava to a threshold signer configuration. Otherwise, the Juno, Osmosis, Sifchain, and Stargaze validators are well-distributed across both location and provider.

Likely, this list will already be out of date by the time the reader reads it; Block Pane’s footprint is constantly changing. Here are the locations of the validator hardware nodes:

  • Seattle, WA
  • Denver, CO
  • (Near) Washington D.C.
  • New Jersey, N.J.
  • London, UK
  • Rotterdam, NL
  • Frankfurt, DE
  • Warsaw, PL
  • Singapore, SG

Not all our systems need to be hosted on hardware; I won’t cover those here, but we utilize some cloud services in other locations for simple services.

Validator Architecture

In the last update, the Block Pane validators used four nodes: one block producer and three sentries. Before the most recent change, most Block Pane validators signed via TMKMS, a remote signer. A more robust design using threshold signatures has made this configuration obsolete.

What is a threshold signature? Think of it as a multi-sig for consensus signatures. The consensus key is split into “n” parts, and then “m/n” signatures are required to vote on a block. The three main benefits of remote/threshold signing are that if a node is compromised, the key is still not disclosed (only a shard), that one or more nodes can go offline without missing blocks. Finally, it takes a lot of effort to accidentally double-sign when using a remote signer.

This threshold signature scheme is all made possible by a tool, “Horcrux,” maintained by Strangelove Ventures, expanding on a version that Unit410 created.

A note for other validators: The transition from single-signers to threshold-signers is dangerous. At least one large validator has accidentally “equivocated” or double-signed during the switch; it requires careful planning and testing.

Because no single node can cause an outage, the risks of running the validator nodes exposed directly to the network are much lower. Four nodes are no longer required, only three, which adds cost-reduction benefits and less maintenance.

A diagram showing three nodes using a vpn mesh
Basic 3 node 2/3 validator configuration.

Now the most-basic Block Pane validator configuration looks like this:

  • Three hardware servers are located in different cities using distinct providers.
  • The nodes use a WireGuard mesh-VPN to provide privacy. The Horcrux cluster communicates using this VPN.
  • The monitoring tools (Tenderduty, M/Monit, and Prometheus) connect to the services using the VPN, as does all administration of the servers.
  • Each server is directly connected to the P2P network.

This config is 100% managed using Ansible; the only aspects that are not are the server installation and configuring of the raid arrays.

Security

Little has changed in how the nodes are secured, but I will cover it anyway.

Firewall

  • Only necessary services are exposed directly to the internet; this includes P2P, WireGuard, and any intentionally exposed services such as public RPC endpoints (however, only CloudFlare can access most public services.)
  • It is effortless to misconfigure a firewall, and I’ve done it countless times. I got tired of spinning up a short-lived VPS and running nmap scans and decided to write a simple web service that will perform a reverse scan and provide a report showing any Tendermint/Cosmos-related services open.

Keys

  • Block Pane does not reuse operator keys.
  • The operator’s seed phrases are not stored online or loaded into local wallets.
  • We use ledger hardware wallets and rarely use a Dapp (such as Keplr), preferring the command line using locally compiled binaries.
  • Horcrux allows sharding the consensus keys, which are removed and stored offline once split into shards and uploaded to the co-signer node. It would require compromising multiple nodes to recover all fragments.
  • All remote SSH uses Yubikey’s GPG-card capabilities. These keys are generated on the device and cannot be retrieved (don’t worry, there are multiple keys in case of failure.) Additionally, a password is required to unlock the key. It might be overkill since SSH is only accessible over the mesh VPN.

I hope this post on how Block Pane validates on Tendermint/Cosmos chains was informative. And if you want to send some delegations, it would make me ecstatic.

--

--