Osmosis Epoch: Why is it so slow?

Todd G
5 min readDec 1, 2021
Image credit: amirali mirhashemian (unsplash)

Osmosis uses a custom Cosmos-SDK module, ‘epoch’, which triggers once a day at roughly 17:15 UTC. The purpose of this module is to payout rewards for LP providers and staking. It is very compute-intensive and has been taking 16–18 minutes the last few days. During this period the chain is halted, consensus is lost, and many API nodes are unresponsive. Most of the explorers take even longer, as much as 30 minutes, to recover.

This situation could be improved, and it’s up to the Osmosis validators and developers. Ultimately this is a code issue and will get fixed, but in the meantime the validators could get the epoch processing to around 3 minutes. There are a couple of ways this could be done:

1. Validators that are underperforming can update their systems and configuration.

2. The community can delegate to the validators that are processing the epoch quickly and redistribute consensus power.

How fast could epoch be?

The following table shows how many validators have finished processing the epoch block and have signed the next proposed block. This data was collected by looking at the validator pre-votes in consensus state RPC during epoch on Dec. 1, 2021:

Although the cause of long epoch is code-related, it is apparent that it need not be so long. Even after consensus is regained, many validators are still down, as many as 30 are still missing 25 minutes after the epoch block.

Missing validator count following epoch for roughly 90 minutes on Dec. 1st, 2021. The purple represents the number of validators not participating in consensus, and the orange is the percentage of consensus missing. Note: this is after consensus has been restored, after the initial 18-minute chain halt. From https://osmosis-stats.blockpane.com/missed.html

What can validators do to increase performance?

I’ve worked on this problem for hours and hours, trying different settings, hardware, filesystems, on and on, and have gotten to the point where I sign at roughly two minutes after the epoch block. This is fundamentally a code problem, and that part we (validators) can’t do anything about. There are parts to this we have control over.

At epoch there are three things contributing to a node failing to sign:

1. The epoch module loops over (what seems like, at least though this may not be accurate,) everything in state.

2. A whole bunch of calculations are performed and written to state and the KV store.

3. Everyone loses a bunch of peers.

if a node is down for a long time, it won’t try to reconnect to its persistent peers, and is why many validators stay down for even longer than it takes to regain network consensus. There are various theories about why the nodes do not reconnect (exponential backoff? blocked for not responding?) but a real cause has yet to be identified.

· Fast storage helps, but only to a point, and doesn’t solve everything. Osmosis nodes really need to be running on NVME/SAS3/or fast SSD drives. If you are using a cloud-provider’s virtual block device, like EBS, striping 3–4 volumes using LVM can provide better read/write throughput.

· Pruning state greatly reduces the impact of the first problem. I personally prune everything and have had 0 issues for at least a month. I take frequent ZFS snapshots, maintain one full node, and an archive node (+3 pruning nodes) so if I do have a failure I can restore a snapshot, or manually failover quickly. Others here have had issues with pruned state, I’m guessing this would happen if the daemon got hit with a SIGKILL, so probably more dangerous if running a Docker container with default stop timeouts (especially risky during epoch if running in a container.)

· Disabling the KV indexer helps with the second part. There’s really no need for the indexer on the validator or sentries unless the node is used as an LCD/RPC endpoint.

· Turning the logging level to error helps with the third aspect. When the peers start dropping the P2P module logs thousands and thousands of errors, whittling away valuable IO. Setting the log output to JSON also reduces log size because it is more terse than plain text logs.

· Cosmovisor can completely lock up if given too much data. Setting the DAEMON_LOG_BUFFER_SIZE=512 env variable can prevent this.

After all this, peers will still drop, but will start getting picked up again as soon as the epoch processing is done, about 1–2 minutes on a pruned/null kv node, about 10 on a default/kv node, and about 20 on an archive node. The node will still probably miss a block or two, it really depends on how well connected the peers it has are.

This is best demonstrated by what happens on an archive node at epoch, where the IO is much more evident (note the write numbers are a bit inflated because it’s running on a ZFS mirror, so gets hit with about a 4x penalty between the intent log and metadata.) This is a on a reasonably powerful system (Ryzen 5950x, 16 physical cores, 128GB RAM, 2 x DC NVMe drives,) and takes about 20 minutes to handle epoch processing.

archive node block I/O during epoch

For contrast here is a pruned/no kv sentry node on EXT4. Also worth noting, this is not a particularly powerful system (VPS, 8 vCores, 32GB RAM, NVMe storage,) and still manages to process the epoch in only about 3 minutes.

Pruned / No kv store node I/O chart during epoch

Who is processing epoch quickly?

Here is a list of who signed a pre-vote in 5 minutes following epoch (block 2215857) on Dec. 1, 2021 (the previous block was at 2021–12–01 17:16:21):

If you made it this far, thanks for reading. Right now, we are active on the FIO, Osmosis, Kava, Stargaze, Juno and Sifchain networks, with more planned. Vote for us here:

--

--