Osmosis Grants Program: Tenderduty
I’m thrilled to announce that the Osmosis Grants Program has approved our proposal for a re-write of Tenderduty.
We have opted to defer any payment on the grant until the project is 100% complete. If we don’t deliver, we don’t get paid. We expect the development to take around 80–100 hours for $10,000.
What is TenderDuty?
Many validators regard TenderDuty as an essential monitoring tool. It differs from many other Cosmos monitoring tools in that it watches for a validator’s signature in finalized blocks, not just the health of nodes which makes it reliable for downtime detection with low false-positive rates and is very simple to set up.
The tool is fully functional but very simple and could be significantly improved. Current features are:
- Monitoring of a single chain using multiple RPC endpoints for redundancy.
- Alerts using PagerDuty when a configurable number of consecutive pre-commits are missing.
- Alerts when a validator is removed from the active set.
- Alerts when a chain has not produced new blocks for a configurable time frame.
Several enhancements would make the tool significantly more helpful, and multiple validators have requested new functionality.
What will the grant cover?
This proposal is to add some of these features and would be a complete re-write of the existing tool:
- The ability to monitor multiple chains using a single instance.
- Differentiation between types of missed blocks: where a pre-commit was seen but not included will be handled differently than a signature that was missing entirely.
- We will add the ability to alert on a percentage of missed blocks over time, not only consecutive misses.
- A visual dashboard that displays missed blocks:
- A graph showing a histogram/heat map for all nodes monitored.
- The dashboard will differentiate between finalized pre-commits (green,) pre-commits seen but not included in blocks (yellow,) pre-votes seen but no pre-commit (orange,) and entirely missing votes (red.)
- Finally, an indicator will show any endpoints not responding or falling behind.
- Allow more flexible notification options: add Telegram and Discord, webhook support, and different severities for PagerDuty.
- A Prometheus exporter to allow visualization in tools like Grafana. (The Tendermint Prometheus exporter for missed blocks is unreliable because it reports using the consensus state and results in false positives if polled before the node has committed.)
- Add alerts for nodes being down or lagging the chain’s head.
This re-write will also coincide with creating an easier-to-use Go websocket client for Tendermint. The existing RPC/HTTP client requires writing boilerplate code for subscriptions; this library will abstract that. TLS/WSS subscriptions do not work in the Tendermint library, limiting the options for what endpoints can be used and has been the most frequent feature-request so far. This library will avoid using the client entirely and add a few convenience functions streaming the fully protobuf-decoded Tendermint/Cosmos types abstracting much of the complexity involving type casting. This new library will not attempt to replace the RPC client, only handling websockets, and will initially focus on what’s needed for Tenderduty.
About Block Pane
We are a validator on several Cosmos-SDK chains, and contribute tools for monitoring and analyzing blockchains. Your support helps us continue to contribute to the health of the interchain. We would be grateful for staking delegations on our validator nodes: