Licence
MIT
Version
0.59.0
Deps
0
Size
32 kB
Vulns
0
Weekly
0
@lidofinance/api-metrics
Utils to work with common API metrics
Installation
yarn add @lidofinance/api-metrics
Getting started
collectStartupMetrics
Just call it in the same place where you call collectDefaultMetrics.
import { Registry, collectDefaultMetrics } from 'prom-client'
import getConfig from 'next/config'
import { METRICS_PREFIX } from 'config'
import buildInfoJson from 'build-info.json'
import { collectStartupMetrics } from '@lidofinance/api-metrics'
const { publicRuntimeConfig } = getConfig()
const { defaultChain, supportedChains } = publicRuntimeConfig
export const registry = new Registry()
collectStartupMetrics({
prefix: METRICS_PREFIX,
registry,
defaultChain,
supportedChains: supportedChains.split(','),
version: process.env.npm_package_version ?? 'unversioned',
commit: buildInfoJson.commit,
branch: buildInfoJson.branch,
})
collectDefaultMetrics({ prefix: METRICS_PREFIX, register: registry })
rpcMetricsFactory
It's mostly internal util for @lidofinance/api-rpc and @lidofinance/eth-api-providers, but you should use it if you use RPC requests in some other way.
Exported metrics:
rpc_service_request(Counter) — labelschainId,provider. Incremented when a request starts.rpc_service_request_methods(Counter) — labelsmethod. One increment per JSON-RPC method in the batch.rpc_service_response(Histogram) — labelschainId,provider,status. Observes response latency for every attempt, including thrown requests (recorded understatus='xxx').rpc_service_response_result(Counter) — labelschainId,provider,result('success' | 'failure'). Use this for explicit success/failure rate queries without touching the histogram's label set.
Dashboard tips:
- For latency (p50/p95/p99), filter the histogram to successful responses so long timeouts don't skew the distribution:
histogram_quantile(0.95, sum by (le, provider) (rate(rpc_service_response_bucket{status!="xxx"}[5m]))). - For availability / success rate, use the result counter:
rate(rpc_service_response_result_total{result="success"}[5m]) / rate(rpc_service_response_result_total[5m]). - A thrown request (e.g. connect timeout) still observes the histogram under
status='xxx'so failures stay visible, but including it in latency queries will push p99 toward the timeout duration.
rpcMetricsUtils
There is a list of utils, which helps to reduce cardinality, e.g. you should collect '2xx' instead of '200', '201', ... and other HTTP response statuses, because in most cases it doesn't matter.