npm.io
0.59.0 • Published 1 month ago

@lidofinance/api-metrics

Licence
MIT
Version
0.59.0
Deps
0
Size
32 kB
Vulns
0
Weekly
0

@lidofinance/api-metrics

Utils to work with common API metrics

Installation

yarn add @lidofinance/api-metrics

Getting started

collectStartupMetrics

Just call it in the same place where you call collectDefaultMetrics.

import { Registry, collectDefaultMetrics } from 'prom-client'
import getConfig from 'next/config'
import { METRICS_PREFIX } from 'config'
import buildInfoJson from 'build-info.json'
import { collectStartupMetrics } from '@lidofinance/api-metrics'

const { publicRuntimeConfig } = getConfig()
const { defaultChain, supportedChains } = publicRuntimeConfig

export const registry = new Registry()

collectStartupMetrics({
  prefix: METRICS_PREFIX,
  registry,
  defaultChain,
  supportedChains: supportedChains.split(','),
  version: process.env.npm_package_version ?? 'unversioned',
  commit: buildInfoJson.commit,
  branch: buildInfoJson.branch,
})

collectDefaultMetrics({ prefix: METRICS_PREFIX, register: registry })
rpcMetricsFactory

It's mostly internal util for @lidofinance/api-rpc and @lidofinance/eth-api-providers, but you should use it if you use RPC requests in some other way.

Exported metrics:

  • rpc_service_request (Counter) — labels chainId, provider. Incremented when a request starts.
  • rpc_service_request_methods (Counter) — labels method. One increment per JSON-RPC method in the batch.
  • rpc_service_response (Histogram) — labels chainId, provider, status. Observes response latency for every attempt, including thrown requests (recorded under status='xxx').
  • rpc_service_response_result (Counter) — labels chainId, provider, result ('success' | 'failure'). Use this for explicit success/failure rate queries without touching the histogram's label set.

Dashboard tips:

  • For latency (p50/p95/p99), filter the histogram to successful responses so long timeouts don't skew the distribution: histogram_quantile(0.95, sum by (le, provider) (rate(rpc_service_response_bucket{status!="xxx"}[5m]))).
  • For availability / success rate, use the result counter: rate(rpc_service_response_result_total{result="success"}[5m]) / rate(rpc_service_response_result_total[5m]).
  • A thrown request (e.g. connect timeout) still observes the histogram under status='xxx' so failures stay visible, but including it in latency queries will push p99 toward the timeout duration.
rpcMetricsUtils

There is a list of utils, which helps to reduce cardinality, e.g. you should collect '2xx' instead of '200', '201', ... and other HTTP response statuses, because in most cases it doesn't matter.