Key monitoring concepts you should be aware of
Discover key monitoring concepts you need to be aware of when working with monitoring and time series visualization tools like Prometheus, Victoria Metrics or Grafana

Metric
A numeric measure or observation of something. Here are example metrics about requests on a web application. The name of the metrics should clarify what is actually measured:
requests_total
requests_success_total
request_errors_total
Time serie
A combination of a metric and it's labels.
requests_total{path="/", code="200"}
path="/"
and code="200"
are two labels associated to the requests_total
metric.
Time series labels are key/value pairs. Time series labels with same labels keys but different labels values are different time series. Here is an example:
# Two different time series
requests_total{path="/", code="200"}
requests_total{path="/contact", code="200"}
The requests_total{path="/", code="200"}
time serie could also be written
like this:
# __name__ is a special label that can be used to indicate the metric name
{__name__="requests_total", path="/", code="200"}
# __name__ can also be ommitted
{"requests_total", path="/", code="200"}
Cardinality
For the monitoring system, cardinality is the number of unique time series. In a metric point of view, cardinality is the number of unique time series produced for that given metric.
High cardinality may increase memory usage.
Churn rate
The speed at which old time series are replaced by new ones. High churn rate is mainly associated with labels whose values change frequently (timestamp, queryid, hash, etc).
High churn rate increases the total number of time series inside the monitoring system's database and may slow-down queries over multiple days.
Raw sample (or data point)
What is a raw sample or data point
The (value, timestamp) pair associated to a time serie. A raw sample is also called a data point.
# Raw sample in Prometheus text exposition format
requests_total{path="/", code="200"} 123 4567890
The raw sample or data point associated to the requests_total{path="/", code="200"}
time serie is represented by 123
(sample value) and 4567890
(sample timestamp).
Sample's timestamp is added by the program that collects the metric in
Pull model monitoring systems.
In Push model monitoring systems, the timestamp is added directly by the application or client sending the metric.
Time series resolution (or step)
What is time series resolution
The minimum interval between raw samples (or data points) of a time serie. A time serie whose value is updated every 30 seconds has a resolution of 30 seconds.
In Pull model monitoring systems, resolution is controlled by clients collecting (scraping) the metrics and corresponds to the scrape interval (time interval separating two scrapes).
In Push model monitoring systems, resolution is an interval between time series raw samples timestamps received by the monitoring system.
Instant query and range query
Deduplication
Ensures only the last raw sample of time series is kept for each discrete X time-unit. If we have multiple scrapers on same targets, sending metrics to the monitoring system every 15s, configuring deduplication with X=15s can be useful to cleanup received duplicated data and avoid wasting storage space.
Downsampling
For each specific interval (5 minutes for instance), keep only the last sample among samples older that X days. Some monitoring tools like Victoria Metrics support configuring downsampling also per different sets of time series. Have a look at Victoria metrics downsampling for more.
Relabeling
Consists of modifying time series labels before they are stored. Have a look at
Prometheus-compatible relabeling for Prometheus/Victoria Metrics compatible relabeling examples.
Types of metrics
What are the different types of metrics
Counter
- Count some events (number of requests, logs, etc)
- Increases or stays the same over time
- Decreases only when the metric is reset to zero (restart of exposing service)
Well named counter metrics will generally have the following suffixes:
_total
_sum
_count
Most common metrics query languages functions used with counters are rate and increase
Gauge
Histogram
Summary
Commonly used metrics query languages functions
Rate and Increase
Mostly used on counter metrics. Here is the data sample we are going to use to clarify what those functions do:
nginx_http_requests_total 133 1740144001 # 2025-02-22T14:20:01Z
nginx_http_requests_total 133 1740144016 # 2025-02-22T14:20:16Z
nginx_http_requests_total 854 1740144031 # 2025-02-22T14:20:31Z
nginx_http_requests_total 854 1740144046 # 2025-02-22T14:20:46Z
nginx_http_requests_total 1671 1740144061 # 2025-02-22T14:21:01Z
This is the data returned by the nginx_http_requests_total
query on the time
ranging from 2025-02-22T14:20:01Z to 2025-02-22T14:21:01Z (1 minutes).
If we run the increase(nginx_http_requests_total[1m])
, we will calculate the number of new requests over the last one minute, between b=2025-02-22T14:21:01Z and a=2025-02-22T14:20:01Z:
- (value at b) - (value at a) = 1671 - 133 = 1538 new requests.
If we run the rate(nginx_http_requests_total[1m])
on that same time range, we will calculate the average speed at which requests increase in that time range, over the last minute (requests / second):
- [(value at b) - (value at a)] / (calculation time range in brackets)
- [1671 - 133] / 60 = 1538 / 60 = 25.63 requests / second