Basic example of Prometheus Query Language¶
PromQL (Prometheus Query Language) is a powerful and flexible query language used to select and aggregate time series data in Prometheus. Here are the main concepts with an example to illustrate how PromQL works.
Basic Concepts¶
- Metrics: These are the time series data points collected by Prometheus.
- Labels: These are key-value pairs that identify dimensions of a metric.
- Functions: These are built-in operations that can be applied to metrics.
- Operators: These are used to combine and manipulate time series.
Example Scenario¶
Assume we have a metric http_requests_total that tracks the total number of HTTP requests received by our server. This metric has labels such as method (e.g., GET, POST), status_code (e.g., 200, 404), and instance (the server instance).
Example Queries¶
1. Basic Metric Query¶
To get the current value of the http_requests_total metric:
http_requests_total
2. Filtered Query¶
To get the total number of HTTP requests for the GET method:
http_requests_total{method="GET"}
3. Aggregation¶
To get the total number of HTTP requests across all methods and instances:
sum(http_requests_total)
4. Rate Calculation¶
To get the per-second rate of HTTP requests over the last 5 minutes:
rate(http_requests_total[5m])
5. Aggregation with Rate¶
To get the per-second rate of HTTP requests for each method over the last 5 minutes:
sum by (method) (rate(http_requests_total[5m]))
Detailed Example¶
Let's say you want to monitor the error rate (status codes 4xx and 5xx) of HTTP requests over the last 5 minutes for each server instance. Here's how you can do it:
- Filter for Error Status Codes:
http_requests_total{status_code=~"4..|5.."}
The =~ operator is used for regular expression matching.
- Calculate the Rate:
rate(http_requests_total{status_code=~"4..|5.."}[5m])
This calculates the per-second rate of error requests over the last 5 minutes.
- Aggregate by Instance:
sum by (instance) (rate(http_requests_total{status_code=~"4..|5.."}[5m]))
This sums the error rates for each instance.
Putting It All Together¶
Here’s a complete PromQL query that calculates the per-second error rate of HTTP requests over the last 5 minutes, grouped by server instance:
sum by (instance) (rate(http_requests_total{status_code=~"4..|5.."}[5m]))
Explanation¶
http_requests_total{status_code=~"4..|5.."}: Selects the time series where the status code matches the pattern for 4xx and 5xx errors.rate(http_requests_total{status_code=~"4..|5.."}[5m]): Calculates the per-second rate over the last 5 minutes.sum by (instance): Sums the rates, grouped by theinstancelabel.
By mastering these concepts and examples, you'll be able to construct PromQL queries to monitor and analyze your Prometheus metrics effectively.