Query large set of logs

andiikaa · May 16, 2024, 8:21am

Hey,
we are using the influxdb helm chart (influxdb 2.7.4 - chart 2.1.2) to deploy influxdb on a onpremise single node k3s instance.
We collect logs from all of our pods with fluentbit and store them within influxdb.

Sometimes we want to collect all the stored logs (last 7 days) for diagnosis purposes via a small go service, write the logs to file zip and download it.
For most components this seems to work fine, but there is some component which logs quite too much.
When i query the data, not all the data that should be included is returned. For example, daily entries for this specific component are missing, even if you limit the number of results.
The rows are also missing if i just use the influxdb webui for that range.
If i make the range smaller (2h) then it works.

I tried to take more use of the pushdown methods as described in start-queries-with-pushdowns.

I also saw that influxdb 1.# has some “max-row-limit” or something like that, which does not seem to exist in version 2.#.

So here are my queries:

I try to first get all apps with the following query (to avoid getting a super big response):

from(bucket: "the_logs_bucket")
	|> range(start: -7d)
	|> limit(n: 1)
	|> keep(columns: ["kubernetes_labels_app"])
	|> distinct(column: "kubernetes_labels_app")

and then collect the logs for each of these apps within another query:

from(bucket: "the_logs_bucket")
  |> range(start: -7d)
  |> filter(fn: (r) => r.kubernetes_labels_app == "the_app_name")
  |> sort(columns: ["_time"], desc: true)
  |> limit(n: 1000)
  |> keep(columns: ["table", "_field", "_value", "_time", "kubernetes_labels_app"])

I use the “filter > sort > limit” because it is mentioned as pushdown methods. Before i tried:

from(bucket: "the_logs_bucket")
    |> range(start: -7d)
    |> filter(fn: (r) => r.kubernetes_labels_app == "the_app_name")
    |> keep(columns: ["table", "_field", "_value", "_time", "kubernetes_labels_app"])
    |> tail(n: 1000)

But all in all it seems to make no difference. The very latest logs are always missing. Even for that query it is not returning the last entry:

from(bucket: "the_logs_bucket")
    |> range(start: -7d)
    |> filter(fn: (r) => r.kubernetes_labels_app == "the_app_name")
    |> last()
    |> keep(columns: ["table", "_field", "_value", "_time", "kubernetes_labels_app"])

Is there any possibility to get this solved without reducing the amount of logs for that component?

Topic		Replies	Views
Query Latest Entries Fluxlang	2	458	May 31, 2022
Using pushdown functions in InfluxDB 1.8 Fluxlang influxdb , query , flux , performance , downsample	7	738	November 25, 2022
What are the limits of the influxdb when using the node.js sdk? InfluxDB 2 influxdb , time-series , query , performance	0	12	February 4, 2025
InfluxQL/Flux query to drop old pod series? influxdb , prometheus , influxql	0	541	February 11, 2019
Query latest tag values from a long time ago is very slow Fluxlang iot , query , flux , performance	6	1166	June 22, 2022

Query large set of logs

Related topics