Why does Grafana take time to read data from InfluxDB

Sarindra_Therese_Ran · March 28, 2024, 12:44pm

Hey, I have a sysstat bucket with measurements such as pidstat, iostat, mpstat, vmstat, ram-memory, and diskspace. When I run my JAR file, it populates each measurement every N seconds. However, locally, I don’t have any issues with data loading on Grafana; everything seems perfect and works in real-time.
However, when I switch to a robust test server with multiple buckets, I find myself waiting for 20 to 30 minutes for the data to load, which loses the real-time aspect. I’ve already removed unnecessary tags and fields, but nothing has changed.
I use Java and Kafka for data ingestion.

//for pidstat

            logger.info("pidstat - Polling ...");
     
            Point point = Point.measurement(measurement)
                              .time(System.currentTimeMillis(), WritePrecision.MS);
   
            // Ajouter tous les champs de GrafanaLog à InfluxDB
            point.addTag("user", pidstatValue.getUsername());
            point.addTag("host", pidstatValue.getHost());
            point.addField("%usr", pidstatValue.getUsrPrct());
            point.addField("%system", pidstatValue.getSystemPrct());
            point.addField("%guest", pidstatValue.getGuestPrct());
            point.addField("%wait", pidstatValue.getWaitPrct());
            point.addField("%CPU", pidstatValue.getCpuPrct());
            point.addTag("CPU", pidstatValue.getCpu());
            point.addField("minflt/s", pidstatValue.getMinfltS());
            point.addField("majflt/s", pidstatValue.getMajfltS());
            point.addField("VSZ", pidstatValue.getVsz());
            point.addField("RSS", pidstatValue.getRss());
            point.addField("%MEM", pidstatValue.getMemPrct());
            point.addField("threads", pidstatValue.getThreads());
            point.addTag("Command", pidstatValue.getCommand());
            point.addTag("insideCommand", pidstatValue.getInsideCommand());
            writeApi.writePoint(point);
          }
          pidstatConsumer.commitSync();

// for vmstat

            logger.info("vmstat - Polling ...");
            Point point = Point.measurement(measurement)
                .time(System.currentTimeMillis(), WritePrecision.MS);
              
            point.addTag("user", vmstatValue.getUsername());
            point.addTag("host", vmstatValue.getHost());
            point.addTag("r", vmstatValue.getR());
            point.addField("b", vmstatValue.getB());
            point.addField("swpd", vmstatValue.getSwpd());
            point.addField("free", vmstatValue.getFree());
            point.addField("buff", vmstatValue.getBuff());
            point.addField("cache", vmstatValue.getCache());
            point.addField("si", vmstatValue.getSi());
            point.addField("so", vmstatValue.getSo());
            point.addField("bi", vmstatValue.getBi());
            point.addField("bo", vmstatValue.getBo());
            point.addField("in", vmstatValue.getIn());
            point.addField("cs", vmstatValue.getCs());
            point.addField("us", vmstatValue.getUs());
            point.addField("sy", vmstatValue.getSy());
            point.addField("id", vmstatValue.getId());
            point.addField("wa", vmstatValue.getWa());
            point.addField("st", vmstatValue.getSt());
            writeApi.writePoint(point);
          }
          vmstatConsumer.commitSync();

// …

scott · March 28, 2024, 1:43pm

@Sarindra_Therese_Ran There’s a lot that could factor into this. My first thought would be network latency, but that wouldn’t (shouldn’t) account for a 20-30 minute delay. Have you tried querying InfluxDB directly to see if the data is actually making it to InfluxDB on time? Try just using the /api/v2/query endpoint or even the built-in Data Explorer that runs as part of your InfluxDB server (I assume you’re using v2 OSS) to see how old the most recent data is. If you use the Data Explorer, switch to the raw data view.

A query similar to this will give you the most recent data:

from(bucket: "example-bucket")
    |> range(start: -30m)
    |> filter(fn: (r) => r._measurement == "example-measurement")
    |> last()

If the most recent data is within the last N seconds, then there is some kind of network delay between your Grafana and InfluxDB instances.
If the most recent data is 20-30 minutes old, there is something in your write pipeline that isn’t working as it should.

Sarindra_Therese_Ran · March 28, 2024, 2:09pm

@scott
Thank you for your response. As soon as I start the pipeline, the data is already available in InfluxDB. Locally, I really don’t have any issues; I’ve even created 30 other buckets, but I don’t experience this 20 to 30-minute latency. I’ve been told that it’s due to cardinalities, the tag and field, but I believe I’ve tried everything to fix it, including modifying data retention.

Is there a way for me to check if it’s the InfluxDB on the test server that is saturated? Or if there are other buckets causing this latency?

scott · March 28, 2024, 2:15pm

Cardinality will affect performance, but if the data is the same locally as it is on your test server, I don’t see this being the issue. You can check the cardinality of data using the influxdb.cardinality() function:

import "influxdata/influxdb"

influxdb.cardinality(
    bucket: "example-bucket",
    org: "example-org-name",
    host: "https://your-test-server.com:8086",
    token: "token-with-read-access-to-the-bucket",
    start: -1y,
)

If you start the pipeline and data is immediately in your test server, it should be immediately queryable.

Sarindra_Therese_Ran · March 28, 2024, 3:01pm

@scott
I executed the query with my old bucket and I’m at value 554. Now I’m running the pipeline with a new bucket, and I’m going from value 18 to 100, then from 100 to 200, and currently at 501 locally. Here are the commands that are executed during the pipeline:

pidstat -H -h -u -r -v -t
vmstat
iostat
mpstat
free (ram memory)
df (disk space)

The test server is often busy, so for each command below, the responses differ from mine locally, which will double or further increase the cardinality compared to mine. I think I need to filter, but since the insertion is automatic, do you know how I can get a list of all the tags and tell my Java code to exclude them? I haven’t tried this yet, and if you have any tutorials or links or suggestions to recommend, I’m all ears.

scott · March 28, 2024, 3:06pm

If you just barely started writing to the bucket, this behavior is somewhat expected. The cardinality is going to grow until all the different tag permutations are written into InfluxDB, after which is should level off and stay somewhat consistent. 500ish cardinality isn’t high at all. At that level, there should be no performance degradation.

Sarindra_Therese_Ran · April 22, 2024, 6:58am

@scott Thank you for your involvement . After I presented everything you taught me, we examined the configurations of each Grafana and InfluxDB server, and it turned out to be a UTC, time-related issue.

Topic		Replies	Views
Threads info statistics into Grafana Telegraf grafana	0	928	February 20, 2018
INFLUXDB performance	11	4953	June 21, 2018
Count data showing properly in influxdb table, but in grafana visualisation sometimes not correct InfluxDB 2	0	162	February 20, 2024
InfluxDB queries work on CLI, but stay pending from within a Grafana dashboard Dashboards influxdb , grafana , query , performance	1	755	November 16, 2020
Collectd with influx(time interval of metrics) Telegraf influxdb , collectd , grafana	4	1927	March 21, 2017

Why does Grafana take time to read data from InfluxDB

Related topics