Queries sometimes return empty results

I have been running InfluxDB for a long time now, feeding it with the same kind of simple data from Node Red all the time with the same programmatic data: Each measurement has exactly one value column that’s always a number (always a float or always an int, depending on the measurement) and three tags that I actually don’t care about currently :wink:

Now from the beginning of querying the data from those measurements with Grafana I always had it happen that the query from Grafana sporadically returned empty results, resulting in a graph showing empty. Just refreshing a second later typically worked - sometimes a different graph issuing another query at that point returned empty then, but mostly all was fine.

Now I have a simple query running from Node Red that started with the same behaviour, just sometimes returning empty so I constructed a loop that just reissued the query with a 3s delay until I got a valid response.

Suddenly even that started to fail though, meaning I would not get a proper response in hours.Today I tried the same queries on the DB host through Influx’s command line tool and I can still see the same issue: Sometimes the query just returns empty, if I press up and enter to reissue the command just below a second later it can start to work. Or the other way round, it works at first then not a second later it fails.

Unfortunately I don’t get any errors while doing so, neither does the InfluxDB journal show anything that stands out to me. Any ideas what’s going wrong or pointers at what to check/try out?

Example of my attempts:

All of those queries happened within at most 10s, so data was always in the range of the filter.
Text version of queries used:

Full:
SELECT mean("value"),max("value") FROM ( SELECT mean("value") AS value FROM "5/0/56" WHERE time > (now()-2m) and time < now() GROUP BY time(10s) fill(previous) ) WHERE time > (now()-2m) and time < now()

Inner only (for checking for actual data existence):
SELECT mean("value") AS value FROM "5/0/56" WHERE time > (now()-2m) and time < now() GROUP BY time(10s) fill(previous)

Cheers,
Chris

Hello @Alloc,
What version are you running? I haven’t heard of this before. I wonder if you could backup the data and reinstall? How long have you been running this instance?

How much data are you querying? High query load or system resource constraints can sometimes lead to intermittent query failures. Have you checked CPU and diskIO during these failures?

What does your shard size look like and retention period? I’m wondering if sometimes youre querying across hot and cold shards causing this issue? Even though that shouldn’t happen maybe its contributing.

Hi, thanks for looking into this, highly appreciate that!

I will try to answer to the best of my knowledge :slight_smile:

What version are you running?
1.8.10

I wonder if you could backup the data and reinstall?

Surely could, but I wonder if a reinstall could even help? Wouldn’t it rather be data related (meaning I’d have to lose it) as the software package would be the same?

How long have you been running this instance?

About 1.5 years, roughly since October 2022.

How much data are you querying?

Queries are only issued with e.g. Grafana open. But this also happens with just running that single query from the CLI without anything else issuing queries at that time - other than the quite frequent inserts of new data to different measurements.

High query load or system resource constraints can sometimes lead to intermittent query failures. Have you checked CPU and diskIO during these failures?

Hardly any IO, actually only InfluxDB writing to the database. Roughly up to 100 KiB/s peak, but not constant. It’s on a SSD, so that amount should be no issue?

What does your shard size look like and retention period?

Suppose that’s the info about the shards? (From influx_inspect report)

Summary:
  Files: 96
  Time Range: 2022-10-31T16:44:57.518785339Z - 2024-05-28T18:04:40Z
  Duration: 13801h19m42.481214661s 

Statistics
  Series:
     - node (est): 4420 (44%)
     - _internal (est): 5510 (55%)
  Total (est): 9956
Completed in 800.807201ms

Total size of /var/lib/influxdb/data is 373 MiB.
Retention period is the default autogen, i.e. unlimited retention.

I’m wondering if sometimes youre querying across hot and cold shards causing this issue?

I honestly don’t know what hot and cold shards are in InfluxDB specifically, but assuming the latest is always hot that should not be the case as that specific query I’m looking at these days will always query just the past minutes - started with 2 hours for the inner query, but currently testing with 2 minutes. And the time of day when running the query does not matter either, it really switches between failing and succeeding (in both directions) within seconds.

Is there any way to have influxdb or the CLI output details on why a query does not return anything?

Hello ,
can you please explain more.

Sorry, not sure what you mean? Any specific part that’s unclear?

Anyone any further ideas? Or any data I could provide to help pinpoint the issue?

I have exactly the same problem with a query in C#. I write several records to the bucket every 2 minutes. Every now and then it happens that the following query does not return any data, although there is data in the bucket. A short time later the query returns the data correctly.

string query = $@“from(bucket: ““my_bucket””)
|> range(start: -15m)
|> filter(fn: (r) =>
r[”“_measurement”“] == ““measure1"”
and r[”“location””] == ““loc1"”
and r[”“_field”"] == ““field1"”)
|> last()”;
var tables = await _connection.Client
.GetQueryApi()
.QueryAsync(query, _connection.Org);
var lastRecord = tables.SelectMany(table => table.Records)
.LastOrDefault();

Is a lock set when the data is written that prevents reading at that moment?