InfluxQL function to use GROUP BY and SELECT * together

Hey there,

we are using InfluxDB v1.8 for our time-series measurements that we capture with ROS. We mainly rely on the feature of GROUP BY time(timeinterval) to synchronize our measurement data.

Previously, we had a configuration .json file which we used for a Python script, that would then generate the SQL query in the form of

SELECT $$SIGNALS$$ FROM "$$MEASUREMENT$$" WHERE time >= $$STARTTIME$$ms and time <= $$ENDTIME$$ms GROUP BY time($$SAMPLETIME$$) fill(previous);

where for every Field ($$SIGNALS$$ in the above template) we are interested in, we would append
mean("$$SIGNAL$$") AS "$$SIGNAL$$" seperated by comman and space, of course.

Now to avoid having to type out every interesting Field in the measurement, because we capture a lot of signals and have changing sensor setups, I would like to simply query all Fields in the measurement, so
SELECT *.

But, as I mentioned, the GROUP BY time(timeinterval) function is important to us. Since this function requires a InfluxQL function in the SELECT statement, as pointed out here in the docs, this won’t work right away.

In a previous issue on this forum here, the solution was to use SELECT FIRST(*). Now, some ROS messages can be dynamic, so that means the number of objects, e.g. an image detection listing all objects in the picture, can change over time. I suspect that this method thus won’t work for me, since this cuts down the measurement to the first point in time where all Fields have had data for the first time. If the maximum number of objects is reached at any other point in time than at measurement start, the measurement will get shortened by this query.

The other proposed method in this thread was to use GROUP BY *,time(__), but I also haven’t managed to get that working.

Could you please help me formulate the InfluxQL query in such a way that SELECT * and GROUP BY time(timeinterval) work at the same time when not all Fields provide a value at measurement start and the fill(previous) method is not changed?

Regards
Ludwig

As a general definition, when using GROUP BY any “column” that you want returned must be in the grouping (the aggr key) or use an aggregation function (what you want to measure).
In InfluxDB v1.x you can group only tags, and aggregate only fields (this is not 100% true, but I’d avoid anything against it)

where for every Field ($$SIGNALS$$ in the above template) we are interested in, we would append
mean(“$$SIGNAL$$”) AS “$$SIGNAL$$” seperated by comman and space, of course.

Since you build that query dynamically and calculate the mean of several defined fields, you can just write something like

SELECT mean(*) FROM __ WHERE __ GROUP BY __

this will return the mean of all the fields, but there is a catch… you can’t control aliases.
by manually typing mean(abc) AS abc you can control each alias (keeping the original name for example), when using mean(*) you can’t, the result will be {function}_{field} → mean_abc and you can do nothing about it…

Thank you so much for the help. Initially i thought this would solve the issue, too.

But to me, it appears that the fact that not all fields start at the same time and not all fields are available near the start of the measurement, the query does not deliver the expected results. Without fill(previous); the result is empty, and with fill the results are filled with NaN up until when all fields have yielded at least one data point.

It’s strange that such a query for such a use case does not work. But I don’t see how the query should be formulated differently.

FILL(previous) respects time range filters, meaning it can’t fill anything unless it has a previous value inside the whole filtered range (as you noticed).
if viable you can increase your time range, but it won’t guarantee that you will find data to fill gaps… this kind of situation is highly dependent on the data at hand