How to efficiently get the timestamp of the first and last records in a table?

asthomas · February 17, 2022, 9:48pm

I have this simple flux code:

indices = from (bucket: "Desktop")
    |> range(start: 0)
    |> filter(fn: (r) => r._measurement == "Simulated" and r._field == "index" and r._value < 1000)
    |> keep(columns: ["fullname", "_time", "_value"])
    |> group()

time1 = indices
    |> min(column: "_time")

rstart = (time1 |> findColumn(fn: (key) => true, column: "_time"))[0]

indices produces a table of 1000 rows, and takes about 3 seconds on a data set of 40 million values.

time1 produces a table with 1 row and takes negligible time.

rstart produces a timestamp, and takes about 3 seconds to compute, even though it is operating on a table of only one row.

I can replace findColumn with findRecord, like this:

rstart = (time1 |> findRecord(fn: (key) => true, idx: 0))._time

This also takes about 3 seconds to run.

Ultimately I want to get the timestamps of the first and last records in the indices table. Is there a faster way to reference those, like a straight array reference, something like this?

rstart = indices[0]._time
rstop = indices[length(indices)]._time

scott · February 18, 2022, 9:11pm

@asthomas The following method takes advantage of some pushdown optimizations to ensure the query is as performant as it can be. Give it a try and see how it works:

indices = {
    _data = from(bucket: "Desktop")
        |> range(start: 0)
        |> filter(fn: (r) => r._measurement == "Simulated" and r._field == "index")
        |> limit(n: 1000)

    return union(tables:[
        _data |> first(),
        _data |> last(),
    ]) |> findColumn(fn: (key) => true, column: "_time")
}
// Returns a array with the first time as the first element and the
// last time as the second element

// Use an array reference to reference the timestamps
rstart = indices[0]
rstop = indices[1]

asthomas · February 18, 2022, 11:56pm

Thank you @scott. That saved one of the three expensive calls. As far as I can tell, the original filter for the indices is scanning the entire database. In a database of 40 million records, the filter on “index” takes about 3 seconds. Oddly, the findColumn call also takes about 3 seconds.

Still, your suggestion has eliminated one apparent scan of the database (I had 2 findColumn calls), so it reduces the total run time for this query by around 25-30%.

Now, how can I eliminate the time spent in findColumn?

Topic		Replies	Views
Filter the first and last rows from a table Fluxlang	3	608	February 16, 2021
How do I get the last timestamp of all measurements filtered by a tag? Fluxlang	1	760	September 12, 2020
How to query by last record _time? Fluxlang	0	361	January 16, 2023
Efficient way to query only 1 record and 1 column (_time) of the last measurement for a filtered key Fluxlang	0	373	April 8, 2021
Flux: List all records of the most recent values (not only the last one) Fluxlang grafana , query , flux	1	1878	December 23, 2021

How to efficiently get the timestamp of the first and last records in a table?

Related topics