Complex queries

latest_anomaly = from(bucket: "bucket_anomaly")
  |> range(start: 0)  // Adjust the time range as necessary
  |> filter(fn: (r) => r["_field"] == "anomaly" and r["_value"] == 1)
  |> sort(columns: ["_time"], desc: true)
  |> limit(n: 1)
  |> findRecord(fn: (key) => true, idx: 0)


from(bucket: "bucket_logs")
  |> range(start: 0,stop: latest_anomaly._time)  // Adjust the time range as necessary
  |> filter(fn: (r) => r["_field"] == "prediction")
  |> filter(fn: (r) => r["_value"] == 1)
  |> sort(columns: ["_time"], desc: true)
  |>limit(n:5)
  |> keep(columns: ["_time"])

i have two buckets bucket_anomaly and bucket_logs. The above code returns the latest anomaly and the last 5 anomalous log timestamps that might be causing the issue. The bucket_logs has another field called log-content. Is there any way we can return that along with the timestamps

@Karthik_Pai Some suggestions and notes:

  • There’s a better way to get the latest anomaly that doesn’t require resorting your data and should be more performant. Instead of sort() |> limit(), just use the last() function:

    latest_anomaly = from(bucket: "bucket_anomaly")
      |> range(start: 0)  // Adjust the time range as necessary
      |> filter(fn: (r) => r["_field"] == "anomaly" and r["_value"] == 1)
      |> last()
      |> findRecord(fn: (key) => true, idx: 0)
    
  • When querying your bucket_logs bucket, in range(), you set the stop parameter to the _time of the last reported anomaly. It’s important to note that the stop time is exclusive, meaning it only keeps data with time stamps less than the specified time, not less than or equal to (noted in the docs). So you’re actually filtering out data with the timestamp that you want.

    I’d suggest using date.add() to add a nanosecond to the stop time:

    import "date"
    
    from(bucket: "bucket_logs")
      |> range(start: 0, stop: date.add(d: 1ns, to: latest_anomaly._time))
      // ...
    
  • Why not just query the log-content field instead of the prediction field?

  • When querying the logs, instead of using sort() |> limit(), just use tail().

  • If you want to include the log content in the results, add _value to keep()

With these suggestions, the query would look something like this:

import "date"

latest_anomaly = from(bucket: "bucket_anomaly")
    |> range(start: 0)
    |> filter(fn: (r) => r["_field"] == "anomaly" and r["_value"] == 1)
    |> last()
    |> findRecord(fn: (key) => true, idx: 0)

from(bucket: "bucket_logs")
    |> range(start: 0,stop: date.add(d: 1ns, to: latest_anomaly._time))
    |> filter(fn: (r) => r["_field"] == "log-content")
    |> tail(n: 5)
    |> keep(columns: ["_time", "_value"])