Flux behaves differently on same data in different buckets

Hello,
I am scratching my head for second day over this problem.

Using InfluxDB 1.8.5

I have data stored in measurement “modbus” in telegraf/autogen. I copied it into modbusdb/autogen via CLI:

use telegraf
SELECT * INTO modbusdb."autogen".modbus FROM modbus

Copying of the data seems ok, InfluxQL queries gives same results but Flux queries returns no data after changing the bucket.

Original(working) flux query:

actual = from(bucket: "telegraf/autogen")
  |> range(start: -6mo)
  |> filter(fn: (r) => r._measurement == "modbus" and (r._field == "energy_delivered"))
  |> filter(fn: (r) => r.name == "something.example.com")
  |> window(every: 1mo, period: 1mo)
  |> last()
  |> group(columns: ["_time", "_start", "_stop", "_value"], mode: "except")
  
diff = from(bucket: "telegraf/autogen")
  |> range(start: -6mo)
  |> filter(fn: (r) => r._measurement == "modbus" and (r._field == "energy_delivered"))
  |> filter(fn: (r) => r.name == "something.example.com")
  |> window(every: 1mo, period: 1mo)
  |> last()
  |> group(columns: ["_time", "_start", "_stop", "_value"], mode: "except")
  |> difference(
    nonNegative: false,
  	columns: ["_value"],
  	keepFirst: true
  )
  
join(
  tables: {actual:actual, diff:diff},
  on: ["_time"]
)

|> map(fn: (r) => ({
	_time:   r._time,
	_value1: r._value_actual,
	_value2: r._value_diff
}))

Altered flux query:

actual = from(bucket: "modbusdb/autogen")
  |> range(start: -6mo)
  |> filter(fn: (r) => r._measurement == "modbus" and (r._field == "energy_delivered"))
  |> filter(fn: (r) => r.name == "something.example.com")
  |> window(every: 1mo, period: 1mo)
  |> last()
  |> group(columns: ["_time", "_start", "_stop", "_value"], mode: "except")
  
diff = from(bucket: "modbusdb/autogen")
  |> range(start: -6mo)
  |> filter(fn: (r) => r._measurement == "modbus" and (r._field == "energy_delivered"))
  |> filter(fn: (r) => r.name == "something.example.com")
  |> window(every: 1mo, period: 1mo)
  |> last()
  |> group(columns: ["_time", "_start", "_stop", "_value"], mode: "except")
  |> difference(
    nonNegative: false,
  	columns: ["_value"],
  	keepFirst: true
  )
  
join(
  tables: {actual:actual, diff:diff},
  on: ["_time"]
)

|> map(fn: (r) => ({
	_time:   r._time,
	_value1: r._value_actual,
	_value2: r._value_diff
}))

After trying different things I figured out that maybe the filter r.name == “something.example.com” is causing trouble because even the most simple flux query doesnt work with it. When I remove it than the query starts returning some data. I thought that maybe the “name” keyword is reserved so I copied the data again with:

SELECT *, "name" AS hostname INTO modbusdb."autogen".modbus FROM modbus

but filtering on hostname field behaves the same.
Filtering on name with InfluxQL works without problem.

Has somebody idea what is going on?

@jkalnik This is curious behavior. There shouldn’t be anything wrong about filtering by r.name. It’s not a reserved keyword. I’d have to see actual data to get a better understand of what’s going on.

Also, I think there’s a better way to get the results you’re after. This query will give you the same results without a join:

from(bucket: "telegraf/autogen")
  |> range(start: -6mo)
  |> filter(fn: (r) => r._measurement == "modbus" and r._field == "energy_delivered")
  |> filter(fn: (r) => r.name == "something.example.com")
  |> aggregateWindow(every: 1mo, fn: last)
  |> duplicate(column: "_value", as: "diff")
  |> difference(columns: ["diff"], keepFirst: true)
  |> map(fn: (r) => ({
      _time: r._time,
      _value1: r._value,
      _value2: r.diff
  }))

Hello, thank you for your reply.
Yes, your query seems a lot simpler/better. I will use it.

I copied the influx databases to testing server, anonymized data and stripped the records (production DB has gigabytes of data) so I can send it to you. I get the same behavior on the testing server with this sample data so you should be able to replicate it.

I made a backup for you with: influxd backup -portable ./backup_sample
You can access it here (can’t upload to this forum directly because of new member policy).
backup

You can test the behavior with:

from(bucket: "telegraf/autogen")
  |> range(start: -6mo)
  |> filter(fn: (r) => r._measurement == "modbus" and r._field == "energy_delivered")
  |> filter(fn: (r) => r.name == "otsaleelmera.xyz.com")
  |> aggregateWindow(every: 1mo, fn: last)
  |> duplicate(column: "_value", as: "diff")
  |> difference(columns: ["diff"], keepFirst: true)
  |> map(fn: (r) => ({
      _time: r._time,
      _value1: r._value,
      _value2: r.diff
  }))

When you change the bucket to modbusdb/autogen the query stops working altough there should be identical data.

@jkalnik After restoring the sample backup you provided, I found that the modbusdb database doesn’t contain any data. The data base and autogen retention policy exist, but they’re empty.
I used the following InfluxQL query to copy the data over:

SELECT * INTO modbusdb.autogen.modbus FROM telegraf.autogen.modbus

The problem with this is, because there is no GROUP BY clause, tags are copied over as fields. That’s why you don’t get any results when you query the copied data. The name column doesn’t exist. You have a name field.

To correct this, you need to delete and recopy all of the data, but include a GROUP BY clause to maintain tags and series groupings (see Common issues with the INTO clause).

SELECT * INTO modbusdb.autogen.modbus FROM telegraf.autogen.modbus GROUP BY *

Once you copy the data that way, the Flux queries will behave the same.

@scott
well thats weird with that missing data. Nevermind.

I can confirm the query now works fine. Now I see I should read the documentation more carefully next time.

Thank you very much for your time and advices.

Happy to help. Cheers!