Cloud TSM Query vs Cloud Serverless Query Flux Group Function

I am trying to see if there is a documented change of behavior between how the group function works in Cloud TSM and how it gets translated for Cloud Serverless (Running IOX).

I currently have a flux query like the following:

from(bucket: "qa-config")
  |> range(start: 1970-01-01T00:00:00.000Z, stop:2023-08-21T19:49:02.143Z)
          |> filter(fn: (r) => r["_measurement"] == "CustomerConfig")
          |> filter(fn: (r) => exists r.customerId)
          |> filter(fn: (r) => r["businessID"] == "fakeBusiness")
          |> filter(fn: (r) => r["customerId"] == "1234")
          |> group(columns: ["_measurement"], mode:"by")
          |> sort(columns: ["_time"], desc: true)
          |> unique(column: "customerId")
          |> filter(fn: (r) => not exists r.softDelete or r.softDelete != "deleted")

Which in cloud TSM when operating on a dataset returns the latest row from the CustomerConfig measurement where the customerId === 1234, importantly the tagset on the latest row is returned “as is” without any changes. When I run the same query against a Cloud Serverless Organization I get the correct response, however the tagset only includes tags which have keys in across all measurements. That is to say if the latest measurement for my CustomerConfig where the customerId === 1234 contains a tag like foo:"bar" and its not present on prior measurements of customerId it is not present in the response.

I know that Flux for cloud serverless is changed into SQL before querying, and I am wondering if there is a way I can see the transformed query prior to querying. Additionally if there is any documentation on this change for group function.

Finally I am unsure if this place is the best place to post this question. If there is a better place please let me know.

Example with Data
So I realize that I haven’t provided any specific data sets for the query to operate on.

Here are two JSON objects which serve as rows inside of the CustomerConfig measurement.

{
  result: '_result',
  table: 0,
  _start: '1970-01-01T00:00:00Z',
  _stop: '2023-08-21T19:49:02.143Z',
  _time: '2023-08-21T18:57:21.955569681Z',
  _value: 'test',
  _field: 'customerName',
  _measurement: 'CustomerConfig',
  businessID: 'fakeBusiness',
  currency: 'USD',
  customerId: '1234',
  email: 'foobar@foo.com',
  paymentChannel: 'manual',
  taxExempt: 'none'
}
{
  result: '_result',
  table: 1,
  _start: '1970-01-01T00:00:00Z',
  _stop: '2023-08-21T19:49:02.143Z',
  _time: '2023-08-21T19:46:04.845713362Z',
  _value: 'test',
  _field: 'customerName',
  _measurement: 'CustomerConfig',
  address: '{}',
  businessID: 'fakeBusiness',
  creditBalance: '0',
  currency: 'USD',
  customerId: '1234',
  email: 'foobar@foo.com',
  offeringEnrollmentDate: '2023-08-21T19:46:04.818Z',
  offeringId: 'abc123',
  paymentChannel: 'manual',
  taxExempt: 'none'
}

If I run the query mentioned above I get the following as a response

{
  result: '_result',
  table: 0,
  _start: '1970-01-01T00:00:00Z',
  _stop: '2023-08-21T19:49:02.143Z',
  _time: '2023-08-21T19:46:04.845713362Z',
  _value: 'test',
  _field: 'customerName',
  _measurement: 'CustomerConfig',
  businessID: 'fakeBusiness',
  currency: 'USD',
  customerId: '1234',
  email: 'foobar@foo.com',
  paymentChannel: 'manual',
  taxExempt: 'none'
}

Notice that the _time value is correct, however my latest response is now missing offeringId , offeringEnrollmentDate, creditBalance and address.
Let me know if you’re not able to reproduce this issue or have other problems

Hello @Daniel_Wasserlauf,
Welcome!
Hmm I’m not sure that there is a way to see the transformed SQL query. I’ll ask around.
You could try querying with SQL to compare the differences.

@Daniel_Wasserlauf InfluxDB Cloud Serverless doesn’t convert Flux to SQL before querying the 3.0 storage engine. Flux connects to IOx through a gRPC layer it has implemented that is supposed to emulate what the TSM engine provides. I’m guessing this behavior may be stemming from somewhere in that implmentation.