I am trying to see if there is a documented change of behavior between how the group
function works in Cloud TSM and how it gets translated for Cloud Serverless (Running IOX).
I currently have a flux query like the following:
from(bucket: "qa-config")
|> range(start: 1970-01-01T00:00:00.000Z, stop:2023-08-21T19:49:02.143Z)
|> filter(fn: (r) => r["_measurement"] == "CustomerConfig")
|> filter(fn: (r) => exists r.customerId)
|> filter(fn: (r) => r["businessID"] == "fakeBusiness")
|> filter(fn: (r) => r["customerId"] == "1234")
|> group(columns: ["_measurement"], mode:"by")
|> sort(columns: ["_time"], desc: true)
|> unique(column: "customerId")
|> filter(fn: (r) => not exists r.softDelete or r.softDelete != "deleted")
Which in cloud TSM when operating on a dataset returns the latest row from the CustomerConfig
measurement where the customerId === 1234
, importantly the tagset on the latest row is returned “as is” without any changes. When I run the same query against a Cloud Serverless Organization I get the correct response, however the tagset only includes tags which have keys in across all measurements. That is to say if the latest measurement for my CustomerConfig
where the customerId === 1234
contains a tag like foo:"bar"
and its not present on prior measurements of customerId
it is not present in the response.
I know that Flux for cloud serverless is changed into SQL before querying, and I am wondering if there is a way I can see the transformed query prior to querying. Additionally if there is any documentation on this change for group
function.
Finally I am unsure if this place is the best place to post this question. If there is a better place please let me know.
Example with Data
So I realize that I haven’t provided any specific data sets for the query to operate on.
Here are two JSON objects which serve as rows inside of the CustomerConfig
measurement.
{
result: '_result',
table: 0,
_start: '1970-01-01T00:00:00Z',
_stop: '2023-08-21T19:49:02.143Z',
_time: '2023-08-21T18:57:21.955569681Z',
_value: 'test',
_field: 'customerName',
_measurement: 'CustomerConfig',
businessID: 'fakeBusiness',
currency: 'USD',
customerId: '1234',
email: 'foobar@foo.com',
paymentChannel: 'manual',
taxExempt: 'none'
}
{
result: '_result',
table: 1,
_start: '1970-01-01T00:00:00Z',
_stop: '2023-08-21T19:49:02.143Z',
_time: '2023-08-21T19:46:04.845713362Z',
_value: 'test',
_field: 'customerName',
_measurement: 'CustomerConfig',
address: '{}',
businessID: 'fakeBusiness',
creditBalance: '0',
currency: 'USD',
customerId: '1234',
email: 'foobar@foo.com',
offeringEnrollmentDate: '2023-08-21T19:46:04.818Z',
offeringId: 'abc123',
paymentChannel: 'manual',
taxExempt: 'none'
}
If I run the query mentioned above I get the following as a response
{
result: '_result',
table: 0,
_start: '1970-01-01T00:00:00Z',
_stop: '2023-08-21T19:49:02.143Z',
_time: '2023-08-21T19:46:04.845713362Z',
_value: 'test',
_field: 'customerName',
_measurement: 'CustomerConfig',
businessID: 'fakeBusiness',
currency: 'USD',
customerId: '1234',
email: 'foobar@foo.com',
paymentChannel: 'manual',
taxExempt: 'none'
}
Notice that the _time
value is correct, however my latest response is now missing offeringId
, offeringEnrollmentDate
, creditBalance
and address
.
Let me know if you’re not able to reproduce this issue or have other problems