Aggregating results by a field in chronograf

dani09 · July 18, 2024, 2:56pm

I’m a beginner in influx and have been trying to run a query with no success.

I have a measurement called “hotel_search”. It contains a “has_availability” tag and the following fields: hotel_id, checkin and checkout. I’m currently trying to find the TOP unavailable hotels. The desired outcome is:

hotel_id,unavailable_results (amount of failed searches)
1, 100
2, 90
3, 80

Initially I’ve tried to group my measurements by the hotel_id, but it fails as it is a field.

Is there another way I can achieve this?

scott · July 18, 2024, 3:01pm

@dani09 Are you using Flux or InfluxQL in Chronograf?

dani09 · July 18, 2024, 3:14pm

@scott I’m using InfluxQL.

scott · July 18, 2024, 3:28pm

Ok, you’re going to need to use a subquery here. The inner query should calculate the number of vacancies per hotel. The outer query should return the top N results of the inner query.

How do you calculate the number of vacancies per hotel? Is the has_availability tag scoped to each hotel or to each room?

dani09 · July 18, 2024, 4:27pm

@scott It’s scoped to each hotel, and it’s actually a boolean field. It indicates if the hotel has any available rooms. Unfortunately I do not know the amount of rooms per hotel.

scott · July 18, 2024, 4:59pm

Ok, I may have misunderstood the original question. What identifies a hotel as a “TOP unavailable hotel”? How should the hotels be sorted? In your example desired results above, what does unavailable_results represent and how is it calculated?

dani09 · July 18, 2024, 5:42pm

@scott the unavailable_results represent the amount of points in which the has_availability tag is 0, ie, the hotel has no available rooms. The top unavailable hotels is a list of hotels whose searches resulted in no available rooms. They should be ordered by the unavailable_results in a descending order

scott · July 18, 2024, 5:57pm

Ok, I think I understand. But looking back at your original description, I want to clarify something:

has_availability is something I would typically store as a field, since it likely changes over time. hotel_id is something I’d typically store as a tag, since it’s constant and identifies where the data comes from. What do checkin and checkout represent? If these are “static” check-in and check-out times, I’d also typically store these as tags.

I’m wondering if we’re mixing up terminology or if this is the actual schema of your data. The reason I ask is because this will change how and if you can query the results you’re looking for. Just to be sure, could you provide the results from the following queries:

SHOW FIELD KEYS FROM hotel_search

SHOW TAG KEYS from hotel_search

dani09 · July 18, 2024, 6:28pm

@scott That’s the actual schema of my data. Actually, I’m mantaining someone else’s work, so I didn’t think much of the schema. Your points sound very valid tho. My issues running the query probably stems from the poorly designed schema then

scott · July 18, 2024, 7:21pm

Unfortunately, I think this is the case. In an ideal world, hotel_id would be a tag and you could group by it and compute a count of all rows where the has_availability field per hotel ID. But InfluxQL can’t group by fields.

There may be a potential workaround if you’re willing/able to use Flux.

dani09 · July 18, 2024, 8:00pm

@scott Thanks for your help, Scott.

I’m not particularly familiar with Flux. Can you point to me which feature/concept might help me with this workaround?

scott · July 18, 2024, 8:24pm

If you’re using InfluxDB 1.x, you’ll first need to make sure Flux is enabled. If you’re using InfluxDB 2.x, Flux is the primary query language. In Chronograf, you can use the Flux Query Builder or Script Editor.

Flux is essentially a functional scripting language designed specifically for querying and processing data. It gives you a lot more flexibility when you need to modify the overall structure of your data (which you do in this case).

With what I know about the schema, the Flux query would look something like this:

from(bucket: "your-bucket-name")
    |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
    |> filter(fn: (r) => r._measurement == "hotel_search")
    |> filter(fn: (r) => r._field == "hotel_id")
    |> map(fn: (r) => ({_time: r._time, hotel_id: r._field, unavailable_results: "tmp" }))
    |> group(columns: ["hotel_id"])
    |> count(column: "unavailable_results")
    |> group()
    |> sort(columns: ["unavailable_results"], desc: true)
    |> top(n: 5)

Topic		Replies	Views
Group by a field with FLUX InfluxDB 2 influxdb , grafana , query , flux	5	3706	June 27, 2023
Different result between Chronograf query vs HTTP query InfluxDB 2 influxdb , chronograf	3	511	October 17, 2020
Convert to from InfluxQL to FLUX InfluxDB 2 influxdb , influxql , query , flux	3	1991	June 10, 2021
Chronograf return different result then influx cli Dashboards influxdb , chronograf	2	1039	July 11, 2017
How to use query count("field"),"field2"? Telegraf influxdb , grafana	1	593	July 2, 2020

Aggregating results by a field in chronograf

Related topics