Aggregating results by a field in chronograf

I’m a beginner in influx and have been trying to run a query with no success.

I have a measurement called “hotel_search”. It contains a “has_availability” tag and the following fields: hotel_id, checkin and checkout. I’m currently trying to find the TOP unavailable hotels. The desired outcome is:

hotel_id,unavailable_results (amount of failed searches)
1, 100
2, 90
3, 80

Initially I’ve tried to group my measurements by the hotel_id, but it fails as it is a field.

Is there another way I can achieve this?

@dani09 Are you using Flux or InfluxQL in Chronograf?

@scott I’m using InfluxQL.

Ok, you’re going to need to use a subquery here. The inner query should calculate the number of vacancies per hotel. The outer query should return the top N results of the inner query.

How do you calculate the number of vacancies per hotel? Is the has_availability tag scoped to each hotel or to each room?

@scott It’s scoped to each hotel, and it’s actually a boolean field. It indicates if the hotel has any available rooms. Unfortunately I do not know the amount of rooms per hotel.

Ok, I may have misunderstood the original question. What identifies a hotel as a “TOP unavailable hotel”? How should the hotels be sorted? In your example desired results above, what does unavailable_results represent and how is it calculated?

@scott the unavailable_results represent the amount of points in which the has_availability tag is 0, ie, the hotel has no available rooms. The top unavailable hotels is a list of hotels whose searches resulted in no available rooms. They should be ordered by the unavailable_results in a descending order

Ok, I think I understand. But looking back at your original description, I want to clarify something:

has_availability is something I would typically store as a field, since it likely changes over time. hotel_id is something I’d typically store as a tag, since it’s constant and identifies where the data comes from. What do checkin and checkout represent? If these are “static” check-in and check-out times, I’d also typically store these as tags.

I’m wondering if we’re mixing up terminology or if this is the actual schema of your data. The reason I ask is because this will change how and if you can query the results you’re looking for. Just to be sure, could you provide the results from the following queries:

SHOW FIELD KEYS FROM hotel_search
SHOW TAG KEYS from hotel_search

@scott That’s the actual schema of my data. Actually, I’m mantaining someone else’s work, so I didn’t think much of the schema. Your points sound very valid tho. My issues running the query probably stems from the poorly designed schema then :sweat_smile:

Unfortunately, I think this is the case. In an ideal world, hotel_id would be a tag and you could group by it and compute a count of all rows where the has_availability field per hotel ID. But InfluxQL can’t group by fields.

There may be a potential workaround if you’re willing/able to use Flux.

1 Like

@scott Thanks for your help, Scott.

I’m not particularly familiar with Flux. Can you point to me which feature/concept might help me with this workaround?

If you’re using InfluxDB 1.x, you’ll first need to make sure Flux is enabled. If you’re using InfluxDB 2.x, Flux is the primary query language. In Chronograf, you can use the Flux Query Builder or Script Editor.

Flux is essentially a functional scripting language designed specifically for querying and processing data. It gives you a lot more flexibility when you need to modify the overall structure of your data (which you do in this case).

With what I know about the schema, the Flux query would look something like this:

from(bucket: "your-bucket-name")
    |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
    |> filter(fn: (r) => r._measurement == "hotel_search")
    |> filter(fn: (r) => r._field == "hotel_id")
    |> map(fn: (r) => ({_time: r._time, hotel_id: r._field, unavailable_results: "tmp" }))
    |> group(columns: ["hotel_id"])
    |> count(column: "unavailable_results")
    |> group()
    |> sort(columns: ["unavailable_results"], desc: true)
    |> top(n: 5)