Help with a query - sum unique values and duration of played songs

Hi! Can someone help me with the query please.
What I would like: a list (top 10) of which songs have been listened the most. So basically somehow adding the unique values from “_value”.
AND how long I have listened to the songs and that per day/week/month.

How far I’ve come:
With this query I get the date/times from “_time” and the song titles, but I can’t manage to sum the unique values of “_value” and somehow sum the total duration of the played songs.

from(bucket: "home_assistant")
  |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
  |> filter(fn: (r) => r["_measurement"] == "media_player.spotify_i")
  |> filter(fn: (r) => r["domain"] == "media_player")
  |> filter(fn: (r) => r["_field"] == "media_title_str")
  |> filter(fn: (r) => r["friendly_name"] == "Spotify i")
  |> filter(fn: (r) => r["entity_id"] == "spotify_i")
  |> sort()
  |> yield(name: "sort")
  |> aggregateWindow(every: 1d, fn: count)

thanks in advance!

@anon13751922 Getting the top 10 you’ve listened to the most should be pretty simple:

from(bucket: "home_assistant")
    |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
    |> filter(fn: (r) => r["_measurement"] == "media_player.spotify_i")
    |> filter(fn: (r) => r["domain"] == "media_player")
    |> filter(fn: (r) => r["_field"] == "media_title_str")
    |> filter(fn: (r) => r["friendly_name"] == "Spotify i")
    |> filter(fn: (r) => r["entity_id"] == "spotify_i")
    |> group(columns:  ["_value"])
    |> count()
    |> group()
    |> top(n: 10)

When it comes to calculating how long you’ve listened to each song, it doesn’t appear you are recording that data. If you know how long each song is, you can multiply the count by the song duration. You could even store the song lengths in an external relational database and join that data with your time series data. Or, if song durations is available from the Spotify API, you can get it from there. But as is, from the data I can see, you don’t have the information necessary to calculate the amount of time you’ve spent listening to each song.

thanks for your answer!
When I run your query I get an error
invalid: cannot aggregate columns that are part of the group key
do you know what this has to do with?


in terms of duration . I understand that this is not possible if it is not in the table. I dug deeper and found another table with “media_duration” data.

from(bucket: "home_assistant")
  |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
  |> filter(fn: (r) => r["_measurement"] == "media_player.spotify_i")
  |> filter(fn: (r) => r["_field"] == "media_duration")
  |> filter(fn: (r) => r["friendly_name"] == "Spotify i")
  |> filter(fn: (r) => r["domain"] == "media_player")
  |> sort()
  |> yield(name: "sort")

output


not quite what I’m looking for because this output shows ALL states so for example pauses, idle and playing. I just want to see the status “playing” in a timeline.

however, that’s what I get from another table when I run this query

from(bucket: "home_assistant")
  |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
  |> filter(fn: (r) => r["_measurement"] == "media_player.spotify_i")
  |> filter(fn: (r) => r["_field"] == "state" and r["_value"] == "playing")
  |> aggregateWindow(every: v.windowPeriod, fn: last, createEmpty: false)

output:

so somehow i have to combine data from 2 tables? so I only see the _value playing in combination with the media_duration values.
If I’m asking too much please let me know

Oh, right, sorry. The following should work. It duplicates the value column as a new media_title_str column and then groups by that column. That way it can store the count in the _value column:

from(bucket: "home_assistant")
    |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
    |> filter(fn: (r) => r["_measurement"] == "media_player.spotify_i")
    |> filter(fn: (r) => r["domain"] == "media_player")
    |> filter(fn: (r) => r["_field"] == "media_title_str")
    |> filter(fn: (r) => r["friendly_name"] == "Spotify i")
    |> filter(fn: (r) => r["entity_id"] == "spotify_i")
    |> duplicate(column: "_value", as: "media_title_str")
    |> group(columns:  ["media_title_str"])
    |> count()
    |> group()
    |> top(n: 10)

Looking at the timestamps of the two fields (media_duration and state), it appears these are collected at different intervals. media_duration seems to be an “event” that emits a point whenever something happens and the activity interval is inconsistent. state appears to be a “metric” that is reported every 10s and isn’t tied to a specific event. The reason I say this is, typically, you could join these datasets based on the time, but state and media_duration don’t have common time values and there are no other identifying tags that you could join on. Unfortunately, I don’t know that you can do what you’re trying to do with the data as it is.

Do you have control over the schema of the data and how it’s ingested?

I assumed that the time values would be the same. Unfortunately. I’m already happy with the top 10 query! I will try to play around with the duration value myself, if necessary request more info from Spotify. thanks for your time and effort!!!

1 Like

I found out that it checks every 30 seconds. so if a song is playing for 2 minutes it logs a total of 4 times.
Is it also possible with this query to only show a song in the output if it occurs a second time? so it’s actually been playing for 1 minute?
would you like to help me with? I’m grateful.

the reason; otherwise I will also see all the songs that I have skipped and not listened to