Flux query generating separate line for each tag

Hello there,

I am new to InfluxDB and I hope this isn’t too ignorant a problem :frowning: But I’ve spent days on this and I do not know what I am not seeing.

I have generated a dataset )which represents expenses) using the line protocol and I want to create a simple line graph using this dataset, which I want to display using Grafana

Here is a snippet from the dataset I’m importing:

transaction,source=csv,category=none amount=-134.06,description="example description text" 1610409651297000000
transaction,source=csv,category=food amount=55.08,description="example description text 2" 1610520338723000000
transaction,source=csv,category=none amount=1000,description="example description text 3" 1610541142475000000

This is the flux query I am using at the moment

from(bucket: "expenses")
  |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
  |> filter(fn: (r) => r["_measurement"] == "transaction")
  |> filter(fn: (r) => r["_field"] == "amount")
  |> aggregateWindow(every: v.windowPeriod, fn: mean, createEmpty: false)
  |> yield(name: "mean")

My problem is that InfluxDB/Grafana will always create a separate line for each of the tags. However, the tags only represent metadata. I only want so see a single line representing all expenses from the dataset.

Is this clear? What am I doing wrong here?

Influx is a time series database. Meaning that timestamps are important. Looking at the sample data set, all three have unique timestamps. So pr default they will end up on separate lines. You need to either make sure they have the same timestamp when inserting into the database. Or you need to modify the time stamps (large every in aggregateWindow) for example.

When you have matching timestamps you should be able to use pivot() to “flatten” the data.

  |> pivot(
    rowKey:["_time"],
    columnKey: ["_field"],
    valueColumn: "_value"
  )

Thank you so much for responding!

I have a question, though: Since Influx is a time series database, why should the timestamps be unique? Each datapoint should be on a different timestamp, I would assume.

In my scenario every datapoint represents an expense which naturally has a timestamp. Or do I misunderstand the timestamps in line protocol?

What do you mean by “large every”?

From you original post, my understanding is that you want a single line?

To accomplish this you need to have same time for all the points on the same line when viewing - provided you want the output to be a time series as well. The timestamp is the key influx uses to figure out which of the datapoints belong together.

If the timestamps on ingestion to the database are different for all items, you need another time for influx to collect the data on. Hence the every in aggregateWindow needs to be “large”. Large here being relative to the data. Do you want an aggregated line pr day? Pr hour? Pr week? That depends on the use case. If your use case is for a restaurant - perhaps pr hour would make sense? For a household, perhaps pr day would make more sense?

I also notice that you use meanas the aggregate function. Also depending on use case. For a restaurant perhaps mean makes sense to see how the spending patterns is distributed for lunch vs dinner customers. For a family budget perhaps sum is more interesting.

I think I need to explain myself better.

My original intent was to generate a line chart with cumulativeSum, because my data contains expenses as well as income.

This is the query I am now using

from(bucket: "expenses")
  |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
  |> filter(fn: (r) => r["_measurement"] == "transaction")
  |> filter(fn: (r) => r["_field"] == "amount")
  |> filter(fn: (r) => r["category"] == "none" or r["category"] == "food")
  |> cumulativeSum()

And this is the time series chart that is generated from that (within the *Explore" tab of InfluxDB)

This is generally correct, only there are two separate charts for each of the category tabs (food/none). I want InfluxDB to completely ignore that there is a category tag at all. It is not relevant for this graph.

Ah! Sorry for being a bit dense. :slight_smile: I assumed you meant line as in log line. Of course it’s graph line in this context. :slight_smile:

I think you get what you want by dropping the category column like this

from(bucket: "expenses")
  |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
  |> filter(fn: (r) => r["_measurement"] == "transaction")
  |> filter(fn: (r) => r["_field"] == "amount")
  |> filter(fn: (r) => r["category"] == "none" or r["category"] == "food")
  |> drop(columns: ["category"])
  |> cumulativeSum()

Thank you!
This feels like the right solution, but somehow I end up with this. It looks like one line alright, but still not quite what I was expecting

Guessing that the values are in two tables still. Perhaps sort will do the trick? Or perhaps you also need to group the data?

from(bucket: "expenses")
  |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
  |> filter(fn: (r) => r["_measurement"] == "transaction")
  |> filter(fn: (r) => r["_field"] == "amount")
  |> filter(fn: (r) => r["category"] == "none" or r["category"] == "food")
  |> drop(columns: ["category"])
  |> group()
  |> sort(columns: ["_time"])
  |> cumulativeSum()
1 Like

Yes! Grouping and sorting the data by time did do the trick!

Thank you so much for your help! I really appreciate it!

1 Like

Your welcome. :slight_smile: BTW: If you mark my previous post as “solution” it will be simpler for other people searching for similar issues to find it.

1 Like