Grouping/correlating data

I have come across influxdb today in my search for a time series database.
I have managed to import csv files containing performance metrics from the storage arrays (hosts, virtualization, switches, etc will follow) using telegraf, but now I feel stuck.
The only reason for doing this (I can generate graphics many other ways also) is data correlation. For start, I want to group LUNs into groups. A LUN is allocated to a server and the server is part of a “server type group”.
For example:

  • LUN1 and LUN2 are allocated to ServerA, Linux server
  • LUN3, 4 and 5 are allocated to ServerB, Linux server
  • Both ServerA and ServerB are part of a Docker cluster
    I want to be able to see the metrics of all the LUNs allocated to servers in the Docker cluster or to Linux servers, etc.
    Can this be achieved? And if yes, how? Any feedback is appreciated!

@AndreiQ Yes, this can be achieved using Flux, but it also depends on how your data is structured. What’s the schema you’re working with? Is the LUN being stored as a tag?

@scott,
This is a sample of the CSV I am importing.

measurement hosts timefield LUN1 LUN2 LUN3 LUN4
read_iops Site1 5/27/2019 13:05 11 85 30 1001
read_iops Site1 5/27/2019 13:06 12 345 4 393
read_iops Site1 5/27/2019 13:07 9 165 3 947
read_iops Site1 5/27/2019 13:08 12 311 1 1416
read_iops Site1 5/27/2019 13:09 9 840 11 410
read_iops Site1 5/27/2019 13:10 9 1076 5 592
read_iops Site1 5/27/2019 13:11 11 266 1 624
read_iops Site1 5/27/2019 13:12 12 671 1 1433

I process the initial data to add the measurement and hosts columns so further modifications will not be a problem.

I’m not 100% sure how this CSV schema would get imported into InfluxDB, but I’m thinking it needs to be updated a bit. With this structure, I’m guessing it will write LUN1, LUN2, LUN3, and LUN4 as fields, which I don’t think is what you want. I think the LUNs should be tags and read_iops should be the field stored under a higher-level measurement like disk. Something like this:

measurement host lun read_iops time
disk Site1 LUN1 11 5/27/2019 13:05
disk Site1 LUN1 12 5/27/2019 13:06
disk Site1 LUN1 9 5/27/2019 13:07
disk Site1 LUN1 12 5/27/2019 13:08
disk Site1 LUN1 9 5/27/2019 13:09
disk Site1 LUN1 9 5/27/2019 13:10
disk Site1 LUN1 11 5/27/2019 13:11
disk Site1 LUN1 12 5/27/2019 13:12
disk Site1 LUN2 85 5/27/2019 13:05
disk Site1 LUN2 345 5/27/2019 13:06
disk Site1 LUN2 165 5/27/2019 13:07
disk Site1 LUN2 311 5/27/2019 13:08
disk Site1 LUN2 840 5/27/2019 13:09
disk Site1 LUN2 1076 5/27/2019 13:10
disk Site1 LUN2 266 5/27/2019 13:11
disk Site1 LUN2 671 5/27/2019 13:12
disk Site1 LUN3 30 5/27/2019 13:05
disk Site1 LUN3 4 5/27/2019 13:06
disk Site1 LUN3 3 5/27/2019 13:07
disk Site1 LUN3 1 5/27/2019 13:08
disk Site1 LUN3 11 5/27/2019 13:09
disk Site1 LUN3 5 5/27/2019 13:10
disk Site1 LUN3 1 5/27/2019 13:11
disk Site1 LUN3 1 5/27/2019 13:12
disk Site1 LUN4 1001 5/27/2019 13:05
disk Site1 LUN4 393 5/27/2019 13:06
disk Site1 LUN4 947 5/27/2019 13:07
disk Site1 LUN4 1416 5/27/2019 13:08
disk Site1 LUN4 410 5/27/2019 13:09
disk Site1 LUN4 592 5/27/2019 13:10
disk Site1 LUN4 624 5/27/2019 13:11
disk Site1 LUN4 1433 5/27/2019 13:12

This structure would be much easier to work with. To answer your original question about how to visualize and correlate this data, what version of InfluxDB are you using?

InfluxDB 2.0
When I imported the data it looked fine. All LUNs could be represented individually or per site. Using your structure I could add host and host_groups column to make groups I wanted. Any other way?

@AndreiQ Ok, without changing the schema, I think the following would work. It creates two stream variables based on servers and simply drops the LUN columns that shouldn’t be included in that server metric. It then sets a creates and sets values for server and server-type columns on each row.

ServerA = from(bucket: "example-bucket")
  |> range(start: -1h)
  |> drop(fn: (column) => column =~ /LUN[3-5]/)
  |> set(key: "server", value: "A")
  |> set(key: "server-type", value: "linux")

ServerB = from(bucket: "example-bucket")
  |> range(start: -1h)
  |> drop(fn: (column) => column =~ /LUN[1-2]/)
  |> set(key: "server", value: "B")
  |> set(key: "server-type", value: "linux")

With these streams defined, you could then filter and visualize the metrics from each server:

ServerA
  |> filter(fn: (r) => r._measurement == read_iops)