Grouping/correlating data

AndreiQ · May 30, 2019, 7:03pm

I have come across influxdb today in my search for a time series database.
I have managed to import csv files containing performance metrics from the storage arrays (hosts, virtualization, switches, etc will follow) using telegraf, but now I feel stuck.
The only reason for doing this (I can generate graphics many other ways also) is data correlation. For start, I want to group LUNs into groups. A LUN is allocated to a server and the server is part of a “server type group”.
For example:

LUN1 and LUN2 are allocated to ServerA, Linux server
LUN3, 4 and 5 are allocated to ServerB, Linux server
Both ServerA and ServerB are part of a Docker cluster
I want to be able to see the metrics of all the LUNs allocated to servers in the Docker cluster or to Linux servers, etc.
Can this be achieved? And if yes, how? Any feedback is appreciated!

scott · June 4, 2019, 2:53pm

@AndreiQ Yes, this can be achieved using Flux, but it also depends on how your data is structured. What’s the schema you’re working with? Is the LUN being stored as a tag?

AndreiQ · June 5, 2019, 8:05am

@scott,
This is a sample of the CSV I am importing.

measurement	hosts	timefield	LUN1	LUN2	LUN3	LUN4
read_iops	Site1	5/27/2019 13:05	11	85	30	1001
read_iops	Site1	5/27/2019 13:06	12	345	4	393
read_iops	Site1	5/27/2019 13:07	9	165	3	947
read_iops	Site1	5/27/2019 13:08	12	311	1	1416
read_iops	Site1	5/27/2019 13:09	9	840	11	410
read_iops	Site1	5/27/2019 13:10	9	1076	5	592
read_iops	Site1	5/27/2019 13:11	11	266	1	624
read_iops	Site1	5/27/2019 13:12	12	671	1	1433

I process the initial data to add the measurement and hosts columns so further modifications will not be a problem.

scott · June 5, 2019, 2:30pm

I’m not 100% sure how this CSV schema would get imported into InfluxDB, but I’m thinking it needs to be updated a bit. With this structure, I’m guessing it will write LUN1, LUN2, LUN3, and LUN4 as fields, which I don’t think is what you want. I think the LUNs should be tags and read_iops should be the field stored under a higher-level measurement like disk. Something like this:

measurement	host	lun	read_iops	time
disk	Site1	LUN1	11	5/27/2019 13:05
disk	Site1	LUN1	12	5/27/2019 13:06
disk	Site1	LUN1	9	5/27/2019 13:07
disk	Site1	LUN1	12	5/27/2019 13:08
disk	Site1	LUN1	9	5/27/2019 13:09
disk	Site1	LUN1	9	5/27/2019 13:10
disk	Site1	LUN1	11	5/27/2019 13:11
disk	Site1	LUN1	12	5/27/2019 13:12
disk	Site1	LUN2	85	5/27/2019 13:05
disk	Site1	LUN2	345	5/27/2019 13:06
disk	Site1	LUN2	165	5/27/2019 13:07
disk	Site1	LUN2	311	5/27/2019 13:08
disk	Site1	LUN2	840	5/27/2019 13:09
disk	Site1	LUN2	1076	5/27/2019 13:10
disk	Site1	LUN2	266	5/27/2019 13:11
disk	Site1	LUN2	671	5/27/2019 13:12
disk	Site1	LUN3	30	5/27/2019 13:05
disk	Site1	LUN3	4	5/27/2019 13:06
disk	Site1	LUN3	3	5/27/2019 13:07
disk	Site1	LUN3	1	5/27/2019 13:08
disk	Site1	LUN3	11	5/27/2019 13:09
disk	Site1	LUN3	5	5/27/2019 13:10
disk	Site1	LUN3	1	5/27/2019 13:11
disk	Site1	LUN3	1	5/27/2019 13:12
disk	Site1	LUN4	1001	5/27/2019 13:05
disk	Site1	LUN4	393	5/27/2019 13:06
disk	Site1	LUN4	947	5/27/2019 13:07
disk	Site1	LUN4	1416	5/27/2019 13:08
disk	Site1	LUN4	410	5/27/2019 13:09
disk	Site1	LUN4	592	5/27/2019 13:10
disk	Site1	LUN4	624	5/27/2019 13:11
disk	Site1	LUN4	1433	5/27/2019 13:12

This structure would be much easier to work with. To answer your original question about how to visualize and correlate this data, what version of InfluxDB are you using?

AndreiQ · June 5, 2019, 3:22pm

InfluxDB 2.0
When I imported the data it looked fine. All LUNs could be represented individually or per site. Using your structure I could add host and host_groups column to make groups I wanted. Any other way?

scott · June 19, 2019, 2:41pm

@AndreiQ Ok, without changing the schema, I think the following would work. It creates two stream variables based on servers and simply drops the LUN columns that shouldn’t be included in that server metric. It then sets a creates and sets values for server and server-type columns on each row.

ServerA = from(bucket: "example-bucket")
  |> range(start: -1h)
  |> drop(fn: (column) => column =~ /LUN[3-5]/)
  |> set(key: "server", value: "A")
  |> set(key: "server-type", value: "linux")

ServerB = from(bucket: "example-bucket")
  |> range(start: -1h)
  |> drop(fn: (column) => column =~ /LUN[1-2]/)
  |> set(key: "server", value: "B")
  |> set(key: "server-type", value: "linux")

With these streams defined, you could then filter and visualize the metrics from each server:

ServerA
  |> filter(fn: (r) => r._measurement == read_iops)

Topic		Replies	Views
InfluxDB schema - new Series, Measurement, Database or Server? Store influxdb , schema	8	3834	April 25, 2017
Multidimensional Data to Influx InfluxDB 2 influxdb	1	219	January 18, 2024
Creating my InfluDB Store influxdb , time-series	3	919	May 4, 2017
Showing multiple sensors data on the same measurement Dashboards	18	2248	October 31, 2022
Join SQL and InfluxDB data InfluxDB 2 flux , join , sql	12	306	April 25, 2024

Grouping/correlating data

Related topics