Using multiple tag keys as index

rvdheij · July 23, 2020, 12:02pm

I struggle a bit with the design of my metrics for InfluxDB 1.8. The data at hand has a two-dimensional index, so the combination of of two tags is needed to distinguish the object (think of Linux data by host, by application, by process). I like separate tags since that lets me sum() by host or by application, for example. There are also additional tags that qualify the object, but are not necessary to make it unique (like data center region, machine type, etc).
Now I am importing the data into a Jupyter Notebook with the InfluxDB Python library, and want to create a DataFrame with the proper index. The best I see is to use the show tag keys from to identify the index columns. But I lost the distinction between primary and secondary keys, and lost the order of the primary keys.
When you do a Grafana dashboard, the knowledge about which tags to use is in your head (or documentation). But when you want to build some automated logic, things are harder.
I am now considering to name the keys such that I can identify and order the primary keys. Is this a silly approach?

Anaisdg · August 10, 2020, 5:24pm

Hello @rvdheij,
I encourage you to take a look at this post.
If you use a pivot() you can easily convert your InfluxDB query output to a pandas dataframe
Example with pivot()

General info on using the client with pandas

rvdheij · August 10, 2020, 8:54pm

Thank you Anais. I also watched your talk on anomaly detection (and was going to do something with that, but need to look at patterns over time as well). I can get my stuff into Pandas just fine, I’m just struggling with the hierarchical multi-index to make my my measurement self-describing. -Rob

Anaisdg · August 10, 2020, 9:18pm

@rvdheij,
I’m sorry, I’m having trouble visualizing your problem. Can you provide an example of what you mean please? Have you tried using pivot() before converting to a df. yet?
Thank you.

rvdheij · August 10, 2020, 10:11pm

Hi Anais, I don’t think pivot() helps me because I have many values in my measurement. This is an example of my part of the dataframe. My samples are typically one minute apart (this shows part of one sample).

zvm_cpu_sytprp,core=26,cpc=0DA1F7,cpctype=3906,cputype=IFL,pfxcpuad=004D,polar=High,ssi=STSSI03,systid=GDLMSTL1,thread=1
zvm_cpu_sytprp,core=27,cpc=0DA1F7,cpctype=3906,cputype=IFL,pfxcpuad=004E,polar=High,ssi=STSSI03,systid=GDLMSTL1,thread=0
zvm_cpu_sytprp,core=27,cpc=0DA1F7,cpctype=3906,cputype=IFL,pfxcpuad=004F,polar=High,ssi=STSSI03,systid=GDLMSTL1,thread=1

Per sample I have multiple data points, distinguished by the combination of keys. What you can’t see from this dataframe is that the CPU number is the primary key, but other measurements use multiple keys in specific order (to aggregate over different levels).

Topic		Replies	Views
Schema design: how may tags InfluxDB 2 influxdb , schema , query , flux	5	2644	February 23, 2021
InfluxDB 1.8 query help (show all values with same timestamp, and specific tag) InfluxQL	2	1162	November 16, 2022
Schema design - Multiple field values (metrics) vs one tag + one value	5	995	June 25, 2021
Organizing multiple-tagged data? influxdb , schema , query	1	196	November 15, 2023
Compare two data set using Tags	4	511	September 23, 2022

Using multiple tag keys as index

Related topics