Using multiple tag keys as index

I struggle a bit with the design of my metrics for InfluxDB 1.8. The data at hand has a two-dimensional index, so the combination of of two tags is needed to distinguish the object (think of Linux data by host, by application, by process). I like separate tags since that lets me sum() by host or by application, for example. There are also additional tags that qualify the object, but are not necessary to make it unique (like data center region, machine type, etc).
Now I am importing the data into a Jupyter Notebook with the InfluxDB Python library, and want to create a DataFrame with the proper index. The best I see is to use the show tag keys from to identify the index columns. But I lost the distinction between primary and secondary keys, and lost the order of the primary keys.
When you do a Grafana dashboard, the knowledge about which tags to use is in your head (or documentation). But when you want to build some automated logic, things are harder.
I am now considering to name the keys such that I can identify and order the primary keys. Is this a silly approach?

Hello @rvdheij,
I encourage you to take a look at this post.
If you use a pivot() you can easily convert your InfluxDB query output to a pandas dataframe :slight_smile:
Example with pivot()
https://www.influxdata.com/blog/birch-for-anomaly-detection-with-influxdb/
General info on using the client with pandas

Thank you Anais. I also watched your talk on anomaly detection (and was going to do something with that, but need to look at patterns over time as well). I can get my stuff into Pandas just fine, I’m just struggling with the hierarchical multi-index to make my my measurement self-describing. -Rob

@rvdheij,
I’m sorry, I’m having trouble visualizing your problem. Can you provide an example of what you mean please? Have you tried using pivot() before converting to a df. yet?
Thank you.

Hi Anais, I don’t think pivot() helps me because I have many values in my measurement. This is an example of my part of the dataframe. My samples are typically one minute apart (this shows part of one sample).

zvm_cpu_sytprp,core=26,cpc=0DA1F7,cpctype=3906,cputype=IFL,pfxcpuad=004D,polar=High,ssi=STSSI03,systid=GDLMSTL1,thread=1
zvm_cpu_sytprp,core=27,cpc=0DA1F7,cpctype=3906,cputype=IFL,pfxcpuad=004E,polar=High,ssi=STSSI03,systid=GDLMSTL1,thread=0
zvm_cpu_sytprp,core=27,cpc=0DA1F7,cpctype=3906,cputype=IFL,pfxcpuad=004F,polar=High,ssi=STSSI03,systid=GDLMSTL1,thread=1

Per sample I have multiple data points, distinguished by the combination of keys. What you can’t see from this dataframe is that the CPU number is the primary key, but other measurements use multiple keys in specific order (to aggregate over different levels).