Tag date is giving me very high cardinality

Julien_Cappiello · July 30, 2019, 7:14am

Hi,

I have a measurement that is not timestamp related.

I parse a file and inside, I get a date yyyy-mm-dd. I will insert :

timestamp (useless),
fields: filename
date ( yyyy-mm-dd )

Right now, the only query I want to make is make a delete based on date criteria

If date is a field, I will not be able to make this delete.

If date is a tag, I will get very high cardinality, which seems problematic in term of memory

What should be the approach if I need to work with dates that is not the timestamp

MarcV · July 31, 2019, 11:20am

hi @Julien_Cappiello ,

did you have a memory issue ?
do you have records with the same filename and date
or is the date always unique for a given filename ?
If you have duplicate values there , the timestamp is not useless
because it will prevent your duplicate records to be overwritten.

If you don’t have duplicates , you could parse the file so that the date becomes the timestamp ,

hope this helps ,
best regards

Julien_Cappiello · July 31, 2019, 11:40am

Hi @MarcV,

Thanks for your answer.

I have still not “memory issues”, but I am just trying to optimize memory consuming of InfluxDB.

To answer your question:
do you have records with the same filename and date --> NO
or is the date always unique for a given filename ? --> YES
BUT date is not unique, so it can’t be the timestamp, this were my first try indeed.

That is exactly my problem.

MarcV · July 31, 2019, 11:41am

can you make filename a tag ?
If you can , the date must only be unique for a given filename.

Julien_Cappiello · July 31, 2019, 11:57am

I could make filename a tag, but it would be worse for cardinality.

It would be better to have filename as data, and date as tag, converted to epoch so it is int.

If I have 1000 files, on 100 days, it would only make 100 series.

I have already did that, but I add the hope that I could optimize cardinality a bit more.

rvdheij · August 2, 2019, 10:23am

But what is the unique key for your data when it’s not the time and not the filename? What are the metrics you’re going to keep and retrieve? You can use the date to construct the timestamp for InfluxDB and use the filename to build the series. That way you could say retrieve the number of files with timestamp in Feb 28, or whatever.

Julien_Cappiello · August 2, 2019, 2:10pm

Thank you for your response rvdheij,

Unique key is always Time/Filename. Those are my metrics. I have no more fields in measurement

I can’t use date to build timestamp as it is truncated to day. You seem to have understood that.

I think I am not understanding your solution.

Can you please elaborate ?

rvdheij · August 2, 2019, 9:08pm

My suggestion is to convert your date to a timestamp, like for midnight on that date. You build metrics like this to write
data,file= obs=1

It’s a bit artificial since you don’t really have anything to measure and don’t really have time series (with multiple observations per series over time). Would be different if you were dealing with size= for example, and wanted to report the amount of date by month or so.

If this is all you need, then a generic NoSQL database like mongodb would be a more obvious choice.

Topic		Replies	Views
Noobie questions Welcome & Getting Started influxdb , cardinality	2	602	November 30, 2022
Stop writing to one tag and create a new one InfluxQL influxdb	3	549	October 26, 2021
Timestamp uniqueness workarounds Store influxdb , time-series	6	2080	March 31, 2017
Cardinality of single tags in measurements Chronograf influxdb , chronograf	0	539	February 21, 2022
Remove tags post-hoc influxdb , time-series , schema	2	1232	November 12, 2020

Tag date is giving me very high cardinality

Related topics