Tag date is giving me very high cardinality

Hi,

I have a measurement that is not timestamp related.

I parse a file and inside, I get a date yyyy-mm-dd. I will insert :

timestamp (useless),
fields: filename
date ( yyyy-mm-dd )

Right now, the only query I want to make is make a delete based on date criteria

If date is a field, I will not be able to make this delete.

If date is a tag, I will get very high cardinality, which seems problematic in term of memory

What should be the approach if I need to work with dates that is not the timestamp

hi @Julien_Cappiello ,

did you have a memory issue ?
do you have records with the same filename and date
or is the date always unique for a given filename ?
If you have duplicate values there , the timestamp is not useless
because it will prevent your duplicate records to be overwritten.

If you don’t have duplicates , you could parse the file so that the date becomes the timestamp ,

hope this helps ,
best regards

Hi @MarcV,

Thanks for your answer.

I have still not “memory issues”, but I am just trying to optimize memory consuming of InfluxDB.

To answer your question:
do you have records with the same filename and date --> NO
or is the date always unique for a given filename ? --> YES
BUT date is not unique, so it can’t be the timestamp, this were my first try indeed.

That is exactly my problem.

can you make filename a tag ?
If you can , the date must only be unique for a given filename.

I could make filename a tag, but it would be worse for cardinality.

It would be better to have filename as data, and date as tag, converted to epoch so it is int.

If I have 1000 files, on 100 days, it would only make 100 series.

I have already did that, but I add the hope that I could optimize cardinality a bit more.

But what is the unique key for your data when it’s not the time and not the filename? What are the metrics you’re going to keep and retrieve? You can use the date to construct the timestamp for InfluxDB and use the filename to build the series. That way you could say retrieve the number of files with timestamp in Feb 28, or whatever.

Thank you for your response rvdheij,

Unique key is always Time/Filename. Those are my metrics. I have no more fields in measurement

I can’t use date to build timestamp as it is truncated to day. You seem to have understood that.

I think I am not understanding your solution.

Can you please elaborate ?

My suggestion is to convert your date to a timestamp, like for midnight on that date. You build metrics like this to write
data,file= obs=1

It’s a bit artificial since you don’t really have anything to measure and don’t really have time series (with multiple observations per series over time). Would be different if you were dealing with size= for example, and wanted to report the amount of date by month or so.

If this is all you need, then a generic NoSQL database like mongodb would be a more obvious choice.