Wrong max-series-per-database exceeded error?

Hi,

we face right now the problem, that we see within our telegraf output the following errors:

2019-09-21T18:20:51Z E! [outputs.influxdb]: when writing to [http://10.10.xxx.xxx:8086]: received error partial write: max-series-per-database limit exceeded: (1000000) dropped=832; discarding points
2019-09-21T18:20:51Z E! [outputs.influxdb]: when writing to [http://10.10.xxx.xxx:8086]: received error partial write: max-series-per-database limit exceeded: (1000000) dropped=1000; discarding points

So we have checked within the chronograf the metric _intermal.monitor -> database -> max(numSeries) -> 246k

So we should stay far below the limit of 1 million series per database. Nevertheless, we have reloaded the telegraf config and restarted the telegraf container without success.

What we haven’t done right now is to restart the influxdb, but we doesn’t assume, that this will help.

Any suggestions how to continue?

Best Regards,

Stephan Walter

How many fields do you have? It seems there are two ways of counting series, one is counting unique combinations of measurement+tags, regardless of the number of fields, the other, used when checking cardinality limits, is counting measurement+tags+field, so if your 246k series have 4 or more fields, you might be in the second case.

I find this disturbing that there are those two definitions of what a series is, but as this is what appears in the doc there were probably some very good reasons that lead to this choice. Maybe someone from influx may elaborate.

Hi,

thank you for this new detail. I wasn’t aware this detail.

We have within this database 19 measurements from the default telegraf input plugins.

They have between 2 and >10 fields with lots of tags.So I would assume, that we are far above 1Mil if we count measurements+tags+fields.

Do you have any idea how to calculate the number of this second definition of “series”?

We were not aware this problem so we may have missed it for quite some time, what we have to avoid in the future. So it would be great to have a Kapacitor based alert before this happens, so that we can react.

Best Regards,

Stephan

Ok, one more thing.

I have checked the reported numSeries for the last 7 days now. We have had until 5. October 402k Series within the database with the problem. Then there was a drop to 159k series.

So I wonder how this could happen and how it fits to the possible explanation above?

We have increased the max series now by a factor of 10 and all error messages has gone.

Nevertheless, it is not clear to us how we could detect, that we ran again out of series.

So it would be great to get some advise.

Best Regards,

Stephan Walter

I have seen that there are quite old files at /var/lib/influxdb/data/telegraf/_series/

root@influxdb:/var/lib/influxdb/data# ls -hal telegraf///*
-rw-r–r-- 1 root root 4.0M Apr 29 14:06 telegraf/_series/00/0000
-rw-r–r-- 1 root root 8.0M May 8 05:43 telegraf/_series/00/0001
-rw-r–r-- 1 root root 16M Jun 3 16:21 telegraf/_series/00/0002
-rw-r–r-- 1 root root 32M Jul 16 12:58 telegraf/_series/00/0003
-rw-r–r-- 1 root root 64M Oct 8 20:08 telegraf/_series/00/0004
-rw-r–r-- 1 root root 8.1M Sep 11 05:45 telegraf/_series/00/index
-rw-r–r-- 1 root root 4.0M Apr 29 13:54 telegraf/_series/01/0000
-rw-r–r-- 1 root root 8.0M May 8 05:52 telegraf/_series/01/0001
-rw-r–r-- 1 root root 16M Jun 3 16:23 telegraf/_series/01/0002
-rw-r–r-- 1 root root 32M Jul 16 12:59 telegraf/_series/01/0003
-rw-r–r-- 1 root root 64M Oct 8 20:08 telegraf/_series/01/0004
-rw-r–r-- 1 root root 8.1M Sep 11 06:46 telegraf/_series/01/index

So is this maybe the source for our max-series-per-database problem?

At InfluxDB 1.7.4 fails after 9 months without issues it was mentioned to manually delete shards. So maybe this is also true for series?

The point why I ask is, that we have modified the retention policy after a quite long time.

Nobady anything to say about the series folder?

Should we run this utility to recreate the index?