Points beyond retention policy dropped for infinite retention policy duration

Steps to reproduce:
I’m writing data to InfluxDB in bulk using python library. If I push data for some specific period it’s giving me error saying that 400: {"error":"partial write: points beyond retention policy dropped=1"}

I did little dig-up to find out why it’s giving error for this specific period and found that some shards are overlapping. Can it be the reason?

I tried writing data one at a time and it’s working for the same duration same data.

$ show shards
name: mydatabase
id  database   retention_policy shard_group start_time           end_time             expiry_time          owners
--  --------   ---------------- ----------- ----------           --------             -----------          ------
447 mydatabase autogen          447         2018-10-22T00:00:00Z 2018-11-19T00:00:00Z 2018-11-19T00:00:00Z
321 mydatabase autogen          321         2018-11-12T00:00:00Z 2018-11-19T00:00:00Z 2018-11-19T00:00:00Z
449 mydatabase autogen          449         2018-11-19T00:00:00Z 2018-12-17T00:00:00Z 2018-12-17T00:00:00Z
67  mydatabase autogen          67          2018-12-24T00:00:00Z 2018-12-31T00:00:00Z 2018-12-31T00:00:00Z
3   mydatabase autogen          3           2018-12-31T00:00:00Z 2019-01-07T00:00:00Z 2019-01-07T00:00:00Z
453 mydatabase autogen          453         2018-12-17T00:00:00Z 2019-01-14T00:00:00Z 2019-01-14T00:00:00Z
4   mydatabase autogen          4           2019-01-07T00:00:00Z 2019-01-14T00:00:00Z 2019-01-14T00:00:00Z
6   mydatabase autogen          6           2019-01-14T00:00:00Z 2019-01-21T00:00:00Z 2019-01-21T00:00:00Z
8   mydatabase autogen          8           2019-01-21T00:00:00Z 2019-01-28T00:00:00Z 2019-01-28T00:00:00Z
10  mydatabase autogen          10          2019-01-28T00:00:00Z 2019-02-04T00:00:00Z 2019-02-04T00:00:00Z

Also check retention policy which is set to infinite.

> show retention policies
name    duration shardGroupDuration replicaN default
----    -------- ------------------ -------- -------
autogen 0s       672h0m0s           1        true

I did changed shardGroupDuration multiple time. Can it cause issue?

Expected behavior:
Data should have been written in chunk as well.

Actual behavior:
Data is dropped by retention policy

Environment info:

  • System info: Linux 4.4.0-1087-aws x86_64
  • InfluxDB version: InfluxDB v1.7.6 (git: 1.7 01c8dd4)

I investigate a little and found that, if I’m pushing data in bulk which have two different specific series, it’s dropping with retention policy violation.

my_measurement,SensorId=10045 kWh=54191.0 1568744999
my_measurement,SensorId=10048 kWh=82350600.0 1568658599

Where SensorId is tag

Further investigation.
I queries data from different shards and used EXPLAIN to check how many series are being queried for that particular date range.

> SHOW SHARDS
.
.
658 mydb autogen          658         2019-07-22T00:00:00Z 2019-07-29T00:00:00Z 2020-07-27T00:00:00Z
676 mydb autogen          676         2019-07-29T00:00:00Z 2019-08-05T00:00:00Z 2020-08-03T00:00:00Z
.
.

Executing EXPLAIN for data from shard 658 and it’s giving expected result in terms of number of series. SensorId is only tag key and as date range fall into only shard it’s giving NUMBER OF SERIES: 1

> EXPLAIN select "kWh" from Reading where (SensorId =~ /^1186$/) AND time >= '2019-07-27 00:00:00' AND time <= '2019-07-28 00:00:00' limit 10;
QUERY PLAN
----------
EXPRESSION: <nil>
AUXILIARY FIELDS: "kWh"::float
NUMBER OF SHARDS: 1
NUMBER OF SERIES: 1
CACHED VALUES: 0
NUMBER OF FILES: 2
NUMBER OF BLOCKS: 4
SIZE OF BLOCKS: 32482

But when I run the same query on date range that falls into shard 676, number of series is 13140 instead of just one.

> EXPLAIN select "kWh" from Reading where (SensorId =~ /^1186$/) AND time >= '2019-07-29 00:00:00' AND time < '2019-07-30 00:00:00';
QUERY PLAN
----------
EXPRESSION: <nil>
AUXILIARY FIELDS: "kWh"::float
NUMBER OF SHARDS: 1
NUMBER OF SERIES: 13140
CACHED VALUES: 0
NUMBER OF FILES: 11426
NUMBER OF BLOCKS: 23561
SIZE OF BLOCKS: 108031642

Any help will be useful.

I found out that shard has “SensorId” as tag as well as field that causing high cardinality when querying with the “SensorId” filter.

> SELECT COUNT("SensorId") from Reading GROUP BY "SensorId";
name: Reading
tags: SensorId=
time                 count
----                 -----
1970-01-01T00:00:00Z 40

But when I’m checking tag values with key ‘SensorId’, it’s not showing empty string that present in the above query.

> show tag values with key = "SensorId"
name: Reading
key      value
---      -----
SensorId 10034
SensorId 10037
SensorId 10038
SensorId 10039
SensorId 10040
SensorId 10041
.
.
.
SensorId 9938
SensorId 9939
SensorId 9941
SensorId 9942
SensorId 9944
SensorId 9949

Inspected data using influx_inspect dumptsm and re-validated that null tag values are present

$ influx_inspect dumptsm -index -filter-key "" /var/lib/influxdb/data/mydb/autogen/235/000008442-000000013.tsm

Index:

  Pos   Min Time                Max Time                Ofs     Size    Key                     Field
  1     2019-08-01T01:46:31Z    2019-08-01T17:42:03Z    5       103     Reading                 1001
  2     2019-08-01T01:46:31Z    2019-08-01T17:42:03Z    108     275     Reading                 2001
  3     2019-08-01T01:46:31Z    2019-08-01T17:42:03Z    383     248     Reading                 2002
  4     2019-08-01T01:46:31Z    2019-08-01T17:42:03Z    631     278     Reading                 2003
  5     2019-08-01T01:46:31Z    2019-08-01T17:42:03Z    909     278     Reading                 2004
  6     2019-08-01T01:46:31Z    2019-08-01T17:42:03Z    1187    184     Reading                 2005
  7     2019-08-01T01:46:31Z    2019-08-01T17:42:03Z    1371    103     Reading                 2006
  8     2019-08-01T01:46:31Z    2019-08-01T17:42:03Z    1474    250     Reading                 2007
  9     2019-08-01T01:46:31Z    2019-08-01T17:42:03Z    1724    103     Reading                 2008
  10    2019-08-01T01:46:31Z    2019-08-01T17:42:03Z    1827    275     Reading                 2012
  11    2019-08-01T01:46:31Z    2019-08-01T17:42:03Z    2102    416     Reading                 2101
  12    2019-08-01T01:46:31Z    2019-08-01T17:42:03Z    2518    103     Reading                 2692
  13    2019-08-01T01:46:31Z    2019-08-01T17:42:03Z    2621    101     Reading                 SensorId
  14    2019-07-29T00:00:05Z    2019-07-29T05:31:07Z    2722    1569    Reading,SensorId=10034  2005
  15    2019-07-29T05:31:26Z    2019-07-29T11:03:54Z    4291    1467    Reading,SensorId=10034  2005
  16    2019-07-29T11:04:14Z    2019-07-29T17:10:16Z    5758    1785    Reading,SensorId=10034  2005