Steps to reproduce:
I’m writing data to InfluxDB in bulk using python library. If I push data for some specific period it’s giving me error saying that 400: {"error":"partial write: points beyond retention policy dropped=1"}
I did little dig-up to find out why it’s giving error for this specific period and found that some shards are overlapping. Can it be the reason?
I tried writing data one at a time and it’s working for the same duration same data.
$ show shards
name: mydatabase
id database retention_policy shard_group start_time end_time expiry_time owners
-- -------- ---------------- ----------- ---------- -------- ----------- ------
447 mydatabase autogen 447 2018-10-22T00:00:00Z 2018-11-19T00:00:00Z 2018-11-19T00:00:00Z
321 mydatabase autogen 321 2018-11-12T00:00:00Z 2018-11-19T00:00:00Z 2018-11-19T00:00:00Z
449 mydatabase autogen 449 2018-11-19T00:00:00Z 2018-12-17T00:00:00Z 2018-12-17T00:00:00Z
67 mydatabase autogen 67 2018-12-24T00:00:00Z 2018-12-31T00:00:00Z 2018-12-31T00:00:00Z
3 mydatabase autogen 3 2018-12-31T00:00:00Z 2019-01-07T00:00:00Z 2019-01-07T00:00:00Z
453 mydatabase autogen 453 2018-12-17T00:00:00Z 2019-01-14T00:00:00Z 2019-01-14T00:00:00Z
4 mydatabase autogen 4 2019-01-07T00:00:00Z 2019-01-14T00:00:00Z 2019-01-14T00:00:00Z
6 mydatabase autogen 6 2019-01-14T00:00:00Z 2019-01-21T00:00:00Z 2019-01-21T00:00:00Z
8 mydatabase autogen 8 2019-01-21T00:00:00Z 2019-01-28T00:00:00Z 2019-01-28T00:00:00Z
10 mydatabase autogen 10 2019-01-28T00:00:00Z 2019-02-04T00:00:00Z 2019-02-04T00:00:00Z
Also check retention policy which is set to infinite.
> show retention policies
name duration shardGroupDuration replicaN default
---- -------- ------------------ -------- -------
autogen 0s 672h0m0s 1 true
I did changed shardGroupDuration multiple time. Can it cause issue?
Expected behavior:
Data should have been written in chunk as well.
Actual behavior:
Data is dropped by retention policy
Environment info:
- System info: Linux 4.4.0-1087-aws x86_64
- InfluxDB version: InfluxDB v1.7.6 (git: 1.7 01c8dd4)
I investigate a little and found that, if I’m pushing data in bulk which have two different specific series, it’s dropping with retention policy violation.
my_measurement,SensorId=10045 kWh=54191.0 1568744999
my_measurement,SensorId=10048 kWh=82350600.0 1568658599
Where SensorId
is tag
Further investigation.
I queries data from different shards and used EXPLAIN
to check how many series are being queried for that particular date range.
> SHOW SHARDS
.
.
658 mydb autogen 658 2019-07-22T00:00:00Z 2019-07-29T00:00:00Z 2020-07-27T00:00:00Z
676 mydb autogen 676 2019-07-29T00:00:00Z 2019-08-05T00:00:00Z 2020-08-03T00:00:00Z
.
.
Executing EXPLAIN
for data from shard 658 and it’s giving expected result in terms of number of series. SensorId
is only tag key and as date range fall into only shard it’s giving NUMBER OF SERIES: 1
> EXPLAIN select "kWh" from Reading where (SensorId =~ /^1186$/) AND time >= '2019-07-27 00:00:00' AND time <= '2019-07-28 00:00:00' limit 10;
QUERY PLAN
----------
EXPRESSION: <nil>
AUXILIARY FIELDS: "kWh"::float
NUMBER OF SHARDS: 1
NUMBER OF SERIES: 1
CACHED VALUES: 0
NUMBER OF FILES: 2
NUMBER OF BLOCKS: 4
SIZE OF BLOCKS: 32482
But when I run the same query on date range that falls into shard 676, number of series is 13140 instead of just one.
> EXPLAIN select "kWh" from Reading where (SensorId =~ /^1186$/) AND time >= '2019-07-29 00:00:00' AND time < '2019-07-30 00:00:00';
QUERY PLAN
----------
EXPRESSION: <nil>
AUXILIARY FIELDS: "kWh"::float
NUMBER OF SHARDS: 1
NUMBER OF SERIES: 13140
CACHED VALUES: 0
NUMBER OF FILES: 11426
NUMBER OF BLOCKS: 23561
SIZE OF BLOCKS: 108031642
Any help will be useful.
I found out that shard has “SensorId” as tag as well as field that causing high cardinality when querying with the “SensorId” filter.
> SELECT COUNT("SensorId") from Reading GROUP BY "SensorId";
name: Reading
tags: SensorId=
time count
---- -----
1970-01-01T00:00:00Z 40
But when I’m checking tag values with key ‘SensorId’, it’s not showing empty string that present in the above query.
> show tag values with key = "SensorId"
name: Reading
key value
--- -----
SensorId 10034
SensorId 10037
SensorId 10038
SensorId 10039
SensorId 10040
SensorId 10041
.
.
.
SensorId 9938
SensorId 9939
SensorId 9941
SensorId 9942
SensorId 9944
SensorId 9949
Inspected data using influx_inspect dumptsm
and re-validated that null tag values are present
$ influx_inspect dumptsm -index -filter-key "" /var/lib/influxdb/data/mydb/autogen/235/000008442-000000013.tsm
Index:
Pos Min Time Max Time Ofs Size Key Field
1 2019-08-01T01:46:31Z 2019-08-01T17:42:03Z 5 103 Reading 1001
2 2019-08-01T01:46:31Z 2019-08-01T17:42:03Z 108 275 Reading 2001
3 2019-08-01T01:46:31Z 2019-08-01T17:42:03Z 383 248 Reading 2002
4 2019-08-01T01:46:31Z 2019-08-01T17:42:03Z 631 278 Reading 2003
5 2019-08-01T01:46:31Z 2019-08-01T17:42:03Z 909 278 Reading 2004
6 2019-08-01T01:46:31Z 2019-08-01T17:42:03Z 1187 184 Reading 2005
7 2019-08-01T01:46:31Z 2019-08-01T17:42:03Z 1371 103 Reading 2006
8 2019-08-01T01:46:31Z 2019-08-01T17:42:03Z 1474 250 Reading 2007
9 2019-08-01T01:46:31Z 2019-08-01T17:42:03Z 1724 103 Reading 2008
10 2019-08-01T01:46:31Z 2019-08-01T17:42:03Z 1827 275 Reading 2012
11 2019-08-01T01:46:31Z 2019-08-01T17:42:03Z 2102 416 Reading 2101
12 2019-08-01T01:46:31Z 2019-08-01T17:42:03Z 2518 103 Reading 2692
13 2019-08-01T01:46:31Z 2019-08-01T17:42:03Z 2621 101 Reading SensorId
14 2019-07-29T00:00:05Z 2019-07-29T05:31:07Z 2722 1569 Reading,SensorId=10034 2005
15 2019-07-29T05:31:26Z 2019-07-29T11:03:54Z 4291 1467 Reading,SensorId=10034 2005
16 2019-07-29T11:04:14Z 2019-07-29T17:10:16Z 5758 1785 Reading,SensorId=10034 2005