dandv
July 16, 2019, 8:54pm
1
I’m trying to migrate data in order to perform some updates that are impossible in InflxuDB (e.g. convert tag to field, rename field, delete tag ). To do that, I’m using SELECT … LIMIT with increasing OFFSETs. However, that gets slower and slower (from 0 seconds to 4 minutes when the offset is around 30M):
opened 08:13AM - 12 Jan 17 UTC
closed 05:04AM - 31 Jan 20 UTC
area/queries
area/performance
1.x
wontfix
### Bug report
__System info:__
InfluxDB 1.0.2, Amazon Linux AMI on EC2
…
__Steps to reproduce:__
1. Start with a clean database
2. Insert an hours worth of second resolution data
3. Query 1000 data points with offset 0 and the start time equal to the start time of the data
4. Query the same 1000 data points, but with a start time a few weeks or months earlier and offset equal to the number of seconds difference between the new start time and the start time in 3 above
__Expected behavior:__
I expect the above offset to have minimal impact on query performance on an empty DB.
I could sort of understand slower performance on the second query on a full DB, but in this case the DB has no data other than the hour being queried for.
__Actual behavior:__
In my case, the first query responds in 200ms (over http).
The second query, however, responds in 13932ms with exactly the same result.
Worse, when the offset gets high enough, influxd
crashes:
opened 01:02AM - 17 Jul 19 UTC
closed 08:41PM - 24 Oct 19 UTC
1.x
wontfix
area/memory
__Steps to reproduce:__
1. Have a measurement with over 40M points
2. SELECT… * FROM measurement LIMIT 1 OFFSET 35000000
__Expected behavior:__
The query returns within a reasonable time (seconds at most), and the server doesn't crash.
__Actual behavior:__
Memory usage spikes to See #7826.
__Environment info:__
* System info: `Linux 4.15.0-54-generic x86_64`
* InfluxDB version: v1.7.7 (git: 1.7 f8fdf652f348fc9980997fe1c972e2b79ddd13b0)
* Other relevant environment details: data and meta on XFS volume
__Config:__
```
bind-address = "127.0.0.1:8188"
[meta]
dir = "/mnt/influxdb-old/meta"
[data]
dir = "/mnt/influxdb-old/data"
wal-dir = "/var/lib/influxdb/wal"
[http]
bind-address = "localhost:8100"
```
__Logs:__
I'm running InfluxDB with `influxd -config /path/to/the/config/above.conf`, and I only see `Killed` when it crashes.
[iostat.txt](https://github.com/influxdata/influxdb/files/3399616/iostat.txt)
[profiles.tar.gz](https://github.com/influxdata/influxdb/files/3399617/profiles.tar.gz)
[vars.txt](https://github.com/influxdata/influxdb/files/3399618/vars.txt)
Is there a workaround? Other than SELECT INTO
, which ended up corrupting my source measurement .
dandv
July 17, 2019, 1:18am
2
One workaround is to not use OFFSET, but add a WHERE time > ${timeOffset}
clause, and set timeOffset
to the last read value. It’s worked well for some series, reducing the query time to ~3 seconds, but for others (involving a WHERE foo='bar' OR foo='baz' AND time > ${timeOffset}
clause), the query time is still ~3 minutes.
brtknr
August 7, 2019, 10:16am
3
This suggestion worked for me too! Many thanks for the suggestion! Although it would be nice if the OFFSET had worked too.