Sorry if this is me being impatient, I’m new to influx. I have a database with roughly 80 million points in there, I have the query " SELECT * FROM /.*/ WHERE time > ‘2014-12-19T19:00:00Z’ " which I understand will return a large amount of data, but in the relational database (PostgreSQL) the data is also stored in it is currently stored in, the query execution time is roughly 25-30 mins. Currently InfluxDB is still running the query after 3 hours.
SYSTEM:
Processor: Intel® Corm i7-3517U CPU @ 1.90GHz 2.40GHZ
RAM: 8GB
Is this query time unusual? If so would it be possible I’m running something incorrectly / I can provide some more logs / data about my system for some advice ?
Cheers guys !
@danstenson This has the sound of a bug but there are a couple of factors that slow this query down:
- The data in InfluxDB is sharded by time. The default shard duration is 1 week. In your case, the query goes back over many one week periods. This causes the query to run over many different shards if the data is evenly spread out. This opens many files and uses an unusual amount of RAM.
- Older shards are considered
cold
. This is one of the assumptions we made while building the database. These old shards get compacted to save storage space. De-compaction to serve queries is expensive.
One question. How many field values do you have for those 80 Million points?
Thanks for the response! @jackzampolin !
I have roughly 5 fields per point, is it possible to change the default shard duration ?
Thanks
Dan Stenson
@danstenson Yup! When creating your database you can specify a retention policy as well:
> CREATE DATABASE "sensors" WITH DURATION 0d SHARD DURATION <duration> NAME "autogen"
You could also use ALTER RETENTION POLICY
to modify an existing retention policy, but the changes will only affect new shards. The best way to do this would be to recreate the database and reload the data in to the new DB.RP.