Out-of-memory on backup + queries

Hi,

I hope you enjoyed your holidays. Good to have you back.
As a reply to your questions:

  • General structure is as described in the linked thread:
    InfluxDB evaluation scenario - memory/performance issue - #3 by qootec that you closed in favour of the current one.

    • So, two measurements:
      • First: 1 tag (single value) and 1300 fields
      • Second: 1 tag (single value) and 15 fields
        The rest of the properties is kept the same throughout the experiments as descibed in that other thread.
    • The 1300 fields is a lot, but their data content is sparse.
      • For a single timestamp, sometimes only one field has a value, sometimes 10, sometimes 1000…
      • Some fields will have values almost every 50ms, other fields only have a daily value or less.
    • The fields all reflect one of the properties of a given system. Does not feel natural to go and split that in different measurements.
  • You are (understandably) wondering about my intensions with this query “select last(*)…”

    • As described above, some fields don’t even have daily values.
      This happens because we log data values only “on change” and some values just don’t change that often.
      In several operational queries, we need their value though.
      If I for instance want to query some aggregate information from today and an involved field’s value did not have any change today, the query will not yield any value for this field.
    • I’ve been looking at some other posts that talk about “last-value outside of time range”, but the answer seems to be that Influx does not have a way to obtain that.
    • As a compensation/workaround, I was walking over my database, inserting the last value of an hour on hour+5ms.
      That way, my data at least contains one value for every field, every hour. We could live with that.
    • You noticed that I was running this kind of query every 4 seconds, that is because the query took about that time and I just schedule them to run sequentially.
      After the query finishing, I write one timestamp with 1300 fields at hour+5ms and start over for the next hour.
    • Final intention would be to run that process once daily (e.g. at midnight+1h) for every hour of the previous day (to compensate for late arriving data).
  • I have been playing with the number of shards.
    One database used daily shards for a year (but currently only 50 days of data is captured).
    The other uses weekly shards, also with data covering about 2 months.
    Full data size is currently about 2GB (full influx’ data directory).

Best regards,
Johan