I’d like to do a sanity check on a server setup I have in mind.
The context:
We have an Influx instance running in AWS. This is handling a modest and consistent write volume, of around 10-20 writes/sec. The query volume, on the other hand, is both quite high and quite uneven. The run-rate during periods of moderately heavy use is >50 queries/sec, and sometimes the instance gets hit by >1,000 queries within seconds.
The problem:
Currently the server is running on an EC2 m5.xlarge instance with 32GB RAM. During periods of high query activity, we’ve started experiencing out-of-memory issues.
Solutions that don’t seem quite right:
We could of course keep upgrading to larger and larger EC2 instance sizes, but that doesn’t seem like the right long-term solution - especially since at night, for example, the query load is 1/10th of the daily peaks, so we’d have massively over-provisioned capacity. Switching to a HA setup with Influx Enterprise/InfluxCloud may be an option (if we can afford it), but it also doesn’t seem like the best fit for simply dealing with heavy/variable query loads.
Solution we have in mind:
It seems that Influx (reluctantly?) supports having its data volume be on a network drive. So we could in principle have an AWS Elastic File System that contains the influxdata partition, attached to our main EC2 instance over NFSv4. That in theory opens up the possibility of spinning up additional instances (/containers) to which this same volume is also attached over NFS. Let’s call these “query nodes”. Now, clearly they shouldn’t have write access to the influxdata volume, but perhaps if the query nodes only have read-only mounts, we could still allow them to execute queries against the data?
I appreciate that there are some caveats here, e.g. the query nodes not having access to the latest data, which the main node may be holding in memory and not yet committed to the data volume. That’s probably ok.
But primarily, before embarking on a trial-and-error exercise, I wanted to get an idea of whether such a setup is possible, or whether the idea is fundamentally flawed – e.g. if a read-only mount is simply not an option.
Any other ideas of how to deal with our situation would also be welcome! (“find a way to reduce the query volume” is of course on our radar already )