Sorry about the back and forth but we can’t read a profile without knowing the software version that you are running.
Offhand, I can think of a few reasons for seeing the server slow down.
- You may be low on physical memory and the process swaps or can’t maintain an efficient cache of hot data. You can see this by looking at available memory at the process / OS level.
- You may be running queries that scale to the number of series being read; as you read more series, the queries grow slower. You could run the queries manually and time them to understand their basic performance, perhaps?
- You are writing data in a way that causes adverse compaction behavior (for example backfilling with new data or overwriting (updating) existing data that has been compacted).
- You are sending data inefficiently. For example, you write a lot of old points or write only a few points per batch.
Also of note is that InfluxDB will do file I/O per database, per retention policy. If you have multiple hot databases, this can be expensive.
Completing the information for this bug reporting template (from GitHub/influxdb issues) would be helpful in trying to figure out what’s happening:
System info: [Include InfluxDB version, operating system name, and other relevant details]
Steps to reproduce:
- [First Step]
- [Second Step]
- [and so on…]
Expected behavior: [What you expected to happen]
Actual behavior: [What actually happened]
Additional info: [Include gist of relevant config, logs, etc.]
Also, if this is an issue of for performance, locking, etc the following commands are useful to create debug information for the team.
curl -o profiles.tar.gz "http://localhost:8086/debug/pprof/all?cpu=true"
curl -o vars.txt "http://localhost:8086/debug/vars"
iostat -xd 1 30 > iostat.txt
Please note It will take at least 30 seconds for the first cURL command above to return a response.
This is because it will run a CPU profile as part of its information gathering, which takes 30 seconds to collect.
Ideally you should run these commands when you’re experiencing problems, so we can capture the state of the system at that time.
If you’re concerned about running a CPU profile (which only has a small, temporary impact on performance), then you can set
?cpu=false or omit
Please run those if possible and link them from a gist or simply attach them as a comment to the issue.