I am trying to find documentation about the database _internal of influxDB but I can’t find any (and it’s driving me crazy…). I especially want to monitor the performance of influxDB as I am using it currently. I can’t find the unit and the meaning of each variable. What I want to do is to collect the number of query, write /seconds, the size of the db over the time, the response time of a query or write request over the time …
Does anyone know if this documentation exist or if I can monitor the performance of influxDB in another way ?
Thank you in advance !
There is currently an open issue on the docs repo (#758) related to the documentation of these stats. As of now they’re still undocumented, although you can usually parse the units and meaning of each measurement based on the measurement and field names.
That issue also links to an older 1.0 documentation page, which is most likely out of data at this point.
One thing to consider, though, is that you don’t want to have the
_internal database enabled on production instances; you don’t want to monitor production with production. In most cases, customers will have a separate instance of InfluxDB dedicated to monitoring their main production instance using the Telegraf
influxdb input plugin.
Let’s take a look at what data we’re getting. I’m assuming you’re accepting writes and queries over HTTP, and haven’t enabled InfluxDB’s other inputs. If that’s the case, we can look at the
httpd measurement, which records statistics about the InfluxDB’s internal
httpd server, to get information about queries and writes. The
httpd measurement has a number of fields. Requests are broken down into
writeReq, which are counters and provide the number of queries and writes, respectively, since the application was started.
In order to calculate the number of writes per second, you will want to use the InfluxQL
DERIVATIVE() function with the
writeReq counters, using a format that looks something like this:
SELECT derivative(first(value), 1s) FROM foo WHERE time > now() - 1h GROUP BY time(10s)
This returns the first value for each 10s interval over the last hour. From these results, it calculates the derivative between each 10s bucket and converts it to a 1s rate.
fill(0)is needed because the first group by query would not return all the data required to calculate a derivative for all the buckets. The derivative currently exits early if it’s missing some data which is why you were only getting one point back w/ 0.
derivativeshould probably be changed to assume 0 for missing data values though.
Total Request duration for queries and writes are available in the
writeReqDurationNs fields, in nanosecond units, but unfortunately these are cumulative totals based on all requests served, and so they probably won’t be much help.
There is no measurement within the
_internal database that contains the total disk spaced used by the database. In order to collect that, your best bet would be to use the
exec plugin to execute a simple bash script with a
du command, as described in this StackOverflow question.
You might also want to keep an eye on Telegraf Issue #3945, which talks about adding this functionality directly to Telegraf.
I hope that helps!
Thank you for your answer @noahcrowley !
For the size of the DB, by reading this: https://github.com/influxdata/docs.influxdata.com/issues/227 , I found this command : select sum(diskBytes)/(1024*1024) as db_size_mb from “_internal”.“monitor”.“shard” where time > now() - 10s group by hostname,“database”
Apparently this gives the size of the DB but I don’t know if it is really its size. I have tested it but with “_internal”.“shard” instead of “_internal”.“monitor”.“shard” because I don’t have the monitor as a measurement. Do you think it gives the size of the database ? Since I don’t know what it is.
From the InfluxDB Glossary doc:
A shard contains the actual encoded and compressed data, and is represented by a TSM file on disk.
_internal.shard measurement will give you the size on disk of the various shards, but it will not give you information about the size of the Write-Ahead-Log (or Hinted-Handoff Queue if you’re running the Enterprise version) or metadata stored on disk.
You can have multiple databases on disk. What are you actually trying to monitor? Do you want to know the size of an individual database, or the total disk usage of InfluxDB?
Thanks for the info @noahcrowley
I am actually trying to monitor a single database that just store data frequently and see its size on the DB. But the total disk usage of influxDB would also be interesting.
New to the InfluxData documentation is a listing of the
_internal measurement statistics and descriptions. See http://docs.influxdata.com/platform/monitoring/tools/measurements-internal/.