I am using Telegraf to write a value of a temperature sensor to Influxdb from an MQTT-broker. About every two seconds the value of the temperature sensor creates a new entry in the database. Every entry contains the following data:
table
_start
_stop
_time
_value
_field
_measurement
host
topic
This is a lot of data, when I only need the value, the timestamp and the MQTT-topic, which I want to display in a chart via Grafana.
So here are my questions:
Is there a way to configure Telegraf in order to reduce the data that is written to a database?
Is there a way to calculate or to show the data size that every entry in the database requires on my harddisk? I am planning to log a lot of sensors in a database, so I am interested in the file size that is required to log - let’s say - 1000 sensors that write a value every 2 seconds over the course of two years.
Filtering
You can filter the data before sending them to the output by keeping only specific keys and/or filtering key values.
Have a look at the filtering docs, You just need to know what becomes a tag or failed in order to use the proper filter.
About the data size
InfluxDB is a columnar database and I don’t think there is an easy way to check for occupied space of a single record (or point)… since it just doesn’t store the data in rows, and uses compressed columns instead.
Therefore even if you calculate the weight of a point that’s not the same space it will occupy in the database.
The best way I’ve found is to just wait until a Retention Policy has been filled, from that moment on the occupied space should be steady (unless the structure of data changes or new data/series) are added.
Also note that the “fresh” data are not yet compressed and therefore occupy way more space than the compressed ones.
To test it out, just gather the data form one sensor for maybe a week or even less, then check the DB size, you can then expect to need that amount of space for each sensor.
You can also use an official utility to check the data size by measurement.