Inputs.file parsing

Hello,

I have a file that continuously gets overwritten, and I’m using inputs.file to read the file every interval and report into influxdb 2:

[[inputs.file]]
files = ["/var/lib/poller/status"]
data_format = “csv”
csv_header_row_count = 0
csv_column_names = [“host”, “btime”, “ctime”, “utime”, “One”, “Five”, “Fifteen”, “tot_user”, “uniq_user”, “on_console”]
#csv_column_types = [“string”, “int”, “int”, “int”, “int”, “int”, “int”, “int”, “int”, “int”, ]
csv_skip_rows = 0
csv_skip_columns = 0
csv_delimiter = " "
csv_trim_space = false
[[outputs.file]]
files = ["/var/log/telegraf/output"]

In the output, I can see all lines get parsed and the batch gets written, however only the LAST line of the file gets written each time, despite what the log says.

Where am I going wrong here?

Can you provide a small sample of your csv file?

Sure! I’ve tried parsing with both space delimited CSV as well as a Grok parser, with the same results - host names scrubbed here.

hostname 1603576091 1623880591 1623372090 165 165 165 0 0 0
hostname 1578122694 1623880591 1623368404 135 135 135 0 0 0
hostname 1614793699 1623880591 1623352422 138 138 138 0 0 0
hostname 1548995647 1623880589 1622050254 207 207 207 0 0 0
hostname 1595386428 1623880589 1621532790 261 261 261 0 0 0
hostname 1591820780 1623880593 1616771857 204 204 204 0 0 0
hostname 1548996498 1623880589 1617345654 249 249 249 0 0 0
hostname 1548996515 1623880591 1606280815 189 189 189 0 0 0
hostname 1591821055 1623880589 1622848351 162 162 162 0 0 0
hostname 1584570844 1623880589 1623573031 105 105 105 0 0 0
hostname 1584737283 1623880595 1622841530 114 114 114 0 0 0
hostname 1584726333 1623880589 1623873783 63 63 63 0 0 0
hostname 1578123915 1623880593 1620399355 210 210 210 0 0 0

I have an idea why it may not work.
Could one of the columns be used as a timestamp?
Could one of the columns be used as a tag, e.g. the hostname?

I think the reason why it is overwritten, is a behaviour of influxdb, that is not so obvious in the first place.
Whenever you push a datapoint to influxdb, that has the same set of fields and the same timestamp, it gets overwritten. I dont’t know the internals of the influxdb engine, but i think, it makes somehow sense, because you could not run queries against data rows, that are indistinguishable.

Your csv data rows have all the same set of fields and the same timestamp.
A solution could be to add some uniqueness to your data rows.
For example to make your hostname a tag instead of a field.
Or pick a timestamp from a column (if there are timestamp values).
So this could be a solution:

[[inputs.file]]
  name_override = "csv"
  files = ["csvstatus.csv"]
  data_format = "csv"
  csv_column_names = ["host", "btime", "ctime", "utime", "One", "Five", "Fifteen", "tot_user", "uniq_user", "on_console"]
  csv_delimiter = " "
  csv_tag_columns = ["host"]

# file output only for debugging
[[outputs.file]]
  files = ["csvstatus.out"]
  influx_sort_fields = true

Hi Franky,

Looks like you’re right on the money here.

I tagged the host field last night on your suggestion, and all the data came through.

Thank you very much for your assistance!