Inputs.file parsing

Matigris · June 16, 2021, 5:51pm

Hello,

I have a file that continuously gets overwritten, and I’m using inputs.file to read the file every interval and report into influxdb 2:

[[inputs.file]]
files = ["/var/lib/poller/status"]
data_format = “csv”
csv_header_row_count = 0
csv_column_names = [“host”, “btime”, “ctime”, “utime”, “One”, “Five”, “Fifteen”, “tot_user”, “uniq_user”, “on_console”]
#csv_column_types = [“string”, “int”, “int”, “int”, “int”, “int”, “int”, “int”, “int”, “int”, ]
csv_skip_rows = 0
csv_skip_columns = 0
csv_delimiter = " "
csv_trim_space = false
[[outputs.file]]
files = ["/var/log/telegraf/output"]

In the output, I can see all lines get parsed and the batch gets written, however only the LAST line of the file gets written each time, despite what the log says.

Where am I going wrong here?

Franky1 · June 16, 2021, 9:43pm

Can you provide a small sample of your csv file?

Matigris · June 16, 2021, 10:01pm

Sure! I’ve tried parsing with both space delimited CSV as well as a Grok parser, with the same results - host names scrubbed here.

hostname 1603576091 1623880591 1623372090 165 165 165 0 0 0
hostname 1578122694 1623880591 1623368404 135 135 135 0 0 0
hostname 1614793699 1623880591 1623352422 138 138 138 0 0 0
hostname 1548995647 1623880589 1622050254 207 207 207 0 0 0
hostname 1595386428 1623880589 1621532790 261 261 261 0 0 0
hostname 1591820780 1623880593 1616771857 204 204 204 0 0 0
hostname 1548996498 1623880589 1617345654 249 249 249 0 0 0
hostname 1548996515 1623880591 1606280815 189 189 189 0 0 0
hostname 1591821055 1623880589 1622848351 162 162 162 0 0 0
hostname 1584570844 1623880589 1623573031 105 105 105 0 0 0
hostname 1584737283 1623880595 1622841530 114 114 114 0 0 0
hostname 1584726333 1623880589 1623873783 63 63 63 0 0 0
hostname 1578123915 1623880593 1620399355 210 210 210 0 0 0

Franky1 · June 16, 2021, 10:34pm

I have an idea why it may not work.
Could one of the columns be used as a timestamp?
Could one of the columns be used as a tag, e.g. the hostname?

Franky1 · June 17, 2021, 7:05am

I think the reason why it is overwritten, is a behaviour of influxdb, that is not so obvious in the first place.
Whenever you push a datapoint to influxdb, that has the same set of fields and the same timestamp, it gets overwritten. I dont’t know the internals of the influxdb engine, but i think, it makes somehow sense, because you could not run queries against data rows, that are indistinguishable.

Your csv data rows have all the same set of fields and the same timestamp.
A solution could be to add some uniqueness to your data rows.
For example to make your hostname a tag instead of a field.
Or pick a timestamp from a column (if there are timestamp values).
So this could be a solution:

[[inputs.file]]
  name_override = "csv"
  files = ["csvstatus.csv"]
  data_format = "csv"
  csv_column_names = ["host", "btime", "ctime", "utime", "One", "Five", "Fifteen", "tot_user", "uniq_user", "on_console"]
  csv_delimiter = " "
  csv_tag_columns = ["host"]

# file output only for debugging
[[outputs.file]]
  files = ["csvstatus.out"]
  influx_sort_fields = true

Matigris · June 17, 2021, 2:15pm

Hi Franky,

Looks like you’re right on the money here.

I tagged the host field last night on your suggestion, and all the data came through.

Thank you very much for your assistance!

Topic		Replies	Views
Parse csv with telegraf and write into influx Telegraf influxdb , telegraf	1	149	March 4, 2024
Telegraf inputs sv Telegraf	1	263	April 3, 2023
Telegraf different measurements input from csv howto Telegraf	6	1925	April 4, 2022
Problems with telegraf + influxdb Telegraf influxdb , telegraf	6	1077	March 17, 2022
Telegraf CSV input formatting problem Telegraf influxdb , telegraf , csv	10	2661	February 15, 2023

Inputs.file parsing

Related topics