Telegraf read, handle, write big log files

Zarko · May 29, 2019, 11:33am

Hello everyone, as I am new here, apologize in advance if I miss something, duplicate or don’t fill as it’s needed.
I’m using csv file with more then 10K of lines which is generated on the server.
I did my best to find out how to hadle next problems or situations:

In the case this big csv file is updated once per day, is it possible to run Telegraf and after reading up whole file to stop? I don’t want to run it in interval every 10s or so…
In the case file is updated, after reading whole file, how to get only new metrics from the file which came with updating and write only them to Influxdb?

In my case, Telegraf loading/reading whole file each 10s and writes to Influxdb as unique lines. I’ve tried with inputs.file, inputs.logparser(using log file and grok patterns), inputs.tail.

Anaisdg · May 29, 2019, 4:57pm

Hello @Zarko,

Thank you for your question! The tricky part of your problem is the fact that your data is in csv format. The first solution that comes to mind would be to create a script (or use something like this csv-to-influx script) that converts your csv points to line protocol to a txt file. Then I would append the script such that it appends updated values to the end of the txt file. If you use the inputs.tail and from_beginning = false then telegraf will only run and collect those points once. I’ll let you know if I think of a more elegant solution.

daniel · May 29, 2019, 11:20pm

These days Telegraf has pretty good support for csv, so I think it won’t be a problem and it sounds like you already have that going. You can run this with the tail plugin as suggested by @Anaisdg to get the read only once behavior. However, if the file is being added on a daily basis, I suggest leaving from_beginning = true so that the entire file is read. It will still only be read one time start to end.

Anaisdg · May 30, 2019, 12:51am

Thanks @daniel.

@Zarko, here’s a blog for an example of how to use the csv telegraf plugin in case you need it:

Zarko · May 31, 2019, 9:48pm

@daniel Thanks for your answer. I’ve tried tail plugin but in this case, it register that file is updated but couldn’t write any line in influxdb.

So, I would say that tail plugin can read but can not write to db.

daniel · May 31, 2019, 10:11pm

Maybe you can show your tail config and a few lines from the file?

Zarko · June 6, 2019, 9:21am

Yes, of course.

From csv log file:

5cbe5580-6540-4376-9638-40055f8e4ee4,1,1559122530,207.46.13.92,retailer_view,4d02377c-7120-4107-83d3-3dead5a054c0,520b72d8-9d2c-4a23-846a-626d566e4bcb

5cbe5580-7990-44c8-886a-40055f8e4ee4,0,1559122530,207.46.13.92,retailer_logo,560cdc2c-126c-4515-b44b-0ed35f8e4e0e,5804c8d1-f6b8-402c-a9a3-774d5f8e4ee4

5cbe5580-83b8-4183-8d01-40055f8e4ee4,1,1559122530,207.46.13.92,gallery_image,568b9cf2-9420-4059-97a5-5bdb5f8e4ee4,56a74b43-da2c-4ff1-aab1-78b45f8e4ee4

5cbe5580-92f4-42ce-ad23-40055f8e4ee4,0,1559122530,207.46.13.92,gallery_image,568b9cf2-9420-4059-97a5-5bdb5f8e4ee4,56a749b1-6a48-4528-92ff-7b695f8e4ee4

Config file:

[global_tags]

[agent]
interval = “10s”
round_interval = true
metric_batch_size = 1000
metric_buffer_limit = 10000
collection_jitter = “0s”
flush_interval = “10s”
flush_jitter = “0s”
precision = “”
debug = true
quiet = false
logfile = “”
hostname = “”
omit_hostname = false

[[outputs.influxdb]]
urls = [“http://influxdb:8086”]
database = “Telegraf”
retention_policy = “autogen”

[[inputs.tail]]
files = [“/var/log/mylog.csv”]
from_beginning = true
pipe = false
watch_method = “inotify”
data_format = “csv”
csv_column_names = [“user_id”, “free_flag”, “timestamp”, “user_ip”, “user_action_type”, “company_id”, “reference_id”]
csv_header_row_count = “”
csv_skip_rows = 0
csv_skip_columns = 0
csv_comment = “#”
csv_measurement_column = “measurement_name”
csv_tag_columns = [“tag_key”]
csv_timestamp_column = “timestamp”
csv_timestamp_format = “unix”
fieldpass = [“user_id”, “free_flag”, “timestamp”, “user_ip”, “user_action_type”, “company_id”, “reference_id”]

Console screenshot:

Zarko · June 6, 2019, 9:27am

@Anaisdg Sorry, I forgot to mention you and thanks for your answer.
I already used that link, it helped me in the beginning but afterwards have new issues.

daniel · June 6, 2019, 6:33pm

It looks like it was able to write successfully, the measurement name would be tail because csv_measurement_column = "measurement_name" could not be found.

Topic		Replies	Views
Is possible to read only new CSV files via Telegraf? Telegraf influxdb , telegraf , csv	1	1132	February 14, 2022
Telegraf CSV Plugin Problems Telegraf influxdb , telegraf	2	698	September 2, 2021
Reading csv using telegraf	3	1707	January 20, 2020
Will tail plugin work for this scenario Telegraf influxdb , telegraf , csv , tail	4	814	March 4, 2022
Help needed getting a CSV file into Influxdb using Telegraf Telegraf csv	1	553	March 7, 2022

Telegraf read, handle, write big log files

Related topics