Wildcard in inputs.file "files" parameter leads to infinite loop?

John_Sobanski · April 23, 2019, 8:05pm

I have about 1,000 files each with a JSON array that contains 2,500 JSON records.

Each one of the files has a name foo.bar.part0.txt, foo.bar.part1.txt … foo.bar.part1000.txt.

I set up the following config in telegraf.conf:

[[inputs.file]]
   files = ["foo.bar.*"]
   json_time_key = "time"
   json_time_format = "2006-01-02T15:04:05Z07:00"
   json_name_key = "mykey"
   tag_keys = ["tag1","tag2"]
   data_format = "json"

Telegraf succesfully ingests each record and emits to InfluxDB.

2019-04-23T19:59:41Z D! [outputs.influxdb] wrote batch of 1000 metrics in 55.601643ms
2019-04-23T19:59:41Z D! [outputs.influxdb] wrote batch of 1000 metrics in 77.459865ms
2019-04-23T19:59:41Z D! [outputs.influxdb] wrote batch of 1000 metrics in 134.56589ms
2019-04-23T19:59:41Z D! [outputs.influxdb] wrote batch of 1000 metrics in 59.871681ms
2019-04-23T19:59:42Z D! [outputs.influxdb] wrote batch of 1000 metrics in 420.667414ms
2019-04-23T19:59:42Z D! [outputs.influxdb] buffer fullness: 10000 / 10000 metrics. 
2019-04-23T19:59:42Z D! [outputs.influxdb] wrote batch of 1000 metrics in 76.68497ms

When I log into influx I see the series.

The only issue is that this import has run for hours.

It’s recording about 5k JSON messages per second, which means it should only have taken 500 seconds (8 minutes) to run.

Did I mistakenly create an infinite loop with my wildcard approach?

Does Telegraf keep track of the files it’s already ingested?

MarcV · April 23, 2019, 10:09pm

Hi @John_Sobanski ,

is the import still ongoing ?

The files are parsed each interval , ( that is the infinite loop )

# File Input Plugin

The file plugin updates a list of files every interval and parses 
the contents using the selected [input data format]

(telegraf/DATA_FORMATS_INPUT.md at master · influxdata/telegraf · GitHub).

Files will always be read in their entirety..

John_Sobanski · April 24, 2019, 1:01pm

Thanks! In my use case I have 1k legacy files that I would like to import once.

It appears that inputs.file reads every file over and over each interval (infinite loop), is this a correct statement?

I’ve tried these alternate approaches and the last one works.

Set interval to a really high number (10 hours)
- Result: Nothing happens - looks like I may have to wait 10 hours for it to start
Use inputs.tail and set from_beginning to true
- Result: 2019-04-24T15:36:20Z E! [inputs.tail]: Error in plugin: open foo.bar.part1001.txt: too many open files
Have one giant newline delimited JSON file w/ all (3m) records, use inputs.tail and set from_beginning to true
- Result: Works! All 2.5M+ records ingested at rate of thousands per second

Topic		Replies	Views
Import JSON file and stop Telegraf telegraf	8	3069	July 30, 2018
Do batch import timeouts result in dropped data? Telegraf influxdb , telegraf	1	1621	April 24, 2019
Telegraf - Weird behaviour with JSON-transformation after first collection interval Telegraf telegraf , json	10	1144	February 24, 2023
Json file > line protocol > influxdb Telegraf	6	7376	December 19, 2018
Duplicate Data Point in InfluxDB Telegraf influxdb , tail , json	5	416	February 7, 2024

Wildcard in inputs.file "files" parameter leads to infinite loop?

Related topics