Wildcard in inputs.file "files" parameter leads to infinite loop?

I have about 1,000 files each with a JSON array that contains 2,500 JSON records.

Each one of the files has a name foo.bar.part0.txt, foo.bar.part1.txtfoo.bar.part1000.txt.

I set up the following config in telegraf.conf:

[[inputs.file]]
   files = ["foo.bar.*"]
   json_time_key = "time"
   json_time_format = "2006-01-02T15:04:05Z07:00"
   json_name_key = "mykey"
   tag_keys = ["tag1","tag2"]
   data_format = "json"

Telegraf succesfully ingests each record and emits to InfluxDB.

2019-04-23T19:59:41Z D! [outputs.influxdb] wrote batch of 1000 metrics in 55.601643ms
2019-04-23T19:59:41Z D! [outputs.influxdb] wrote batch of 1000 metrics in 77.459865ms
2019-04-23T19:59:41Z D! [outputs.influxdb] wrote batch of 1000 metrics in 134.56589ms
2019-04-23T19:59:41Z D! [outputs.influxdb] wrote batch of 1000 metrics in 59.871681ms
2019-04-23T19:59:42Z D! [outputs.influxdb] wrote batch of 1000 metrics in 420.667414ms
2019-04-23T19:59:42Z D! [outputs.influxdb] buffer fullness: 10000 / 10000 metrics. 
2019-04-23T19:59:42Z D! [outputs.influxdb] wrote batch of 1000 metrics in 76.68497ms

When I log into influx I see the series.

The only issue is that this import has run for hours.

It’s recording about 5k JSON messages per second, which means it should only have taken 500 seconds (8 minutes) to run.

Did I mistakenly create an infinite loop with my wildcard approach?

Does Telegraf keep track of the files it’s already ingested?

Hi @John_Sobanski ,

is the import still ongoing ?

The files are parsed each interval , ( that is the infinite loop )

# File Input Plugin

The file plugin updates a list of files every interval and parses 
the contents using the selected [input data format] 

(telegraf/DATA_FORMATS_INPUT.md at master · influxdata/telegraf · GitHub).

Files will always be read in their entirety..
1 Like

Thanks! In my use case I have 1k legacy files that I would like to import once.

It appears that inputs.file reads every file over and over each interval (infinite loop), is this a correct statement?

I’ve tried these alternate approaches and the last one works.

  • Set interval to a really high number (10 hours)
    • Result: Nothing happens - looks like I may have to wait 10 hours for it to start
  • Use inputs.tail and set from_beginning to true
    • Result: 2019-04-24T15:36:20Z E! [inputs.tail]: Error in plugin: open foo.bar.part1001.txt: too many open files
  • Have one giant newline delimited JSON file w/ all (3m) records, use inputs.tail and set from_beginning to true
    • Result: Works! All 2.5M+ records ingested at rate of thousands per second
1 Like