I have about 1,000 files each with a JSON array that contains 2,500 JSON records.
Each one of the files has a name foo.bar.part0.txt, foo.bar.part1.txt … foo.bar.part1000.txt.
I set up the following config in telegraf.conf:
[[inputs.file]]
files = ["foo.bar.*"]
json_time_key = "time"
json_time_format = "2006-01-02T15:04:05Z07:00"
json_name_key = "mykey"
tag_keys = ["tag1","tag2"]
data_format = "json"
Telegraf succesfully ingests each record and emits to InfluxDB.
2019-04-23T19:59:41Z D! [outputs.influxdb] wrote batch of 1000 metrics in 55.601643ms
2019-04-23T19:59:41Z D! [outputs.influxdb] wrote batch of 1000 metrics in 77.459865ms
2019-04-23T19:59:41Z D! [outputs.influxdb] wrote batch of 1000 metrics in 134.56589ms
2019-04-23T19:59:41Z D! [outputs.influxdb] wrote batch of 1000 metrics in 59.871681ms
2019-04-23T19:59:42Z D! [outputs.influxdb] wrote batch of 1000 metrics in 420.667414ms
2019-04-23T19:59:42Z D! [outputs.influxdb] buffer fullness: 10000 / 10000 metrics.
2019-04-23T19:59:42Z D! [outputs.influxdb] wrote batch of 1000 metrics in 76.68497ms
When I log into influx I see the series.
The only issue is that this import has run for hours.
It’s recording about 5k JSON messages per second, which means it should only have taken 500 seconds (8 minutes) to run.
Did I mistakenly create an infinite loop with my wildcard approach?
Does Telegraf keep track of the files it’s already ingested?